What problem does this feature request solve?
The speech to text block returns a large block of text without clearly flagging speakers.
What is the use case for this feature?
We analyse large blocks of audio from interviews, discussions, focus groups, etc. To analyse these effectively it is important to know when there is a change of speaker and who says what (speaker 1, speaker 2, etc).
Please describe the functionality of this feature request.
Insert speaker tags in the text returned by Scribe from ElevenLabs in the Speech to Text block or return the JSON with these details.
GPT4o-transcribe is limited as it does not accept long transcripts. Scribe accepts the long transcripts, but you are only returning the text output, not the details of who was speaking.