Add ElevenLabs tagging of speakers to Speech to Text Block

What problem does this feature request solve?

The speech to text block returns a large block of text without clearly flagging speakers.

What is the use case for this feature?

We analyse large blocks of audio from interviews, discussions, focus groups, etc. To analyse these effectively it is important to know when there is a change of speaker and who says what (speaker 1, speaker 2, etc).

Please describe the functionality of this feature request.

Insert speaker tags in the text returned by Scribe from ElevenLabs in the Speech to Text block or return the JSON with these details.

GPT4o-transcribe is limited as it does not accept long transcripts. Scribe accepts the long transcripts, but you are only returning the text output, not the details of who was speaking.

Great idea! The block has been updated for scribe to include an option called “Include Speakers”. When set to yes, the response will be in subtitle-style format:

00:00:00,000 --> 00:00:04,739 [speaker_0]
...built a professional services firm to 35 million in revenue.

00:00:05,039 --> 00:00:10,259 [speaker_1]
Let me guess, spreadsheet guy who thought he could DIY his exit strategy.

00:00:10,439 --> 00:00:13,800 [speaker_0]
You know it, Michael. Classic case of "I built this bus-

Let me know if that fixes it for you!

Yes, that is perfect. Thank you so much for the quick response.

Regards,

Royden