A conversation with Kenan Cerimagic, Executive Producer for Radio Free Europe
Kenan Cerimagic, executive producer of video at the Balkan Service of Radio Free Europe/Radio Liberty, has been testing the Google Cloud’s speech-to-text tool for transcriptions with his team this month.
While they haven’t finished analyzing the results, Cerimagic said early signs show that this tool could significantly help support their newsroom — especially because it had something that few other services offered: transcription for their local languages.
For a long time, Cerimagic and his team had relied on Otter.ai to transcribe interviews but now they’re exploring alternatives that can transcribe their five Balkan languages. Full transcriptions with time stamps are also important to them, as the multimedia team shares content across platforms like radio, online and social.
We spoke with Cerimagic about his testing of Google Cloud Speech-to-Text, why a good transcription service matters for newsrooms — especially those with a small staff — and what advice he has for others.
Lytle: Tell me more about why transcription services are helpful for your work as a video producer?
Cerimagic: I love to play with the new tools and toys and try to figure out how to use them. Because, [like] many of the companies in the region, we are always lacking staff. So you have to rely on technology to help you out. Transcription can greatly benefit us in terms of making things faster because we lose [an] enormous amount of time on transcribing.
Lytle: What have you learned so far during your trials of Google Cloud’s speech to text service?
Cerimagic: For the Balkan languages — we have five languages here — it depends on the quality of the audio or video. For example, if the journalist speaks using standard language, a clear tone, and it’s recorded in the booth, it’s more or less like 80-90% accurate. But if there is natural sound, if there is music, if there is a noise, it reduces the quality significantly. And also when you have vox pops or people who don’t use standardized language, it makes a real problem.
Also, in English, you can ask for punctuation. In our languages you don’t have that. Still, when you compare it to the staff that you actually need to transcribe a one-hour interview by hand and having something that gives you 50, 60, 70% of something, it gives you a good head start.
We’ve run the test maybe 10 or 15 times. It’s not going to save our lives, but it will make our lives better or easier.
Lytle: If you are going to pay for a transcription service, what makes it worth the cost?
Cerimagic: It depends how you calculate. I like to calculate in head-count times working hours, so you can calculate how much you pay a journalist to do the same work. If you pay 100 bucks per hour, and s/he/they spend one hour transcribing 10 minutes, that’s basically a hundred bucks. And you can get it for a quarter with Google speech-to-text at $0.024 per minute.
Then, you have to calculate the accuracy. I expect in six months to a year that we will be relying on the AI transcription services because it develops so fast. We currently spend the most amount of time editing and subtitling stuff. Now, for the English-speaking global productions, Adobe Premiere is doing all these things for them already. So that’s going to speed up production for 30% – 40%.
Lytle: What advice would you have for people looking for a transcription service?
Cerimagic: Test it and play with it. The technology advancements are so fast that even a month means a lot. The machines are learning, so the bigger the pool, the more accurate it will get. So, for example, even if I’m not satisfied with Albanian transcription, I would do the test again in a month or two when more people may be using it. If the pool of people using it is pretty small and then you don’t have enough data for the machine to learn. That’s the reason English transcription is way better. Punctuation is even better than when you transcribe. It will put the comma really where it’s supposed to go, even the Oxford comma. Just test it.
This interview has been edited for clarity and brevity.