Translating Audio with Python

2 min readSep 28, 2022

Françoise Gilot, *The Telephone Call*, 1952

“The day will come when the man at the telephone will be able to see the distant person to whom he is speaking” - Alexander Graham Bell

I was approached to start transcribing some calls at work. Some calls are pretty short and some others can be up to an hour. Some also have more background noise and the people speaking could not be heard as well. It would be a lot faster and take out a lot of user error to have this translated with code that could listen better than I could. So I made this:

Starting with a .wav file which is standard for Python we can begin. ffmpeg and pydub will need to be installed on your computer first in order
to run the following code. Links are attached at the end of the post.

IMPORTS

What we’re trying to do is speech recognition and then save what was recognized. AudioSegment allows us to be able to segment the audio file and pydub.silence allows us to be able to split the audio on timed intervals of silence. The code essentially runs off of google translate.

Function Call

I recorded a quick voicemail clip where I said “Morning how can I help you?”.

As we can see the audio started recording the actual ringing first and wasn’t able too understand the ring as speech which is correct. The translation was a fair representation of what was actually said. I was a little bit back from the phone and it is possible that the audio wasn’t very clear.

https://github.com/jiaaro/pydub

FFmpeg

ffmpeg -i input.mp4 output.avi FFmpeg 5.1 "Riemann", a new major release, is now available! Some of the highlights: We…

ffmpeg.org

https://github.com/adavis-85/Call-Parser

Translating Audio with Python

FFmpeg

ffmpeg -i input.mp4 output.avi FFmpeg 5.1 "Riemann", a new major release, is now available! Some of the highlights: We…

Written by Adam Davis