Tweaking Levels – Voice Processing
Voice Processing | Level Headed
by Gregg McVicar
Now that Jeff has explained the basics of compression, limiting and equalization, let’s talk about how to use these classic tools to create radio that sounds warm and clear, even on car radios and Internet streams. Rest assured that most recordings made on modest broadcast-quality equipment will sound just fine on the air with little or no technical fussing. A good mic, well-placed with careful attention to levels is far more critical to success than anything that follows.
But as our work becomes more complex, we’ll be encountering a wider variety of voices and trying to blend audio from disparate sources into one cohesive sound. There’s an art to it, but also some science, so we’ll need we’ll need to focus on how audio levels will be perceived by our listeners, wherever they are.
The Announcer recorded on a studio mic, shown on the left side, has more energy (red and yellow) in the low (bass) frequencies, and much more response in the treble range above 3khz, than the voice recorded from the phone on the right, which has a constricted frequency range.
And that’s where our story begins, inside the human ear and the parts of the brain that operate the most miraculous of our senses. Highly-evolved for human communication (as well as navigation and survival), the ear is especially sensitive to the range of frequencies produced by the human voice — approximately 500hz to 5khz. So it’s no surprise that audio equipment is designed to function most effortlessly in this same frequency range. The telephone is a perfect example — for the sake of economy — the phone only transmits those frequencies most essential to vocal communication. Broadcast gear also is designed to easily handle this vocal range plus frequencies above and below that carry warmth and character.
Music, of course, presents its own challenges, as it occupies the entire bandwidth of human hearing (approx. 20hz to 20Khz), not just the most sensitive range. But as producers, we don’t want to constrict our pallet of choices to “telephone quality” nor do we want to have our message muddied in the mix or below the din of background automotive sounds. Our goal is to manage “apparent loudness” — presenting audio to our listeners that has reasonably consistent levels, even when the ear will perceive some content as “louder” than others, just by virtue of where it falls on the frequency spectrum. Our challenge then is to optimize intelligibility while preserving as much “naturalness” as possible — the second goal is one of the things that distinguishes us from our brethren in commercial radio.
And what is natural sound? In the case of an interview, on playback it should sound like that person is right there in the room with you — not a thin and hollow facsimile, nor overly-hyped like a burger ad or morning shock-jock. When someone walks in on a dubbing session and says “oh, I thought someone was in here with you,” I take that as a compliment. But I also know that this same recording might need a little help to sound just as good in the car.
I like my tracks to be:
- Warm and clear
- Thick but not hard
- Detailed but not brittle
- Smooth and airy on top, but not sibilant or splattery when broadcast.
Think Terry Gross or Tony Kahn. Every voice doesn’t have to sound HUGE, but it must have enough weight and clarity to hold the listener — a person who is accustomed to hearing NPR, PRI, BBC and Pacifica daily and doesn’t want to have to reach over and adjust the volume every time our stuff comes on the radio.
Here’s the test: can you hear every word of your piece while listening to it in the kitchen at low level? Or while driving in the car at a fairly low level? How about on your home stereo? If some things drop out and can’t be clearly understood, or sound boomy and muddy, or blast through higher than the surrounding material, then you need to turn some things down and then maybe apply some EQ, compression and limiting to even everything out. It’s easy if you know what to do.
Here’s How: Let’s say you’re editing together a piece that includes a soft male voice, a more strident female voice, phone sound and music from a CD. All the levels look good on the meters, yet it still sounds uneven.
What’s going on? Nothing unusual really — it’s just because our ears are less sensitive to the low pitched male voice and more sensitive, in varying degrees, to the other elements.
We can quickly solve this problem by one or more of these three methods —
1) using our knowledge of perceived loudness to adjust relative levels using our ears, not the meters
2) rolling-off low frequencies that generate level but not intelligibility
3) narrowing the dynamic range in a way that boosts softer sounds and holds back unnecessary peaks (using compression, upward compression and peak limiting).
Let’s start with the phone sound. It only contains only those frequencies to which the ear is most sensitive, so all things being equal, it just seems louder. Let’s just bring the level down a few db. How does that sound? Does it match better with the other material? Maybe we should nudge the female voice down a tad as well. But the male voice is still lost in the mud. So let’s fix it by applying some EQ, taking out a little bit of his bass that is making the meters move but not really imparting intelligibility. We do this by applying a high-pass filter, removing approx. 3db in the range under 100hz. Play with it, adjusting the “knee” frequency and the amount of dB reduction until it sounds right. You’ll want to keep the warm, round sound of his voice while losing the boominess. After removing some of this low-frequency energy, when you bring the level up, you’ll have a punchier sound that’s a better match with the other elements.
Different microphones may seem “louder” as well The narrator’s “voice of God” condenser may have more apparent loudness than a handheld dynamic mic. It’s important to make your adjustments and comparisons at a fairly low monitoring level (and not through headphones) to get the best “real world” comparison of apparent loudness.
The human ear is also more sensitive to audio that is more dense or “compressed.” This is because all those details that enhance intelligibility are brought to the surface where they can be heard, even at low levels. Sounds and syllables that are soft but important like “f” or “th.” This is why telephones, TV commercials and pop music are heavily compressed, so that they seem to “pop out” of the speakers. So after you’ve adjusted levels mindful of frequency-based apparent loudness, we should turn our attention to dynamics.
Just looking at the density of the waveform on your DAW will give you a good indication of how compressed your raw material already is. I call it the “chewy filling.” If the filling is fatter than the furry peaks top and bottom, then the signal is already pretty dense and will sound louder than other recordings that have less pronounced modulation and lots of little thin peaks (such as drums). An interview that ranges from a whisper to raucous laughter will also present serious dynamic challenges.
Wide dynamic range, lower perceived loudness
Same file limited, with make-up gain added, lower dynamic range, sounds "louder"
Typical commercial music recording, very compressed, sounds very "loud"
Dialog with explosive laughter, needs compression or limiting to increase average levels
Let’s remember too that most commercial CDs have been tweaked to perfection with EQ and compression, so that the resulting sound is often too bright and dense to mix smoothly under a voiceover. This is when you need to watch those levels and then consider adding some compression to your voice track to let it compete with the processing already in the prerecorded music.
As we know, compression (and its more intense big brother, limiting) hold back the loudest parts of the signal so that the overall level can be brought up without exceeding 100% modulation. Some tools do this for you, boosting level as you pull down the peaks. This is called “Maximizing.”
My favorite “desert island” tool is the Waves L1 limiter, which brings out the warmth and detail of a voice without nasty artifacts. I use a little bit of it on almost every voice and it is simply magic.
But by carefully applying small amounts of compression and limiting to the loudest passages, then normalizing the track back up to just under 100% (to prevent mathematical overshoot) anyone can “maximize” their audio. A useful and sophisticated technique used in film dialog is “upward compression” — others call it an “upward expander.” Quite simply, it gently raises the level of softer passages without changing the louder parts. Some DAW plugins offer this function — Waves C1 and C4, the Arboretum Ionizer, and others. Check your user manual for details.
Waves C4The important thing is to use these tools only in small doses — if it distracts from the warmth and intelligibility of the recording, it’s too much. You can always go back and add a little more, but once audio is over-processed, it’s garbage. It’s so important to audition your work in a wide variety of circumstances — on studio monitors, via headphones, in the car, on boom boxes and so on. Each will reveal something different about the balance of frequencies and dynamics in your mix.
But most importantly, listen to your work on the air and the Internet. Your carefully balanced feed will be affected by the processing at the station and the Net server. That lovely mix that left your studio may sound overdriven on that Real Audio stream or have sizzly, splattery highs once subjected to the indignities of FM pre-emphasis and compression. If you’ve done your homework, your audio should sail through these other systems uneventfully, sounding just as good as the network feeds that come down the satellite all day long.
Or even better!
Gregg McVicar is the host and producer of Earthsongs a nationally-syndicated program of Modern Music from Native America, and has over three decades of creative radio production experience.