Basics: Digital Audio

Sample Rates and Bits

16 vs. 24

8 bits
Graphic at 8 bits per pixel

The number of bits signifies the number of levels of gradation within a single sample. A graphic analogy could be: many shades of gray at higher depths, vs. black and white at low bit-depth. A larger number of bits per pixel will provide a smoother gradation of appearance. A similar thing can happen to audio files, sounding “grainy” at low bit depths.

Graphic at 4 bits per pixel

The “CD standard” is 16 bits, but more and more recorders and editing systems are capable of recording at 24 bits or higher. There are significant advantages to recording at higher bit-rates, but costs as well: 24 bit soundfiles take more disc space and more computational cycles to process. So unless your audio will benefit from the increased low-level detail afforded by high bit-rate recording, it may be more efficient to stick to 16 bits, especially if the final delivery of your program will be by FM transmission or internet streaming.

Dither

Whenever any manipulation is made to a digital audio file, the resulting math will often create a number too large to represent with the available bits. If some sample’s volume were represented by the single-digit 7, and the gain is reduced by half, the resulting volume would be 3.5, requiring an extra digit to represent the value. If the system cannot accommodate the decimal place, it will “round-off” to the nearest integer. This truncating of the “digital word” can lead to coarse sound and loss of low-level detail. Working at 24 bits makes these inaccuracies happen at very low levels. Another solution is “dither” which applies random values to the lowest bit, making a bit that is supposed to be 3.5 into a 3 or 4 at random intervals, rather than the 3.5 always being represented by a 3. This process smoothes the sound, but adds a tiny bit of noise. So one should apply dither to any signal that has been changed in any way; volume, EQ, fades, etc. But one should be careful to not dither the same signal too many times, to avoid building up dither noise.

Sample Rates

Sample RatesThe sample rate is the number of “snapshots” of audio that are sampled every second. The continuous audio stream is digitally encoded in a similar way to a movie camera capturing motion by recording a frame of image many times per second. The higher the sample rate (and bit depth), the closer one can come to accurately representing the original sound. The usual analogy is representing a curve with straight lines, the higher the number of blocks, the closer one can come to accurately describing the curve:

44.1k, (44.1 thousand samples per second) 48k, and 96k are the most common sample rates. 44.1k is the standard used for CDs, 48k is common in video. 96k is becoming more common among audiophiles, but as with high bit-depths, recording the extra samples takes up much more hard-disc space, and often requires multiple digital cables to handle the extra headroom. One generally does not want to use sample rates below 44.1k, because of the “Nyquist Frequency” a formula that indicates that the audio bandwidth of a sampled signal is restricted to half the sampling frequency. So in order to cover the approximately 20khz range of human hearing, the equipment must sample at more than 40,000 (40k) samples per second. Reducing the sample rate will reduce the sound quality and the bandwidth, and therefore should only be used when absolutely necessary, such as for internet streaming of voice-only sources.

In most cases, there can be only one sample rate active at one time, although some digital editing systems allow multiple sample rates in one project. Therefore it is important to make sure that all your audio sources are recorded at the same sample rate, or you will need to “sample rate convert” some of them.

Clock

Digital signals are made up of individual samples and a clock signal. The clock tells the system exactly when each sample starts. The clock should be stable, must run at the proper sample rate, and there must be the only clock driving the system. In most cases, digital equipment derives clock from the incoming digital audio signal, and it is no problem to record from different sources, as long as there is only one acting as the master clock at any given time. In a typical set-up, a DAT machine will both input digital audio into a digital workstation, and record the output from that editor. While sound is being played from the DAT, that machine will act as the master clock, and the workstation should be set to external clock. But when playing audio from the workstation to the DAT, the workstation needs to act as the master, and so should be set to internal clock. If the workstation is set to external sync, it will be looking to the DAT machine for clock, but because the DAT is set to record from a digital input, it is looking to the workstation for clock, and as a result, will not record, because it cannot lock to a stable clock. Some professional equipment has a “word clock” input for elaborate systems that use a single, dedicated source of “house sync”.

Types of Connections

There are several different types of digital audio connections, a few of them using the same types of cabling and connectors as analog audio does, but the signals are very different, so be careful not to confuse them. The most common types of digital connections are AES/EBU and S/PDIF. AES/EBU usually uses 3-pin XLR cables that look the same as microphone cables. While a standard mic cable will generally make a useable connection, one should try to use cables designed for digital data, with the proper impedance, and the shortest length possible.

S/PDIF (Sony/Philips Digital Interface Format) can be sent two different ways: coax (usually on RCA cables) or optical (usually on glass or plastic fiber-optic or “toslink” cables.) As is the case with AES/EBU signals, S/PDIF digital signals can be sent down an RCA cable that you would use for audio, but to reduce the likelihood of errors, one should use a 75-Ohm cable designed for digital signals.

Just to make things even more confusing, the AES/EBU and S/PDIF standards are very similar, but not quite the same. It is unadvisable to connect one type to the other simply using adapters on a cable, although in an emergency, it would probably pass data. One standard is not really superior to the other, although AES/EBU uses more robust connectors and better cabling, and can send signals longer distances, although in all cases these distances should be kept to a minimum.

Optical S/PDIF uses fiber-optic “wiring” that has the benefit of not being confused with audio cabling. However, there are two optical standards in common use, S/PDIF, which sends two channels, and ADAT “Lightpipe” which handles 8 channels. Some devices can be set to work with either standard, but they are not the same, and one cannot connect S/PDIF optical outputs to lightpipe inputs or vice-versa.

One AES/EBU or S/PDIF cable carries two channels of audio, which can be a stereo signal, or two discrete audio streams.

Storage

Digital signals can be encoded and stored in many different file formats. Record your initial sounds at as high of a quality as is practical (higher sample rates and bit-depths use more disc space). Resist the urge to record or store your elements at low sample rates or in a “lossy” compressed format such as MP3. Save those compressions for later. Record and save your individual elements and final mix as at least 16-bit, 44.1 aiff, wav or SD2 files. If recorded at 16-bit, 44.1K, the audio files will take up approximately 10 megabytes per stereo minute, or 5 Megs per track minute.

For delivery of your final mix, especially on the internet, the large savings in file-size offered by various compression schemes is very attractive, but one should save a full-bandwidth version first, then make copies as MP3, or Real Audio, or Windows Media files. Keep in mind that those compression schemes (as well as the ATRAC system used by minidisc recorders) are throwing away data, and although the algorithms are getting better and better at retaining the audio quality, there is always some loss of resolution, and your sound can suffer, especially if it is encoded multiple times. There is a data-reduction program called “Shorten” that reduces files sizes significantly, while retaining all of the original audio information. This can be very effective, as long as all parties involved have the appropriate software to encode and decode the “Shorten” files.

Jeff Towne

About
Jeff Towne

During more than 25 years as a producer of the nationally-syndicated radio program Echoes. Jeff Towne has recorded interviews and musical performances in locations ranging from closets to cathedrals, outdoor stages to professional studios, turning them into radio shows and podcasts. Jeff is also the Tools Editor for Transom.org, a Peabody Award-winning website dedicated to channeling new voices to public media. At Transom, he reviews field recorders, microphones and software, helping both beginning and experienced audio producers choose their tools. In his spare time, Jeff will probably be taking pictures of his lunch in that little restaurant with the strange name that you've been wondering about. 

Comments

Your email address will not be published. Required fields are marked *

*
*
Website

  • Peter

    2.14.15

    Reply

    What about choosing between WAV and BWF types? You present the info very well but IMO it would be helpful to put at the bottom of each topic what sampling rate, etc, you use.

Your email address will not be published. Required fields are marked *

*
*