Speech in Quiet and Speech in Noise:
Audio Exemplars and Some Recommendations for Enhancing the Quality of Oral History Recordings
by Brad Rakerd
Department of Communicative Sciences and Disorders
From an acoustical standpoint, no two oral history recordings are ever exactly alike. One reason for this is that there is wide variation in the equipment that users employ when making their recordings, and perhaps even wider variation in the methods that they employ. Recommendations are offered here to help guide good decision making about both equipment selection and recording methods. Those recommendations are motivated by a series of observations regarding the physical characteristics of natural speech.
Comparison: Speech Sounds of Different Intensities
The spectral and intensive properties of speech can vary substantially from speaking situation to speaking situation, from talker to talker, and even from word to word. Still another source of natural variation is shown below. The consonant and vowel sounds that make up the spoken words of a language differ greatly in their overall sound power. The vowel sounds of English, for example, are all relatively intense (see/listen to the vowels in the top row). Consonants like “l”, “r”, “m”, and “n” (middle row) have an intermediate intensity, and consonants like “f” and “th” (bottom row) are notably weak.
Comparison: Speech in Quiet and in
Increasing Levels of Background Noise
One implication of this variation in intensity across the different speech sounds is that some sounds are more vulnerable than others to interference from any noise that may be present in a recording. The recordings below provide some sense of this. In the absence of any background noise, all of the speech sounds are audible and the message is clear throughout (recording #1). But as the noise level increases (recordings #2 and #3), more and more of the speech sounds become difficult or impossible to hear, leaving only the most powerful sounds to carry the message. Not surprisingly, both misunderstanding and frustration can result when listening takes place under noisy conditions.
An added point is that noise levels that are low enough to go unnoticed by most listeners may nevertheless be both noticeable and interfering if a listener has a hearing loss or some other communication challenge.
Speech in Quiet [Listen]
Comparison: Speech Recorded With and Without Clipping of the Signal
The wide range of intensities found in natural speech has important implications for anyone who is charged with setting the proper level of “gain” on microphone preamplifiers and any other audio gear that is to be used when recording a speech event. If the gain is set too low then the weaker speech sounds tend to be lost in noise of various kinds, as shown above. But if the gain is set to high then another problem presents. The more intense speech segments may be amplified beyond the limits of the recording system with the consequence that they will become “clipped.” Clipping distorts the speech signal and results in a harsh, unsatisfactory, and sometimes difficult to understand recording as can be heard by comparing the clean and clipped versions of the sentences below.
Sentence One (Clean Recording) [Listen]
Sentence One (Clipped) [Listen]
Sentence Two (Clean Recording) [Listen]
Example Sentence Two (Clipped) [Listen]
Here are two guidelines for the selection of any hardware component in a speech recording system. First, it should have a frequency response that is relatively flat and that extends out to at least 8000 Hz. Second, it should have a dynamic range (sometimes expressed as a signal-to-noise ratio) of at least 60 dB. A system assembled from components that meet these specifications should be capable of making a satisfactory recording of the speech of almost any person who is speaking at a conversational level and who is in reasonable proximity to a microphone.
Another equally important guideline is that the environment in which speech recordings are to be made should be selected carefully. Find as quiet a room as you can and, so much as possible, minimize all sources of noise within the room. Typical noise sources include heating and air-conditioning systems, computer fans, and “hum” from electrical equipment.
The following speech samples were all recorded with equipment and recording methods consistent with these recommendations. Most listeners find these recordings to sound natural and they find the speech to be clear and straightforward to understand. Sustained listening to recordings with this level of sound quality should be possible for most listeners.
Sentence-Long Speech Samples
Sample #1: The Clown [Listen]
Sample #2: The Knife [Listen]
Sample #3: The House [Listen]
Sample #4: The Tomatoes [Listen]
Sample #5: The Car [Listen]
Importance for Oral History
Why is all of this important for oral historians? At the most basic level, it is important because the acoustics of a recording can determine whether the recorded message is understood as an interviewee intends. Beyond that, the acoustics will determine whether the task of understanding the interviewee’s message is a relatively easy one for a listener or a harder one. And this, in turn, will either encourage or discourage the sort of thoughtful and sustained attention to an oral history record that is needed if it is to be fully appreciated.
Finally, there is an under-recognized issue about accessibility that pertains to all speech recordings, including those made by oral historians. Younger listeners, older listeners, listeners with hearing loss, and listeners who are non-native speakers of an interviewee’s language all require a high level of acoustical fidelity if they are to achieve reasonable understanding of a recorded message. It is now technically possible to make high quality oral history recordings as a matter of routine. One of the notable benefits of doing so will be to materially enhance the accessibility of oral histories for persons who have hearing loss or other limitations on their ability to process speech.
Further Reading on the Physics of Speech
Borden, G. J., Harris, K. S., & Raphael, L. J. (1994). Speech science primer. Physiology, acoustics and perception of speech (3rd ed.). Baltimore: Lippincott Williams & Wilkins.
Denes, P., & Pinson, E. (1993). The speech chain: The physic and biology of spoken language. (2nd ed.). New York: Freeman.
Fry, D. B. (1979). The physics of speech. Cambridge: Cambridge University Press.
Pickett, J. M. (1987). The sounds of speech communication. A primer of acoustic phonetics and speech perception. Austin: Pro-Ed.
Citation for Article
Rakerd, B. (2012). Speech in quiet and speech in noise: audio exemplars and some recommendations for enhancing the quality of oral history recordings. In D. Boyd, S. Cohen, B. Rakerd, & D. Rehberger (Eds.), Oral history in the digital age. Institute of Library and Museum Services. Retrieved from http://ohda.matrix.msu.edu/2012/06/speech-in-quiet-and-speech-in-noise/.
Rakerd, Brad. “Speech in Quiet and Speech in Noise: Audio Exemplars and Some Recommendations for Enhancing the Quality of Oral History Recordings,” in Oral History in the Digital Age, edited by Doug Boyd, Steve Cohen, Brad Rakerd, and Dean Rehberger. Washington, D.C.: Institute of Museum and Library Services, 2012,http://ohda.matrix.msu.edu/2012/06/speech-in-quiet-and-speech-in-noise/
This is a production of the Oral History in the Digital Age Project (http://ohda.matrix.msu.edu) sponsored by the Institute of Museum and Library Services (IMLS). Please consult http://ohda.matrix.msu.edu/about/rights/ for information on rights, licensing, and citation.