Published on Dec 16, 2015
Human beings extract a lot of information about their environment using their ears. In order to understand what information can be retrieved from sound, and how exactly it is done, we need to look at how sounds are perceived in the real world.
To do so, it is useful to break the acoustics of a real world environment into three components: the sound source, the acoustic environment, and the listener:
this is an object in the world that emits sound waves. Examples are anything that makes sound - cars, humans, birds, closing doors, and so on. Sound waves get created through a variety of mechanical processes. Once created, the waves usually get radiated in a certain direction. For example, a mouth radiates more sound energy in the direction that the face is pointing than to side of the face.
once a sound wave has been emitted, it travels through an environment where several things can happen to it: it gets absorbed by the air (the high frequency waves more so than the low ones. The absorption amount depends on factors like wind and air humidity); it can directly travel to a listener (direct path), bounce off of an object once before it reaches the listener (first order reflected path), bounce twice (second order reflected path), and so on; each time a sound reflects off an object, the material that the object is made of has an effect on how much each frequency component of the sound wave gets absorbed, and how much gets reflected back into the environment; sounds can also pass through objects such as water, or walls; finally, environment geometry like corners, edges, and small openings have complex effects on the physics of sound waves (refraction, scattering).
this is a sound-receiving object, typically a "pair of ears". The listener uses acoustic cues to interpret the sound waves that arrive at the ears, and to extract information about the sound sources and the environment.
A 3D audio system aims to digitally reproduce a realistic sound field. To achieve the desired effect a system needs to be able to re-create portions or all of the listening cues discussed in the previous chapter: IID, ITD, outer ear effects, and so on. A typical first step to building such a system is to capture the listening cues by analyzing what happens to a single sound as it arrives at a listener from different angles. Once captured, the cues are synthesized in a computer simulation for verification.
The majority of 3D audio technologies are at some level based on the concept of HRTFs, or Head-Related Transfer Functions. An HRTF can be thought of as set of two audio filters (one for each ear) that contains in it all the listening cues that are applied to a sound as it travels from the sound's origin (its source, or position in space), through the environment, and arrives at the listener's ear drums. The filters change depending on the direction from which the sound arrives at the listener. The level of HRTF complexity necessary to create the illusion of 3D realistic hearing is subject to considerable discussion and varies greatly across technologies.
The most common method of measuring the HRTF of an individual is to place tiny probe microphones inside a listener's left and right ear canals, place a speaker at a known location relative to the listener, play a known signal through that speaker, and record the microphone signals. By comparing the resulting impulse response with the original signal, a single filter in the HRTF set has been found. After moving the speaker to a new location, the process is repeated until an entire, spherical map of filter sets has been devised.