• Welcome to The Audio Annex! If you have any trouble logging in or signing up, please contact 'admin - at - theaudioannex.com'. Enjoy!
  • HTTPS (secure web browser connection) has been enabled - just add "https://" to the start of the URL in your address bar, e.g. "https://theaudioannex.com/forum/"
  • Congratulations! If you're seeing this notice, it means you're connected to the new server. Go ahead and post as usual, enjoy!
  • I've just upgraded the forum software to Xenforo 2.0. Please let me know if you have any problems with it. I'm still working on installing styles... coming soon.

Horizontal Localization of Sound in Humans

Flint

Prodigal Son
Superstar
How we locate sound in nature is the basis for how well stereo imaging works AND is the core concept for surround sound systems.

The way the brain processes the aural signals from our ears is complicated, but we know some things extremely well based on research which goes back to the 1920s and has been refined and perfected since then. There are still researchers attempting to master sound localization in order to create headphone solutions for virtual reality and improve the lives of those with hearing issues and brain function issues.

Basically, we instinctively and without any conscious awareness use two methods to pinpoint the location of a sound on the horizontal plane.

Intra-aural delay

First is the difference in arrival of the sound to our ears. Our brains, rely on very high order skills to process the difference in time between the excitement of one ear from the other. A short sound, like a bump, click, knock, snap, tap, etc. to the right of us will reach the right ear a few milliseconds before it reaches the left ear. Our brains are VERY good at recognizing that the exact difference in that arrival time and fairly accurately predicting about how far to the right the sound came from. That's for one click. If there are additional clicks from the same location, we will ever so slightly move our heads and recognize how the time difference changes, and we will keep estimating location on subsequent clicks until we can hone in on the exact location. If we really want to pinpoint the sound, we will move our hears until the delay is no longer present then the sound should be directly in front of us and we should be able to see the source of the sound with our eyes by looking directly forward and changing our focus point and scanning up and down and a tiny amount side to side to make up for limitations in our aural location skills and any acoustical artifacts which might be reducing the accuracy of our hearing skills.

This use of delay time is limited to frequencies where the distance between our ears is more than one full wavelength of the sound source. So, if our ears are about 7 inches apart, then our ability to differentiate delay between our ears is limited to a high frequency of about 2,000Hz. It also happens that our heads block sound from wrapping around our heads at about the same frequency, so frequencies above about 2,000Hz are attenuated or completely blocked from reaching the opposite ear. As the frequency gets higher, our head is more effective at blocking the sound. Also, as the sound source moves farther to the side, the more of the head actually blocks the sound from reach the opposite ear. So, like all filters in the real world, our brain goes from relying entirely on intra-ear delay to locate a sound in the midrange to not being able to use time delay at all to help locate a sound. It may start getting more difficult to locate a sound around 2,000Hz and it gets more and more difficult as the frequency increases until at a certain point we cannot use delay at all. It turns out that for most humans we cannot use time delay at all above about 5,000Hz, or so. Also, it is believed that skill can be trained over long periods by using headphones with processed delay information at higher frequencies. This is helpful for fighter pilots who wear headphones and have a pressing reason to have more information about location.

We also know that this ability to rely on intra-aural delay information to locate sound sources is limited in the bass. However, it is almost impossible to pinpoint bass sounds in the bass more because of the way bass frequency acoustic wave propagate in a space AND the nature of bass sound sources. In nature, it is rare that a bass sound will be short with the majority of the energy at the very front of the waveform. When you drop a rock off a cliff, at 100 feet away the bass levels are louder when the echoes and reverberant sound arrives than in the initial impact. So our brains are trained to rely less heavily on delay in arrival times when it comes to bass frequencies. In an enclosed space, bass sounds tend to reverberate around for over a second, so it is even harder for our brains to locate a sound source based on low frequency information alone. Most studies have found that it is extremely rare for any human, no matter how well trained, to located a bass frequency sound source at any frequency below about 100Hz, or so. Most people cannot locate a bass source below about 150Hz.

Thus, intra-aural arrival time information is used by our brains in the range from about 150Hz to about 2,000Hz - with the common variations between people's physiology and recognizing it isn't a brick wall stop at those end frequencies.

That's one way we locate sounds.
 
The other way we locate sound is by recognizing the difference in loudness of the source between each ear at higher frequencies.

High frequency loudness method

Because of the way our heads block sound at higher frequencies, our brains can pretty easily just compare how loud a sound is in one ear versus the other and assume the source is either towards the right side or the left side. If a light click occurs directly to the right of your head, the left ear may not hear it at all while the right ear might clearly hear it. If the click happens more than once, and we can move our heads to change the difference in loudness between the ears, we can deduce how far to the right that click is and adjust our head position until the click appears to be same loudness in each ear and thus directly in front of us. Once our hearing is convinced it is in front of us, we can look forward and hopefully find the source of the clicking. That's how it works.

Now, and this is where it gets cool, the frequency where loudness starts to be very effective in locating a sound source is about 700Hz. So, from about 800Hz to about 2,000Hz our brains can rely on both the time delay between ears AND the loudness difference between ears to determine the location of a sound source.

As the frequency climbs, our ability to locate sound sources at all starts to get weak. Most tests put the limit of locating sound at about 10kHz, but that varies wildly between individuals. The commonly accepted range of that upper limit is about 7,000Hz to about 13,000Hz. Above 15,000Hz no study has shown a single person capable locating a sound source.

One thing modern humans are dealing with is an abundance of high frequency information which is long tones rather than clicks, cracks, taps, tings, and so on. In nature, almost no high frequency pitch is sustained other than vocalizations of animals and buzzing of insect wings. Even things like crickets are not really pitches. However, today we are surrounded my musical instruments, machines, noise makers (like wind chimes, church bells, computer notifications, and so on) which present tons of long high pitched sustained tones. The human brain is not capable of relying on a sustained high frequency tone to locate the source of the tone. Just go to you stereo and play a 5,000Hz sine wave over a single speaker and you'll experience the bizarre effect of that tone seeming to come from any strange place in the room which suddenly leaps around the room as you move your head around. It is unnerving, really. That's why it can be so hard to find a hum or buzz in your house when you hear it. It is also why you can often turn in the wrong direction when trying to find that pesky mosquito in the room.

That's the second method of locating sound.
 
Why do the facts I've shared above matter to audio enthusiasts?

Well, our ability to locate sound is manipulated by stereo recordings to trick us into hearing full, balanced, wide and deep sound stages from great recordings. The better we can deliver a perfect stereo source, the more effective that illusion can be.

This is why when we discuss acoustic treatments, the range which requires the most attention to placement in relation to the listener/speakers is the midrange from 300Hz to 5,000Hz. All the talk about "early reflections" is about this sound localization stuff. If a room is too bright at adequate listening levels, one could place high frequency absorption anywhere in the room and tame the brightness. But in a stereo listening room you want to address the left and right early reflections first, and treating those alone might provide enough material to calm the brightness in the room. If not, then using absorbers on the front wall behind the speakers can help with stereo imagine because of how we locate sound and that might tame the room enough. If not, then the wall behind the listener can be treated to remove some more early reflections to improve the stereo effect by taking advantage of how we locate sounds, and that should tame the room enough.

See how that works? If the room is no longer bright, but there is a need to improve imaging and the stereo effect, then it is time to consider diffusors and angled reflectors to address the room issues without over taming the room. This is one hell of a complex world, but it isn't impossible if you consider how our bodies and brains work to hear and locate sounds AND how sound is propagated and reflected around the room on the horizontal plane.
 
All of the stuff above applies to how we locate sounds horizontally in front of us. Humans are pretty bad at determining sound location height and even worse at determining sound locations behind us. Real simply, we are engineered to locate sounds in a way our eyes can immediately take over determining the nature of the source - be it a threat, food, or just environmental awareness. Our ability to understand that is based on head motion while the sound is being made. We rely very heavily on small head movements to get close to pinpointing sound sources.

As such, when listening to a stereo system, we will move our heads a bit at the beginning of a song, and it can be as little as a cm change in ear location, after which we understand where everything is. What seems to be centered and how wide the soundstage is going to be. We can then remain fairly motionless until the character of the sound changes - such as playing a different recording, or if the music collection is on shuffle each song might have a completely different stereo image styling. Just the same, we do move our heads a bit to determine where everything we are hearing is coming from - be it actual (speaker locations) or virtual (stereo soundstage).

This is one reason many people struggle with headphones as the stereo image seems to put sounds which were mixed to be perceived as being in front of us instead appear to be coming from the center of our heads. One of the primary reasons this happens is that we unconsciously move our heads either extremely or ever so slightly expecting for the sounds we are hearing to also change accordingly. However, when wearing headphones the sounds do not change audible locations when our head movements. So, our brain has no choice but to place the sound in our heads between our ears where head movements will move the sound sources identically.

Adding crosstalk doesn't change that phenomenon. However, adding crosstalk can slightly relieve the effect by artificially adding some of the real world effects of having sound coming from two speakers in front of us rather than two headphone speakers on our heads. But for the crosstalk to truly work properly, it must be equalized with a transfer function resembling the muffling of the sound as it is blocked by our heads/faces, and it much be delays by the difference in arrival time between our ears if they were speakers. In general, the arrival time delay for speakers setup in an equilateral triangle with the listener at the third corner is about 3mS. So, the signal sent to the opposite earpiece needs to have an EQ filter applied which reduces the treble similarly to the way your head would reduce treble and it would need to be delayed by about 3mS.

That sounds simple enough and any ol' DSP should be capable of doing that. However, it does not account for head motion, which is critical. No matter how still you sit, if you start paying close attention to the audio rather than how still you hold your head in place, you will involuntarily move you head in small amounts because is in instinctive to do so when listening. If the sound does not change at least a little when moving your head, your brain will assume the illusion is thought it was hearing was in fact something completely different - most likely that the sounds are coming from inside your head.

It is messy, really.
 
Back
Top