Real Time Convolution for Auralization

Auralization is, according to Vorlander,”the technique of creating audible sound files from numerical (simulated, measured, or synthesized) data.”


EASE is one program available for acoustic room modeling.

There are several steps to creating an auralization. First, the directivity and sound power of a sound source must be considered. Next, the room response, real or virtual, to the sound source must be convolved with the direct sound. The head-related transfer functions (HRTFs) of the listener are then convolved with the direct and reverberant sound to create a binaural signal, different for each ear. Each step requires consideration for accuracy and detail. The directivity of the sound source, whether a human voice, or a loudspeaker, or a musical instrument, must be accurate. How the source excites a room is another question. Rooms can be measured acoustically with a known sound source and several microphones. Rooms can also be modeled geometrically, including correct absorption and scattering coefficients for the surfaces. The source and receiver must be placed in the correct locations within the virtual room. Once the virtual room is set up, ray tracing and image source methods are commonly used for generating a room impulse response in the room.

A BYU classroom modelled in EASE

A BYU classroom modelled in EASE

For reduced computational time in this step, many programs use a hybrid method that utilizes both ray tracing and image sources to calculate the room impulse response. Both the direct sound and the room impulse response are then convolved with the receiver’s HRTF to create a binaural sound file. Individual people have individual, unique HRTFs based on their unique head geometry and pinna. HRTFs can be measured by using binaural microphones just outside a person’s ear canal.



Recently, real-time and low-latency convolution algorithms have made real-time auralization possible, adding to the acoustic realism of virtual reality situations. Here at BYU, we have built a real-time convolution system based on the work of Cabrera and Yadav. In this system, the user of this system experiences his or her self-produced direct sound and the virtual sound of simulated reverberance produced by the direct sound’s impulse. The following picture shows the equipment used in the real-time convolution system.

Hardware ad Software for real-time convolution system

Hardware and Software for real-time convolution system


Of special note are the off-ear, acoustically open headphones the subject wears. Because they are off-ear, the subject can hear their own direct sound unimpaired by the presence of headphones.

AKG K1000s



The software side of the system allows for the real-time convolution. Several simulated impulse responses of rooms in EASE, ODEON, Catt-Acoustic or similar packages can be created. The response files are then converted to binaural response files by internally convolving the response file with a measured HRTF response. The binaural impulse response files are then converted to *.wav file format for use in the SIR2 VST plugin within the digital audio workstation Reaper. The initial portions of the room responses are be truncated by 5 ms, in order to match the latency of the convolution system and remove the early reflections that ordinarily occur from a person’s body as he or she speaks. The SIR2 plugin performs the real-time convolution of a speaker’s input signal with the prepared binaural room impulse responses. Because in this system, the speaker in the anechoic chamber can already hear their direct sound, the “dry” signal will not be included in the auralizations, only the “wet” reverberation from the simulated rooms.

SIR2 in use with the real-time convolution system


The result of all this computation is simply a sound delivered to the headphones pictured above. This sound comprises the reflections of a room in response to a person’s noise. Thus, while the person using the system is physically in the anechoic chamber, a room with no reflections, through the headphones, it will sound as though the person is in another room – whether it be a gym, a concert hall, or a classroom.

Sound files of what this system sounds like to a person using it are below. The best listening experience can be had with headphones.


(anechoic trombone)

(auralized trombone in DeJong)

(auralized classroom speaking)