What is RayTracing Audio? - Characteristic and operation

You've probably seen the term RayTracing Audio somewhere, but you have no idea what it is. In this article we will explain what this is about RayTracing Audio, or rather, TrueAudio Next from AMD. And it is that the commercial name of RayTracing Audio is based on the extension of the term RayTracing for the tracing of light rays implemented by NVIDIA.
Actually in this article you will find very few times the term RayTracing Audio. As we said, it is because the name of the technology is TrueAudio Next and it has been developed by AMD. Actually the definition of ray tracing for sound is not correct, it would be more like wave tracing. But you know, you have to give it a commercial name that attracts attention.
To better understand how sound behaves, it is interesting to understand a little how the Doppler Effect works. Trying to avoid saying the wrong thing, I prefer to leave you a video of Santaolalla, since he is more qualified than me to explain it.
Table of Contents
The origin of TrueAudio Next: virtual reality
Virtual reality is a new technology that is growing little by little and that creates significant challenges, including audio processing.
Throughout the history of video games, the realism of audio representation has been relatively maintained. This is seen in greater depth if we look at advances in graphical representation and modern kinematics. But even though sound is inherent in the three-dimensional sense, audio has not undergone major changes.
In a 2D system, such as a screen, if you hear a sound behind you and turn around, you will see the source of the sound or a wall or other element. FPS games often use 3D audio to offer tactical assistance that can be beneficial. Even in the theater, surround sound relies on rear and side speakers that offer ambient fill effects. They do not usually offer auditory significant signals, simply because doing so could distract us from the action on the screen.
Mounting a screen on our head, as happens in virtual reality headsets, changes everything. We can move in all directions and see a complete visual scene. Today we can even walk independently in a virtual world. Virtual reality systems are expected to give a sense of presence that is close to consensus reality: 'perceptual illusion of non-mediation'. Studies indicate that realistic audio is important for virtual reality.

Basics for realistic audio
It is usually stated that accurate representation of positional and special audio using head-related transfer functions is sufficient to deliver realistic audio (HRTF). This is fine for environments with a closed loop. The designer can add ambient reverb, occlusion, reflection, diffraction, absorption, and diffusion effects to each pre-recorded sound.
This becomes inappropriate when the user has the ability to move freely around the scene, even within a limited area. The user with the movement or with the change of position of the head, generates that the reflection routes and the environmental effects are different.
What is usually done is to take all the effects in a reverb plug-in. It is common to use one reverb setting for the entire scene or multiple settings for different rooms in the scene. So that we understand, if there is a sound in an adjoining room, it is processed with a different reverb configuration than we are.

Technology of the 1990s
Within a virtual world, these approximations do not generate a presence, although the sounds are configured with great precision. Let's take an example in virtual reality. We go down a corridor and there is an opening from which a sound comes out. With the typical system the sound would always be the same, we are at the point we are, but in reality this changes due to the Doppler effect. When we get close to the point that generates the sound, the 'volume' increases, when we reach the point the 'volume' is maximum and when we pass it, the 'volume' decreases.
This is due to:
- Occlusion of room walls
- Diffraction around the aperture
- Reflection of wall, floor and ceiling surfaces
- Diffusion and absorption of the materials that make up the surfaces
Thus the conventional audio design system and the representation of the reverb of a room, adds simple attenuation and low-pass filtering of a sound source. Positioning it with an HRTF generates a credible sound representation, but fails to create presence. Something that happens no matter how much the designer uses realistic audio curves for distance attenuation and filtering of sound sources.
The simple reason this conventional approach falls short is because real-world acoustics are more complex. Our brain is well trained by exposure and adaptation to recognize real world acoustics and we discriminate it accurately. Our hearing is a critical adaptation for survival: sound is a first indicator of danger, usually. Knowing the direction and distance from the sound source, given an environment, is a critical adaptation.

Surround audio modeling
Delivering ambient sound with real world-like acoustics requires modeling the physics of sound propagation. This process is called auralization. For this implementation there are several propagation modeling approaches with trade-offs between complexity and precision.
The perfect model, for example solving the acoustic wave equation for each sound propagation, is not practical in scope. This is because the real-time computing capacity for virtual reality systems is limited. Real-time computing power via GPU with AMD's TrueAudio Next may be enabled in the future. This will allow auralization that cannot be achieved with a CPU.
Geometric acoustics begins with the tracing of the ray paths between each sound source and the position of the listener's ears. Algorithms are required for the data set on the path paths and the properties of the material they bounce off of. This generates a unique impulse response for each sound per ear.
Need for few resources
Additionally, path reflection, diffusion, and occlusion, as well as diffraction effects and HRTF filters, can be modeled in this framework, with each time-varying impulse response overlapping.
Within this rendering process and the impulse response are constantly updated according to the source and the position of the listener. These signals are mixed separately by ear to generate the output audio waveforms heard by the listener. A scalable approach implemented on both CPUs and AMD's TrueAudio Next.
According to the number of physically modeled sound sources it improves significantly. Limiting to a reduced number of primary sound signals is removed, allowing scaling to include ambient sound sources. Allows you to achieve a complete soundscape of 40 to 64 sounds. This is achieved by allocating 10-15 Compute Units from a GPU. Furthermore, it can be scaled in even larger dimensions when multiple GPUs or combinations of APUs and GPUs are implemented.

TrueAudio Next and FireRay
For this geometric acoustic representation, two elements are needed:
- Time Variable Convolution (Audio Processing Component)
- RayTracing (spread component)
AMD Radeon GPUs enable ray tracing using the open source FireRays library developed by AMD. Thus, the time-varying real-time convolution is performed using the TrueAudio Next library.
TrueAdio Next is a high-performance OpenCL-based real-time math acceleration library for audio with special emphasis on compto by GPU. The time-varying, low-latency convolution supports FFT and Fast Hartley transforms.
Now we are faced with two critical questions
- Can this technology be used with GPU Stream Processors without penalizing frame rendering (FPS)?
- Can high-performance audio over the GPU be played smoothly and with low latency in a VR game or advanced cinematic rendering scenario?
Traditionally the answer would be that GPU audio processing creates unacceptable latency and penalizes graphics performance. But with TrueAudio Next it can be done without any problems, reserving Compute Units for asynchronous computing.
Asynchronous computing from AMD is known in VR rendering as a key element of Time Warp and Direct-to-GPU features. It is based on preventing queues from running on a single array of Compute Units. What is sought is that multiple queues use different sets simultaneously, prioritizing variable execution under the control of an efficient hardware programmer.
Booking Compute Units takes this one step further. A specific set of CPUs can be taken and reserved for the necessary time, accessing the queue in real time. A 32 CU GPU, for example, could reserve 4-8 CU for use only under TrueAudio Next. The reservation is made within the application, plugin or engine enabled for TrueAudio Next.
Reserve Compute Units
This system allows you to reserve a flexible, scalable and discretionary CU number according to the game. Audio engines are already experienced in scaling processor resources using profiling tools. TrueAudio Next adds a private, reliable and highly configurable sandbox.
To avoid potential future problems, a CU reserve can be set early in the development cycle of a game. So the development of audio and graphics can take place independently. This also allows avoiding the problem that audio processing 'steals' resources from graphic processing. They really do offer a strict sandbox, but much larger and more powerful than that offered by a processor.
Graphics audio and graphics audio are isolated. Only the memory bandwidth is shared, the audio occupying a very small part of it, compared to the graphics. Also, for audio, DMA transfer latencies are more suitable.
Under these parameters, convolution latencies without failures have been achieved with a latency of only 1.33ms in 64 samples of 48kHz. Under this same test, the impulse response has been more than 2 seconds. Conventional audio in typical games has a buffer latency of between 5-21ms.

AMD TrueAudio Next
Sound, like light, interacts with the elements that we have around us in different ways. The texture, shape, displacement, or volume of an object have a direct influence on sound waves. Perhaps many of you are familiar with The Big Bang Theory series, the Doppler effect, which describes how a sound source changes getting closer to or away from us, the observer.
The TrueAudio Net software solution enables the processing of a sound signal using GPU acceleration. A few Compute Units on the graphics cards, isolated from the graphics pipeline, are assigned to perform this task.
It would be something quite similar to NVIDIA's light ray tracing, which requires RT Cores, which are basically CUDA Cores that only calculate the movement of light and its interaction with other elements. For AMD, developers allocate GPU resources to this feature as needed.
For this, libraries optimized for algorithms with a high computational cost are offered. They offer time-varying audio convolution, FFT / FHT, and audio-oriented vector math.
All of this allows sound designers the option to reproduce more sound sources with high-resolution physics. It also allows you to switch to higher-order Ambisonic for a higher resolution of the sound field. Another possibility is to enable the 5 second impulse response ideal for cave sounds that could not be rendered by CPU.

AMD TrueAudio Next 1.2
This version of TrueAudio Next technology implements significant performance and feature enhancements.
The audio convolution algorithm implements an acceleration option called the "head-tail" partitioning method. It allows an audio processing thread that is sent to an audio buffer in real time to receive a response from TrueAudio Next faster than with respect to a conventional convolution.
Much of the computing overhead occurs in the background, between presenting the buffer to TrueAudio Next. This makes everything much friendlier for parallel processing.
At the same time, all of this reduces latency and improves performance. This is because the calling audio thread does not hang waiting for the entire convolution to be calculated.
On the other hand, optimizations are added in the TrueAudio Next graphics audio acceleration library. This allows you to minimize the amount of memory required, speed up buffer transfer, and avoid synchronization overhead. All of this significantly improves performance when the IR kernel is dynamically updated while convolution is running.
The latency that is generated for this audio buffer call thread is now a short header computation.
TrueAudio Next in this generation allows reserving resources of an AMD GPU for audio processing. What this reservation allows is to protect the audio and graphics queues. Simply explained, we prevent both queues from blocking each other, working in parallel.
Finally, in this version, GPU-accelerated mixing is added to minimize the overhead of transferring the buffer. 10-band EQ, IIR (Infinite Impulse Response) filter, and time-domain convolution are added and Doppler displays.

Beyond spatial audio
Graphics are very important in games, but an important factor in games is also sound. Typically sound designers focus on the spatial 3D representation of direct sound sources. Direct sound is essential for creating great soundscapes that take gaming to the next level.
The problem is that the sound in its movement can run into objects that absorb part of the sound or distort it. Normally the sound reflected by walls, ceiling, floor or objects is not taken into much consideration.
It is not taken into account because the reflection of sound requires a significant amount of resources, greater than direct sound. The problem is that the tools available (hardware and software) may not be up to the task. Therefore designers prioritize direct sound and use simple reverbs to implement sound reflections. Also, this reverb is done at a low level to avoid loss of clarity.
Raytracing Audio.
The level of realism and credibility when implementing the sound reflection process in a game enhances the experience. Physically generated and specialized reflections can achieve this and provide very useful positional signals to the listener, which is limited in direct sound.
For this, Ambisonics offers 3D audio encoding with changes when the head orientation is changed. It allows in a simple way the calculation of reflections with scalable resolution. Higher Order Ambisonics (HOA) is a very useful solution at this point. Each increase in order exponentially increases the spatial resolution of the reverberant sound field, avoiding clutter and ambiguity.
Zero-order Ambisonics, on the other hand, is monodirectional, heading towards the head. The new solution is third-order or higher-order Ambisonics offering sixteen-way sound.
Third-order ambisonics
All of this can be a bit tricky to understand, that's why AMD posted a video where you can see the differences. The first part of the video based on TrueAudio Next is based on zero order Ambisonics or what is the same, without considering sound refraction. The second part of the video is based on third order Ambisonics where the differences depending on the objects and the distance are already perceived.
The 16 spheres that we can see in the room represent different sound sources speaking simultaneously. We started outside the living room, went through a hallway and into the living room. We can hear in the second case how the sound changes with each movement.
Sound of the two scenes:
- First part with sound in real time using zero order Ambisonics. 16 total convolutions are used and can be easily rendered with a modern CPU.
- Second part with real-time sound via third-order Ambisonics accelerated by TrueAudio Next. There are 16 fonts with 256 convolution filters that run on the GPU with resources reserved for this task.
It should be noted that the same direct 3D sound representation is used in both cases. The difference between the two sequences is due to the handling of sound reflections.
Third order Ambisonics video (use headphones for clarity)
Items that can be heard when TrueAudio Next is enabled:
- Outside the building it is easy to hear the direction of the reflected sound. The direct positioning of the sound can not work only the reflections are audible. No sound directing devices are added in this scene to create such an effect, it is automatically generated with physics.
- Inside the corridor the reflected sound is more natural and changes according to the movement of the head.
- When we enter the room, despite the increased reverberation from the walls and ceiling, it is easier to distinguish the individual sound sources. Habituation sounds more natural as we move. Despite many things happening, the sound becomes more comfortable to listen to. In Zero Order Ambisonics the designer should turn down the reverb to prevent the listener from becoming disoriented.
Conclusion
We can say after the explanation that the RayTracing Audio (commercial name of AMD TrueAudio Next) will be a huge leap in games. This technology implemented in games has the potential ability to offer an enriched gaming experience. We must thank virtual reality for the development of RayTracing Audio and leave behind the less rich technology of the 1990s.
Perhaps one of the most interesting aspects is that RayTracing Audio requires a relatively low amount of resources from the GPU. The processor, as well explained, cannot process RayTracing Audio, but a GPU can easily do it. Only with 15% of the GPU resources, at most, we can already enjoy RayTracing Audio.
At the moment and that we know, it is not added in any game, but the announcement of its implementation in the Microsoft console It will mean its standardization. RayTracing Audio will be supported by the Xbox Series X console based on an AMD graphics and processor. Possibly to avoid the minimal sacrifice of graphics performance, AMD implemented dedicated hardware for RayTracing Audio, although it is not known for sure.
Be that as it may, the jump in the gaming experience is significant, with greater clarity and with a more realistic sound. RayTracing Audio, just like its brother RayTracing for lightingThey are here to stay.



