Basics of Sound
Desktop Apps Training - Other Applications

Sound Basics

Digital audio is great fun to play with and is becoming much easier to work with with the development of new programs in Linux. Digital audio is like a series of still images taken in a movie to create samples that recreate the movie. Basically the more samples that are taken the better the re-enactment of the movie. A typical example would be a audio CD using the WAV format. The WAV format takes 44100 still images per second. Each still image is made up of 16 bits which refer to the resolution or depth of the still image. These still images or samples, are stored as Pulse Code Modulation or PCM. PCM devices were dramatically modified and expanded in the 2.6 kernel under the Advanced Linux Sound Architecture or ALSA. PCM designates the digital output when it interfaces with sound cards. Two major PCM types, hw and plughw, allow the user to modify the way that ALSA relates to the sound card. The PCM is opened with specific settings for sample format, sample frequency, number of channels, number of periods or fragments and the size of the periods. When the sound card does not support the settings which were opened with PCM there is a problem. ALSA provides a solution to the problem by allowing the user to choose the plughw which enables ALSA to automatically convert the data in the plugin layer to a format that is supported by the sound card. This process allow much greater availability to sound in Linux. However, the changes will occur with the sample format, frequency and number of channels, thus altering the quality of the sound. Now if the hw type is employed ALSA will attempt to open the PCM devices directly and use the settings of the application that is running.

Several common PCM formats are WAV using Windows audio codecs with 8,16 and 24 bit PCM data. WAV uses sample rates of 2 kHz to 192 kHz. AIFF is a format used by Apple Macintosh which is very similar to the WAV format and uses the same bits and sample rates.

Often analog audio will need to be converted to digital. The sound card has an analog to digital converter built in called a ADC. On the other hand, when a digital CD is played the sound card must convert the digital to analog which uses the DAC converter on the sound card.

Video Compression

The real problem with audio is that when it is not compressed it consumes about 10 MB per minute. This is data that is difficult to transport, via a network for example, and also difficult to store. As a result several compression techniques were developed. One is the MP3 which was developed by Fraunhofer IIS and was patented. The result of the patent is that anyone distributing a MP3 encoder must pay a license fee. As a result the OGG VORBIS audio compression was developed. This format is now supported by most audio players.


ALSA has provided a method for Linux to use MIDI and as a matter of fact many sound card have MIDI ports to provide the input form synthesizers, keyboards or sound modules. Some sound cards even convert MIDID events into audible sounds with a WaveTable synthesizer. Virtual MIDI keyboards actually use the keyboard on the computer to create sound.


Mixer is the process of manipulating the volume and balance of sound output and input on the computer sound system. Here is an example of a mixer.



Players are programs that will playback the common formats of MP3, WAV or OGG VORBIS. Linux supports quite a number of easy to use players. XMMS is one of the most popular.

Buffering and Latencies

On computers the CPU must perform multiple tasks at the same time, this is called multitasking. Multitasking tasks include system tasks as well as programs being used by users. The problem is that the CPU can only perform one task at a time so it must provide a time slice to each of the tasks that need to be performed.

These time slices are very quick and most of the time are not noticeable by the user. However, in the process of playing back audio, occasionally clicks may be heard which are actually the CPU switching between tasks as it gives time slices to each program that is running. This problem was addressed by providing buffers that would be large enough to span the longest switch interruption that was made by the CPU.

A second problem developed in the latency or reaction time of a program when using a buffer. In other words, if the buffer is too big, there is a natural delay in the program. So the solution was to keep the buffers as small as possible by increasing the priority of the audio program or by using a real-time scheduler.