A Brief Introduction to Audio and Video Encoding

Every day, I listen to music or watch videos on different platforms–the radio, phone, computer, and television. Until recently, I had never considered exactly how that media was delivered to me. I would venture to guess that many people have been, or are in, that situation themselves.

I’ve spent some time looking into exactly how these forms of media are created and distributed, and I wanted to share some of the things that I’ve learned. While the concepts themselves aren’t overly challenging, the terminology can be confusing and misused, so I’m going to review some of the common terms and expressions.

Codec

A mix of the words “coder” and “decoder,” codec represents data in a specific format. When something is encoded, it can be understood by many systems in various contexts. Both video and audio data can be encoded, but there are different encoders and decoders involved.

Generally compressed, some codecs are lossy while others are lossless. Lossy encoding uses approximations to represent the data. While the compression rates are typically higher, the quality suffers, but if done properly, the decrease in quality can be unnoticeable.

This is the type of data an end user generally consumes, as these formats can save a large amount of space. Because there’s less data to send, compressed media can be delivered quicker. Some popular lossy audio codecs are MP3, Opus, WMA, and AAC. Popular video codecs are H.264, H.265/HEVC, VP8, VP9, and Theora.

Lossless encoding is the opposite–all original data remains intact even after being encoded. Some audiophiles prefer to listen to music that hasn’t been compressed, as they say they can hear the difference, but according to this test, I don’t fit that bill.

Other consumers of this media might be photographers or videographers, but the majority of people won’t find themselves dealing with lossless files. Some popular lossless audio codecs are FLAC and ALAC.

Container

Often (and for good reason) confused with a codec, this is a different concept. A container simply holds data. Composed of metadata and content, it could theoretically hold anything. In the context of media, containers typically combine audio and video data, although some popular containers have one but not the other.

Encoded data can be put inside a container, which is why it’s necessary to distinguish the two. For example, the popular container Matroska can hold virtually any kind of audio or video data, in addition to subtitles. Some popular audio/video containers are Matroska, MPEG, AVI, OGG, and Quicktime.

Bitrate

This applies to both video and audio, and it is simply the amount of data used to represent the content over a period of time. Audio can be encoded with either constant or variable bitrates.

  • Constant Bitrate (CBR) keeps the bitrate the same throughout the duration of the media. CBR is generally faster to encode than VBR, but it takes up more space, as quiet parts or silence don’t require as much data to represent the sound.
  • Variable Bitrate (VBR) uses a range of bitrates to encode the audio, with the more complex areas requiring more data. Despite the encoding time and the lack of support in some software or hardware, VBR is often a better option for encoding audio, as it delivers a much better quality-to-space ratio.
  • Average Bitrate (ABR) is a subset of VBR. An average bitrate target is set, and the encoder will reach that by having both higher and lower bitrate chunks.

Sampling Rate

This represents the number of samples per second in audio. It is measured in Hz (or kHz). The higher the digital sampling rate is, the better the representation of the analog sound wave is.

The Nyquist Theorem states that the sampling rate must be at least twice the highest analog frequency. A sampling rate of 44,100 Hz is quite common, as it represents sound with a 20,000 Hz maximum frequency. Humans can only hear about 20 kHz, so a higher frequency isn’t strictly necessary.

It is possible to drop the sampling rate for audio if storage space or CPU usage is an issue. If the source material doesn’t exceed a certain frequency, the sampling rate could be lowered, as well.

How Does this Affect Software?

Beyond the joy of learning a new topic, there are a couple of reasons this information is good to know. First of all, it helps to know what kind of media your software is delivering. If your application relies on video or audio, such as a music streaming platform or video editing software, it’s important to know exactly what goes into it.

Secondly, if your software uses audio or video (and even images), knowing encoding options can be incredibly useful. For instance, if you know about different audio/video codecs and when to use them, you might be able to improve the experience of a mobile user with a bad internet connection.

I hope you enjoyed learning more about this topic. If you’ve come across other useful audio and video encoding information, please share in the comments.