A few months ago, I received a DVD from an engineer who was looking for a job. He worked on a prominent television show, so I assumed that the DVD would reflect his excellent work. Indeed, the video on the disc was beautiful, but when the disc began to play, I had to dive for the master fader on the monitoring console. The audio was highly compressed and showed 10 full-scale samples in less than two seconds. I set the master fader to my reference level for DVD, so I felt as if I were inside an exploding grenade for about two seconds. Fortunately, I was near the console. After that experience, I learned to never trust audio levels on DVDs. Like Forrest Gump's box of chocolates, you never know what you're going to get.
Although there are many emerging technologies that require audio preparation for publishing, such as SACD, streaming audio, interactive games and background “scenarios” for cell phone conversations, for this article, I'll concentrate on audio for DVD-Video and DVD-Audio.
COVERING YOUR ASSETS
Unfortunately for audio, most DVD authoring houses evolved from video post-production studios. This scenario presents a big problem for audio as well as for audio engineers in the DVD world because most authoring packages come with audio encoders. Fairly good video editing programs, such as Final Cut Pro, even incorporate rudimentary audio editing and mixing capability. DVD authors will say, “Just send me your multichannel tracks and I'll make them work.” Then they will call the audio engineer wondering why the audio sounds distorted or (and this is 10 times worse) why the audio slips out of sync with the video.
For this reason, today's prudent, wise and savvy audio engineer must learn how to prepare the audio for a DVD release so that the authoring house has less chance of ruining it. After all, who gets the blame when the audio sounds bad on a DVD? The authoring house? Not if the audio engineer or recording studio is in the credit roll. While all of the people around you may be talking about “repurposing assets,” it's best to remember that saving your own is by far the most important consideration — especially if you want to stay employed.
DVD-VIDEO, OR SAY GOODBYE TO 44.1 KHZ
No matter how you feel about sample rate conversion, if your wonderfully mastered oeuvre with its excellent soundstage and “warmth with clarity” is a 44.1kHz file or a Red Book CD, it will need to be upsampled to at least 48 kHz for the DVD-V disc's PCM audio track. Also, don't assume that authoring houses have an elegant sample rate converter with polyphase filters. They may say “no problem” and then play your CD from their sweaty Walkman after they finish jogging, right out of its grimy RCA jacks and into the analog ports of their ancient Avid. Because the Walkman only listens to its internal clock, you'll soon receive a call. “Hello? We sent you the QuickTime movie and you said your audio matched the length, but it's off by 15 frames toward the end of the clip.” By now, you should know that “no problem” from an authoring house is code for “It's your problem if it doesn't work.”
The answer to this dilemma is to learn not only the audio specifications for DVD-V, but also common authoring programs and their limitations, what the target authoring house has and how to find the person who handles the audio assets at the authoring house. Get all of the information you can and always, always, always label everything with as much information as you can legibly squeeze onto the label. A “read-me” doc is nice on a CD or DVD assets package, but never assume that anyone will actually read it. If you are delivering assets via disc, then get white printable discs and a CD printer and put the pertinent information on the disc. Also, forget about those paper labels — put them on a DVD-R and you may render the disc unreadable. DVD players spin at a much higher rate than CD players do, and unbalancing the disc with a paper label is asking for trouble.
SOME SIMPLE DVD-V SPECS FOR AUDIO
DVD-Video discs can have up to eight streams of Dolby Digital multichannel audio, MPEG-2 and MPEG-1 multichannel audio, or linear PCM audio. The authoring house takes your audio and multiplexes (“muxes”) it with MPEG-2 — encoded video. The resulting files reside in the Video-TS folder on the finished DVD-V. You'll often note an empty Audio-TS folder on a DVD-V disc. On a DVD-A disc, this folder contains all of the disc's information. Unlike CD, which grew from an audio file format, DVD-V and DVD-A evolved from the CD-ROM format and their file structures and directories follow the Universal Disc Format, not Red Book audio.
Although MPEG-2 and MPEG-1 audio are “in the spec,” they are rarely used. For NTSC, DVD-V discs must have one of the primary tracks (usually Dolby Digital or PCM) as the first audio track. The remaining seven tracks can be any of the three primary formats or one of the two optional formats: DTS (Digital Theater Systems) and SDDS (Sony Dynamic Digital Systems). SDDS, the Sony multichannel file format based on the ATRAC Mini-Disc, is seldom used for DVD-V discs. DTS, a lossy format with a high bit rate, is often used as the second audio track for DVD-V discs in which multichannel audio quality is the primary concern.
Multichannel PCM is specified for DVD-V, but decoder manufacturers have declined to implement it and we've never seen a DVD-V disc authored with multichannel PCM. People have authored DVD-V discs with stereo 96kHz/24-bit PCM, but the audio bit rate imposes severe restrictions on the video performance.
All DVD-V audio formats support Karaoke mode. This comprises stereo L and R channels, with optional melody (M) or guide (G) channels, and two optional vocal channels (V1 and V2). Karaoke mode, however, is usually only implemented for DVD-V players that offer special features for mixing and microphone output. I'll save that discussion for another article, perhaps after there has been a bit more research in the sushi bars — er, in the appropriate audio environments.
An abbreviated chart on page 52 displays audio formats you may encounter if you are preparing audio assets for DVD-V. Audio engineers who work in the DVD-V area know most of the chart by memory, because calculating bit rates and data storage is par for the course when they try to convince DVD-V authors to compress the video more so that the audio may be compressed less. Similarly, when clients want to know why their assets won't squeeze into a regular 4.7-gigabyte DVD, you can pull out the chart and a calculator to show them the reason why they have to pay for the extra baggage with a DVD-9.
PCM AUDIO ASSETS FOR DVD-VIDEO
As stated earlier, although DVD-Video supports up to eight channels according to the spec, I've never seen a DVD player that implements it, nor have I seen a DVD-V disc with multichannel PCM. One manufacturer referred to it as a “non-spec” in a recent phone conversation, but I've seen contributors to various Internet lists steadfastly maintain that because it is in the spec, it must be possible. While waiting for this event, I'll focus on the common formats for PCM on DVD-V: PCM stereo (or Dolby Surround matrixed as Lt/Rt) at 48 kHz and 96 kHz at 16, 20 and 24 bits. Other matrixed formats may also be used, such as SRS Circle Surround and Dolby Pro Logic II. The file format is still PCM; it's up to the target decoder to either recognize or be set to dematrix these PCM formats.
Delivery formats include DAT with timecode, two tracks on an MDM tape (such as tracks 7 and 8 on a DA-88 tape, with discrete multichannel on the first six tracks), and files on CDs, DVDs, hard drives (FireWire and SCSI), Jaz drives and even those USB keychain drives. You may also be asked to provide an ftp area on your company's Website for clients, or you can use a Web server hardware device, such as Digidesign's Digidelivery, that sends the client an e-mail with a link to the client software for installation and then sends the encrypted data over the Internet. Authoring houses generally prefer .WAV, Broadcast .WAV or .AIFF, but some may request SDII files.
This sounds simple enough, until you start dealing with video synchronization. If you have a DAW, I've found that the easiest way to work is to request a QuickTime video file for the work print. Ensure that the video program starts exactly after two seconds of video black and leaves two seconds of digital black at the front of the audio file or tape. I've had audio files that arrived sans timecode hit lists and started “somewhere” in the video intro. On one such disc, I had to search through the video and audio to line up vocal plosives with the vocals on the audio file. I have a line item for that on the bills: “fix audio synchronization.” If the audio and video match exactly (down to the audio timecode subframe), it's much more cost-effective for the client.
Another quirk I encountered was difficulty in making those “audiophile DVD-V” discs. For one project, I made 96kHz/24-bit .AIFF files and found that Sonic DVD Fusion refused to recognize them. It had been happy with the 48kHz/16-bit .AIFF files, so in desperation, I decided to use Barbabatch to convert them to Sonic Solutions format. This failed, too. After some investigation (and sleepless nights), I had to have 96kHz/24-bit files rendered in the Sonic Solutions 5.4 EDL format. You can hack the files (sorry, I won't tell you how) or open them up in a Sonic Solutions HD system and save them in that format. There is also a utility called Sonic Magic that will do this, which is available from Sonic Studio LLC and Dark Matter Digital. I downloaded it and it tried to open up Classic Mode on an OS 10.3.3 machine, but I haven't tested it yet. (I'll save that for “downtime” days). It looks like a better alternative than hacking files or pulling favors from friends with SSHD rigs.
Other problems can happen when the authoring house has a buggy software revision. I've run into this a few times: There's a version of a midlevel Mac authoring program that will import a 96kHz Broadcast .WAV file and then encode it at 48 kHz (twice as long and half as fast)! Fortunately, it worked fine with 96kHz .AIFF. There are various DVD lists on the Internet and although it can be exhausting, it's worthwhile to join them and keep track of messages — searching through file headers at 2 a.m. can sometimes mean the difference between keeping and losing a client.
DOLBY DIGITAL/AC3 ASSET DELIVERY
Dolby Digital files for DVD-Video are usually delivered to the authoring house as .AC3 files. They can be encoded via software encoders like Minnetonka Audio SurCode, Universal Audio Smartcode Pro, Nuendo Dolby Digital Encoder or Apple A-Pack (the AC3 encoder included with DVD Studio Pro). Dolby also provides a PC utility to capture the AC3 bitstream from its hardware decoders to make AC3 files.
Not all encoders implement all of Dolby's AC3 features. Among the most notable missing features are timecode and RF overmodulation protection. AC3 files with timecode are rarely required, but I was thankful to have an AC3 file with timecode (created with Dolby's capture software) when an authoring house called up saying, “The audio loses sync slowly with the video.” I was able to take timecode hit-points from the video EDL and prove that the AC3 file matched the hit-points on the video delivered to me from the video post house. Further investigation revealed that the authoring house was running a revision of a high-end authoring system that dropped video frames in preview. My collective assets were safe.
The most hotly debated topics regarding Dolby Digital center on audio levels. Threads abound on the Internet with audio engineers staunchly defending their right to break Dolby Digital — encoding rules and Dolby representatives countering that the encoding guidelines are there for well-tested reasons. The best way to deal with this is to get all of the information from your client (for instance, find out some DVDs that the client likes or would like to sound like) and all of the information from the authoring house about successes and failures that they've seen. After a while, you'll learn when to use which settings.
For instance, if you're encoding for a DVD that will be played on computers, make sure RF overmodulation protection is turned off. If your encoder doesn't allow for this, then you need another AC3 encoder for that project.
Also, if you go with Dolby's suggestion to set dial-norm (dialog normalization) to -27, be aware that a track encoded using DTS will sound 4 dB louder than the AC3 track. According to some, you should just tell your client to turn up the playback system for the DTS track. If your client has customers coming back who are complaining that the DVD is “broken,” then you may want to change dial-norm to -31, which means “off” for dial-norm. Setting dial-norm requires an in-depth knowledge of the Dolby Digital encoder; for more information, read one of the several articles at www.dolby.com.
Music purists would often rather have compression set to none. If the end-user has a decoder set to “heavy compression,” then the music may be squashed anyway because the decoder may override your setting.
Rarely will your AC3-encoded tracks be exactly the same length as the original audio tracks. They can be off by nearly a frame in total length due to the difference in video frames compared to AC3 frames. You should have a few frames of video black and audio black at the end of the program material so that audio doesn't get truncated with a click.
ENCODING WITH DTS
If you're encoding using a software DTS encoder, then don't choose the .WAV file option, which is for making DTS CDs. It will not work for DVD-V; instead, choose the .CPT option.
Also, encourage the authoring house to let you use the higher bit rate version of DTS. Especially for multichannel, the sonic improvement is clearly audible with respect to the proper imaging of transients.
LEVELS
While most audio engineers who have mixed multichannel for a while will stress the importance of a calibrated mix room, many DVD authors work in small, cramped spaces with tiny computer speakers for monitors. Even worse, talk about audio for DVD on the Internet often centers on “How high can the levels be without distortion?”
DVD-V levels seem to be climbing, too, so it's important to know what your client expects before delivering the audio.
While experienced engineers will advocate the standard -20 dBFS = 0 VU = +4 dBu = 85 (or 79) dB SPL, those new to the trade will mix with CD “compressed-to-the-max” levels in mind. Notwithstanding the modern movement to bring more dynamic range to mixing, remember that AC3 will “fix” overload caused by overly high levels. When the overload protector kicks in, it ain't pretty.
Others suggest that discrete multichannel should have peaks anywhere from -10 dBFS to -6 dBFS. Some also say that DVD-V audio levels should match those for broadcast, but broadcast specs have been changing lately, with some cable broadcasts showing much higher levels than those in recent years.
Finally, music mixes for DVD-V may, in fact, need to be encoded differently than programs with center-channel dialog. The client and the audience are the final arbiters of taste, and experience in dealing with the public will often trump other encoding factors.
DVD-AUDIO AUDIOPHILE ASSETS
Like DVD-V, DVD-A offers downmix capabilities for playback systems that cannot reproduce the max number of channels at the highest bit rates. Downmix for DVD-A is more elegant than downmix for DVD-V, with coefficient tables to prevent overload and assist in mixdown control. Of course, hybrid discs offer the possibility of separate stereo including backward compatibility with DVD-V. Although DVD-A trails behind DVD-V in market penetration, interest in the format continues to grow. Lower-cost DVD-A authoring programs that can import Video-TS folders (such as Minnetonka Audio Chrome) are slated to be available by the second quarter of 2004.
The WG-4 (the DVD Forum's working DVD-Audio group) is also considering ratifying the DVD-A/CD “flipdisc,” a format tested this spring in several U.S. cities. As I write this in late March, the WG-4 has just chosen the Advanced Audio Coding format as the low-resolution codec for DVD-A's DVD-ROM zone, satisfying the consumer need for a format that is suitable for solid-state and portable audio devices. Again, the engineer who wants to work with DVD-Audio needs to know various encoding formats and the techniques required for the best end result.
The one rule that most engineers follow for DVD-A is to deliver five full-range channels sans LFE (unless there are musical sound effects involved). Although most modern DVD-A players have bass management, some older systems do not, so if you include an LFE, then include information about its level setting in large type on the final product. Then, the end-user will have a means to a solution for fixing playback with a strange bottom end.
PREPARING FOR THE FUTURE
Although the DVD Forum has announced “provisional approval” of Microsoft's VC-9 for the video portion of the HD-DVD specification (in addition to the mandatory H.264 and MPEG-2), the audio spec remains to be sorted out. While Blu-Ray devices can now be purchased, they are targeted at the home market for recording HDTV.
Will future specifications for DVD-Audio include 384 kHz to compete with the high bit rate of DSD for SACD? Will we be including highly compressed video for portable devices?
The challenge for audio engineers will be to know as much about audio codecs as they do about mixing audio, as it is clear that the complexity of these issues cannot be relegated to the exclusive domain of those who view audio encoding as a push-button afterthought to video.
K. K. Proffitt is chief audio engineer of JamSync, a Nashville facility specializing in multichannel mixing and DVD authoring.
DVD-VIDEO AUDIO FORMATS
| Format |
Sample Rate |
Bit Depth |
Channels |
Bit Rates |
Typical |
Compression |
| PCM |
48, 96 kHz |
16, 20, 24 |
1 to 8 |
Up to 6.144 Mbps |
48kHz, 16-bit stereo |
None |
| Dolby Digital (AC-3) |
48 kHz |
16, 20, 24 |
1 to 6.1 |
64 to 448 kbps |
192 kbps for stereo |
AC3 (lossy) and 384 or 448 kbps for 5.1 |
| DTS |
48, 96 kHz |
16, 20, 24 |
1 to 7.1 |
64 to 1,536 kbps |
377 or 754 kbps for stereo and 754.5 or 1,509.25 kbps for 5.1 |
DTS Coherent Acoustics (lossy) |
| MPEG-2 |
48 kHz |
16, 20 |
1 to 7.1 |
32 to 912 kbps |
seldom used constant and variable bit rate |
MPEG (lossy) |
| MPEG-1 |
48 kHz |
16, 20 |
2 |
384 kbps |
seldom used |
MPEG (lossy) |
| SDDS |
48 kHz |
16 |
5.1 or 7.1 |
Up to 1,289 kbps |
seldom used |
ATRAC (lossy) |
DVD-AUDIO FORMATS
| Format |
Sample Rate |
Bit Depth |
Channels |
Bit Rates |
Typical |
Compression |
| PCM |
44.1, 48, 88.2, 96, 176.4, 192 kHz |
16, 20, 24 |
1 to 6, with up to two simultaneous streams |
Up to 9.6 Mbps |
5 or 5.1 channels of 96kHz/24-bit with MLP |
Meridian Lossless Packing (MLP) |
|