Digital Video Preservation and Oral History
Preserving digital video requires addressing the entire life cycle of the content, from pre-production, to capture, edit, archiving, and providing access. Decisions made at the point of creation have implications for the other stages down the road. It is important to understand these implications and make decisions through out the workflow that will enable efficient, cost-effective, and accessible archiving over the long-term.
This paper provides a discussion of preservation issues, primarily for born-digital, file-based workflows, but also for video content digitized from analog sources. Regardless of the source of the content, the long-term preservation and management concerns remain the same for all types of digital video.
Anatomy of a Video File
A digital video file is made up of multiple components. Most important are the file wrapper, the encoded video track, and (if there is sound) the encoded audio track(s).
The file wrapper or container is what we commonly think of as the file format. It is represented on your computer or storage system with an extension such as .mov (QuickTime), .avi (AVI), .mpg (MPEG), and .wmv (Windows Media). The file wrapper is only one part of the video file, albeit an important one. Its role is to bind the video and audio essence together so they can be played back accurately. The file wrapper may also contain important metadata and additional tracks, such as closed captioning or subtitles.
The video and audio tracks contained within the file wrapper are created by different encoding formats, or codecs (short for coder/decoder). The codec used to create the video track must also be used to decode it upon playback. Software used to playback video files must have the right codecs within their library in order to play the video files back. Codecs can thus be thought of as yet another file format within your file. Common codecs today include H.264, DV (Digital Video), Apple ProRes, MPEG-2, and MPEG-4. The encoding format also dictates the type of compression that will be used on the file (unless the video is captured as uncompressed during digitization).
It is important to understand the components digital files you will be creating, as the different formats have different uses and different approaches for preservation. When choosing what format and codec to use during recording or digitization, a number of factors should be considered that will impact the interoperability and longevity of the format. These include the adoption of the format (Is it supported by a large number of software and hardware vendors? Is there a large community of users?), documentation of the format (Is it proprietary or open source? Is the source code documented and available to developers? Is it standardized or are there many different flavors of that format out there?), and external dependencies (Does this format depend on specific hardware or software in order to play it back?). An excellent discussion of codecs, compression, and decision factors for selecting video formats for archiving is A Primer on Codecs for Moving Image and Sound Archives: 10 Recommendations for Codec Selection and Management by Chris Lacinak of AudioVisual Preservation Solutions (http://www.avpreserve.com/wp-content/uploads/2010/04/AVPS_Codec_Primer.pdf).
Regardless of your budget, decisions made a the point of creation (i.e. the video-recording stage) will affect the remainder of your workflow: your editing decisions, how much storage you will need, what kind of access you can provide, and how you will need to plan for ongoing preservation.
The main issues that impact the video workflow are the encoding format, bit rate, and, by extension, the file size. The selection of a video camera should allow you to use and archive the recording in the most efficient and effective way given your infrastructure, expertise, and resources.
The first aspect of the record device to look at is the file format and codec it uses. As mentioned above, it is important to use using widely adopted, well-documented, standardized encoding and wrapper formats. This will help guarantee that the file will be playable in common software now and in the future and also implies that there will be a large community invested in maintaining the playability of the file, and/or developing solutions for migration to new format.
Although it feels like there are multitudes of video formats available, in actuality camera manufacturers are settling on a few common encoding formats. When looking to purchase a camera, look at the technical specifications. If you don’t see one of the following listed, you may want to reconsider that item:
If no encoding format is listed in the technical specs, you should ask for more information, or consider looking at other options. Cameras that record to these standard formats can be found in all price ranges.
Bit Rate, Resolution & File Size
Bit rate (or data rate) is the number of bits that are processed over time. For video, it is usually expressed in kilobits per second (kbps) or megabits per second (Mbps). Bit rate impacts both the resolution of the image and the size of the resulting video file created. Thus, different bit rates are used for different purposes. In order to deliver video over networks, video created for the Web typically uses a lower bit rate (e.g. 700 kbps), while video being edited for eventual television broadcast will be of a much higher bit rate (e.g. 25-50 Mbps for standard-definition video) to create a high quality image. The resulting video file, however, will also be much larger.
Some cameras record at fixed bit rates, while others allow the user to adjust the bit rate. It is important to start with the highest bit rate that both your camera allows and your storage and editing infrastructure can support. Video with a higher bit rate will require more computer processing power in order to edit and transcode, as well as more storage space. However, from a high-resolution video file, any number of lower-quality derivatives can be created for different purposes. The same cannot be said for the inverse: no amount of computer processing can replace bits that were not originally recorded. Therefore, if the original copy you have is a 1 Mbps video file, that is also the best copy you are ever going to have.
Edit and Transcode
Every transcode– conversion from one encoding format to another– introduces a loss of generation, due to different compression algorithms employed by different codecs. It is important to keep this in mind when deciding whether to transcode your video, as well as what purpose the transcode serves.
Video may need to be transcoded for a number of different reasons:
- If the original encoding format is proprietary or at a high risk of obsolescence
- To create mezzanine files, also known as working copies or edit masters.
- To create proxy files, also known as access copies.
In most cases, the original video footage should be retained, unedited, as preservation master files. The exception would be when the video is captured using a proprietary format. Although recordings using proprietary format are not recommended, if it is absolutely necessary, it will be important to transcode the video to a more manageable preservation master format for long-term access.
The creation of three file types – preservation master, mezzanine, and proxy – is common practice in the world of video production and archiving. The aim is to keep the original footage intact at the highest possible resolution, while using the mezzanine format to create new edits and proxies for different distribution purposes. This way, the integrity of the original is maintained and editing and transcode is made lightweight and efficient. The preservation master is safely stored, while the other files are used for manipulation.
If the originally captured video needs to be transcoded for preservation purposes, it is important to maintain the bit rate, frame size, frame rate, color sampling of the original.
A mezzanine file is a working copy, or edit master. It should be of high enough resolution that nearly all necessary derivatives can be created from it. The specifications for mezzanine files depend on the original. Some might not need a mezzanine. For instance, if your original video file is a 25 Mbps DV file (the equivalent quality of mini DV tape) and you are using the footage for broadcast, or even to make Web video, the same file will be suitable as a mezzanine. Just make an additional copy and store the other in a separate, dedicated area. On the other hand, if your original is a 100 Mbps HD file, you will probably want to consider a lower-resolution mezzanine format, perhaps 25 Mbps or lower.
The choice of mezzanine format often depends on the software and hardware that you will be using. Many editors work with video codecs that are native to their non-linear editing system. Those who work in Fina lCut Pro frequently transcode video to Apple’s ProRes for editing purposes, while people who work in Avid environments often choose Avid’s DNx. While both are high-resolution, effective editing formats, they are also proprietary, subject to frequent changes by the software developer, and not reliable as long-term preservation formats. These may be suitable mezzanine formats, but are not recommended for long-term preservation. If these formats are used during the editing process, it is highly recommended that the final output file is a standard format such as one of those mentioned above in the camera section for final retention.
Partial list of available export encoding, screen size, and frame rate settings in Final Cut Pro
A proxy file is akin to a reference file. Today, proxies are typically created for Web delivery, DVD, or other distribution channels. It is of low resolution, in many cases too low for editing, projection, or broadcast (depending on your needs), but a proxy file satisfies the bandwidth and user requirements of the Web and quick desktop screening.
If you do need to convert your file to another format, transcoding to create mezzanine or proxy files can be performed with a variety of different tools on your desktop computer. On the commercial side, products like Compressor, which comes with the Final Cut Studio suite, offer easy transcoding to a wide variety of formats. However, there is no need to rush out and buy this expensive software. Free tools like MPEG Streamclip and Handbrake are simple transcoders, are available for Mac and PC operating systems, and in the case of MPEG Streamclip, can output to a very wide variety of container and encoding formats. It is also a great tool for making short clips from longer video footage.
MPEG Streamclip from Squared 5, showing available export container formats. After selecting the container format, encoding options and other settings are selected.
Retention & Storage
File Naming and Organization
As mentioned above, original video footage should be retained, unedited, as preservation master files. The “original” may include more than just a video file. Some cameras today output a number files, all packaged together in a directory structure. Quite often, these additional files, their file names, and the directory structure itself all play an important role in functionality of the video file. A few important rules of thumb to keep in mind:
- If you are retaining the original, retain all of other original files as well.
- Don’t change the directory structure.
- Don’t change the names of any of the files, or any of the folder names within the top-level folder.
- Do use a consistent folder naming convention for the organization of the original files. Only change the top-level folder name.
Directory structure for a Panasonic P2 camera
Directory structure for an XDCAM camera
Directory structure for a Canon Mark II 5D camera
You should create a file naming convention for your final, edited masters, mezzanine, and proxy files. As with all digital files, consistent file- and folder-naming conventions should be used so that files can be easily found, organized, and guaranteed unique (so to avoid accidently overwriting of files, or distribution of the wrong version). Avoid the use of spaces and special characters in your file names (e.g., @ # $ % & * : ” ’ < > ? / ).
The amount of storage needed for digital video files again depends largely on the bit rate. By following a simple formula, you can quickly calculate how much storage will be needed for an upcoming new oral history recording project or a digitization project for your analog tapes:
- Divide the number of Mbps (or kbps) by 8 – this converts bits per second (how bit rate is calculated) to Megabytes per second or MB/s (how storage is calculated)
- Multiply the number of MB/s by 60 to get MB/minute
- Multiply the number of MB/hour by 60 to get MB/hour
- Divide the MB/hour by 1000 to get GB/hour
- Multiply GB/hour by the number of hours you will be recording or digitizing
There are other tools to help with these calculations. The AJA Data Rate Calculator (http://www.aja.com/products/software/), for example, will compute the number of MB, GB, or TB you will need to store files created in specific codecs, or with certain data rates, for a given number of hours.
Keep in mind that good digital storage practices require keeping at least two copies of your preservation master material and storing these copies on different storage media, ideally in separate geographic locations. When creating compressed video files, which— given the incredibly large file size (and thus processing power and storage required) of uncompressed video— is how most files are created in camera and during digitization, it is important to consider creating and storing a third separate copy. Compressed image, sound, and video files are much more susceptible to visible/audible corruption in case of bit rot (a colloquial term used to describe a gradual decay of storage media).
Uncompressed video file with small amount of corruption (.mov, 61.5 MB, 206 Mbps). Small specs of color appear throughout the image.
DV file same corruption applied as the uncompressed (.dv, 8.7 MB, 25 Mbps). The image is visibly more degraded.
h.264 file created from same corruption applied (.mov, 968 KB, 1 Mbps). The results of the same corruption are clearly much more devastating.
Appropriate storage media for digital video files depends on a number of factors, including the size of the video collection, the IT infrastructure and resources available, and the frequency of access required.
For preservation master files, which need to be accessed very infrequently, storage on offline media such as unpowered hard disk drives will be suitable for small collections (less than 1 TB) or on LTO data tape (multiple TB) will likely be the most cost-effective solutions. Keep in mind that at least two copies of your video files need to be stored on different storage media.
Mezzanine and proxy files often need to be accessed more regularly. These files can be stored on hard disk, either on external hard drives, or storage networks such as SAN, NAS, or other servers, if your infrastructure supports these storage approaches. Proxy files can often be stored by video hosts and distribution channels, or cloud-based solutions. The cloud is not a suitable storage method for large preservation master and edit files, however, given the tremendous amount of bandwidth that would be required to move these files over networks.
Technical and Preservation Metadata
Managing digital video collections is greatly supported by metadata of various types. It is important to create and manage good, consistent descriptive, metadata in order to support discovery and understanding of the resource. However, technical, structural, and preservation metadata also greatly supports collection management of audiovisual content.
Fortunately, this metadata does not all need to be created by hand, nor do have to start from scratch deciding what metadata to capture and collect. Technical metadata is created when the file itself is created, and is stored in the file header. You can easily view this metadata using a tool such as MediaInfo (especially recommended for video and audio files). With a little additional work, the most important excerpts from the MediaInfo output can be added to your technical metadata for your files. MediaInfo is a free tool available for PC and Mac platforms.
MediaInfo display of technical metadata for an MPEG-4 file, using the Text View mode.
Two standards offer support for capture of extensive technical metadata for video files: 1. videoMD, maintained by the Library of Congress, and 2. PBCore (which is also a descriptive standard), maintained by the Corporation for Public Broadcasting (http://pbcore.org/). Whether or not you choose to implement these standards in your database, they still can be a useful guideline and, in the case of PBCore, offer controlled vocabularies that can be particularly helpful for entering consistent metadata.
Before settling on any proxy format, look at how you will be distributing, the specifications for video files, and whether they do internal transcoding, so that you don’t have to. Many of today’s common video distribution platforms will create proxies for you in the format most suitable for their system. YouTube, for example stresses that you should upload the original file, so to ensure that the quality is retained during their transcode process. Vimeo, on the other hand, provides a set of specifications for video format, encoding, frame size, etc. Online video management systems like Kaltura (http://www.kaltura.org/) will create a large number of derivative files from your high-resolution original or mezzanine for distribution to different platforms.
Providing access to remote communities may require the creation of either very low-resolution (low data rate) proxies for streaming to low-bandwidth areas, or alternatives distribution formats. DVDs may be the most appropriate media in many cases. If authoring a DVD to distribute to people in other parts of the world, be sure to check whether the DVD needs to be authored in PAL, NTSC, SECAM, or other format, so that they can be played back on local DVD players. This issue should not apply when video files are saved to DVDs as data (rather than an authored disk with menu, chapters, etc).
As mentioned above, the use of open, standard file formats and codecs is highly recommended to ensure that the files will be accessible in the future. Long-term retention of proprietary video formats is not recommended. These formats change frequently, playback is limited to specific software, and the source code is not documented so that others can write codecs to read the files. If you have created or acquired proprietary video files as your primary preservation format (this includes Apple ProRes and AVID DNx), you may want to consider migrating these files to a more preservation-friendly file format.
Generally speaking, video formats that are widely-supported, documented, and open standards will have a much greater longevity than those proprietary formats subject to frequent change. However, over time, these formats will also change, and it may be necessary to migrate preservation files to a new format. It is important to monitor the technological landscape to know when a format (container or encoding format) is at risk for obsolescence. In the meantime, it is important to maintain original, high-quality files in their native codec and resolution.
The other good news about standard file formats is that there is an incredibly large number of people that are using these formats, as well as groups of developers who maintain codec libraries, which contain the source code for a large number of standard video codecs. The maintenance of such codec libraries, such as libavcodec maintained by ffmpeg (http://www.ffmpeg.org/), helps to ensure that software developers will continue to be able to create methods to playback video files in these formats, even as new changes in technology come along.
While standard preservation files will not often need to be migrated, the same doesn’t generally hold true for proxy files. As bandwidth increases, and the world’s largest video creation and distribution companies (i.e., Apple, Adobe, and Google) continue to battle for market dominance, the format of the day shifts with the trends. As an example, the Prelinger collection at the Internet Archive (http://www.archive.org/details/prelinger) still contains the original preservation masters created 10 years ago, but the proxy files for this collection have continually changed over the years, from RealMedia, to Quicktime, Flash, and more recently H.264. The Web video world is currently experiencing a battle over which codec will become the standard for HTML 5 Web delivery, with Apple and Adobe behind H.264, and Google behind webM.
The Library of Congress’s File Format Sustainability (http://www.digitalpreservation.gov/formats/) criteria provides very helpful factors for evaluating a given format.
PrestoPRIME threats to mass storage digest: (http://www.prestocentre.eu/sites/www.prestocentre.eu/files/digest_threats_V1.04.pdf)
Library of Congress, “Sustainability of Digital Formats”: (http://www.digitalpreservation.gov/formats/)
Chris Lacinak, “A Primer on Codecs for Moving Image and Sound Archives”: (http://www.avpreserve.com/wp-content/uploads/2010/04/AVPS_Codec_Primer.pdf)
Video Preservation Website: (http://videopreservation.conservation-us.org/)
MPEG Streamclip:( http://www.squared5.com/)
AJA Data Rate Calculator: (http://www.aja.com/products/software/)
Citation for Article
Van Malssen, K. (2012). Digital video preservation and oral history. In D. Boyd, S. Cohen, B. Rakerd, & D. Rehberger (Eds.), Oral history in the digital age. Institute of Library and Museum Services. Retrieved from http://ohda.matrix.msu.edu/2012/06/digital-video-preservation-and-oral-history/.
Van Malssen, Kara. “Digital Video Preservation and Oral History,” in Oral History in the Digital Age, edited by Doug Boyd, Steve Cohen, Brad Rakerd, and Dean Rehberger. Washington, D.C.: Institute of Museum and Library Services, 2012, http://ohda.matrix.msu.edu/2012/06/digital-video-preservation-and-oral-history/
This is a production of the Oral History in the Digital Age Project (http://ohda.matrix.msu.edu) sponsored by the Institute of Museum and Library Services (IMLS). Please consult http://ohda.matrix.msu.edu/about/rights/ for information on rights, licensing, and citation.