File Naming in the Digital Age

by Dean Rehberger and Brendan Coates

Librarians and Archivists know well the importance of consistent file naming.  When dealing with thousands (if not millions) of digital objects, having names that are both machine and human readable can keep a world close to chaos, stable and usable. However, many of us feverishly working on oral history/narrative projects often leave behind us directories full a helter-skelter file names.  At the time of production, we don’t think much about it.  Surely we will remember what file names “interview3” and “work”  means.  But a short time down the road (particularly as we get older) the confusion of files makes it difficult to remember what was what, and a good deal of time is wasted trying to find the right file and to determine what we want to keep and what to discard.

A few minutes of forethought and a little planning can make your files more usable for years to come, and can also make it easier to deposit in archives and libraries, as well as more useful for developing museum exhibits.  The following recommendations for file naming consistency works for digital audio files and extends Kara Van Malssen’s important discussion of file naming in “Digital Video Preservation and Oral History”  to mezzanine, edited, and dissemination video files (but not preservation files).

Consistent naming structures are crucial to file identification. These can take a variety of forms depending on the system, its operators, and the content type. Reviewing the guidelines published by Michigan State University[1] and Indiana University at Bloomington[2], a mix of human-readable and machine specific IDs are incorporated in this recommendation. The basic name structure will consist of the interviewer, the interviewee, the status, the part, and the date; all lower case and separated by underscores. The directory that houses the files should retain the project name.

Oral History: interviewer_interviewee_[status]_[part# ]_[date].format

Oral Narrative: fieldworker_subject_[status]_[part# ]_[date].format

Example: boyd_johnson_pres_01_20120801.wav

Example: macdowell_benberry_mez_02_20120801.h264

The interviewer and interviewee are straightforward names and longer names can be shortened to 8 letters.  For folklorists doing fieldwork, interviewer and interviewee can be substituted by “fieldworker” and “subject” (again longer names and subjects can be shortened to 8 letters).  Status can be denoted as follows by 3 to 4 letter designations:

  • raw  — is the raw capture format.  As Van Malssen notes, the raw file names should not be changed for video format.
  • pres – is the preservation copy of the file.  Again as Van Malssen notes, the raw file names should not be changed for video format.
  • mez – denotes the mezzanine copy (often known as working copy or edit master).
  • ed01 —  denotes the edited version of the file.  Contains letters and numbers to allow for more than one edited copy.
  • dis —  denotes the dissemination copy of the file (often for the web)

Part# designates parts of an interview if it is done in more than one part or session and thus has a separate digital file.  The part is a 2 number designation [01] [02] [03] allowing up to 99 parts of one interview or subject.  The date is self-evident and designates the date of collection or recording in the format of  yyyymmdd.

This is one suggested file name structure.  Many are possible. You can develop a file naming structure that works best for you but the key is consistency to keep files both machine and human readable.   Also it is best to use the following guidelines:

  • Keep file names short, no more than 25 characters;
  • Avoid weird symbols that computers use for other things:  “/ \ : * ? ” < > [ ] & $ , best to use letter and numbers and underscores only (really are not letters and numbers more than enough?);
  • The first character of the filename should be an ASCII letter (‘a’ through ‘z’);
  • Computers abhor blank spaces.  Avoid spaces and use underscores (oral_history) or camel case (OralHistory) initial capital letters of words with no spaces;
  • Use all lower case (except when adopting camel case).
  • Keep directory names under 20 characters to denote project – above rules apply (except the first).

A few minutes of file name planning can save lots of time and make it easier for those who follow you to use your work. It is wonderful to be able to look down a list of files and know exactly what it is and when it was done (without having to open them).  And consistency allows programmers to do amazing things.

In addition to the file naming resources at Michigan State University (http://www.lib.msu.edu/about/diginfo/collect.jsp) and Indiana University (https://wiki.dlib.indiana.edu/display/INF/Filename+Requirements+for+Digital+Objects) check out the file naming conventions at the University of Wisconsin (http://researchdata.wisc.edu/manage-your-data/file-naming-and-versioning/).

[2] See Sound Directions Best Practices 13-21 and IU File Name Guide: https://wiki.dlib.indiana.edu/display/INF/Filename+Requirements+for+Digital+Objects

