“Oral History Core”: An Idea for a Metadata Scheme
by Nancy MacKay
In the twenty-first century, the need to organize, preserve, access, and share information is no longer a matter for discussion. It is a given. In no other field is the disconnect between the creators and the curators more strongly felt than in oral history. Interviews collected by family historians, community groups, scholars, universities, and government agencies over the past half century are currently scattered among academic institutions, libraries, and historical societies– as well as the file cabinets of individuals and small organizations. Many are neither catalogued nor inventoried, so no one really knows how many oral histories exist or what condition they are in. Without guidelines for inventorying and cataloging these materials, the situation will only get worse.
The best way gain a handle on the vast amount of information within oral histories is to develop standards for collecting and organizing this information that institutions of all kinds and sizes can easily adapt. This report presents an idea for such a solution: a metadata scheme for oral histories with the working title Oral History Core.
Metadata is generally defined as data about data. It is structured information that describes, explains, locates, or otherwise organizes information and makes it easier for both computers and humans to use. Though metadata has been around for a long time in the form of book indexes, auto parts manuals, etc., the volume of information generated in the computer age makes it an essential component for any information system. Creating metadata is similar to the process of traditional cataloging, and the terms are often used interchangeably.
By itself, metadata is not very useful in large information systems. Therefore, user communities– such as botanists, sound preservationists, geographers, and archivists– develop metadata schema which consist of rules and structure for organizing their data. Such a scheme has two functions: 1) it is a tool for a user community to organize relevant data within the community and 2) it structures that data so that selected parts can be shared outside the community in a more general information system.
The information that comprises a metadata record can come from a variety of sources. Some information is entered by humans; other information is transferred from computerized systems. For example, an oral historian can contribute descriptive information about the content and context of an interview; an archivist may enter administrative information about acquisitions and provenance; and technical information about the sound or video file containing the interview may be transferred directly from the recorder. This metadata coming from multiple sources is added to the “envelope” containing the digital file, making it possible for the digital resource to be searched, exported, batch updated, or otherwise manipulated at the field level.
Metadata can be loosely organized into categories based on the source and function of the information collected. The five categories suggested by Anne J. Gilliland (2008) would work well for an oral history metadata model: descriptive, administrative, preservation, technical, and rights/access. Though these categories help us to organize our work processes, the distinctions are arbitrary and shouldn’t be applied too rigidly.
- Descriptive. This category is most closely associated with traditional cataloging. It documents the physical and intellectual content of the resource – the name of the interviewer and narrator, the date and place of the interview– and content information in the form of keywords, summary, and controlled vocabulary.
- Administrative. This category includes acquisition and provenance information, number and location of copies, holding institution, and copy-level metadata.
- Preservation. This category documents the physical condition, preservation events, and data-refreshment events.
- Technical. This category includes all the physical and technical properties of the digital object, including size, format, compression, and date stamps for recording. It can also include digital provenance data, such as the recording device and the settings used, along with technical data relating to any editing or transfers along the way.
- Rights and access. This category documents ownership and rights to use, including copyright, use and fair use, permission criteria, and restrictions.
Challenges of Cataloging Oral Histories
Oral histories create big problems for catalogers any way you look at it. As a result, oral histories are catalogued inconsistently, superficially, and often not at all. Here are some of the reasons why oral histories present problems.
- Varying practices of repositories. Oral histories end up in a variety of repositories: libraries, archives, museums, historical societies, and now, digital repositories. Each type of repository uses cataloging systems designed specifically for their institutional goals. Museums and historical societies usually use stand-alone systems designed to catalog artifacts. Libraries usually share cataloging via a bibliographic utility, and digital repositories have yet another set of practices. In addition, different types of institutions practice cataloging differently. For example, libraries catalog by the item (each oral history has an entry), while archives traditionally catalog by the collection (a group of related oral histories cataloged as an entity).
- Difficulty in adapting to changes in oral history practice. The first step in cataloging consists of identifying the information unit to be described. Early oral histories coming to repositories were catalogued as print monographs because oral historians considered the transcript the primary document (information unit). Currently, oral historians consider the audio or video recording the primary document (information unit). The result is three common—and different—informational starting points: print, audio, and video. Since cataloging rules (and the skills of catalogers) are contingent upon the type of information being described, oral histories are often distributed to catalogers with expertise in the particular format that’s been deemed primary (print, audio, or video), but no connection to oral histories being catalogued as another format. As a result oral histories are scattered among books, audio, video, and electronic resources in both in library catalogues and on the shelves.
- Formats and media. Transcript, audiocassette, minidisc, compact disc, digital file. Audio, video, paper, hard drive. Analog, mp3, wav, digital video. Deciphering this technical information requires understanding beyond the range of the everyday cataloger. In addition, media storage devices deteriorate or the machines to play them become obsolete, rendering the recording not only uncatalogable, but also unusable. Each obstacle that a cataloger must overcome adds to the time, the expense, and the likelihood that the oral history will go uncatalogued.
- Incomplete or incorrect data. There has not been a consistent practice for the creators of oral histories to label media or to provide clear information to those who curate them. Poor or nonexistent labeling of media and incomplete documentation about the interview or the oral history project will result in incorrect or incomplete metadata. A data sheet with correct spellings and forms of proper names is necessary to create quality metadata, as the cataloger cannot be expected to listen to the interview. Historians are well aware that misspelled names and incorrect dates found in primary resources are repeated over and over in secondary sources, confounding the historical record.
There are many additional factors complicating the cataloging of oral histories: provenance, rights management, cultural context, relationships to another catalogued objects, version/copy/format, and preservation, all of which should be addressed in a metadata scheme.
Goals for a Metadata Scheme for Oral Histories
Most of the problems associated with cataloging oral histories can be rectified by a clear set of rules and standards, clear communication between the creators and the curators of oral histories, and a toolkit of templates to record information consistently.
Here are the goals for a successful metadata scheme for oral histories:
- Extensible. It must have the ability to describe a resource at every level of granularity.
- Scalable. It must work for oral history cataloging projects of any size or scope.
- Inclusive. It must include all circumstances of oral history, with provisions for language, restrictions, multiple representations, cultural context, rights management, obsolete formats and other elements as they come up.
- Flexible. It must be adaptable to any kind of content management system (database).
- Easy. It must be designed so that cataloging skills are not necessary for entering data, and that data can be entered easily from templates submitted by a non-specialist.
- Backward compatible. It must be able to accommodate for previously conducted oral histories on earlier recording formats.
- Relationships. It must acknowledge and accommodate for the multiple kinds of relationships that exist within oral histories – among institutions, oral history projects, interview, series, and formats.
- Instances. It should accommodate for multiple instances and copies – master/preservation/user copy, .wav/.mp3 copy.
- Provisions for crosswalks. It should be designed with the assumption that cataloging data will be shared among computer systems, and that existing cataloging, such as MARC records, can be easily mapped to the newer system.
- Thinking ahead. It must do a good faith job in predicting the future needs for oral histories in the digital age, especially the assumption that most oral histories will eventually migrate to digital repositories and be available in the Internet.
Metadata Scheme: Oral History Core
I suggest that those who work with oral histories collaborate to develop a metadata scheme. Teams of oral historians, information professionals, recording specialists, and curators could work separately and together to compile sample field definitions and guidelines for descriptive, administrative, preservation, technical, and rights/access data. To be successful, it would require significant coordination among constituent groups, and a system of field testing concurrent with development. One way to begin is to develop one area, such as descriptive metadata, and fill in the gaps based on lessons learned and user needs.
Long-term goals for a metadata standard should include endorsement from the Oral History Association (OHA), a cataloging committee through OHA with liaisons to the Society of American Archivists and American Library Association, a manual of rules and best practices, training tools, and outreach. An ultimate goal would be registration as a NISO standard.
PBCore as model.
PBCore is a “public broadcasting metadata dictionary project” designed for a user community similar to the oral history community. It is media-based, free, open-access, flexible, and grassroots. PBCore and the accompanying documentation are highly developed, and would serve as a good model for structuring Oral History Core. Perhaps PBCore developers could assist with Oral History Core.
Opinions from the Field
To test my ideas about the importance of a metadata scheme for oral histories, I sent a short questionnaire to oral historians, librarians, and curators who represent a broad variety of potential users and beneficiaries for a metadata scheme. These are the examples I chose:
- Oral history project within a university.
- Subject repository for oral histories collected from a variety of sources.
- Oral history program within a university.
- Oral history program within a public library.
- State historical society.
- Digital repository around a specific topic.
- Community oral history project planned and executed by volunteers.
Below is a summary of responses.
Oral history project within a university. The Nevada Test Site Oral History Project is an award-winning project based at the University of Nevada, Las Vegas (UNLV). The project is “dedicated to documenting, preserving and disseminating the remembered past of persons affiliated with and affected by the Nevada Test Site during the era of Cold War nuclear testing.” It consists of more than 150 interviews (335 hours), along with transcripts, related documents, and photographs. It is designed as a project, implying at least in theory, that it has a specific end date.
Since the project is conducted within a university, the oral histories have a built-in institutional home as well as the advantage of the library’s highly skilled staff. Currently the physical and digital manifestations are handled separately. The transcript is catalogued in MARC format for the library OPAC, WorldCat, and an internal database for the special collections library. Online versions are handled by the University of Nevada’s Digital Collections department who created a full-text searchable collection using Dublin Core on CONTENT dm. As would be expected, the most highly developed metadata is descriptive, though there are fields for rights management, digitization specifications, and some preservation metadata.
UNLV Digitization Projects Librarian Cory Lampert responded to my questionnaire. She mentioned hopes for the future to integrate workflows of the physical and digital manifestations through a streamlined metadata/cataloging process. There are plans to contribute this collection to the Mountain West Digital Library, a regional digital repository, and the library has just completed a metadata audit for this purpose. The special collections cataloger does not have trouble cataloging these oral histories, however, the managers of the digital manifestations have more questions – about metadata schema, controlled vocabulary, full-text access, and digital preservation. Most significant for this discussion Ms. Lampert offers a caution to the limitations of a metadata scheme: “It is difficult to overcome all these issues in a [metadata] schema because digital collections managers are often working within the constraints of what their local system can manage. “
Subject repository for oral histories collected from a variety of sources. The Veterans’ History Project was initiated in 2000 by an act of Congress to collect, preserve, and make accessible the personal accounts of American war veterans. Despite its name, it is an ongoing program under the jurisdiction of the American Folklife Center at the Library of Congress. In theory, this digital repository has access to the best resources in the world; on the other hand, it is restrained by the rules and protocols which govern any large institution. Important for this discussion is the fact that in addition to interviews,—diaries, letters, and official military documents are contributed by the general public in a variety of formats and without quality control.
Oral histories are catalogued at three levels for three different databases: collection level, intellectual item level, and physical item level. The physical-and item-level cataloging is primarily for inventory control. Catalog records live mostly in local databases, though there is an effort to build crosswalks for data migration and to minimize duplication of project cataloging that exists at the Library of Congress. The emphasis is on descriptive metadata, though some technical, preservation, and rights management is captured in the local digital management system, MAVIS.
Bertram Lyons, the digital assets manager at the American Folklife Center, responded to my questionnaire. His comments about the worth of a metadata scheme for oral histories are significant, especially in light of the fact that the Veterans History Project could be a model for a subject-based digital collection. He suggested a series of templates (more flexible than a metadata scheme) that could be used by archives, museums, or libraries to accommodate to the cataloging needs of each. He makes a good point in “the definition of oral history is loose enough that defining a scheme that fits all scenarios will be difficult.” He emphasizes that the most important first step is to define the concept of oral history before imposing a metadata scheme upon it. For example, “An oral history may consist of multiple audio (and/or video) recordings, transcript, photograph, supplemental permissions documentation. How do you build a scheme with extensibility enough to encompass the potential variety of oral history collections?”
Oral history program within a university. For more than 40 years, Baylor University’s Institute for Oral History (BUIOH) has been documenting and preserving oral memories from Texas and beyond. This program is an example of a longstanding oral history program within a university which must meet the metadata needs of past (the analog world), present, and future oral history practice.
Though the Baylor University library system has a metadata department, Elinor Maze, senior editor at the institute, creates the metadata for the oral history collection. Because of the program’s age, cataloging goes back to a Filemaker database, which has subsequently been mapped to MARC with plans to upload to WorldCat. More recently a crosswalk has been created to Dublin Core for online access using CONTENTdm. Most of the cataloging data is descriptive, with some copyright and formatting data.
Elinor Maze responded to my questionnaire, providing the perspective of both librarian and oral historian. She emphasizes the importance of flexibility and scalability in a metadata scheme as well as the importance of planning for metadata in the initial stages of the project.
Oral history program within a public library. Since 1976, the Maria Rogers Oral History Program (MROHP) in Boulder, Colorado has been documenting Colorado history through oral history.The program is physically located within the Boulder Public Library System and receives IT and cataloging support from the public library. The program runs mostly through volunteer effort and adds 40 to 100 interviews each year, mostly video.
Program Director Susan Becker has a deep understanding of the importance of cataloging and has cultivated a successful relationship with the library cataloging department. One of the secrets to success is the use of a template which the interviewer fills out to accompany the video to that cataloging department. This template is filled out by the interviewer and insures that the metadata is entered exactly as the interviewer intended. Catalog records reside on the library OPAC, a regional consortium, and WorldCat. Oral histories are also available on the program’s Web site.
Cyns Nelson responded to my questionnaire for Susan Becker. She mentions that the oral history program has become important to the Boulder community and that tools for access and discovery would benefit the program and the users.
State historical society. Since 1948 the Minnesota Historical Society (MHS) has been the guardian of Minnesota’s history. In addition to the oral history collections, the MHS also collects photographs, manuscripts, and other primary documents related to Minnesota history. MHS catalog records reside in the Minnesota consortium catalog MnPALS and WorldCat. In addition, oral histories are catalogued in the society’s stand-alone collection management system, KeEMU, which is an entry point to Web site access. Data is currently encoded in MARC, EAD, DACS.
Sarah Quimby, Library Processing Manager at the society, responded to my questionnaire. Her response was guarded to my question as to whether a metadata scheme for oral histories is worthwhile: “It depends on the repository and how specialized the cataloging needs to be.”
Digital repository. Densho: the JapaneseAmerican Legacy Project, is an award-winning learning center with the mission to promote the understanding of the Japanese-American internment and its legacy. The collection consists of 400 visual histories (as of 2/2011) and more than 10,000 photos, clippings, and other primary documents. The organization was founded as a non-profit in 1996 by former Microsoft executive and third generation Japanese American Tom Ikeda.
Densho is a digital repository only and does not currently share cataloging [verify]
Questionnaire response forthcoming.
Community oral history project planned and executed by volunteers.
TheStories of Transformation Oral History Project consists of 30 interviews documenting Barack Obama’s campaign in Colorado’s Arapahoe County. Like most grassroots oral history projects, its seed was planted in a conversation among community workers– in this case, among campaign workers. They realized the historical significance of the political shift in their community from Republican to Democrat during the Obama campaign. Fortunately, a trained oral historian was part of the conversation, and the project took off. Oral history best practices were followed throughout the interview stage, but as an unaffiliated volunteer project, there were no options for cataloging or archiving. Digital copies of the interviews reside on home computers of two of the participants and one external hard drive.
The significance of this project is the absence of information to report. It represents thousands of oral history projects conducted at the grassroots level by volunteers who fulfill the immediate goal for the oral history project but they have few options for archiving, preservation, and access. Project consultant Cyns Nelson responded to my questionnaire.
Lessons Learned From the Field The most surprising responses to my questionnaire came from the question: “Do you think a metadata scheme for oral histories is a worthwhile endeavor? Why or why not?
Only one response was emphatically in favor, other respondents suggested caution in a number of areas. The strongest caution came from larger institutions: “Any metadata scheme would have to be approved by and be compatible by existing institutional data structures.” The responses were significant enough that it would be wise to conduct a more extensive survey for potential metadata users.
Creating a metadata scheme and accompanying cataloging guidelines for oral histories is one important step in the process to track, share, and preserve oral histories. The communication gap between the creators and the curators of oral histories is lengthening rather than shrinking as more communities undertake oral history projects and, without guidance, are forced to make up rules as they go along. Clear guidelines and best practices that are responsive to oral history as practiced in the twenty-first century will benefit not only our communities today, but researchers in future generations.
Appendix A. Principles for Metadata for Oral Histories
- Metadata must be considered an essential step in collecting and processing oral history. Quality metadata is just as important as the care, preservation, and dissemination of materials the metadata describes.
- The creation of metadata is integral to every step of the oral history process. Planning for metadata must begin at the collecting stage, and continue through recording, rights management, description, location, and use.
- Creation of metadata is an incremental process and should be a shared responsibility. Metadata will be added from a variety of sources, both human and computer, at various stages in the oral history’s life cycle. The responsibility for metadata is distributed among the stakeholders — the interviewer, recording professional, and curator.
- In order to be successful, Oral History Core must be made freely available, along with sufficient training materials and documentation, for oral history curators at every level of expertise.
- Oral History Core can be most successful if it is accepted as a global standard. The development team should seek the support and endorsement of the Oral History Association, the International Oral History Association, and other international bodies, and should invite a broad range of participation from these organizations and their constituencies.
- Oral History Core must be interoperable not only among the variety of institutions holding oral histories, but also within the cataloging systems of the repositories that hold them.
- Success in a metadata standard is dependent on the buy-in at all levels, including policy makers, funders, and administrators.
APPENDIX B. Definitions
Content Management System (CMS). A computerized system for organizing intellectual content. CONTENTdm and PastPerfect are common CMSs in this context.
Controlled vocabulary. A system for structuring language using one preferred term among variants in order to achieve consistency and avoid ambiguity. In this context, controlled vocabulary is also useful for normalizing forms of personal names.
Crosswalks. A method whereby data can be exchanged from one computer system to another by mapping database fields from one data structure (metadata scheme) to another.
Data mapping. To prepare for exchanging data from one computer system to another, each database field must be identified in the first database and marked for import into a differently named field in a second database.
Information unit. A library term which refers to exact unit being cataloged. In this context an information unit could be a single interview, a series of interviews of one person, or an oral history collection.
Metadata. Data about data.
Metadata dictionary. The composit of data elements, guidelines, and controlled vocabularies associated with a particular metadata scheme.
Metadata scheme. A system of database fields and definitions designed for the use of a specific user community.
Metadata standard. A metadata scheme that has come to be accepted as a universal standard. A standard can be registered with standards agencies such as NISO or ANSI.
Orphans. An item in an archive that cannot be made available to the public for any reason. Oral histories are most commonly orphaned because of a lack of proper consent agreement, the storage media is too fragile to play, or because the storage media is obsolete and playback equipment is not available.
APPENDIX C. Questionnaire
- Describe how cataloging/metadata is done now for your oral histories?
- Do you currently use a metadata standard such as MARC or Dublin Core? Have you developed a homegrown system? Describe your experience.
- Do your metadata records currently include preservation, technical, and/or rights management data in addition to descriptive metadata? Please describe.
- Where does cataloging currently reside? Local OPAC? Consortium? WorldCat? Website? Standalone database? Have you developed crosswalks to accommodate different destinations for records? Please describe details, especially issues that could be resolved with a metadata scheme for oral histories.
- Do you think a metadata scheme specifically for oral histories is a worthwhile endeavor? Why or why not?
- What have I forgotten to ask? Please share additional thoughts, challenges, and ideas for a metadata scheme.
- Would you be interested in participating actively in developing a metadata scheme?
APPENDIX D. Respondents
Cyns Nelson, email@example.com, Director, Colorado Voice Preserve, responding for “Stories of Transformation Oral History Project”
Cyns Nelson, firstname.lastname@example.org, responding on behalf of Susan Becker, Director, Maria Rogers Oral History Program, http://boulderlibrary.org/carnegie/collections/mrohp.html
APPENDIX E. Sample Template
|Narrator’s Name (100)|
|Interviewer’s Name (700)|
|Sponsoring Institution (710)|
|Subject – Personal Names (600)|
|Subject – Corporate Names (610)|
|Subject – Geographic Names (651)|
|Subject – Topics (650)|
|Genre (655)||Oral histories||LCSH|
Project Title: (740) : Evolution or Revolution: Reflecting on Public Librarianship, 1980-2010.
Project description (520) Paste or paraphrase from project design statement
Title (245) [First name, last name] oral history ]
Physical description (300) [Example sound file, .mp3, 64MB]
Date and place of interview (518)
Biographical summary (545) Compile from interview and Narrator Recommendation Form
Interview summary (520). Paste or paraphrase from Interview Information Form
Uncontrolled keywords (653).
Dublin Core Metadata Best Practices, Version 2.1.1. (Collaborative Digitization Project, 2006). http://www.bcr.org/dps/cdp/best/dublin-core-bp.pdf
Introduction to Metadata, Murtha Baca, ed.. 2nd ed. (Getty Research Institute, 2008). Available in print or online, http://www.getty.edu/research/conducting_research/standards/intrometadata/
Metadata Standards and Guidelines Relevant to Digital Audio (American Library Association, updated 2/2010), http://www.ala.org/ala/mgrps/divs/alcts/resources/preserv/audio_metadata.pdf
Metadata Standards Crosswalks (Getty Institute, 2009), http://www.getty.edu/research/conducting_research/standards/intrometadata/crosswalks.pdf
A Public Trust at Risk: the Heritage Health Index Report on the State of America’s Collections, (Heritage Preservation and IMLS, 2005). http://www.imls.gov/pdf/hhifull.pdf
Saint-Pierre, Margaret and LaPlant, William P. Issues in Crosswalking Metadata Standards (NISO, 1998), http://www.niso.org/publications/white_papers/crosswalk/
The State of Recorded Sound Preservation in the United States : a National Legacy at Risk in the Digital Age (CLIR Publication 148, 2010). Available in print or online, http://www.clir.org/pubs/abstract/pub148abst.html
Understanding metadata. (NISO, 2004). http://www.niso.org/publications/press/UnderstandingMetadata.pdf
AgileZen, http://agilezen.com. Project management software as content management by some oral history projects. Has a free version.
Archivists’ Toolkit, http://www.archiviststoolkit.org/. Free, open source content management system for archivists.
ContentDM, http://www.contentdm.org/. Collection management software offered by OCLC. This is an expensive option, but used by many universities, libraries, historical societies, and museums.
Digital preservation bibliography, http://digital-scholarship.org/dcpb/dcpb.htm
PBCore, http://pbcore.org/2.0/. A metadata scheme (aka metadata dictionary) for public broadcasting. This model has many similarities to the goals for oral history, and would serve as a great resource.
PastPerfect, http://www.museumsoftware.com/. Museum collection management software.
 Oral historians and archivists agree that the majority of interviews have not even made it to a repository. As further evidence, the Heritage Health Index (2005) reports more than 40% of sound recording collections already in repositories are not catalogued.
 Though I speak about digital repositories in this paper, the majority of oral histories on the Internet are not in digital repositories at all, but simply posted on websites by independent oral history projects. The variety of ways oral histories are presented online illustrate the lack of standards. Independent oral history projects who post oral histories on their websites would be one of the biggest beneficiaries of a metadata scheme for oral histories.
 See Appendix C
 Adapted for oral history from Murtha Baca’s Introduction to metadata . 2nd ed. (Getty Research Institute, 2008). p.71-72.
Citation for Article
MacKay, N. (2012). Oral history core: an idea for a metadata scheme. In D. Boyd, S. Cohen, B. Rakerd, & D. Rehberger (Eds.), Oral history in the digital age. Institute of Library and Museum Services. Retrieved from http://ohda.matrix.msu.edu/2012/06/oral-history-core/.
MacKay, Nancy. “Oral History Core: An Idea for a Metadata Scheme,” in Oral History in the Digital Age, edited by Doug Boyd, Steve Cohen, Brad Rakerd, and Dean Rehberger. Washington, D.C.: Institute of Museum and Library Services, 2012, http://ohda.matrix.msu.edu/2012/06/oral-history-core/
This is a production of the Oral History in the Digital Age Project (http://ohda.matrix.msu.edu) sponsored by the Institute of Museum and Library Services (IMLS). Please consult http://ohda.matrix.msu.edu/about/rights/ for information on rights, licensing, and citation.