Metadata at BUIOH: A Case Study
By Steven Sielaff
This case study is concerned with the use of descriptive metadata for online oral history collections created in a content management system, or CMS. It will highlight the Baylor University Institute for Oral History (BUIOH) and their history in uploading and describing their digital collection of transcripts and audio files in Baylor University’s CMS, CONTENTdm. Specific to this case study, the issue of managing a digital archive and conforming to metadata standards set university-wide for digital collections will be addressed.
- Brief Overview of BUIOH and its Collection
- Current Descriptive Metadata Policies at BUIOH
- Conforming to a Metadata Schema for Digital Collections at Baylor University
- Concluding Thoughts
Brief Overview of BUIOH and its Collection
Founded in 1970, the Baylor University Institute for Oral History (BUIOH) is a free-standing institute at Baylor University in Waco, Texas. As such, it does not administratively fall under the larger university library hierarchy as is the case with most other university oral history programs. However, BUIOH does participate in providing online access to its collection through the Baylor University Libraries Digital Collections archive, powered by CONTENTdm.
As of this writing, BUIOH houses over 5,800 interviews recorded on a variety of analog and digital media. The resulting collection is two-fold: the preserved original recording media and the derivative transcripts created by BUIOH since its inception for every active interview accessioned. For a more detailed description of BUIOH’s pre-digital age collection management policies please see the BUIOH case study written by my predecessor, Elinor Maze, in 2012: /2012/06/baylor-institute-for-oral-history/. [1]
Currently, around 3500 interview transcripts and 1000 complete interview audio files are available online in BUIOH’s digital collection via Baylor’s CMS, CONTENTdm. Thanks to an ongoing partnership with the Ray I. Riley Digitization Center (part of Baylor’s Electronic Library), the process of transferring analog recording materials to preservation-worthy digital WAV masters is well underway. The resulting digitized WAV audio files, plus all the born digital interview files are used to create an access MP3 directory from which students now quality check and upload to pair with their transcript counterparts. The current long-term goal for both the preservation and curation of BUIOH collection is that by the completion of our digitization project we plan to have every interview that has passed through our review process represented online in both audio and transcript form.
Current Descriptive Metadata Policies at BUIOH
Beginning with a large-scale memoir scanning project in 2009 [2] BUIOH has spent the last half decade uploading interview records in CONTENTdm. Within this CMS, we use what are called compound objects to house all materials related to a particular interview, interview series, or in some past cases, entire interview projects. The compound, or parent object, is typically used for materials such as books, so that each resulting chapter or page can exist as a child object. For our collection, we use it to house all materials related to a particular memoir, which will typically mean the individual interview transcripts and their source audio:
Both the compound object and each object contained therein have their own set of metadata fields. The metadata entered for the compound object describes the entire series, whereas the metadata for an audio file only describes that particular interview, and more specifically, the digital audio file itself. Therefore, each item type contained in the online archive must have a defined set of metadata descriptors to maintain uniformity across the collection.
Of course, as is very often the case in archives, changes in policy over time can create havoc when considering the consistency of entries. BUIOH currently utilizes 26 fields of metadata per object, though as many as 42 have been used in the past. Changes in how objects are structured (we no longer create project-based compound objects for example) also can result in inconsistencies. To help remedy this situation, in early 2014 templates were created for each metadata set from which both students and staff will work when inputting metadata. At the same time, certain fields that will always contain the same information (e.g. Sponsor – BUIOH) were changed with collection-wide administrator commands. This still leaves a good amount of legacy metadata to update, but thankfully a majority of this work can be achieved concurrently by our student workers during our ongoing audio upload project.
As for the fields used to describe our collection, it is probably best to summarize by sharing one of the aforementioned template documents. Here is the template for our compound objects:
ContentDM Compound Object Metadata Values | |
Interviewee | Last Name, First Name; repeat as needed |
Interviewer | Last Name, First Name; repeat as needed |
Title | Oral Memoirs of Full Name |
Interview Details | Interviewed by Full Name on Date, in City, State. (spell out everything) |
Number of Interviews | # |
Interview Date(s) | Year(s) |
Resource Type | Interview |
Format | PDF; Audio |
Genre | Oral Histories |
Language | English |
Description, transcript | In process/# of pages; index (if extant) |
Description, sound recordings | analog audio tape/analog video tape/digital audio file; audio length in x.xx hr. (repeat as needed) |
Collection | Collection Name |
Project | Project Name |
Summary | “First Line” of abstract text |
Subject–Library of Congress | Enter as able (the more geographically specific the better) |
Sponsor | Baylor University Institute for Oral History |
Sponsor Location | Waco, TX |
Rights | https://www.baylor.edu/lib/digitization/digitalrights |
Deposit Date | Year |
Genre Source | local |
Index | Y (if present, otherwise blank) |
ID | Parentname (sielask) |
A majority of these fields use data collected at the time of the interview from one of our required accession forms. Others are more technical in nature, sometimes using controlled Dublin Core mapping vocabulary to provide functionality when records are harvested (e.g. Resource Type: Interview). Our internal interview process tracking database (created in FileMaker Pro) contains all the information needed to enter a majority of the data. The separation of these two databases and the resulting duplication of certain aspects of data entry are lamentable side-effects of decades of slow progress towards pure-digital data management[3]. For us, the system works thanks in large part to certain database redesigns and streamlined processes. Of course, dreams of future large-scale data transfer between systems still persist, so for others, especially those oral history repositories still in their infancy, I highly recommend investigating all-in-one CMS options, especially those based on open-source platforms such as Omeka and Drupal.
While most of the discovery of our collection takes place via Google Scholar or general Google searches, it is nice to have our CONTENTdm memoirs also listed among the thousands of volumes on WorldCat. In fact, as of this writing, the oral history collection is the only digital collection at Baylor that is harvested for WorldCat upload. Additionally, each metadata field is searchable in the local CONTENTdm advanced search interface, allowing greater discoverability by those generally browsing of collection, as well as enabling the creation of “canned” search links. Such links on BUIOH collection splash page at present include a full project listing and a searchable listing of all online records with audio recordings currently available.
Conforming to a Metadata Schema for Digital Collections at Baylor University
As mentioned previously, the oral history collection component of Baylor’s CONTENTdm has existed for over five years now and has very much been driven structurally by BUIOH, with the Baylor Electronic Library providing technical guidance. However, beginning in late 2014, the leadership of the Electronic Library decided to pursue a unifying metadata schema for all its digital collections. Heretofore collections from across the campus were added to CONTENTdm system on a project-by-project basis, sometimes running exactly congruent with the acquisition of new equipment for the Riley Digitization Center. With an impressive host of equipment and a growing number of collections, the library now seeks a bit more structure among its digital partners, particularly concerning metadata entry.
The resulting in-progress 2015 document is a metadata schema for seven required fields, each with Dublin Core mapping specific to the information type. During meetings with the Baylor University metadata librarian, we perused the extant BUIOH metadata fields and evaluated how they fit in the new system. The following chart is a representation of said investigation:
Schema Required Field | DC Mapping | BUIOH Equivalent | Current DC Map |
Title | dc.title | Title | dc.title |
Collection Title | dc.relation.isPartof | Collection? | dc.publisher |
Institution (Custodian) | dc.source | Sponsor? | dc.publisher |
Identifier | dc.identifier | ID | dc.identifier |
Language | dc.language | Language | dc.language |
Rights | dc.rights | Rights | dc.description |
Resource Type | dc.type | Resource Type | dc.type |
The highlighted sections represented questions or discrepancies when it came to matching old and new fields. As you can see, there was not too much to alter, and overall the adaptation process was fairly painless – I made the required changes within a few minutes from the CONTENTdm administration panel. I will say however this was mainly due to the fact that BUIOH has always been fairly meticulous when it comes to both our metadata field structure and entry systems. With these changes now in place, our collection can now continue to be autonomously administered by BUIOH while sharing a correlative metadata base with the rest of the Baylor Digital Collections family.
Concluding Thoughts
The recent work of integrating BUIOH’s metadata fields into the greater digital collection has parallels for me in considering the ultimate goals of OHA’s Metadata Task Force. We are all looking for the best method to not only describe our own collections, but have them “play nice” in a greater online oral history environment. Obviously at BUIOH we have many fields we plan on keeping past the Baylor Digital Library’s required minimum, fields that are very important in describing our unique collection. Similarly, I feel success for the Task Force will not reside in a simple “take it or leave it” schema for all to adopt, but the formation of certain baseline requirements to ensure maximum discoverability with minimal effort. See related essays by Task Force members here.
[1] Mazé, Elinor. “Case Study: Baylor Institute for Oral History,” in Oral History in the Digital Age, edited by Doug Boyd, Steve Cohen, Brad Rakerd, and Dean Rehberger. Washington, D.C.: Institute of Museum and Library Services, 2012, /2012/06/baylor-institute-for-oral-history/
[2] Ibid.
[3] Ibid.
Recent Comments