OHMS: Enhancing Access to Oral History for Free
by Doug Boyd
Since finding a home in digitized, networked archives, oral history has grown as a resource for historical and cultural documentation by both academic as well as community scholars. Yet it still remains underutilized. Why? Because oral history can be a cumbersome resource to use, even in an online environment. We aim to fix that. This article is about the most recent effort at the Louis B Nunn Center to make using oral histories more efficient and fluid. It’s called OHMS. OHMS, which was born in 2008, improves the ways oral history collections are used online and enhances the oral history user experience. It inexpensively and efficiently increases access to oral histories by locating precise segments of online audio or video that match a search term entered by a user.
Since the original launch of the OHMS system the Nunn Center made over 700 interviews accessible utilizing this system. Usability improved. However, a review revealed two major issues. First, OHMS was originally designed to work with transcribed oral history and very few can afford to transcribe on a mass scale. Second OHMS was originally designed to work only for the Kentucky Digital Library. To broaden the potential pool of online collections that might use OHMS the Nunn Center and UK LIbraries has launched into a new phase of OHMS development. OHMS now includes an indexing component that offers an option for distributing oral history for a fraction of the cost that comes with transcribing. Most importantly, with the assistance of IMLS (The Institute of Museum and Library Services), we are preparing OHMS to be open source, free distribution. As part of this initiative, OHMS will developed as a plug-in, working seamlessly with other content management systems including Omeka, Kora, CONTENTdm and Drupal. To be clear, OHMS was initially created in an archival context to serve an archival access imperative, yet its potential possibilities extend far beyond the archives. This case study reflects on the history of OHMS as it enters into a new phase designed for public consumption.
BACKGROUND and DESIGN
I have written in more detail on the creation of OHMS in the chapter “Achieving the Promise of Oral History in a Digital Age” in Donald A. Ritchie’s The Oxford Handbook of Oral History. So I won’t dwell too much here on the motivation behind the OHMS system. In short, while digital technologies have transformed the way researchers and users discover and engage with archival materials, the digital world is in its infancy. Software designed to offer access to archived materials is not sensitive to specific challenges posed by oral histories.
Digital archival platforms (in the United States) are typically designed and optimized for providing access to digital photo and manuscript collections. They tend to treat the different components of the oral history information package (audio, video, transcript, index or metadata) as separate entities. Users can access audio, access a transcript, and they can even search the text while listening or watching the interview. However, the different components are, rarely, joined into one integrated experience that can aid access and use. The first version of OHMS built a bridge between transcripts and audio to speed up access to audio and video. When an oral history collection is transcribed, the searchable text dramatically increases efficiency for users of oral history collections. However, most online platforms still required the user, once they locate the desired information, to manually find the corresponding moment in the audio or video. But that is really a job for computers. So we designed OHMS to connect the user from a location in text to a moment in the recording; it proved quite successful.
This original version of OHMS was an efficient search and retrieval system designed for the Kentucky Digital Library (http://kdl.kyvl.org). It synchronized transcribed text with time code in the audio/video (on the backend), as well providing a user map/viewer that connected search results of a transcript to the corresponding moments in the audio or video. From its inception, OHMS contained two parts 1) the Synchronizer and 2) the Viewer. The OHMS Synchronizer was a web-based space where metadata could be prepared for an interview and a transcript could be synchronized. The OHMS Viewer brought the digital interview and the synchronized transcript together online in an integrated space. Figure 1 presents a look at the OHMS viewer and a glimpse of the user experience. Searching Ian Abney’s interview in Figure 1 for the word “Fallujah” presented contextual search results in the right column.
Figure 1: Original version of the OHMS Viewer
Clicking on a search result took the user to the appropriate location in the transcript. On the left side of the transcript time-code markers appear at 1-minute intervals. Users could click on the corresponding time code marker and see/listen to the corresponding moment in the interview. The design empowered users to effectively navigate an oral history interview.
The OHMS Synchronizer was designed to quickly locate the 1-minute intervals in an interview and drop time code markers into the transcript. Today, time code can be included in a transcript at the time of creation (See OHDA Case Study: Transcripts, Time-Coding, and You by Michael Sesling). We designed the original OHMS transcript synchronizer to efficiently encode previously-created transcripts with time code. The OHMS transcript synchronizer automatically took the “time-coder” (or as we call them, the TC) to the :50 second part of each minute giving the TC ten seconds to locate themselves in the text. When the clock hits the minute a bell rings and, with a click, the TC has embedded the appropriate time code into the text. Without pause, the system skips to the next interval where the process repeats. Figure 2 presents a glimpse of the Transcript Synchronizing process utilizing Robert Penn Warren’s 1964 interview with Martin Luther King.
Figure 2: OHMS Transcript Synchronizer
OHMS inexpensively and efficiently encodes transcripts of interviews, and then connects the transcripts to the corresponding moments in the audio or video interview. Users reap the benefits. One hour of interview (transcribed) can be synchronized and submitted in a matter of minutes and an index can be generated while listening to or viewing the interview. In addition to using OHMS to embed time codes into the transcript, we constructed a viewer so users could see the time correlation between the transcript or index and the audio/video, navigate the interview, and provided an integrated final product where the multiple components of an oral history interview information package all worked together. I was proud of OHMS and our innovative approach to presenting interviews online. However, as I indicated in the introduction, OHMS was only being utilized by the Nunn Center and only worked with transcripts.
THE BEAUTY AND CHALLENGE OF TRANSCRIPTS
Many oral history repositories transcribe every interview that they collect, however, this creates a limitation in the quantity of interviews that a single repository can accession as transcription is expensive. (See Linda Shopes’ OHDA essay Transcribing Oral History in the Digital Age). At present, the Nunn Center has over 8,500 interviews in its collection sometimes accessioning up to 700 interviews in a single year. I have always known that, even with a modest endowment, we could never afford to transcribe all of our interviews unless we dramatically limited our intake of interviews each year. Additionally, many of our interviews that were transcribed in the past never had the final quality control audit conducted. As a result, many of the transcripts in our collection were “first draft” transcripts. Although we will provide a “first draft” transcript (with disclaimers) as part of our reference process, we committed to not hosting “first draft” transcripts online.
Generally, the cost of transcribing and auditing the resultant transcript averages between $180 – $200/hour of oral history interview. There are less expensive vendors for transcribing and the Nunn Center has experimented with utilizing students as transcribers, but it seems that when factoring in the quality control audit, the final costs tend to remain $180-200 per interview hour. The original version of OHMS required a transcript and Nunn Center policy required that, in order to be a candidate for the OHMS system, the transcript be a fully audited, final version of the transcript. As a result, only projects that had major funding were accessible in our own innovative system. There are thousands of interviews in our collection that I would want to use the OHMS system to make publicly accessible. The likelihood of raising the millions of dollars to transcribe and audit those collections, however, remains highly unlikely.
A potential alternative to transcription is automatic speech recognition. However it not far enough along, especially for a large-scale collections of often poorly recorded interviews containing multiple dialects (See the OHDA essay: Can Automatic Speech Recognition Replace Manual Transcription? by Doug Oard). And even using a perfect transcript has limitations as a way to support text searches that offer access to points in the audio. In an oral history interview, a narrator could describe living under segregation for several hours without ever using the word “segregation.” Future researchers interested in the topic of segregation, logically, would search “segregation” in the hopes of discovering content useful for their intended research. Keyword searching of a verbatim transcript fails in mapping natural language conversation to descriptive and meaningful concepts.
So a balancing act between resources and access via transcript emerges. Having observed the oral history reference process firsthand, it is clear to me that researchers using interviews overwhelmingly preferred the transcript. I have always worked with relatively large oral history archives that tended to privilege the transcript as an access point and, therefore, expended major time and both financial and human resources over the years on the creation of those transcripts. Yet privilege has its limits; scale and cost are insurmountable factors and tend to push the recordings into the background. Michael Frisch once wrote, “Scale tends to drive choices in tools and approach as well.” Like many others, I have been influenced by Michael Frisch and his advocacy of indexing in the process of creating descriptive access points in an oral history interview or collection. Frisch’s writings and numerous presentations on the limitations of transcripts and on the benefits of indexing were always exciting and energizing to me. I must confess that having worked with major transcribing budgets in the past, accompanied by the underlying perception that the transcript was the access ideal, I always just focused the Nunn Centers energies on transcription. But following the recent economic collapse and experiencing both underwater and frozen endowments, it was clear to me that we could no longer afford to transcribe and audit transcripts on such a mass scale. As an administrator, the notion of cost and time-savings created by indexing oral history interviews was extremely attractive. As we proceeded into a second phase of OHMS development, I chose to expand capabilities to include an indexing feature of the system.
INDEXING: QUICK AND AFFORDABLE OPTIONS FOR ENHANCED ACCESS
If you were to go to the grocery store and ask the manager, “Where can I find the Cheerios?” the manager would, most likely, smile and say “aisle 10.” At that point, the information seeker, first identifying the location of 10, then, proceeding down aisle 10 you for the specific location of the Cheerios. In many ways, this is how I view the role of indexing an oral history interview.
Having curated numerous analog oral history collections in the past (mostly on cassette), I have always valued the analog “tape-logging” approach to creating descriptive metadata. Diligent interviewers listened back to their audio cassettes, and identified/described major subject changes in an interview using a variety of techniques, which included partial transcription, keywords and narrative description of the content, then marking the location of this segment by the cassette counter number. The “tape-log” or index gave the future researcher a sense for the content being discussed as well as when it was discussed in the interview. The breakdown in the analog system always fell on the time markers. Analog counter markers were never true timecode markers and proved inaccurate and vague when attempting to connect the user to the specific point in the recording referenced in the index. The expectation was, always, that the user would manually navigate to the corresponding moment in the analog recording. The digital index has the advantage of true time code, however, very few digital archival systems were developed to automate the linking of the time code representing the segment created to the corresponding moment. This seemed to be the logical next step for OHMS.
The OHMS indexing system works very simply and efficiently. The indexer logs into OHMS (presuming the digital interview is uploaded or imported into OHMS) and commences. Choosing the interview that they want to index, they are taken to a new page containing a player and a “tag now” button. While listening to the interview, the indexer chooses moments that they want to describe. Pressing the “tag now” the indexing dialogue opens up containing the following fields:
- Time Stamp (Auto filled)
- Partial Transcript
- GPS coordinates
Of course, not being clairvoyant, the indexer will not know something should be tagged until they have already heard the content. To compensate for the fact that indexing on the fly will always be just a few moments too late, the “tag now” button automatically drops the audio player back in time a few seconds. The indexer can then drop back or skip forward at 15 second intervals to place the time stamp in the accurate location for the segment’s beginning. OHMS allows you to create thesauri that can be uploaded and auto fill the title, subject and keywords fields to encourage standardization and consistency. Once the indexer describes the segment, they hit the save button, closing the segment window and moves on to the next segment. The following video example of indexing (Figure 3) features an interview with Buffalo Trace Distillery’s Master Distiller Emeritus Elmer T. Lee discussing what, in his opinion, makes good tasting Bourbon.
Figure 3: OHMS indexing module (back end)
Figure 4: OHMS Index Viewer (User Side)
Indexing oral history interviews makes sense for a variety of reasons. As I previously mentioned, the Nunn Center, although still committed to targeted transcription, can no longer afford to transcribe on a mass scale, yet I am still a major proponent of using the OHMS system to enhance online access to interviews. Indexing an hour long interview using the OHMS system, depending on the level of specificity in the indexing, can range anywhere from 2-3 hours. At this point, the interview is ready to go online. The cost for the Nunn Center to utilize graduate students to index this hour-long interview is under $30. This same hour-long interview would cost $200 to transcribe and audit. To an administrator experiencing an economic downturn, this is an attractive option.
It is not just about the money, however. Access beckons as well. I see many un-transcribed interviews in important collections sitting on shelves (often virtual), which are not candidates to put online using the original OHMS system. For example, we have over 400 interviews with World War II veterans in our collection, most of which were awaiting funding for transcribing and auditing so that we could put them online. Indexing in OHMS allows us to create a workflow that puts these interviews online much more quickly and efficiently for a very low cost. As I indicated when discussing limitations of the transcript, an index is able to create a much different representation of the content embedded in an oral history interview possessing several advantages over the verbatim transcript. (See the OHDA essay: Meaningful access to audio and video passages: A two-tiered approach for annotation, navigation, and cross-referencing within and across oral history interviews, by Doug Lambert and Michael Frisch). The newly developed indexing module of OHMS creates a searchable online index containing a variety of descriptive fields that also connect to the corresponding moment in the audio or video interview. The interview index can be created for a fraction of the cost of verbatim transcription and can be done much more quickly.
The OHMS viewer is not restricted to presenting either a transcript or an index. It accommodates both a transcript and an index, allowing the user to toggle between and search both resources if available. In fact, I see this as the best of both worlds. When an interview being presented in the OHMS viewer contains both a transcript and an index, a user will first encounter an index enabling them to browse quickly or search the contents of an interview. When keyword searchability becomes necessary, the user switches to the transcript and begins a more specific and targeted approach to information seeking.
OPEN SOURCE AND FREE
OHMS was rewritten using PHP, a ubiquitous programming language, and was optimized for implementation by others. In January 2012, the Nunn Center was awarded an IMLS National Leadership Grant to prepare OHMS for open source distribution. Open source does not always mean free or simple. Often, “open source” tools require a massive amount of IT and programming support in order to implement. The goal of this national leadership grant is to take the open source tool we have already created and build in compatibility with some common archivally oriented content management systems used today. By the end of the NLG in 2013, OHMS will be compatible as a plug-in with CONTENTdm, KORA, Omeka, and Drupal. In addition to building in the compatibility with the OHMS viewer, our team intends to create educational modules not only for using OHMS but also for implementing OHMS on the server side. The hope is that the smaller historical societies or similar organizations with very limited budgets and almost no IT support can take full advantage of the OHMS system for presenting oral history collections online.
Open source tools must be designed with sustainability in mind in order to be truly successful. As an open-source solution, the OHMS system, once integrated into other content management systems, will need a sustained user community. Community is central to the successful sustainability of any open-source software. We will actively engage this community to provide feedback and information that will allow for effective and ongoing development and for the future innovation of this freely available tool.
OHMS was originally created to enhance online archival access to oral history interviews, but the potential for oral history in the digital age is only beginning to emerge. OHMS is not the only annotation tool available, but it is free, simple to implement, simple to use, and designed specifically to accommodate oral history. OHMS is one tool in a growing toolbox serving as an intermediate step to improve access and efficiency. What lies ahead? We are awaiting the maturity of automatic speech recognition and artificial intelligence and its potential for automating access to large oral history collections. Likewise, the search is on for ways to help develop new kinds of metadata that not only capture the informed perspective of the archivist, but reach out to the the world wide audience that is beginning to use archived materials. As we celebrate OHMS, It is important that we recognize the limitations of our current conventions and practices while preparing our collections for future access. Transcribing, indexing, and descriptive metadata development, even when conducted by the authoritative archivist, are incredibly limited and subjective methods of representation. Increasingly, online communities will contribute and drive the creation of future access points. We must focus on improving representations of the rich content embedded in each oral history interview, and design archives that effectively connect information seekers to content. Digital access is not limited to a local audience, and our metadata standards need to better represent content for global users in multilingual contexts.
We are very proud of what we have accomplished in the creation of OHMS, now is the time to make it more available to other repositories, institutions or individuals. I believe that the mission of oral history is to record and make individual stories an accessible part of the historical record. However, when oral histories remain hidden on shelves in archives or prove too difficult or cumbersome for researchers to use online, we fall short of fulfilling this mission. We are experiencing an exciting level of awareness of oral history as both a methodology and an archival resource. My hope is that OHMS can be part of connecting these individual stories to the historical record. The mission of OHMS has transformed from enhancing access to the Nunn Center collections to empowering institutions, both large and small, to provide an effective, user-centered discovery interface for oral history on a larger scale for a fraction of the price.
Citation for Article
Boyd, D. A. (2012). OHMS: enhancing access to oral history for free. In D. Boyd, S. Cohen, B. Rakerd, & D. Rehberger (Eds.), Oral history in the digital age. Institute of Library and Museum Services. Retrieved from http://ohda.matrix.msu.edu/2012/06/ohms-2/.
Boyd, Douglas A. “OHMS: Enhancing Access to Oral History for Free,” in Oral History in the Digital Age, edited by Doug Boyd, Steve Cohen, Brad Rakerd, and Dean Rehberger. Washington, D.C.: Institute of Museum and Library Services, 2012, http://ohda.matrix.msu.edu/2012/06/ohms-2/