Mapping Approaches To Oral History Content
Management In The Digital Age

By Michael Frisch, with Douglas Lambert

Almost every traditional assumption about the collecting, curation, and uses of oral history is collapsing in the digital age. This is particularly true for a content management:  providing meaningful access to specific content within and across long oral history interview documents in a collection.   This dimension has been diminutive to the point of invisibility in conventional practice.  But it is coming to assume a crucial role in an age of digital practice.  We can have instant digital access to just about anything in even large collections of digital interview recordings and documents. With digital-age tools for production and dissemination we can do just about anything imaginable with what we find, for an unlimited range of users and purposes. But realizing the potential of these new capacities depends on knowing what is there, and how to find it–  fluidly, flexibly, responsively, and on-demand.   Hence content management is becoming a central determinant—whether propelling or limiting– of oral history practice in the digital age.

For decades, most oral histories have been described, if at all, at the collection and interview level;   text transcriptions, or sometimes informal logs or summaries, have been relied on for the only real avenue of access to specific content.  The original recordings have rarely been accessible at all, much less meaningfully.   In many collections recordings were not even retained once transcriptions were made and became, in effect, the primary source of record.

Digital tools and approaches are changing all of this, dramatically.   And they are providing resources for engaging what has long been a pervasive, intractable dilemma facing users of oral history collections:   the difficulty of identifying and reaching material of interest more directly and easily than by moving through the entire text or recording.

Whether focused on the actual digitized recording as the primary source, or on media-linked transcription or other annotations as the mode for access and exploration, digital oral history now provides tools for locating points of interest, or specific terms and references, or even framed passages within interviews, and for immediate access to them.   Interviews can be searched and explored for content and meaning;  where appropriate, content and even media passages can then be easily extracted for use.  In a digital age, these tools are widely distributable and easily adapted, meaning that the process of content management itself becomes a sharable, collaborative activity rather than a function of a narrowly based curatorial function.

How is all this being done, and where is the practice headed?   Imagine a three-dimensional territory of diverse, emerging digital content-management practice. This  field can be mapped and organized around three axes:

  • Cataloging  v. indexing
  • Transcription v. Recordings
  • Content mapping  v. meaning mapping

This brief essay provides a sketch map of this territory, so to speak, a map hopefully useful for organizing rapidly evolving digital modes of oral history content management.  The map can expand awareness of the widening range of content-management approaches and options; it can also be used more actively  to locate  potential tools and particular choices in this landscape as they are encountered.  In this way, for any oral history context, the strengths, the limitations, and implications of a particular approach to rich content management, may become clearer.

Cataloging v. Indexing

The first axis involves a seemingly simple dimension revolving around cataloging at one end and indexing at the other.  In traditional libraries, the catalog helps you find a needed or relevant book, and the index locates content of interest within that book once you have found it. Modern information tools narrow the distance between these very different functions, opening up an intriguing but sometimes confusing middle ground. No longer limited, as were old card catalogs, to the author and title plus one or two subject headings, digital catalogs reach more deeply into content, identifying sources through multiple subject heading tags and varied combinations among tags. That said, such descriptors still tend to be relatively general, and do not necessarily identify or connect to specific passages.

Indexes have always been different in this respect. For hundreds of years, indexing has offered flexible tools for identifying very precise content and dimensions of meaning or abstract theme as well, with no privileging or narrowing of engagement in the process. When the index points to page 312 of a book, readers have access to the full text surrounding the identified term or passage.  The index, in effect, is a nonlinear, hypertextual mode for navigating the text.  And in digital realms there is no barrier to extending the concept of indexing from one book to a shelf of related books, from one interview to entire collection, or across similar collections for that matter.   The ease of manipulation and navigation, and the analytic capacity these confer, is one reason why contemporary information tools have so dramatically advanced the power of fluid, relational approaches to information, as the same content can easily be explored from complementary and contrasting directions. In electronic form, such approaches become more and more powerful, as if the entire book or group of books were being re-indexed on demand, with its content displayed and organized through the lens of any combination of index terms.

For oral history, the significance and challenges of this approach are magnified considerably as access to large digitized interview collections expands dramatically, online and within institutional holdings.  The basic dilemma is that powerful library science and archival tools for describing, identifying, organizing, and mapping content within and across large collections turn out to be designed primarily for core reference units: they identify a book, article, artifact, object, document, or, in oral history, an interview. Efforts to concentrate cross-collection use around shared, consistent metadata standards, such as Dublin Core, necessarily focus on such broader, object- or unit-level descriptors. These can be pushed to open access to particular themes or aspects or sections within, but the fundamental item of reference remains the unit, and it remains more closely bound by the assumptions and contours of unit cataloging than of content indexing as such.

This presents some real obstacles for oral histories, especially where the unit is large, the time demands of review and exploration are considerable, and the range of content , theme, or dimension of interest within the material is expansive.  Such obstacles are heightened when collections offer users access within and across very large collections, and especially when they offer access to the primary source—the audio or video recording—since unlike text transcripts that can be skimmed, recordings have got to be listened to and watched in real time. In practical terms, referencing that cannot lead users pretty directly to the sections or specific passages or points of interest within larger units is not likely to prove very satisfying or useful. How to provide this, within library or collection management database systems that are essentially unit-based, like CONTENTdm,  is at present the object of considerable attention and experimentation.  Some approaches drill down from the more general plain of collection-management tools; others are building up from the interior power of indexing and annotating tools.  However it is approached, finding ways to bring the intra-unit power of close indexing and the cross-unit and large-scale collection power of complex digital cataloging—this is very much a front-burner issue in the field as a whole.

Transcript v. Recording

A second main axis revolves around the roles of recordings and text, especially transcription, at the core of digital age oral history content management.

Perhaps the most profound consequence of digitization is the ability to work with media directly as the core primary source.    There being no inherent difference between digitized text, sound, and image, in a variety of systems digital recordings can now be mapped and organized and accessed as easily as text through indexing and cross-referencing.    Among many other implications, for oral history these tools bring within reach that content and meanings in interviews not easily captured in transcription.  Now we can see, hear, study, and select nuances of voice , gesture, performance, and expression that are not representable in transcription, and often not lexical at all.   Whether in audio or in video, this is what it means to say that the orality of oral history is moving excitingly back into primacy.   There is considerable excitement, and much development, centering on how direct, inquiry-driven access to media points, segments, and passages broadens the value and usefulness s of oral history interview collections.

But text transcriptions, even when not the “end,” remain very important and useful as a practical means for accessing the audio or video stream efficiently.  In many cases, of course, they are already available as resources once it is determined to bring the original audio or video into use more actively.   Even when not, the traditional preference for transcription is being reinforced by a variety of factors.

Perhaps most important is the ease and utility of instant transcript searches, not to mention the anticipation—even if always around the next corner– of speech recognition software able to produce adequate transcription for collections whose scale, or limited budget, or both, places them beyond the feasibility of conventional transcription.   Additionally, in a somewhat ironic development, the appeal of direct access to digitized interview recordings is increasing the appeal of transcription,  in that it is proving so easy to embed recording time-codes in transcription files, meaning that any identified point of interest in the text can lead, if not always instantly, to the corresponding point in the recording.

The temptations of scale are a final consideration making the cost and inherent limitations of transcript-based access seem variously acceptable, preferable, or even requisite in particular settings.   More and more archives are leveraging the unbounded capacity of cyberspace to post large, complex collections to the web so they can be instantly “accessible” by anyone, anywhere.  The larger these collections, the more difficult it is to imagine providing meaningful access beyond the usual listing of interview-level descriptors and, perhaps, themes. Thus transcription linked to the audio or video source, combined with powerful text searches, provides efficient tools for moving around in massive media collections.   In some settings, increasingly sophisticated word search tools move beyond the overly literal clumping of ‘hits’ and counter-productively inclusive “false positives”:  context and proximity controls, for example, can help distinguish an interview discussing ‘bomb’ and ‘airplane’ from one discussing ‘bomb’ and ‘Broadway.’

But the limits of this approach are clearly the inverse of its strengths.  Even accepting the practical value of searchable transcriptions,  the stubborn fact is that people in interviews do not say “And now I will tell a story about the social construction of gender,” or “about class consciousness.”  They just tell a story about their mother, or a strike—and in so doing they may not actually use the word “mother” or “strike.”    Which is why many have been seeking to transcend the limits inherent in referencing or searching only the words of an interview, not to mention the potentially even broader range of content and meaning that is expressive or affective or simply not at all lexical.

There are thus a range of approaches in play:  At one end of the spectrum is near-total reliance on text, with all searches based on the words in transcripts synchronized seamlessly to the audio or video stream. At the other end is a near-total dispensing with transcription as unnecessary and unhelpful, in contrast to searching or navigation based on various combinations of summary annotations, coding, and tags,  and not mediated through interview transcripts. Many approaches are coming to be located somewhere in between these poles, with various combinations of transcript-based and transcript-independent cross-referencing.

Content mapping  v. multidimensional meaning mapping

A third dimension of choice in oral history content management, directly related to but yet also distinctly independent of our discussion of cataloging  v. indexing  and transcription v. recordings,  involves the distinction between linear, funneled searching, on the one hand,  and a multi-dimensional, relational data-base approach to organizing and exploring complex organizing information, on the other.  The power of each of these is vastly enhanced in digital form, but in many respects this power ends up pulling them in different directions in terms of broad content-management approaches and capacities.

For most oral historians,  catalog-like subject headings and linear searching are more familiar and comfortable.   The revealing term “subject heading” is usually given a literal, descriptive, denotative meaning applied to content, and digital tools combined with multiple subject headings, often organized in category and sub-categrory heirarchies arranged, offer very powerful drilling and searching within this realm.   Indexers, on the other hand, have always felt free to reference all sorts of things in addition to content, such as abstract ideas, themes, nominal references, and very different categories of meaning and experience.   Referencing dimensions other than literal content lies at the core of the most traditional indexing;  it is anything but a new or exotic impulse one driven by media technology.

Social scientists are more comfortable than historians, archivists,  and librarians with the notion that significance in data is not necessarily or even usually explicit or nominal, the object of a straightforward search.   It is, rather, meaning identified in response to an inquiry, and then coded for sorting and analysis as such. Social scientists are also comfortable with the notion that there may be wholly different fields through which every unit of data in a study needs to be described and categorized.

In the digital age, at the same time as content searching is becoming more and more powerful over vast collections of data, including interviews, these multidimensional sensibilities  are also coming into range for oral historians in tools and approaches propelling a data-based sensibility for content management.  If interviews can be mapped in many possible fields or domains of reference, what are those domains? What are the independent dimensions of historical meaning and reference, for the project or set of stories?    How do we reference a World War II anecdote that is in one sense about a particular combat maneuver, in another sense about the uses of particular weapons, and in another sense about the relationship between officers and troops under battle stress?  How do we reference a story in an agricultural oral history collection that is in one sense about the decision to introduce milking equipment, in another sense about farm wives and gender roles, in another sense about the impact of World War II on farm families?

In many emerging digital modes, a multi-dimensional approach to indexing can develop distinct control-vocabulary taxonomies for all such dimensions, or facets, by which any content can be meaningfully mapped and explored.   With such indexing one views and sorts the entire collection through one or a combination of  “lenses”  that are anything but mutually exclusive:  each maps a different facet of the subject matter, and can thus be used to explore content through these distinct views, and to filter it through various combinations of lenses

In this dimension, then, current practices in digital content management are arrayed along  a spectrum that ranges from powerful content-driven searching at one end, to multi-dimensional, meaning- and analysis-driven exploration at the other.  The distinction has particular relevance to the promise of oral history, in that personal narratives are the ground informing broader reflection and analysis of everything from specific historical contexts to the dynamics of life course and personality to the workings of memory and narrativity.    And it is proving additionally significant because of its capacity to incorporate additional user-driven fields– tags and annotations and subjective responses that map a collection from a vernacular or community perspective,  providing a valuable counterpoint to formal content-management—and vice versa.

Taken together, our map of emergent approaches to working with audio-video materials involves three overlapping, interrelated, but conceptually and operationally distinct dimensions, or axes: 1) from cataloging to indexing, 2)  from text-transcript based audio or video access to direct or observational cross referencing of audio or video as such; and 3)  from one-dimensional descriptions of a unit of data to a multi-dimensional, multi-field approach to data mapping–from powerful linear, hierarchical content-referencing zoom-in, zoom-out frameworks,  to differently powerful multi-dimensional meaning or and qualitative analysis referencing.

At the current moment, most oral history collections remain closer to the first-mentioned end of each of these dimensions: they are closer to cataloging than indexing;  they are generally more reliant on transcript-driven searches than non-transcript or observational referencing;  they rely more on linear searches than on relational database approaches to organization and navigation, and they  are more comfortable with content-searching than to meaning-mapping.   These preferences are driven to a certain degree by scale, to an additional degree by the archival and library collection-management auspices of most of these projects, and to some extent as well by the state of current technology.In contrast, your humble cartographer believes that the most promising direction for content management, the direction which emergent methods have both the need and capacity and increasing momentum in developing,   will be towards the opposite pole of each of the axes I have described:  away from subject headings and towards more comprehensive indexing  of specific sections and passages within and across interviews;  towards more direct access to recording media s as the primary source, un-mediated by transcript word-searches; and   towards the mapping of meanings as well as content, towards multi-dimensional or multi-field cross-referencing in something closer to a relational data-base framework.Whatever the balance in any setting or development trajectory, however, what all of these approaches have in common, what defines the current and prospective development of the field in this regard, is that one way or the other, from large-scale archive to small community project to home and family collections, it is becoming more and more feasible to explore oral history content in highly directed, responsive ways—and to hear, see, browse, search, study, refine, select, export, and make use of audio and video extracts from oral histories directly.   In a world driven by expanding multi-media uses and applications for oral history,  developing such content-management capacities even further necessarily stands at the center of the field,  a critical dimension of curation that has the power to either limit or liberate the vast potential of both oral history collecting and dissemination.

Frisch, M., w/ Lambert, D. (2012). Mapping approaches to oral history content management in the digital age. In D. Boyd, S. Cohen, B. Rakerd, & D. Rehberger (Eds.), Oral history in the digital age. Institute of Library and Museum Services. Retrieved from http://ohda.matrix.msu.edu/2012/07/mapping/.


This is a production of the Oral History in the Digital Age Project (http://ohda.matrix.msu.edu) sponsored by the Institute of Museum and Library Services (IMLS). 

