New Roots: An Oral History Metadata Case Study at the University of North Carolina at Chapel Hill
by Jaycie Vos
Since 2007, faculty, staff, and students at the University of North Carolina at Chapel Hill (UNC) have conducted oral history interviews focused on issues relating to Latino migration to North Carolina and the formation of Latino communities. The interviews are in English or Spanish, and interviewees include immigrants, U.S.-born second generations, professionals who work with immigrants, policy makers, religious leaders, educators, students, and local business owners. This growing initiative, called New Roots, is part of the Latino Migration Project, under the direction of Dr. Hannah Gill, in collaboration with the Center for Global Initiatives, the Institute for the Study of the Americas, and the Southern Oral History Program (SOHP). Since 2011, these interviews have been archived and made accessible online through the SOHP’s collection in the Southern Historical Collection in Wilson Library at UNC. Thanks to a generous award from the National Endowment for the Humanities, the Latino Migration Project, the SOHP, and University Libraries at UNC are working to make New Roots accessible to broader regional, national, and global audiences in new ways beyond the library catalog, finding aid, and SOHP digital archive.
In order to increase visibility, reach larger audiences, and improve access to New Roots oral histories, faculty and staff are building a special Omeka website to serve as a portal to New Roots oral histories and related resources. The entire SOHP collection is presented online through CONTENTdm, including those in New Roots; rather than expecting researchers to sift through the entire SOHP collection, this Omeka website serves as a separate, dedicated public-facing space for New Roots, allowing audiences to access the materials directly and explore more easily. Using this new platform has also encouraged the SOHP to consider different approaches to describing, organizing, and sharing oral histories at UNC, and the project has helped the SOHP and University Libraries to consider broader audiences, particularly Spanish-speaking populations, all of which has led to innovative features with which audiences can engage and interact, improving their overall experience.
Metadata is at the heart of describing, organizing, and sharing oral histories digitally. In conceptualizing the Omeka website, the New Roots team set out to determine how best to describe these oral histories so they are intuitive to users. Omeka provided a fresh slate: the team could examine the existing descriptive metadata that SOHP used in CONTENTdm for the entire collection, then figure out (through weeks of brainstorming, survey results, and trial and error) which fields were not very useful and what other information would be helpful and meaningful to include in each New Roots interview description. Many of the existing fields, such as “interviewee,” “interviewer,” “date,” and “interviewee ethnicity” seemed obvious to include in Omeka because they contain essential information about the interview. Other fields, including those about materials and access, such as “audio access” and “medium of original,” provide information that is important for archivists and librarians to capture but is largely irrelevant and potentially confusing to researchers. For example, the audio is clearly available on the Omeka site, so including metadata stating “audio access: online” is unnecessary. There are also numerous existing fields about different series and the repository that are meaningful within CONTENTdm but not in Omeka. For example, New Roots is one of over 150 series in the SOHP collection, so it is necessary to present the full project title (complete with project number) “R.34. Special Research Projects: New Roots” in SOHP’s larger digital archive in CONTENTdm; however, in the standalone New Roots site in Omeka, that information is redundant.
Because such metadata is already captured and available in CONTENTdm, the team decided to trim the existing schema to include only the most essential, valuable information for users of New Roots Omeka site. See the table below for a complete listing of fields retained and eliminated in Omeka.
|Fields retained in Omeka||Fields eliminated|
|Interviewee, Interviewer, Interview number, Restrictions, Date, Interviewee gender, Interviewee occupation, Interviewee ethnicity, Interviewee date of birth, Abstract, Citation||Project, Project description, Transcript, Transcript access, Number of pages, Subject Topical (LCSH), Subject Name (LCSH), Listening copy, Audio access, Listening copy file type, Medium of Original, Duration, Notes, Life history, Field notes, Photographs, Tapelog, Supplementary materials, Hardware – filename, Bit Depth – filename, Sample rate, Channels, Resource Type, Digital collection, Collection in Repository, Repository, Host|
Throughout this evaluation, the team realized that there were important pieces of information that would significantly enhance interview descriptions that were not included in the existing metadata schema. Guided by a desire to reach broad audiences outside of the academy, particularly Latino communities throughout North and South America, and in keeping clarity, helpfulness, and ease of use as primary goals for describing and organizing these interviews, the team introduced new geographic metadata, topic/theme metadata, and bilingual metadata.
Because physical locations, migration, and journeys are so important in immigrant stories, the team introduced metadata fields for where the interviewee was born (“Interviewee place of origin”) and where they lived at the time of the interview (“Interviewee place of residence”), which are presented through place names in city-county-state-country format. The team also added a field to capture the immigration journey called “Interviewee journey coordinates.” However, they could not find a standard for storing latitude, longitude, and a date together, so they created a modified WKT (well-known text) format that includes latitude, longitude, and the year the interviewee arrived at a given location. The metadata from this “Interviewee journey coordinates” field is swept from CONTENTdm to an interactive map where visitors to the Omeka website can visually follow an interviewee’s journey over time and space. There will be journey maps for individual interviewees as well as a map showing all of the New Roots interviewees’ journeys. These new fields and the maps will allow users to search and browse by location, to uncover migration patterns, and to engage with these oral histories in a visually compelling way.
In order to make browsing easier and to present the general content of each interview in a straightforward manner, the team also added a new field for topics and themes reflected in the interviews. They spent several weeks evaluating the content of the oral histories and developed a clear, easily understandable set of terms that represented themes important to these interviews, such as “DREAMers,” “migratory experience,” “activism,” and “racism and discrimination.” These terms function like tags or keywords in that they are concise, direct, and relevant to the interview, and they stand on their own instead of being buried in a lengthier abstract. Rather than use Library of Congress Subject Headings (LCSH) or another existing thesauri or controlled vocabulary, the team developed a vocabulary based specifically on the contents of the New Roots interviews for clarity, accuracy, and simplicity. The terms are drawn from the natural, everyday language used in the interviews and in discussions about immigration and in Latino communities; by letting the oral histories themselves guide the terminology, the language reflects how users are actually searching and browsing and does not feel overly academic, complicated, or unnatural. For example, rather than LCSH “emigration and immigration – religious aspects – baptists, [catholic church, etc.],” a New Roots interview would simply be assigned the terms “immigration” and “religion.” This was especially important in considering the audience New Roots hopes to reach, which spans beyond the academy to K-12 educators and students, media outlets, interviewees and their communities, immigrants, and other Spanish-speaking and Latino communities. Users can use these terms to glean the general topic of an interview at a glance and to browse and find related interviews. Because this controlled vocabulary was developed in-house, it is also easy to modify and make additions as necessary. See the table below for the full list of fields added for New Roots oral histories in Omeka.
|New metadata fields for New Roots oral histories in Omeka||Contents of new field|
|Language||English or Spanish|
|Themes||Topic(s) / theme(s) of interview|
|Interviewee place of origin||Where interviewee was born|
|Interviewee place of residence||Where interviewee lived at time of interview|
|Interviewee Journey Coordinates||Modified WKT reflecting lat/long and year interviewee arrived at given location, used to populate map|
|Reference URL||Link to the full CONTENTdm record for the interview as presented in Omeka|
|Repository||Content from “digital collection,” “collection in repository,” “repository,” and “host” fields in CONTENTdm|
The last big piece in describing, organizing, and sharing these oral histories was creating bilingual metadata so that each interview would be searchable and accessible in both English and Spanish. Several fields’ content did not need to be translated (such as “Interviewee,” “Interviewer,” “Date,” and “Interview number”), and thanks to an Omeka plugin developed by team members, the content can be automatically synced from CONTENTdm into both the English and Spanish metadata fields in Omeka (e.g. “Interviewee” and “Nombre de entrevistado”). For the content that does need to be translated in order to be meaningful, the New Roots team added the new fields found in the table below. New Roots team members who are native or fluent Spanish speakers translated this content and developed controlled vocabularies for these fields.
|New Spanish language field in Omeka||Corresponding field in English|
|Ocupación de entrevistado||Interviewee occupation|
|Género de entrevistado/a||Interviewee gender|
|Lengua de entrevista||Language|
|Abstracto en español||Abstract|
The team ran into some challenges throughout this project. Some terms did not have a clear English–Spanish translation, and there were disparities in translation. For example, one term in “Interviewee occupation” is “Factory workers,” which was translated alternately as “Trabajoadores de las fabricas” and “Obreros” before the team settled on “Obreros de fabricas.” These issues were resolved through discussion and regular communication, and it was helpful to address these issues early on so that the content could be translated consistently moving forward. Another challenge came in the form of quality control; because not all team members are fluent in Spanish, it was difficult to notice if a term was entered incorrectly. Like many steps in this project, these issues were resolved through trial and error as well as discussion among the different team members, all of whom brought a variety of strengths to New Roots.
New Roots has been an informative and revealing project in terms of accessibility and oral history metadata, and the SOHP and UNC Libraries have already learned new approaches to describing, organizing, and sharing oral histories based on this work. Purposefully seeking an audience outside of academia pushed the team to consider language and its potential barriers. By trimming the existing schema used in CONTENTdm to include only the most relevant and meaningful fields, users will not be deterred by confusing or unnecessary description. Providing a new field for topics/themes tailored to these oral histories allows the interviews to be organized in a new, thematically meaningful and useful manner. This field also allows users to browse and search in intuitive, straightforward ways that reflect the natural language of the interview contents, and it saves archivists from trying to apply an existing, outside controlled vocabulary that may be overly academic in tone and confusing, lengthy, and often simply not quite the right fit. Additionally, the SOHP and UNC Libraries have long been in conversation about developing a topic/theme browsing tool for the SOHP collection, so developing this metadata for New Roots has served as an excellent sample that the entire SOHP collection can build upon. In developing bilingual metadata, the New Roots interviews are accessible to immigrants and Latino communities across North and South America, which is an enormous benefit at a lesser level of difficulty than might be expected; the amount of content to translate was relatively small when considering numerous fields do not need to be translated, and having a Spanish speaker who translates terms at the beginning establishes consistency and accuracy. Location and movement is important in conceptualizing the stories in oral histories across many communities, and consistently capturing this information opens up exciting possibilities for visualizing and interacting with this content on maps. The SOHP is now capturing geographic metadata for other projects (such as new projects about North Carolina foodways and the Rural South) and will potentially create interactive maps for these projects too.
This New Roots project gave the team at UNC fresh eyes toward oral history metadata and inspired us to ask “What do we really need?” and “What do our users want?” in ways that encouraged clarity, directness, and ease of use in describing oral histories and developing new features to reach new audiences. This also informs and reflects the work of the Oral History Association’s Metadata Task Force, founded in 2014, which seeks to promote knowledge about oral history metadata and collaboration across the profession. To follow progress on New Roots, read more at the project blog. The Omeka website is currently being developed and is scheduled to go live in spring 2016. Learn more about the Metadata Task Force through related essays here.