What do we mean by “Collections As Data” (CAD)? by Cory Lampert & Emily Lapworth

Closeup of Las Vegas City Commission minute book
Closeup of Las Vegas City Commission minute book dating from 1911
Las Vegas City Commission Records, 1911-1960. MS-00237.

This is part one of a two-part article on Collections as Data research at the UNLV Libraries.

Select Las Vegas City Commission Records are now available as a dataset that can be analyzed using computational research tools. Read on for more information, or go directly to the dataset at https://github.com/UNLV-Libraries/UNLV-Collections-as-Data/tree/master/Las-Vegas-Commission.

Cory Lampert, Head of Digital Collections, and Emily Lapworth, Digital Special Collections Librarian provide us with an overview of how Digital Collections is working on this important research initiative.

Research in library collections is no longer limited to locating and physically handling collections; nor does it need to stop with the access and download of digitized materials. Increasingly, content providers in libraries seek to take library collections and present them as datasets that can be acted upon in various ways by machines or through use of computer-aided tools and methods. 

“Collections as data” is the idea that collections (such as UNLV’s Digital Collections) can be used as data for researchers to analyze using computers. For example, a historian can use a computer program to quickly read thousands of pages of text and identify patterns such as topics or people that are named. Digital or computational research methods such as text mining, data visualization, mapping, image analysis, audio analysis, and network analysis automate steps in the research process that would take humans many hours to do, or are even impossible for humans to do manually. If you’ve heard of Digital Humanities, Collections as Data falls under that umbrella of scholarly activity.

Collections as Data” was also the name of a grant-funded project which began in 2016 and formed interdisciplinary teams that, “documented, iterated on, and shared current and potential approaches to developing cultural heritage collections that support computationally-driven research and teaching.”This work was extended through additional grant funding for Collections as Data: Part to Whole in 2018, but at this point the grant has had far-reaching impacts on practitioners, including at the University of Nevada, Las Vegas Libraries where faculty are taking a more data-focused look at how we provision cultural heritage collections, increase data literacy capacity in our community, and support this type of research across disciplines and communities. Thomas Padilla, currently Interim Head of Knowledge Production, served as PI for both Collections as Data grants.

UNLV Special Collections and Archives (SCA) has been active in creating digital collections since 2006 and has set goals in recent years to focus on increased production of reformatted materials through large-scale digitization methods. The availability of larger sets of digital image files and metadata provide an opportunity for SCA to investigate Collections as Data practices using locally-curated archival collections and user profiles that reflect the researchers in the Southern Nevada community as well as on the UNLV campus.  

Why SCA/DC is doing CAD

In 2018, a small team led by Thomas Padilla, Emily Lapworth, and Doris Morgan Rueda worked on an initial Collections as Data Project using the Entertainment digital collection. The digitized archival records in the collection document the history of entertainment in Las Vegas and include over 38,000 items. It was chosen as a candidate for experimentation with CAD concepts in the context of historical research and digital humanities. 

This project highlighted that while CAD is a compelling area of work for cultural heritage professionals, many professionals in the special collections field (such as archivists, digital project managers, metadata professionals, curators and public services librarians) still find a gap between theory and best practice in effectively preparing collections as data and serving researchers that seek to use computational methods in their study.  UNLV SCA, in collaboration with other interested library faculty, set out to create space to work on these gaps and focus on concrete goals that will help solidify CAD concepts into programmatic approaches to digital collection work. A strategy with several goals was developed with a focus on:

  • small, quickly attainable goals
  • creation of data sets that serve a known user need 
  • collaboratively shared and documented learning beyond SCA 
  • open publishing of datasets and associated documentation

In Part II of her blog post on Collections as Data Research, Emily Lapworth and Cory Lampert will provide an overview of the UNLV Libraries datasets which are open for research with this pilot project, and provide ideas about how the LVC datasets can be used for research.

Ask Us