The HathiTrust Research Center Workset Ontology: A Descriptive Framework for Non-Consumptive Research Collections

Authors

  • Jacob Jett University of Illinois at Urbana-Champaign
  • Timothy W. Cole University of Illinois at Urbana-Champaign
  • Christopher Maden University of Illinois at Urbana-Champaign
  • J. Stephen Downie University of Illinois at Urbana-Champaign

DOI:

https://doi.org/10.5334/johd.3

Keywords:

research collections, formalisms, data models, digital libraries, large-scale corpuses

Abstract

The HathiTrust Digital Library (HTDL) is a digital library containing about 14 million volumes which comprise billions of pages of content. The HathiTrust Research Center (HTRC) is a collaborative research initiative jointly led by Indiana University and the University of Illinois at Urbana-Champaign. This paper describes the development of a collections data model by the Workset Creation for Scholarly Analysis project, a HTRC research initiative funded by the Andrew W. Mellon Foundation. The resulting HTRC Workset data model is designed to aid humanities scholars by helping them to describe selected portions of the HTDL corpus that serve as the objects of their research. The resulting worksets are persistent, citable, and can be assessed by other scholars for reuse in additional research processes.

Author Biographies

Jacob Jett, University of Illinois at Urbana-Champaign

Jacob Jett is a PhD student at the Graduate School of Library and Information Science (a part of the University of Illinois at Urbana-Champaign). He employs formal methods to examine issues in the conceptual foundations of information access, organization, and retrieval, especially with regards to web and data semantics. Knowledge representation techniques and modeling exercises, such as ontology development and conceptual modeling, represent a sizable area of overlap in my research.

Timothy W. Cole, University of Illinois at Urbana-Champaign

Tim Cole is a professor at the Graduate School of Library and Information Science and the Mathematics Librarian at the University of Illinois. His research interests include: metadata; linked open data; annotation of digital resources; digital library interoperability.

Christopher Maden, University of Illinois at Urbana-Champaign

Chris Maden is a research programmer at the University of Illinois. In the past he has worked on electronic processing of textual and structured information for more than fifteen years. He helped create the HTML, XML, and XSL standards, and I know them like the back of his hand.

J. Stephen Downie, University of Illinois at Urbana-Champaign

J. Stephen Downie is Associate Dean for Research and a Professor at GSLIS, and the Illinois Co-Director of the HathiTrust Research Center. He has been an active participant in the digital libraries and digital humanities research domains. He is best known for helping to establish an vibrant music information retrieval research community. Since 2005, he has directed the annual Music Information Retrieval Evaluation eXchange (MIREX). He also was a founder of the International Society Music Information Retrieval (ISMIR) and its first president.

Downloads

Published

2016-03-18

Issue

Section

Discussion Paper