Cleaning up legacy metadata for ETDs: Strategies, tools and a look into the future [Presentation]



Journal Title

Journal ISSN

Volume Title



Since July 2015, the University of Houston (UH) Libraries Metadata and Digitization Services (MDS) Metadata Unit in collaboration with the UH Libraries Digital Repository Services (DRS) department has been working towards the goal of improving the quality of legacy Electronic Theses and Dissertation (ETD) metadata in the UH Institutional Repository. In addition to standardizing the metadata for internal purposes, this effort, known as the ETD Metadata Upgrade Project, will align UH ETD metadata with the newest Dictionary of Texas Digital Library Descriptive Metadata for Electronic Theses and Dissertations. The Texas Digital Library (TDL) is a consortium of higher education institutions in Texas that provides shared services in support of research, teaching, and the advancement of scholarship. It facilitates collaboration among the TDL community and with external partners. By bringing the University of Houston’s ETD metadata into compliance with TDL guidelines, the connection to other TDL member institutions is strengthened and the Libraries are better positioned to take advantage of ETD system developments.

This presentation will describe the background of this project, the procedures that have been developed, the tools used in the work, and plans for future work. Presenters will share how they wrangled and revised legacy ETD metadata exported from the Libraries’ TDL-hosted DSpace and Vireo instances using Microsoft Access and Open Refine. They will detail how they addressed challenges presented by particular fields - such as standardizing advisor, committee member, department, and degree discipline names - and how they intend to re-import the cleaned data into DSpace. The team will also share strategies for project communication, documentation, and task management using Basecamp and PmWiki. Finally, presenters will share goals of further streamlining and automating ongoing metadata remediation, deploying a local RDF vocabulary management system to aid in name and department standardization, and exploring the publication of ETD metadata as linked data.

Overall, these strategies and tools have improved the ETD metadata quality and workflows, strengthened the communication and collaboration between the DRS department and the MDS Metadata Unit, and given insight into opportunities for future development. This presentation will be useful for libraries and information centers that have similar ETD goals. Session attendees will come away with techniques to address ETD metadata maintenance needs for their digital repositories.


This presentation was given at the 2016 United States Electronic Thesis and Dissertation Association Conference in Columbus, OH, September 27, 2016.


Metadata, Electronic theses and dissertations, Metadata remediation, Metadata, Electronic theses and dissertations, Metadata remediation