Cleaning up legacy metadata for ETDs: Strategies, tools and a look into the future [Poster]



Journal Title

Journal ISSN

Volume Title



Since July 2015, the University of Houston (UH) Libraries Metadata and Digitization Services (MDS) Metadata Unit in collaboration with the UH Libraries Digital Repository Services (DRS) department has been working towards the goal of improving the quality of legacy Electronic Theses and Dissertation (ETD) metadata in the UH Institutional Repository. In addition to standardizing the metadata for internal purposes, this effort, known as the ETD Metadata Upgrade Project, will align UH ETD metadata with the newest Dictionary of Texas Digital Library Descriptive Metadata for Electronic Theses and Dissertations . The Texas Digital Library (TDL) is a consortium of higher education institutions in Texas that provides shared services in support of research, teaching, and the advancement of scholarship. It facilitates collaboration among the TDL community and with external partners. By bringing the University of Houston’s ETD metadata into compliance with TDL guidelines, the connection to other TDL member institutions is strengthened and the Libraries are better positioned to take advantage of ETD system developments.

This poster presentation will describe the background of this project, the procedures that have been developed, the tools used in the work, and plans for future work. Poster presenters will share how they wrangled and revised legacy ETD metadata exported from the Libraries’ TDL-hosted DSpace and Vireo instances using Microsoft Access and Open Refine. They will detail how they addressed challenges presented by particular fields such as standardizing advisor, committee member, department, and degree discipline names and how they intend to reimport the cleaned data into DSpace. The team will also share strategies for project communication, documentation, and task management using Basecamp and PmWiki. Finally, presenters will share goals of further streamlining and automating ongoing metadata remediation, deploying a local RDF vocabulary management system to aid in name and department standardization, and exploring the publication of ETD metadata as linked data.

Overall, these strategies and tools have improved the ETD metadata quality and workflows, strengthened the communication and collaboration between the DRS department and the MDS Metadata Unit, and given insight into opportunities for future development. This poster will be useful for library and information professionals that have similar ETD goals. Viewers of this poster will come away with techniques to address ETD metadata maintenance needs for their digital repositories.


This poster was presented at the 2016 United States Electronic Thesis and Dissertation Association Conference in Columbus, OH, September 27, 2016.


Metadata, Electronic theses and dissertations, Metadata remediation, Metadata, Electronic theses and dissertations, Metadata remediation