Alas, poor Metadata!

Submitted by: Jennie Levine Knies, February 8, 2013

The Born-Digital Working Group has already undergone a radical change since the last blog post.  Originally, the group members divided into four subgroups in order to tackle the different aspects of the born-digital workflow.  We are now three.  RIP Metadata subgroup. The original intent of the Metadata subgroup was to look at everything needed to create a properly-described submission information package (SIP).   The group met on January 28 and quickly discovered that it was both very easy and very difficult to talk about this topic in a vacuum.  We discussed the redundancies not only between our work and the work of the Tools subgroup, but also with future decisions about access to content.   After much soul-searching, and a confusing white-board diagram involving a monkey, a hat, and a floppy disk, we suggested folding the Metadata subgroup into the Tools subgroup and focusing more on the initial acquisition and processing of born-digital content. Understanding the digital files and how to accession them on the digital shelf is our first real challenge.

The Tools subgroup will be using two different types of workstations to develop workflows to image, analyze, and prepare the born-digital content for submission into our repository.  In the non-digital world, the work of the Tools subgroup equates to picking up archival materials from a donor, moving them from the garbage bags in which they were stored to clean records-center cartons, assigning an accession number, and describing them enough that a basic accession record can be created.  We envisioned the work of the Metadata subgroup picking up at this point – at the point where the archivists would appraise, describe, place in context, and arrange the content. This is where the monkey and the hat come into the picture.

The Beast from Ryder, Djuna Barnes, 1928 (

The University of Maryland currently uses a home-grown system for capturing archival description. The “monkey” is a Microsoft Access database fondly referred to as “The Beast,” into which Special Collections librarians enter all of their archival description into convenient forms, where it is then extracted using a Java-based script into a neat EAD-encoded archival finding aid and distributed online via ArchivesUM.  The Beast allows for the basic metadata collection allowed by EAD – we gather series, sub series, box, folder, title, dates, physical description, and restriction information at the folder level, and occasionally at the item level.  The “hat” is our Fedora-based Digital Collections repository.  In a separate workflow, the University of Maryland is creating digitized content and ingesting it into our digital repository.  The Digital Collections descriptive and technical metadata are also home-grown (something we hope to migrate out of in the not-so-distant future) and also much more detailed than what you might find in a traditional EAD finding aid.  Like the archival collections, some material is described at a folder level and sometimes at an item level, but item-level description is more common here.  Currently, the two systems do not talk to each other.  We developed a process to ingest the EAD finding aids into our Fedora-based Digital Collections at the time of ingest into ArchivesUM. But what is searchable in Digital Collections for the EAD finding aids is really just a collection-level record.  As a side note, the University of Maryland Libraries also host an institutional repository (DRUM), which is entirely separate and based on DSpace.   DRUM already houses a great deal of born-digital content, and the distinction between what is there and what is collected by our Special Collections may be growing less clear.  We also have large amounts of data (both digitized books and web archives) currently stored in the Internet Archive, not to mention descriptive metadata in our catalog, that ultimately will need to be integrated with our other digital content.

Where do born-digital materials fit into all of this?  Like the rest of the five linear miles of archival collections at the University of Maryland, these items are part of archival collections, just in newer formats.  Like the content in Digital Collections, they are digital, the difference being that they are not surrogates of analog items.  Should born-digital materials be described in an archival finding aid? Should they be discoverable and viewable in some way in their native environment? Yes. Will our staff and users be happy about having to learn how to use another silo system to keep track of born-digital materials? Probably not. And this is why we dissolved the Metadata group.  Until we know what our initial analyses and boxing/packaging process is capable of returning to us, it is a little difficult to envision by what means the archivists will be able to describe the material.  Parallel to the work of the Born-Digital Working Group is the expectation that in the next two years, the University of Maryland Libraries will migrate out of their home-grown system for archival finding aids, and move to something more widely adopted, most likely ArchivesSpace.  When that happens, more dynamic automated linking between Digital Collections and the archival management tool will be developed.  Thinking holistically, managing born-digital content needs to fall into that workflow somehow. We still envision that the Tools subgroup will gather some requirements that will really fall more into the area of archival description, and we still plan to do some experimentation with tools that allow for metadata gathering, such as BitCurator, Archivematica and Curator’s Workbench, to better understand how these work and what parts of the workflow they might help us to capture. Is this the right approach?  After much thought, it feels more manageable to us, and anything that keeps us from feeling paralyzed or overwhelmed is a step in the right direction.


The Born Digital Working Group Divides and Conquers

Back in October, we introduced the MITH/UM Libraries Born Digital Working Group (BDWG) with a post about processing the Bill Bly Collection.  Since then we’ve firmed up our goals (“start collecting/working with diverse born digital materials in the libraries”  being a bit nebulous and… huge) and divided ourselves into sub-groups to conquer them. Goals and groups decided upon, we’re going to try to give bi-weekly updates on our work, cross-posted to the MITH and Special Collections blogs. We’ll be cycling through the groups to ensure every area is covered; those areas are: tools, policies/procedures, metadata, and administration.

Originally called “Technology/BitCurator/hardware/software/tools,” this subgroup is dedicated to pre-processing work–everything that happens before an acquisition is deposited in the digital repository. The Tools group is led by Jennie Levine Knies and includes Amanda Visconti, Eric Cartier, Matt Kirschenbaum, Porter Olsen and Rachel Donahue.

Dedicated to developing the many guidelines necessary to implement new digital workflows in the libraries. The Policy/Procedures group is led by Joanne Archer and includes Caitlin Wells, Daniel Mack, Rachel Donahue, Robin Pike, and Trevor Muñoz.

Dedicated to data about data. Specifically, this group will look at everything that’s needed to create a properly-described submission information package (SIP). The Metadata Group is led by Joshua Westgard and includes Eric Cartier,Jennie Levine Knies, and Rachel Donahue.

Dedicated to providing the high-level support needed by change agents everywhere. Administration was originally lumped in with Policy/Procedures, but we broke it out to keep things specific and manageable. The Administration group is led by Trevor Muñoz and includes Daniel Mack, Jennie Levine Knies, Joanne Archer, Matthew Kirschenbaum, and Rachel Donahue.

As you read our posts in the future, bear in mind that we’re essentially starting from scratch. We’re unlikely to have anything amazingly groundbreaking to share, but we hope that being transparent about our work might help other organizations undergoing similar changes.

Searching UMD Libraries’ Digital Collections Using BASE

The Bielefeld Academic Search Engine (BASE), sponsored by the library of the University of Bielefeld in Germany, is an electronic index to more than 37 million digital objects in over 2300 repositories around the world, including over 15,000 items from UMD Libraries’ Digital Collections.  BASE provides not only an alternative method of searching and browsing our digital collections, but also the opportunity to search our collections simultaneously alongside those of many other, similar repositories.  At the same time, because BASE’s scope is limited to collections in academic libraries and other scholarly repositories, and because it allows for the searching and delimiting of results on the basis of detailed metadata, it makes possible more targeted searching than generalized search engines such as Google.  To search BASE, go to  To search or browse UMD’s collections on BASE, go to