Monthly Archives: March 2014

Thanks to Mark Phillips!

This post is overdue as I talked to Mark Phillips at the University of North Texas in January. Still our conversation was so helpful that I want to recount some of the items we discussed. Mark and his team at the University of North Texas libraries have been leaders in the world of metadata aggregation, and he generously shared some of his expertise with me. We talked about two of his projects, the Portal to Texas History and Texas Heritage Online. The Portal takes a different approach from our project in that UNT digitizes and hosts all the materials that they make available. Mark noted that they prefer to handle the digitization and hosting themselves in part because they are focused on long-term preservation of these digital objects. Texas Heritage Online is more similar to our project because it does not host items; it aggregates metadata and sends users to the partners’ websites.

Mark brought up three issues that were very applicable to our project. The first was the issue of how much detail to provide with partner metadata. He has a lot of experience with how messy metadata from different institutions can be–different fields, different expectations of content, and many inconsistencies. He noted that they show users less metadata rather than more. By showing fewer fields on Texas Heritage site, they have less cleanup work to do on the partner records, and users can see the full record by clicking to the partner’s site. It was a revelation to me that showing less might actually be more useful to our users than showing more metadata that is messy, inconsistent, and possibly confusing.

The second issue that he suggested we spend more time on is instituting controlled vocabulary. Because Blacklight allows users to facet on fields, the fields need to be consistent to make the facets useful. I have written before that the facets on our beta site are not working effectively right now. One reason is that the partners often use the same metadata field for different content. For example, one partner might put measurements in the format field while another might put a text description–both are completely acceptable but don’t play well together. If we implement controlled vocabulary in some fields, it will make the fields more consistent and the facets will be more effective.

One example Mark gave was using a controlled vocabulary for dates. Our partners present their dates in different formats and with different content–some have days, some have only years, and some have a span of years. Mark suggested that if we create a decade date for each item, users could facet on the decade before encountering the inconsistent date formats from each organization. I am not exactly sure how we would implement controlled vocabulary but I think it would be a very valuable addition to make the CPC site more useful.

The third issue that Mark and I discussed is how closely we should follow a metadata schema. We are using Dublin Core as our schema right now but it is missing some fields that would be useful for us. For example, we have partner organizations as part of the metadata, and it is an important field. However, it doesn’t exist in Dublin Core. So we had been using the publisher field for the partner organization name. Mark suggested that we create our own fields, adapting Dublin Core to meet our specific needs. He also suggested that we clearly document all decisions that we make related to our metadata fields. The tension is the more we customize Dublin Core for our own project, the less easily it can be shared. The reason that schemas exist is so information can be easily shared in a consistent manner. If we create our own schema, we reduce that ability to share the metadata. I think that there is a happy medium here, but I’m not yet sure what it is.

CPC and the Catalog–distinguishing their goals

We have been thinking through the Catalog part of the grant project and trying to make some decisions to move that part forward. We have been grappling with questions such as how do we want to handle internet resources in the Catalog and what to do about presidential speeches that often appear in multiple places on the internet. How many times do we want “Special Message from President Martin Van Buren” to appear in the Catalog?

We are also thinking through the relationship between the Catalog and the Connecting Presidential Collections website. We originally envisioned the Catalog as a way to identify presidential collections and their hosting organizations so that we could then partner with them and add the collections to CPC. But once we made the first effort to catalog a president’s collections, we were reminded how large and diverse the world of presidential collections is. We identified many interesting presidential resources that might be useful to include in the Catalog that I didn’t think would be appropriate for CPC. I couldn’t quite articulate to my coworkers why some of the resources in the Catalog prototype made me uncomfortable. Useful resources such as videos about the presidents on YouTube might be good to identify in the Catalog but wouldn’t work in my mind in CPC.

After talking through the issues with one of my coworkers, we hit upon a useful way to distinguish between the two parts of this IMLS grant. The Catalog and the CPC website are both focused on presidential collections but in fact their goals are quite unique. The Catalog will be useful to a wider audience if we include most of the presidential resources available, resources such as collections of digitized speeches, educational YouTube videos, and even perhaps lesson plans about the presidents. But CPC has a different mission.

The CPC website is focused on exposing hidden collections of presidential materials. The CPC doesn’t necessarily need to include an internet resource that anyone can easily find through a google search. What we hope to do with CPC is reach out to partner organizations to make their collections more accessible. It doesn’t matter to us if we make them more available by aggregating their metadata into CPC to increase traffic to their website or whether we make them more available by providing training to digitize materials and put them online. The goal is to shed light on valuable historical resources that might be hard to find right now.

This distinction between the two parts of the project was very helpful to me. We can include resources in the Catalog that might make it easier for people to find presidential resources already available on the internet. But CPC’s mission skews in another direction–to focus on the organizations with hidden collections that might need a little assistance in bringing them to light. I am sure that as we continue through our IMLS grant project, I will have many similar revelations that help clarify my thinking about our work. And each time, I will benefit from conversations with my coworkers and others involved in this universe, learning a little more each step of the way.