This post is overdue as I talked to Mark Phillips at the University of North Texas in January. Still our conversation was so helpful that I want to recount some of the items we discussed. Mark and his team at the University of North Texas libraries have been leaders in the world of metadata aggregation, and he generously shared some of his expertise with me. We talked about two of his projects, the Portal to Texas History and Texas Heritage Online. The Portal takes a different approach from our project in that UNT digitizes and hosts all the materials that they make available. Mark noted that they prefer to handle the digitization and hosting themselves in part because they are focused on long-term preservation of these digital objects. Texas Heritage Online is more similar to our project because it does not host items; it aggregates metadata and sends users to the partners’ websites.
Mark brought up three issues that were very applicable to our project. The first was the issue of how much detail to provide with partner metadata. He has a lot of experience with how messy metadata from different institutions can be–different fields, different expectations of content, and many inconsistencies. He noted that they show users less metadata rather than more. By showing fewer fields on Texas Heritage site, they have less cleanup work to do on the partner records, and users can see the full record by clicking to the partner’s site. It was a revelation to me that showing less might actually be more useful to our users than showing more metadata that is messy, inconsistent, and possibly confusing.
The second issue that he suggested we spend more time on is instituting controlled vocabulary. Because Blacklight allows users to facet on fields, the fields need to be consistent to make the facets useful. I have written before that the facets on our beta site are not working effectively right now. One reason is that the partners often use the same metadata field for different content. For example, one partner might put measurements in the format field while another might put a text description–both are completely acceptable but don’t play well together. If we implement controlled vocabulary in some fields, it will make the fields more consistent and the facets will be more effective.
One example Mark gave was using a controlled vocabulary for dates. Our partners present their dates in different formats and with different content–some have days, some have only years, and some have a span of years. Mark suggested that if we create a decade date for each item, users could facet on the decade before encountering the inconsistent date formats from each organization. I am not exactly sure how we would implement controlled vocabulary but I think it would be a very valuable addition to make the CPC site more useful.
The third issue that Mark and I discussed is how closely we should follow a metadata schema. We are using Dublin Core as our schema right now but it is missing some fields that would be useful for us. For example, we have partner organizations as part of the metadata, and it is an important field. However, it doesn’t exist in Dublin Core. So we had been using the publisher field for the partner organization name. Mark suggested that we create our own fields, adapting Dublin Core to meet our specific needs. He also suggested that we clearly document all decisions that we make related to our metadata fields. The tension is the more we customize Dublin Core for our own project, the less easily it can be shared. The reason that schemas exist is so information can be easily shared in a consistent manner. If we create our own schema, we reduce that ability to share the metadata. I think that there is a happy medium here, but I’m not yet sure what it is.
As you can tell, Mark Phillips has years of experience and expertise. I so appreciated him taking the time to talk and share his valuable information with me!