The Connecting Presidential Collections project is, at its heart, a metadata aggregation site, which means that it combines descriptive information about partner collections into one search index so that users can search across presidential collections. The descriptive information, known as metadata, includes fields such as title, date, subject, author, or type. In order to organize metadata, different schemas have been developed that use tags such as <title> or <dc:subject> to organize the information in a digital environment. There are a handful of different types of metadata schemas that were developed for different reasons. For example, TEI is often used for digital texts, while METS is a bibliographic standard that allows other metadata schemas to be contained within it.
Using Metadata in CPC
Each CPC partner collects different types of metadata about their collection, not to mention unique content within the metadata fields. One of the tasks of the CPC staff is to map the partner metadata fields to one metadata schema so that the data in CPC is consistent before we import it into our Solr search index. For example, if one partner has a field called “author,” and another partner has a field called “creator_illustrator,” we might map both those fields to just “creator” so when a user searches, she will find both sets of information.
Dublin Core: Our Initial Metadata Format
When we started the Connecting Presidential Collection project, we were using Dublin Core as our metadata schema. We decided on Dublin Core because of its simplicity and flexibility.
The original Dublin Core has 15 elements:
- title,
- creator,
- subject,
- description,
- publisher,
- contributor,
- date,
- type,
- format,
- identifier,
- source,
- language,
- relation,
- coverage,
- and rights.
When we received partner metadata, we would map the metadata schema to Dublin Core. In one example, a partner was already using Dublin Core, which made the mapping very simple. In another example, a partner was not using a standard metadata schema but instead had one that came with their content management system. In that case, we had to do a lot of mapping work. For example, “Object Name (Web Title)” was mapped to “Title” and “Original.Recipient” was mapped to “Contributor.” Each field needed to be assigned to its corresponding equivalent in Dublin Core.
Although Dublin Core was a simple schema to use, we began to realize that we were losing some of the complexity of partner data by mapping to it because, if there wasn’t a Dublin Core field for a certain type of data, we would simply have to exclude that data.
MODS? Moving Forward with CPC Metadata
Given the challenges noted above, we have begun to consider switching over to MODS (Metadata Object Description Schema) in place of Dublin Core. Developed by the Library of Congress, MODS is a richer, more complex schema that allows us to include more partner data. We have not yet implemented the switch from Dublin Core to MODS but will gradually start to change our mapping over the next few months.
The Library of Congress has an example of a full MODS 3.5 record.
As we get further along in our process, we will be creating a wiki that shows what ideal metadata for CPC looks like. The purpose of the wiki will be to communicate the ideal format for metadata—an ideal that we hope partners (and potential partners) can work toward. But, that said, much of the CPC project is meant to empower and assist partner organizations. Regardless of our metadata decisions, and partners’ technical abilities, we will continue to receive myriad of formats and will happily work alongside partners to get the data accurately imported into the CPC site.