We are learning so much in the final 6 weeks of this grant project. One lesson we have learned is that XML mapping does not really translate from sample data to actual data. I created the initial XML mapping document from sample data, but it did not match the maps that resulted from the actual data. In many cases, the differences were not major—just tweaks here and there. In some cases the partners sent me data in a different format or had refined it based on our conversations about the sample data.
And I do not regret the sample mapping because it taught me a lot about XML and Dublin Core. There are some real limitations with Dublin Core. For example, it does not have an obvious field for transcripts of letters or speeches or even a full text field. It also does not have a natural field for To: and From: in correspondence. We were able to adapt it in most cases but we also had the luxury of not including metadata if it did not fit the Dublin Core fields. Since we are offering users an abbreviated record for any given digital object and we want them to go to the partner organizations’ websites for more information, I had no problem just not mapping metadata that did not fit within the Dublin Core fields.
Still as this project goes forward, we might need to consider whether Dublin Core is the best choice as our metadata standard. We have had some discussions with our consultants about whether Dublin Core really works well with Solr. One consultant suggested that we use Solr’s native data structure because it seemed well matched to our data. We had long since decided to go forward with Dublin Core so for the purposes of this beta, we kept on our original path. But I think it is a good idea to have another look at that assumption in subsequent phases of this project to see if we might want to switch from Dublin Core to Sorl’s native data structure or even something else.
To provide some measure of comparison, the fields that our consultant suggested using included:
- title
- subtitle
- author
- format
- url
- language
- published
- callnum
- isbn
- full_text – anything you want to be used in searching.
The fields that we are using are:
- Title
- Creator
- Subject
- Description
- Publisher
- Date
- Type
- Format
- Identifier
- Source
- Language
- Relation
- Coverage
- Rights
So clearly we have more fields to include using Dublin Core but the fields that were uniformly used across all partner collections were Title, Creator, Description, Publisher, Date and Source. I am not sure how much those other fields add. Perhaps going forward we should consider using minimal metadata fields so that users are more likely to click through to partner organizations. However, I think it is a balancing act of providing enough information about a digital object to be useful but not so much that we give it all away.