I published the last entry too quickly without talking a little bit about the work that I have been doing over the last few weeks. I have been XML mapping, that is taking the metadata from the partner organizations and comparing the fields and data that they collect with the metadata schema that we have chosen to use. We decided to use Dublin Core for this project, and the mapping involves examining partner fields and matching them to the closest Dublin Core fields.
In some cases this is easy work. For example, title matches to title without any problem. But in other cases, there are either multiple possibilities and we need to make sure that we are consistent across all partner organizations’ data. Since we are using a single index in Solr, we need the fields to match up not just between partner organizations and Dublin Core but also across all the partner organizations. So when one partner used Contributor for their organization’s name and one partner used Publisher, we need to decide which one we will use and be consistent across all the partner data.
Another ramification of using a single index in Solr is that the unique identification takes on a new importance. For every single digital object in the beta product, we need to have a unique identification. But we also need to be able to match that unique id with the original item from the partner for any subsequent updates or changes. Solr needs to recognize the object and be able to match it from the partner data to the index and compare whether changes have been made. Right now we are planning on generating the unique ids with a combination of information from the partners and codes that we use for each partner.