Monthly Archives: October 2012

XML Mapping (continued)

I published the last entry too quickly without talking a little bit about the work that I have been doing over the last few weeks. I have been XML mapping, that is taking the metadata from the partner organizations and comparing the fields and data that they collect with the metadata schema that we have chosen to use. We decided to use Dublin Core for this project, and the mapping involves examining partner fields and matching them to the closest Dublin Core fields.

In some cases this is easy work. For example, title matches to title without any problem. But in other cases, there are either multiple possibilities and we need to make sure that we are consistent across all partner organizations’ data. Since we are using a single index in Solr, we need the fields to match up not just between partner organizations and Dublin Core but also across all the partner organizations. So when one partner used Contributor for their organization’s name and one partner used Publisher, we need to decide which one we will use and be consistent across all the partner data.

Another ramification of using a single index in Solr is that the unique identification takes on a new importance. For every single digital object in the beta product, we need to have a unique identification. But we also need to be able to match that unique id with the original item from the partner for any subsequent updates or changes. Solr needs to recognize the object and be able to match it from the partner data to the index and compare whether changes have been made. Right now we are planning on generating the unique ids with a combination of information from the partners and codes that we use for each partner.

Advertisements

XML Mapping

I can’t believe how quickly the month of September flew by. The combination of the semester starting and events picking up meant it was a busy month around here. The IMLS project is still moving (slowly) along. The firm that is developing the beta product in Solr and Blacklight has been working away and importing partner data into Solr so we can begin to get an idea of some of the issues that we will face with the partner data.

One of the questions that we have temporarily resolved for the beta product but we may want to rethink for subsequent phases is whether to have a single core (or index) in Solr or whether to have multiple cores. Multiple cores would allow each partner organization to have an index that is solely devoted to their data. However, there seem to be some issues that make it harder for Blacklight to search across multiple cores in Solr. So we have decided to go with a single core for the beta product while making a note to reconsider the issue for subsequent phases of the project.