I’ve been focusing on the “educational materials” for a few weeks now and I have to admit—this is a very daunting task.
The purpose behind this part of the project is to empower organizations who, for whatever reasons, haven’t prioritized digital development. We want to provide some tools, as well as some inspiration, for pushing the ball forward. (Build [or update] your website! Digitize records! Put things online!)
But now that I’m attempting to synthesize all my interviews, all my surveys, into usable material, I’m just feeling a little lost. Do I believe digital development is a good move? Absolutely! But all of these conversations have served to paint such a rich picture that it’s very difficult to speak to these issues with just one voice.
Every organization we’ve meet with has been impressive. If there’s one thing I now know for sure, it’s that the people who choose to work at libraries and museums are dedicated, passionate (, overworked, underpaid) people. Finding ways to speak to them as one and create some “hooks” that they can grab on to is so daunting.
But I guess that’s why my goal shouldn’t be to instruct or to pressure but to assist. How could we help? What could we create that would be actually useful, actually empowering?
My hope is that this set of videos (some instructional, some inspirational… after all, we usually have bosses to convince!) and write-ups (ideas, funding sources, digital resources) will start to get there. The subject of the videos and write-ups come from the most common requests from surveys and in-person interviews… but I’m always open to more feedback!
As you read this, what comes to mind that would be helpful to you? Here, complete this sentence: “If someone showed up at my office to [help with, finish, tell me about] ____________, I’d be thrilled!”
Recordings from Taft, Coolidge along with Woodrow Wilson, Theodore Roosevelt, and Warren Harding allow CPC users to hear presidents whose voices are unknown to most people. The audio recordings also add a multimedia dimension to CPC that we hope we to expand in the future.
Beyond its presidential selection, the Vincent Voice Library holds more than 100,000 hours of audio recordings. The collection was founded by G. Robert Vincent who began making recordings in 1912 after borrowing a recording device from his friend, Charles, who was Thomas Edison’s son. We are excited to welcome the Vincent Voice Library as a partner to CPC!
The CPC team is excited to be headed west! Not permanently, but we are packing our bags for the Digital Libraries Federation (DLF) Forum, being held in Vancouver, British Columbia in October.
Connecting Presidential Collections was selected as a Snapshot project update. In 2013, we did a poster presentation of CPC at the DLF Forum in Denver, CO. At that point, the project was mostly theoretical–we had only created a very basic beta website (that looked very different than the site does today). We had 6 partnerships and only about 25,000 items in CPC. Today CPC has 12 partnerships with more than 260,000 items covering 32 out of the 43 presidents. We have come a long way!
Of course, we will cover these updates in our talk but more importantly we will talk about the surprises that have come up and the lessons we have learned in the 2 years we have been working on this 3-year grant project sponsored by the IMLS and the Miller Center. We will also cover the work that we have been doing to smooth out the rough (and varying) edges of our partners’ metadata so that it can all play nicely together in our Solr index.
We will post the slides to our talk after October. Hope to see some of you there!
I thought I’d write about where we are with the microfilm digitization part of our project. In particular, we had a minor setback with our scanning that proved (to me, at least) to be quite educational.
We enlisted the help of The Crowley Company, a commercial digitization lab that offered state-of-the art equipment and a clean room, and that, along with experienced operators doing the work, convinced us that we would produce digital copies of these collections of the highest possible quality. But what of the microfilm itself? Were the reels in good enough condition, and was the original photography done well? We were eager to view the results, and it fell to me to analyze the scans and assess their quality.
How’s it look to you?
I’ll admit to having been a little unsure of myself. I explored various reels, looking for poor exposure, blurry images, anything indicating an error or gap in the digitization process. Eventually, I got a sense of what the overall resolution of the images was. But something didn’t feel right. When I looked at the top of the page (and why wouldn’t you start there?), things looked fine, but occasionally I saw things that looked blurry, and this gave me pause. Was I looking at the maximal sharpness of the image, or could things be improved? Without the microfilm on hand, I couldn’t even determine whether it was the digital photography that was to blame or the original microphotography. If the latter, there was really nothing to be done, short of re-shooting the papers, and that wasn’t an option!
Fortunately, when we had an on-site visit from Crowley, I had a chance to raise the issue with their representative. He very quickly identified a curious trend: the top of each page was crisp and clear, while the bottom was less clear. A few minutes later, he was on the phone to the scanning lab, as they still had the microfilm in their possession. At his request, the film was examined under a microscope, and we confirmed that original microphotography was in focus. This can be done ad hoc, by looking at a frame selected at random. But, fortunately, there is a more systematic method for answering questions such as these.
Resolution Calibration Test Targets Are Your Friends
When microfilm is professionally manufactured, there is a calibration chart included in the material photographed. The photographs of the chart document the focal quality of the entire batch of images. Moreover, they are designed to exhibit the limitations of any optical lens used in photography: aberration. The target looks like this:
The four seemingly redundant charts at the corners tell you something very important: how the photography resolves an image at the extreme edges of the camera lens. It’s not enough to have good focus at the center, a lens must be designed and calibrated to produce sharp images at the periphery, and these charts document how the photographic lens performed for the microfilming session. Like an eye chart, finding the smallest lines that can be resolved (and are not just a grey blur) indicates the limit of your ability to focus in on a detail in an image.
As this is a standard, highly accurate printed chart, rather than pen ink on a hundred-year old manuscript, you can get a much more reliable sense of what the photography (and subsequent scanning) captured. And here we can see the variation between a top-corner and bottom-corner chart on the very same exposure:
I find that if you examine the bars labeled “2.0”, you can see a distinct difference. (Click on the image to expand.)
Once we’d established that at least some of our scanned reels resulted in partially out-of-focus digital images, more detective work was required. Our Crowley representative quickly established that there was a calibration error with the scanner that processed the samples we examined, and their records indicated the entire run was done with the same machine. They very quickly sent that scanner to be repaired and re-calibrated, and since they still had the microfilm on hand, they offered to rescan the material as soon as the machine was deemed fit to return to service.
We were more than happy with this arrangement. I’m hoping to use these images in an effort to extract transcripts and/or metadata from the documents photographed, and clarity is very important. Consider this side-by-side example:
The first scanning attempt:
And the results of recalibration:
I think the increase in clarity is pretty obvious (your mileage may vary!)
I’m glad we investigated this, and the turnaround time for the rescanning was excellent. So thanks again to Crowley for their diligence and expert eyes! I have a much higher degree of confidence in our digital images, and look forward to the next phase of our work.
My name is Matthew Stephens, and most recently I was a technologist at Alderman Library, here at the University of Virginia. This is where Blacklight was born, and I’ve been a fan of it and Apache Solr for many years. Prior to my coming to UVA, I worked at Intelex, a digital publisher here in Charlottesville, and that’s where I cut my teeth on metadata aggregation, something very much on the minds of everyone here. Needless to say, I’m very excited to work on a project where I can build upon past experience and learn new ways to do things I care about.
The first item on my (growing) to-do list is updating the CPC site. Open source software is a moving target, and many of the components of the site are due for an update. The Blacklight team has released version 5.3.0 in the past month, and upgrading to this, along with many dependencies, will keep the CPC application current with the broader community of Ruby-on-Rails and Solr enthusiasts. Along with the upgrades, I will be streamlining a few things under the hood, including the indexing of the metadata provided by our partners. A Solr index is many a splendored thing, and part of my role will be to ensure that the information we receive from partners can be effectively searched and discovered by our users.
I’m delighted to join a team that has already impressed me with their creativity, expertise, and drive. I’ve also learned of their passion for the serial comma, so you may view that last sentence as a peace offering, given my expressed agnosticism. (I’m sure we’ll work that out in the months to come. I’m also from Canada, so my pronunciation of the letter ‘Z’ may be an issue.)
Modern web development is a fascinating endeavor, but so much more so when the endless possibilities are shaped by serving a community. I invite any and all interested readers to have a look at the site as it changes over the coming months. We’re all eager to make something interesting and useful, and we welcome any suggestions you’d care to make.
As any archivist, librarian, or web developer will tell you, the landscape of copyright law surrounding dissemination of digital artifacts is a rocky one. The process of facilitating a world with easy access to material must be balanced with a necessary level of caution regarding respect for intellectual property rights. No scale can healthily swing too far in either direction: Offering all content for broad distribution, without regard for rights status, is blatantly illegal (and deeply disrespectful to the rightful creators and owners of said content). But the other end of the spectrum—the end where all content is locked tight for fear of violating rights agreements—doesn’t sit right either, especially when the goal of an archive is to facilitate the widest possible open access to materials.
The CPC website, which gathers metadata about existing presidential collections, is an area where we knew we’d need to address rights issues from the very beginning.
Thanks to projects like the Digital Public Library of America (DPLA), the notion of Creative Commons licensing, and specifically CC0 (“Creative Commons Zero”) licensing, is increasingly being accepted as the wisest route to responsible but open access. Marking material as CC0, “permanently surrenders copyright and related rights, placing the work as nearly as possible into the public domain, worldwide.” In fact, to participate in the DPLA project at all, descriptive metadata must carry a CC0 license.
To be clear (and perhaps to lower your blood pressure after reading that last sentence): The metadata we refer to is the collection of descriptive attributes that surround a digital object. It does not apply to the object itself.
As an example, lets pretend that Sox Clinton penned a memoir titled, “In Mice We Trust: Currency in Cat Culture, 1990-2000”. The publication itself is most certainly protected by restrictive copyright unless it is intentionally released into the public domain or under a Creative Commons license. But if an organization creates a metadata record about the publication, that metadata can be made available to the public under a CC0 license. That metadata might include something like this:
Title: In Mice We Trust Subtitle: Currency in Cat Culture, 1990-2000 Author: Sox Clinton Publisher: Cornell University Press Format: Hardcopy, Paperback, ePub Pages: 800
When an organization signs on as a CPC site partner (or as a participant in the DPLA, for that matter), they agree that their metadata, like the fictitious example above, will be made freely available to the public without restrictions. All rights applied to the original object or content (be it prose, a painting, a speech, etc.) remain intact.
All this allows the CPC project to collect metadata, like the entry above, toss it together with similar entries (provided by a wide variety of organizations), and make them all searchable from one handy URL.
Here’s a wonderful blog post from Dan Cohen, Executive Director of the DPLA, about the choice of CC0 and the process of moving intellectual property assessments out of the legal realm and “into the [ethical] realm”.
My favorite parts of his writeup are the parts that call attention to the social contract. If we’re going to build something wonderful together (and we are), lets do it with the understanding that respecting the integrity of content benefits everyone. And information access and dissemination is unfailingly powerful.
This post is overdue as I talked to Mark Phillips at the University of North Texas in January. Still our conversation was so helpful that I want to recount some of the items we discussed. Mark and his team at the University of North Texas libraries have been leaders in the world of metadata aggregation, and he generously shared some of his expertise with me. We talked about two of his projects, the Portal to Texas History and Texas Heritage Online. The Portal takes a different approach from our project in that UNT digitizes and hosts all the materials that they make available. Mark noted that they prefer to handle the digitization and hosting themselves in part because they are focused on long-term preservation of these digital objects. Texas Heritage Online is more similar to our project because it does not host items; it aggregates metadata and sends users to the partners’ websites.
Mark brought up three issues that were very applicable to our project. The first was the issue of how much detail to provide with partner metadata. He has a lot of experience with how messy metadata from different institutions can be–different fields, different expectations of content, and many inconsistencies. He noted that they show users less metadata rather than more. By showing fewer fields on Texas Heritage site, they have less cleanup work to do on the partner records, and users can see the full record by clicking to the partner’s site. It was a revelation to me that showing less might actually be more useful to our users than showing more metadata that is messy, inconsistent, and possibly confusing.
The second issue that he suggested we spend more time on is instituting controlled vocabulary. Because Blacklight allows users to facet on fields, the fields need to be consistent to make the facets useful. I have written before that the facets on our beta site are not working effectively right now. One reason is that the partners often use the same metadata field for different content. For example, one partner might put measurements in the format field while another might put a text description–both are completely acceptable but don’t play well together. If we implement controlled vocabulary in some fields, it will make the fields more consistent and the facets will be more effective.
One example Mark gave was using a controlled vocabulary for dates. Our partners present their dates in different formats and with different content–some have days, some have only years, and some have a span of years. Mark suggested that if we create a decade date for each item, users could facet on the decade before encountering the inconsistent date formats from each organization. I am not exactly sure how we would implement controlled vocabulary but I think it would be a very valuable addition to make the CPC site more useful.
The third issue that Mark and I discussed is how closely we should follow a metadata schema. We are using Dublin Core as our schema right now but it is missing some fields that would be useful for us. For example, we have partner organizations as part of the metadata, and it is an important field. However, it doesn’t exist in Dublin Core. So we had been using the publisher field for the partner organization name. Mark suggested that we create our own fields, adapting Dublin Core to meet our specific needs. He also suggested that we clearly document all decisions that we make related to our metadata fields. The tension is the more we customize Dublin Core for our own project, the less easily it can be shared. The reason that schemas exist is so information can be easily shared in a consistent manner. If we create our own schema, we reduce that ability to share the metadata. I think that there is a happy medium here, but I’m not yet sure what it is.
As you can tell, Mark Phillips has years of experience and expertise. I so appreciated him taking the time to talk and share his valuable information with me!
We have been thinking through the Catalog part of the grant project and trying to make some decisions to move that part forward. We have been grappling with questions such as how do we want to handle internet resources in the Catalog and what to do about presidential speeches that often appear in multiple places on the internet. How many times do we want “Special Message from President Martin Van Buren” to appear in the Catalog?
We are also thinking through the relationship between the Catalog and the Connecting Presidential Collections website. We originally envisioned the Catalog as a way to identify presidential collections and their hosting organizations so that we could then partner with them and add the collections to CPC. But once we made the first effort to catalog a president’s collections, we were reminded how large and diverse the world of presidential collections is. We identified many interesting presidential resources that might be useful to include in the Catalog that I didn’t think would be appropriate for CPC. I couldn’t quite articulate to my coworkers why some of the resources in the Catalog prototype made me uncomfortable. Useful resources such as videos about the presidents on YouTube might be good to identify in the Catalog but wouldn’t work in my mind in CPC.
After talking through the issues with one of my coworkers, we hit upon a useful way to distinguish between the two parts of this IMLS grant. The Catalog and the CPC website are both focused on presidential collections but in fact their goals are quite unique. The Catalog will be useful to a wider audience if we include most of the presidential resources available, resources such as collections of digitized speeches, educational YouTube videos, and even perhaps lesson plans about the presidents. But CPC has a different mission.
The CPC website is focused on exposing hidden collections of presidential materials. The CPC doesn’t necessarily need to include an internet resource that anyone can easily find through a google search. What we hope to do with CPC is reach out to partner organizations to make their collections more accessible. It doesn’t matter to us if we make them more available by aggregating their metadata into CPC to increase traffic to their website or whether we make them more available by providing training to digitize materials and put them online. The goal is to shed light on valuable historical resources that might be hard to find right now.
This distinction between the two parts of the project was very helpful to me. We can include resources in the Catalog that might make it easier for people to find presidential resources already available on the internet. But CPC’s mission skews in another direction–to focus on the organizations with hidden collections that might need a little assistance in bringing them to light. I am sure that as we continue through our IMLS grant project, I will have many similar revelations that help clarify my thinking about our work. And each time, I will benefit from conversations with my coworkers and others involved in this universe, learning a little more each step of the way.
As one part of this grant project, we are digitizing reels of microfilm and then using different techniques to try to gather item-level metadata about the materials. One challenge with digitizing microfilm is that the end result can be just one big bunch of images with very little metadata to guide users. The resulting images can be quite impenetrable. We are trying to address that challenge by interconnecting finding aids with the images and extracting information from the handwritten documents.
We chose three sets of presidential papers in microfilm—The James Monroe Papers in Virginia Repositories, Millard Fillmore Papers, and the Papers of Rutherford Birchard Hayes. For each set, we are trying to track down the master reels for the microfilm. The master reels are the original reels, usually created as master negatives. In order to preserve them, the master reels are used very rarely. From those reels, the service reels are created. The service reels are those that a person might check out from the library.
We would like to use the master reels for the digitization because they are generally the most pristine copies of the materials. Nan Card, the Curator of Manuscripts at the Rutherford B. Hayes Presidential Center, has kindly offered us the use of the master reels of the Hayes Papers. Similarly, Cynthia Van Ness, the director of the Library and Archives at the Buffalo History Museum, is willing to let us use either master reels or pristine service copies to digitize.
For the third set of microfilm, The James Monroe Papers in Virginia Repositories, I have been having a hard time tracking down where the master reels are located. The University of Virginia’s Alderman Library undertook creating the Monroe Papers microfilm in 1969. The papers mostly focused on public documents (such as county court house records) during Monroe’s legal career and time in office (not just as president but also as a Virginia statesman and governor). It also included privately held letters. The National Historical Publications Commission (now the National Historical Publications and Records Commission), a federal agency, funded the project.
When I started trying to track down the master reels, I began first with the librarians at UVA’s Alderman Library and then contacted the NHPRC. I learned that neither organization held the master reels of the collection. Eventually I found that a private company, Primary Source Media, held the master reels of the Monroe collection. According to its website, it has been building one of the world’s largest microform archives for the last 40 years. When I contacted the company, I was told that it would cost me $2600 to order the 13 reels of microfilm.
I do not understand how Primary Source Media ended up with the master reels of this microfilm. The creation of The James Monroe Papers in Virginia Repositories was undertaken by a public university and paid for with federal funds. And most of the records in the collection are government records which should be in the public domain. I understand that the microfilm collection becomes a new work and is subject to different copyright but what I don’t understand is how these master reels have ended up with a private company. Can anyone shed light on this? Is it common practice for master reels of microfilm to be bought by private companies and then sold back to the public that funded their creation?
Ironically, the guide to the Monroe microfilm collection has a quote from Thomas Jefferson that states, “The lost cannot be recovered; but let us save what remains; not by vaults and locks which fence them from the public eye and use, in consigning them to the waste of time, but by such a multiplication of copies, as shall place them beyond the reach of accident.”
It is hard to believe that we are 2 months into the second round of this IMLS-grant project. It has been a busy 2 months, and I want to detail what we have been doing. Since we think about this project in terms of its 4 parts, I will talk about each of them individually.
Part I: Catalog of Collection-Level Presidential Materials—we are working with a consultant, Susan Perdue, on this part of the project. Since she is so familiar with the early presidents (because of her work on Founders Online), she has a good idea of where to start when looking for all the collections of materials for any given president of the Founding Era. We met with her in early December to get her started on doing the first entry for an early president. We are using her expertise to create a template for how to research this information, how to compile it, and what to include for each collection. We imagine that she will run across many questions that we will need to answer as well as decisions that we will need to make. This first president will serve as a test of the roadmap for how to create the catalog. Having Susan work on this project also fits well with our schedule. We are still wrapping up other projects before we can turn our attention fully to this. We are hopeful that Susan’s effort will give us a clear path to follow as we start to catalog materials for subsequent presidents.
Part II: Connecting Presidential Collections (CPC) website—the first round of this grant project resulted in a beta website. To begin this second round, we wanted to have an audit done of the Blacklight/Solr configuration that we are using as the site’s front-end and search index, respectively. Since the beta site included less than 12,000 items from just 6 partners, we wanted to make sure that the site is stable and will be able to scale up and support hundreds of thousands of items from dozens of partners (our goals over the next three years). We have hired Performant Software to perform an audit of our current Blacklight and Solr instances. They are going to set up a local copy of the CPC site and test it for stability, scalability, and usability issues and bugs. During this process, they are also going to write user documentation for importing new metadata (that was sorely lacking from the initial work) and write up their results and recommendations for the site’s next steps.
We also have a number of items we would like to change or improve on the existing site. For example, the faceting is not very effective right now, and some fields that were imported into Solr (and seem to be indexed by Solr) are not showing up on the item detail pages. We also interested in incorporating some new features into version 1.0 such as adding thumbnails to the detail pages and running some scripts on the metadata to make it more consistent. As of now, it is unclear how much work the existing site will need for us to consider it version 1.0. And since we are planning to have version 1.0 available by late spring, we may have to adjust how much we can accomplish before that time. Still, we are excited to have Performant working on the audit, and we will have more information in January to create a work plan for the site.
Part III: Partner Outreach and Training Materials—we are not focused on this part in these early months. Our schedule does not begin until February because we are not officially adding new partners until after the rollout of CPC version 1.0. However, we have been talking with a few new partners as well as reconnecting with existing partners. We have also been reaching out to other people who are working on similar projects. It is very valuable to talk to others who can share lessons that they have learned, give us advice and suggestions, and refer us to other people, articles, or websites from which we can learn. We spend a fair amount of time listening to learn from others and evangelizing about our own project.
This component of the grant will become much more a focus of our time in the spring and summer as we reach out to people at Presidential Sites and Libraries (and hope to meet many of them in person at the Presidential Sites and Libraries Conference in Little Rock, Arkansas, in June 2014).
Part IV: Scanning Microfilm of Presidential Papers—we have been spending a fair amount of time on this part. The first reason is that Waldo Jaquith, who was the point person for the microfilm, left the Miller Center this month to start the U.S. Open Data Institute. Before he left, he reviewed reels from the three collections of presidential papers (Monroe, Fillmore, and Hayes) so that he could make notes in our github repository for future reference. He wanted to note specific examples of challenges in the collections and ideas sparked by looking at the actual material.
We have also been spending time trying to decide whether to buy a scanner and do the scanning ourselves or outsource the scanning. In our grant proposal, we planned to buy a scanner and do the work onsite. With further research, however, we have concerns that we will not be able to afford to buy the most appropriate scanner for the budgeted amount. We have also been trying to track down the master negative reels, which are highest quality. With the Hayes papers, those reels are available but they are only available for the Monroe papers for about $2,000. (I will discuss that further in a future blog entry.)
The reasons to do the scanning ourselves include learning during the process and the ability to adjust the settings as we scan. The downside is the time and learning necessary for us to do the scanning. One option that we recently learned about is ribbon scanning, in which a reel is captured as one big image. This process might allow us to adjust the settings after the images have been captured. We are still gathering information to make this decision. (Just a note—if we change our plan, we will contact the IMLS to get approval before we reallocate funds.)
As we head into the holidays, we are excited to be working on the Connecting Presidential Collections project. In addition to the four parts detailed above, we have also been working on general housekeeping tasks such as choosing and implementing project management software to track the schedule and tasks. Because there are multiple people working on different parts of the project at the same time, it will be important to be able to track progress so that everyone can see where we are and what is being done. Thanks to the wonderful Jen Starkey at the Miller Center, we don’t have to closely track budget—she does that for the project with much effort and hair pulling (I imagine).
We will be back with more updates in 2014. Happy Holidays!