08 April 2014

Radcliffe Workshop on Technology & Archival Processing

Recently I had the privilege of participating in a workshop on technology and archival processing sponsored by the Radcliffe Institute for Advanced Study and the Association for Research Libraries.[1] The workshop sought to explore ways to apply technology to tangible collections, with only secondary consideration of born-digital materials.  In particular, how can technology facilitate arranging and describing archival collections?   A second, inherent question focused on how finding aids might change or be improved through technology.

Aaron Trehub of Auburn and I were asked to offer closing comments.  I offer my observations here, after taking time to reflect on the many excellent insights and ideas.

§ § §

Throughout the workshop, I wondered (as I often do) if archivists were confronting evolution or revolution.  Are we seeing the transformation of the profession?  Or, have things changed so much that we’re really witnessing the demise of archives (and archivists) as we know them?

I believe that  archivists must persevere in their noble profession because they serve a distinct role in society.  They are focused on the long-tail value of records, the usefulness of records long after they’ve been created.  William Maher observed that archivists “must stand fast and hold true to [their] role as custodians and guardians of the authentic record of the past,” “to provide  an authentic, comprehensive record that ensures accountability for our institutions and preservation of cultural heritage for our publics.”[2]

For years, I've said that what archivists do (at the abstract level) remains the same, but how we do must change.[3], [4]  Among other things, archivists

-      Select and acquire records that capture a complete (representative, if not exhaustive), accurate, and authentic story of the past.  If not, cultural memory will be lost, the future will not have the records its needs to understand its past, and individuals and organizations will not have the evidence necessary to protect their rights and interests.

-      Organize and describe those records to provide physical and intellectual control.  Archivists must help people find their way in what is, for many, a strange land of primary sources, where meaning often lies in the contextual relationship between records, relationships that reflect their provenance and original order – rather than in the document itself.  Archives aren’t your grandfather’s Dewey Decimal library, and can be alien and confusing for many.

-      Provide access and reference services.  Where some see archivists as gatekeepers and barriers to the records, the reality is that the archivists are advocates for researchers.  Not only do archivists help researchers find relevant records, they often helps researchers hone their questions.

Getting into the weeds a bit, how we do it must change.   In the past, when we transferred records from the file cabinets where they were stored during use, across the archival threshold, and into our custody, we put carefully placed them in boxes to preserve their original order.  That doesn’t work for electronic records.  The records are not on paper, but in databases, and may need to be extracted from fielded data and templates to a document-like report.   Not to mention, placing electronic records in box doesn’t make sense.  But that change is trivial, as we have a number of readily available possibilities.  Files can be placed in zip or tar files, then transferred via a network connection, thumb drive, tape, or disk.  The workshop suggested many more interesting possibilities, changes in how we do our jobs that re-envision new, more effective ways to work. 

Moving finding aids from typewritten paper to DACS/EAD files on the web was just a start.  To a large extent, digital finding aids are protodigital forms, a replication of the existing structure and functionality without taking advantage of the virtual medium.[5]  Not that I’m discounting DACS or EAD.  We must continue to describe our collections, but technology offers us much more than markup.  We need to take advantage of technology to go well beyond the protodigital and find new ways to connect researchers with relevant records they might formerly have overlooked.

Many would immediately think of the scale of information as the most significant change facing archivists.  While the size of backlogs and digital information is a problem, it’s hardly new.  Archivists have struggled with information explosions for years.  After World War I, Jenkinson specifically addressed the issue in his Manual of Archive Administration: Including the Problem of War Archives and Archive Making.[6]  The volume of records that resulted from the growth of the federal government during the Depression and following World War II drove Schellenberg and others at the National Archives to come up with new ways to manage both active records and archives.  And the phrase “information explosion” takes off in the 1960s, and is largely replaced in the 1980s by “paperless office.”[7]

At the workshop, I heard three themes of how technology can change how we do our job.  (Other themes were mentioned, of course.  And, there are other areas of the archival enterprise where technology will have impact, but the workshop focused on processing and providing access to collections.) 

First, researchers asserted that finding aids remain valuable.[8]  Hierarchical description based on provenance and original order is largely derived from European tradition.  In many ways, the model is as much pragmatic as theoretical. Archives have never had the resources for item-level description.  (In the early 20th century, the Library of Congress’ manuscripts processing manual bemoaned backlogs, even as it prescribed item-level calendaring.[9])  The structure remains useful as a framework.  The finding aid is an important means to document the original order of the collection, to preserve the contextual relationship between records.  New tools that can search repositories and assemble collections based on geotagging, name extraction, and more, described by Dan Cohen of the Digital Public Library of America, are invaluable tools.[10]  But those assemblages are artificial and do not have the authority of the order established by the creators, an order that reflects the primary value of the records.

Bill Landis observed that recent archival practice has trended away from item-level description, to higher and higher levels of abstraction.  I’ll argue that technology allows us to reverse that trend.  It gives us the tools to provide much more detailed access.  In the past, we didn’t have the staff or time to provide item-level access.  Now, we have access to computing power that can provide that access at an even more sophisticated level that goes beyond item-level access to data mining.  Many researchers don’t have ready access to the software or know how to use those tools.  That’s a service archivists can – and I think should – provide.   Trevor Owen noted that the fourchan records were put online as a zip file with a collection level description.  But why not pipe the collection through a full-text indexing tool and let people have at it.  People may find what they’re looking for in the text, but not in the collection level description.[11]

Second, archivists need to be better at what they do.  Which raises the question, what is better?  Ironically, better may be sloppier.  Lambert Schomaker, who presented on automated recognition of handwriting, noted that Google provides reasonable results.  At one point, he observed that archivists sought perfect results, an exact hit.  In archivists’ defense, I think there’s a profound difference between searching the web and searching records.  More often than not, the web has a range of documents that contain overlapping information, where archives hold unique documents that may be the only authoritative, authentic source of a very specific piece of data.  You might find someone’s birthday scattered across the web, but their birth certificate is likely in one place.  Even so, Schomaker’s point is well-taken.  It’s better to have a mess of reasonably relevant documents than nothing at all.  Google can get you in the neighborhood and give you clues where to look.

Luis Francisco‐Revilla noted that there was no consistency in how a group of archivists – working separately – arranged a small collection of personal papers.  In response, one participant[12] expressed her concern that there were no normative practices for arrangement and much of archival practice.  (I expressed some skepticism about the test.  Original order is a normative principle, but personal collections are notorious for being chaotic with no meaningful order to preserve. Moreover, I argued – to tweets in agreement – that such a small collection didn’t merit any arrangement; to the extent arrangement facilitates rapid access, it would take very little time for a researcher to peruse such few records.  Again, providing access without arrangement may be an example where sloppy may be better.)

Better also means that we need to think about what the finding aids say about the collections.  Do they answer users’ questions, help them finding relevant collections and records?  One researcher wanted more back story on how the collections were acquired, something usually missing from finding aids.  One researcher’s comment that scope notes were of little value might have pained the archivists in attendance (it broke my heart), but I don’t find the observation surprising.  Recently, I asked my students to do a survey of mission statements and collecting policies on university archives’ websites.  What they found were often little more than a few bullet points of questionable value because they had little substance that would help users (or archivists) know what was in or out of scope.  A recurring theme at the workshop was that finding aids needed to do more than report the structure of the collection.  I’ve always admired Cutter’s Rules, although more than a hundred years old, because he begins with a strategy that focuses on the user.  His last object for the catalog is “to assist in the choice of a book as to its edition [and] as to its character.”[13]  I believe that spirit needs to be at the heart of finding aids, to be way-finders, to help researchers make sense of the collection.  The quality of description must be measure by the degree to which they communicate the information researchers need, not the degree to which they comply with formal rules.

Finally, and possibly most important, are archivists so wed to the tradition of how we do things that we can’t (or won’t) innovate?  When working on a project to explore automated workflows to process digital collections, a participant whose job was processing collections and proud of her craft fumed at her supervisor, “You can’t automate what I do!”  He responded, “You’re exactly right!  We don’t want to automate what you do.  We need to do something different.” 

That is a revolutionary statement that could portend the demise of archivists.  I am concerned that if archivists don’t step up to the plate, if they don’t adapt and take advantage of technology, they may become extinct and others may take our place.  I’ve already seen examples of this.  When heads of companies and government agencies get questions about email, they call the head of IT, not the records manager or archivist.  I suspect most archives are struggling with limited resources to managed an overwhelming number of tangible records.  But to ignore these tools, to be tied to historical approaches can paint records managers and archivists into a corner.  Investing at least some time experimenting with and touting innovative uses of technology may be an essential part of outreach that demonstrates we remain relevant and current.

At the closing reception, a participant questioned my observation, asking if the archival function would persist, even if others took our place.  I don’t know that the fundamental value of archives – the function of cultural memory that sees the long-tail value of some records – will persist.  Technologists, like the record creators, are appropriately focused on the job at hand, the here and now.  They aren’t focused on “paperwork” or how the records that result for the work might be needed in ten, fifty, or a hundred years. 

Archivists, I believe, should view the present from a future perspective.  What will the future need to remember about its past (our present)?  We need to be creative, and we need to put aside practical worries long enough to think big, think outside the proverbial box (records center or virtual).  We can’t let the desire for the perfect finding aid be the enemy of the possible.  After all, our patrons are accustomed to Google search results.

[1] See Corydon Ireland, “Books meet Bytes,” Harvard Gazette (4 April 2014) for a description of the first day of the conference.  http://news.harvard.edu/gazette/story/2014/04/books-meet-bytes/.  See also the Twitter feed by searching #radtech14.  Shane Landrum was actively tweeting and captured a summary at https://github.com/cliotropic/radtech14.
[2] “Lost in a Disneyfied World: Archivists and Society in Late-Twentieth-Century America,” American Archivist 61 (Fall 1998), p. 261, 263.
[3] “Janus in Cyberspace: Archives on the Threshold of the Digital Era,” American Archivist 70 (Summer/Spring 2007), p. 13-22.  Available online at http://archivists.metapress.com/content/n7121165223j6t83/fulltext.pdf.
[4] I would like to acknowledge that Catherine Stollar and Thomas Kiehne challenged my formulation, proposing instead “What we do as archivists will change (practice), but why we do it will not (theory).”  See Richard Pearce-Moses and Susan E. David, New Skills for a Digital Era (Society of American Archivists, 2008), p. 64.  Available online at http://www.archivists.org/publications/proceedings/NewSkillsForADigitalEra.pdf.
[5] Kudos to Ken Withers of the Sedona Conference for coining the term ‘protodigital.’
[6] (Clarendon Press, 1922).  Available through Google Books.
[7] Dates based on Google ngram analysis.
[8] Suzanne Kahn and Rhae Lynn Barnes, two historians actively involved in research, discussed their perspectives on finding aids as part of the program.  Both noted that finding aids, even if imperfect, were valuable for a variety of reasons.  Other speakers on the panel, moderated by Ellen Shea, included Trevor Owen and Maureen Callahan.  Callahan’s presentation is on her blog at http://icantiemyownshoes.wordpress.com/2014/04/04/the-value-of-archival-description-considered/
[9] J. C. Fitzpatrick. Notes on the Care ,Cataloguing, Calendaring and Arranging of Manuscripts (Library of Congress, 1913). Available from the Hathi Trust at http://hdl.handle.net/2027/uc2.ark:/13960/t7br8zr3b.
[10] Cohen gave a brilliant opening plenary that did a great job setting the stage for the discussion. 
[11] In defense, Rome was not built in a day, and the archives deserves credit for what it did, not criticism for not doing even more.  I ask the question to illustrate how these approaches must become so commonplace that they’re routine.
[12] In the spirit of the Chatham House Rules, I omitted names of people making comments unless they were part of the published program or unless they tweeted their comments publicly.  Anyone who wishes to be acknowledged may contact me to have this piece edited, or they may identify themselves in the comments.
[13] Charles A. Cutter, Rules for a Printed Dictionary Catalogue (Department of the Interior, Bureau of Education, 1876).  Accessible through Google Books.

8 April 2014 : 1:48 p.m. EDT.  Corrected Dan Cohen's name.  I have no idea who Fred Cohen is a participant in the InterPARES Trust project.  Apologies! <g>

29 October 2015 : 12:15 p.m. EDT. Grammatical edit, and I remembered who Fred Cohen is.  


  1. I am already done it and find that this post is really amazing.
    blogg tech