18 July 2013

Planning a Digital Archives

What does it take to build a digital archives?  To incorporate electronic records into an existing archives?

Those questions don’t have trivial answers.  At the same time, I would hope a digital archivist should be able to sketch out a high-level plan rather quickly.  As much as anything, a rough plan will identify questions that need to be answered.  Students in Clayton State University’s Masterof Archival Studies program begin their capstone project by creating such as plan, based on what they've learned throughout the program. Their response structures the work they need to do the rest of the semester -- find answers and explain the process.

The essence of a plan can be found in three questions that professional archivists should be able to answer (at a high level) rather quickly.

What is an archives?  In a nutshell, ‘archives’ may refer to the repository (as a space), to the program (as purpose and activities), or to the collections.  
          The plan must clearly define a space, distinct from folders, boxes, and shelves.  This high-level outline might describe that space as “digital storage,” setting the stage for future discussions of local and cloud storage, backup, and specific technologies designed for digital archives.  The plan must consider the purpose of the digital archives, either establishing, revising, or revisiting the vision, mission, and goals to consider. Finally, the plan must consider the collections, at a broad level: what records should the archives acquire to accomplish its purpose?

What does an archivist do?  The Academy of Certified Archivists’ Handbook identifies several broad functions that should be addressed in the plan.  The Handbook also includes professional, ethical, and legal responsibilities, which touch on the archival functions noted below.

Selection, appraisal, and acquisition.  At a more detailed level, what approaches will be used to identify records for consideration, what criteria will be used to include them, and how will the records be transferred?  Existing records retention schedules and collecting policies may need little modification to appraise records based on content.  Other factors, such acceptable formats, condition, and preservation costs may need to be addressed more specifically.  Procedures (including identifying specific tools) to transfer the records will almost certainly have to be developed for digital records.

Arrangement and description. Rumors of original order’s death may -- or may not -- be exaggerated.  At the same time, the plan must address approaches to organizing the records in storage (the virtual equivalent of folders and boxes).  The plan should also consider approaches to automate description through automated extraction of metadata and generating file lists.

Reference and access.  Making electronic records available online can be a great boon for access. At the same time, publishing records online raises questions about copyright, privacy, and other issues. 

Preservation and protection.  Even a high-level outline of digital preservation needs to identify a broad range of questions, ranging from backup and system security, demonstrating authenticity and integrity, format migration, preferred formats, and audits.  As important, who on staff will be responsible for the technical aspects of preservation?

Outreach, advocacy, and promotion.  These activities may not be urgent. Given the impact and costs of incorporating digital archives into the program, they are important and can't be delayed long.

Managing archival programs.  Beyond those activities specific to archivy, archivists must identify who will do the work and money to support both people, hardware, and software.

What is a record?  Although archivists and records professionals may debate the nuances, the heart of the definition -- in my mind -- focuses on information in a fixed format used as evidence of the past.  Geoffrey Yeo poses a similarly broad definition, “persistent representations of activities, created by participants or observers or their authorized proxies.” (In “Concepts of Records (1):Evidence, Information, and Persistent Representation.” American Archivist 70 (Fall/Winter 2007), p. 515-543.
          The plan needs to raise discussion about what will be preserved, of what constitutes the digital record to be preserved.  Are digital records necessarily preserved in native formats?  Is it significant that formulas in an Excel spreadsheet are lost when converted to PDF?  Is it possible to keep databases accessible indefinitely? If not, how can that information be captured?  How do these decisions affect the trustworthiness of the archives? 

02 July 2013

Technical knowledge necessary for archival jobs

Posted on Archives & Archivists, 1 July 2013, in response to a thread with Bruce Montgomery and Frederic Grevin, later joined by Fynnette Eaton.

I worked with digital archives for several years before coming to Clayton State to begin master's program in archival studies that emphasizes digital archives.  (And, with Susan Davis, I published the proceedings of New Skills for a Digital Era, a colloquium designed to know what technical skills archivists needs to work with electronic records. http://www.archivists.org/publications/proceedings/NewSkillsForADigitalEra.pdf)

IMHO, the more archivists know about technology, the better.  Consider what archivists know about traditional record formats.  While they may not have studied paper documents as a medium, they understand the material nature of the records because they grew up with them.  They may never have considered a staple to be metadata, but in fact it enforces the document's boundaries (pages included) and sequence. Archivists who work with photographs have to learn about the materiality of those materials and what could be learned from format and process, and how to preserve them.

Today, many people are sophisticated consumers of technology.  They can browse, send email, write documents, do spreadsheets, and more.  One reason they can do that is because the software vendors have made so many advances in hiding the complexity of the process.  (Ancient history: when using WordStar in the early 1980s, I had to edit the binary software to insert the printer codes if I wanted to take advantage of such advanced features as bold and italic. <g>)  The flipside is that many people who've grown up with computers don't know what's going on "under the hood."  The materiality of e-records is largely hidden from them -- all they see is what's on the screen.

By comparison, many people are sophisticated consumers of automobiles.  They can use maps, turn on the ignition, and get from point A to point B with little problem.  But when something goes wrong, when they have to do anything more technically complex than put gas in the tank, they're stuck.

Digital archivists need to understand the materiality of e-records to be successful.  A year ago, I asked my students what ASCII was.  I was shocked to discover only a handful knew. (If you're one of them: http://en.wikipedia.org/wiki/ASCII). In some ways, that's like being able to read and write, but not knowing what the word "alphabet" means.  How can archivists work with or preserve electronic records without such basic knowledge?  Would the concept of bitrot make any sense without an appreciation that the records are binary?  How will they be able to plan for format migration without knowledge of software versions?

At Clayton, all courses include a technology component.  For example, the course on appraisal and acquisition looks at tools like BagIt to ensure the integrity of transfers.  The program offers two courses to ensure that graduates have background in technology.  At a minimum, they'll know the concepts so they can have intelligent discussions with CS/IT folk.  They'll also have basic competency so they can do many basic tasks on their own.  (We expect processing archivists to be able to arrange records.  In the digital era, they'll need to be able to sort them -- which involves queries and software tools.)  Students will also have a foundation for proficiency -- they'll know enough to be able to continue learning new tools on their own.  Clayton also teaches a course in database design to ensure that students can properly manage information about the records.  I hope to offer an intensive course in digital curation and preservation tools very soon.

All courses are live, online lectures.  Students meet in a WebEx "classroom" weekly in the evenings (6:30p Eastern).  I believe coursework has significant advantages over intensive workshops.  Spread out over a semester, there's more time to cover more content.  Even more important, students can experiment at work during the week with what was covered in class.  Applying the information in a real world setting makes it practical and often raises great questions for class discussion. 

When you factor in transportation, housing and per diem, and workshop registration, Clayton's courses are competitively priced: about $1300 per course.  Courses are the same rate, regardless of where the student lives. (At this time, the program cannot accept students who live in Alabama, Alaska, Maryland, Minnesota, Oregon, or Wyoming.  The University System of Georgia is negotiating agreements with these states to allow residents to enroll in the online MAS program.)

Clayton welcomes individuals working in archives to take courses, either working towards a Master's or as a part-time, non-degree student. 

For more information, see http://www.clayton.edu/mas/ .  Or, give me a call.  We still have space for fall admissions. (Classes start 12 August.)