KEYWORDS: digital archives, user interfaces, data security
AFFILIATION: Max-Planck-Institut für Geschichte, Göttingen
E-MAIL: saumann@gwdg.de FAX NUMBER: +49 - 551 - 495670 PHONE NUMBER: +49 - 551 - 495632
Recent discussions on the possibilities to store digital manuscript material have most oftenly focused on the possibility to produce high quality representations of a rather restricted amount of digitized source material. In the archival world, on the other hand, digital systems have frequently been designed with the understanding that the digital storage of bulk material is primarily a replacement of the classical microfilming operations of archives.
Using a German project, which intends to create a pilot "edition" of a serial source of ca. 50,000 pages, this paper discusses how far archival systems can provide a starting ground for an incomparably more intensive access to bulk material than traditional techniques.
The presentation will start with a short review of the existing access methods for digital archives. It is well known that while the scanning campaign of a digitization project represents a serious organizational task, the provision of the various access tools which allow a user to access the digitized material, actually requires a considerably larger effort.
Let us recapitulate what the purpose of these access mechanisms is. The user of a digital facsimile or edition should have the possibility to select those pages he or she wants to look at by specifying characteristics of the text contained on the individual pages. The user of a digital archive should have the possibility to access by similar means all parts of the archival holdings which interest him or her in the form of high quality reproductions right at the desk in the user's room in the archive.
Traditionally this is done with the help of either full text retrieval systems or structured databases which contain descriptions of the material, which makes their preparation rather time consuming. Three forms of access can be differentiated between.
1) Access by Browsing
The user encounters the manuscript(s) as a - potentially structured - collection of pages. (S)he pages through the material in the order imposed by the structure holding the documents.
This is the only traditional access tool which can be realized speedily. More popularly speaking: you go to the traditional catalogue of the archive, look up the shelf mark, enter it into the computer (or select it there from a list) and get the first page of the relevant document onto your screen.
2) Access by Query
The user specifies a query in the query language of an underlying database system. This query addresses formal descriptions - which can contain partial or complete transcriptions - of the document. As a result the user is presented with an ordered collection of qualifying pages.
Less technically: you save the excursion to the catalogue, which is itself administered by the computer as a database in which you can employ traditional database tools. The problem with such an approach, as mentioned before, is that it is usually a very complex operation to convert a traditionally very flexible and highly irregular archival catalogue into a rigidly structured database.
3) Access by content
Partial or complete transcriptions are loaded into a fulltext system, presenting the complete vocabulary of some holdings as an "active list". By dynamically specifying the formulae needed, the selection is narrowed down to a manageable number of documents, which are then displayed.
Because of the heterogenous nature of traditional archival tools, such a conversion is usually easier to accomplish than the creation of a rigidly structured database. This idea to create a computer based access tool directly out of an existing one leads us one step further, to:
4) Access by Digitized Versions of Traditional Tools
An existing catalogue or findbook is digitized itself. The digital version of this tool can be accessed by any of the access methods described so far. "Activating" an entry of the digitized tool intializes the display of the page(s) described by it.
Less formally: you search within a graphic reproduction of the old catalogue on the screen and click on a specific entry within it to see the first page of the file described by that entry.
This notion of using a visual object as an access tool for other visible objects leads directly to:
5) Access by a Graphic Overview
The organizational scheme representing the order of the collection - for example a map of a community or territory - is presented as user interface. By activating a "house" or "location" on the map, the related documents are displayed.
More intuitively: you click on a map to start browsing through all the documents related to the village clicked on. While this is more intuitive, it can be shown however, that for actual access to information within real-size historical territories, the popular "clickable" map of toy applications may need some rethinking to reach an acceptable information density on the screen.
6) Access by Fragment
Significant sections of the manuscript - for example illuminated initials or miniatures - are administered as a primary database. By activating such a fragment the part of the complete manuscript from which it is taken is displayed.
Few experiences with this kind of approach exist yet; it remains to be seen whether such a tool which has been used experimentally within the realm of digital facsimiles can successfully be extended to large scale digital archives.
After having shown examples of the basic access mechanism, we go on to demonstrate, that the actual software functionality required to implement these techniques is very closely related to the functionality which has been implied by Dino Buzzetti's discussion of variant readings.
By this we assume to have demonstrated, that the various possibilities to use digitized manuscript material are closely related to each other: which supports the thesis, that the appropriate response of archival institutions to the new technologies should primarily be in the creation of an institutional framework, which is sufficiently flexible to allow one and the same institution to act as a logistical host for a few groups of manuscripts with very intensive editorial information assigned to them, while acting at the same time as supplier of very shallowly described mass documents.
This may seem doubtful for one reason: documents, into which extremely intensive editorial preparation has been invested create different problems of copyright and protection against illegitimate distribution than mass documents with few, if any, explanatory information attached to each individual page.
We close our considerations on digital archives therefore with a discussion of the protection mechanisms employed within the organizational and software environment from which the examples of this paper are drawn. Data security in the case of archives arises broadly from three reasons.
a) The institutions from which the source material originates have been awakened recently to the problems of copyright with regard to digitized source material.
Museums are afraid that they will be robbed of large revenues if cheaper pictorial reproductions of their holdings, and particularly reproductions, which can easily be copied, get around. This is not quite so obvious in the archival case, but certainly represents a reason for much concern for an author of a digital facsimile or edition.
b) While nobody in a small city archive really believes that they will loose huge sums because their early 16th century account books can easily be copied, there is a widespread fear in the archival world that the systematic digitization of source material will threaten the position of the archives in two ways. On the one hand there is a widespread feeling that these technologies will let the archives lose control over their material. There is probably no technical answer to that: it is part of the implications these technologies have for the organisation of the research process. A more immediate fear, particularly in smaller archival institutions, is related to the fact, however, that many archives get funded among other reasons because the local authorities get convinced of the importance of an institution which has so and so many users a year. This effect, it is feared, will get lost when large portions of the archival holdings are accessible from the outside. c) A third problem arises with sensitive material, as, for example, in the case of an attempt to convert the holdings of the archive at the former concentration camp in Auschwitz into digital form. While the manipulation of high quality images is not quite as easy as that of low quality reproductions on which it is usually demonstrated, in the case quoted the danger of falsifications produced by some right-wing lunatics to prove the non-existence of the holocaust is quite real.
Within the various projects implicitly discussed here, we have not yet found any definite solutions to these problems. However in general the following procedures will probably be implemented. To protect the rights of the institution generating the material, it will be distributed in an internal format, which can only be accessed with a specific copy of the program issued with it; which should solve the problems described under a) and b).
In that area we assume that any protection scheme can only protect as long as no serious criminal attempt is made to break it. (If you want to produce a non-copyrighted version of a fairly traditional publication, you can do so just as well.) In the last case, however, where historical integrity is in question, and the potential offenders have a clear criminal potential, this is deemed insufficient. In principle it will always be possible to display visual material on a computer and dump a copy of the screen into a file, where it can then be processed further. While it requires quite some effort to recreate out of such dumps the original quality, it could in principle be done. The distribution of the material is not the problem in a case like Auschwitz: the more people see the authentic sources about the holocaust, the better. It has to be possible, however, to prove easily that a specific visually reproduced document has not been tampered with. For such purposes digital reproductions of images or manuscripts can contain embedded "watermarks" or "seals" which are as difficult to break as the identification codes for credit cards and similar devices.
The presentation concludes by an attempt to show briefly, how these mechanisms for the protection of manuscript security fit into the overall logic of manuscript processing, which is supposed to be the covering theme of this session.