A course to be held at ALLC/ACH '96 in Bergen
| TITLE: | A Course in SGML, TEI, MECS, and an introduction to tools in text encoding for humanistic research |
| TIME: | Saturday 22 June, 1 pm - 7 pm Sunday 23 June, 10 am - 6 pm |
| PLACE: | University of Bergen |
| INSTRUCTORS: | Lou Burnard, Peter Cripps, Claus Huitfeldt, and C. M. Sperberg-McQueen |
| REGISTRATION FEE: | 400 NOK |
| REGISTRATION DEADLINE: | 1 June 1996 |
Topics to be covered include:
* General Principles of Text Markup: What is markup for?
Varieties of markup; effect of markup. What are electronic texts for?
Markup and interpretation. Markup as a means of enabling intelligent
retrieval.
* Basics of SGML: What it is and isn't; the case for using it. Basic SGML syntax for the document instance (tags, entity references, comment declarations). Examination and explication of simple examples.
* Basics of MECS: What it is and isn't; syntax of document and declarations; simple examples. Why isn't MECS the same as SGML?
* Document Analysis: What document analysis is, and why it is an essential part of any e-text project. Phases of document analysis. Group document analysis of a sample text.
* Basics of the TEI: origins and goals of the TEI, overall organization of the TEI encoding scheme, basic structural notions of the TEI DTD and the pizza model: the base, additional, and core tag sets, and how they may be extended, modified, and documented.
* Group tagging of the sample text in TEI and MECS.
* Special problems: discussion of sample texts posing
special problems for markup (e.g. hypertext, text with critical apparatus, text with literary, philosophical, or linguistic analysis), with tagging in MECS and TEI, and comparison of the two.
* Group tagging of further examples.
* Practical issues: types of software available for work with electronic texts in SGML and MECS, issues of project organization, publication on the net, and a review of where to go for further information.
Computer-aided research now crosses many political, linguistics, temporal, and disciplinary boundaries; the TEI Guidelines have been designed to be applied to texts in any language, from any period, in any genre, encoded for research of any kind. As far as possible, the Guidelines eschew controversy; where consensus has not been established, only very general recommendations are made. The object is to help the researcher make his or her position explicit, not to dictate what that position should be.
Viewed as a standard, the TEI scheme attempts to occupy the middle ground. It offers neither a single all-embracing encoding scheme, solving all problems once for all, nor an unstructured collection of tag sets. Rather it offers an extensible framework containing a common core of features, a choice of frameworks or bases, and a wide variety of optional additions for specific application areas. Somewhat light-heartedly, we refer to this as the Chicago Pizza model (in which the customer chooses a particular base - say deep dish or whole crust - and adds the toppings of his or her choice), by contrast with both the Chinese menu or laissez-faire approach (which allows for any combinations of dishes, even the ridiculous) and the set meal approach, in which you must have the entire menu.
In contrast to the top-down approach of SGML, whereby the structure of any legal document is determined by a DTD, MECS allows you to introduce codes into a document as and when the need for them arises. From a document written in this way one can then derive a minimal code definition table (CDT) which will, if desired, serve to limit the introduction of further codes. This way of working is of value when developing a code set by means of pilot project or similar. Alternatively MECS allows you to impose upon a user the need for a CDT, thereby ensuring that only a limited set of codes is used.
It is at present possible to define MECS such that all MECS-documents are SGML conformant document instances and vice versa. There also exists a range of MECS software which includes tools for code-syntax and/or CDT document validation, document formatting, SGML document conversion, code extraction, statistical document analysis, spell-checking etc.
Participation will be limited according to the space available.
Updated 15.04.96
Claus Huitfeldt