The Helsinki Corpus of English Texts

MANUAL TO THE DIACHRONIC PART OF

THE HELSINKI CORPUS OF ENGLISH TEXTS

CODING CONVENTIONS AND LISTS OF SOURCE TEXTS

Third Edition

Compiled by Merja Kytö
Department of English
University of Helsinki
Helsinki, 1996

Merja Kytö
Manual to the Diachronic Part of the Helsinki Corpus of English Texts: Coding Conventions and Lists of Source Texts

Information on the Helsinki Corpus and the manual available from:

The HIT Centre/Humanities Information Technologies Research Programme
Allégaten 27
N-5007 Bergen
Norway
E-mail: icame@hit.uib.no
URL: http://www.hit.uib.no/
Tel: +47 55 582954/55/56
Fax: +47 55 589470

The Oxford Text Archive
Oxford University Computing Services
13 Banbury Road
Oxford OX2 6NN
United Kingdom
E-mail: info@ota.ahds.ac.uk
URL: http://info.ox.ac.uk/~archive
Tel: +44 1865 273238
Fax: +44 1865 273275

Department of English
P.O. Box 4 (Yliopistonkatu 3)
FI-00014 University of Helsinki
Finland
E-mail: merja.kyto@helsinki.fi
merja.kyto@engelska.uu.se
Fax: +358 9 19123072
+46 18 4711229

Third Edition
ISBN 951-45-7470-2
Helsinki University Printing House
Helsinki, 1996

Note:

For the Helsinki Corpus

* texts and file names, see Part One, Section 2
* distribution formats, see Part One, Section 4

For the use of the Helsinki Corpus

* and the Oxford Concordance Program, see Part One, Section 5.1.
* and the WordCruncher (4.1; 4.30), see Part One Section 5.2.

For looking up the source texts

* according to authors and names of texts, see Part Two
* according to abbreviated titles, see Part Three

Preface

The Helsinki Corpus of English Texts: Diachronic and Dialectal is a computerized collection of extracts of continuous text. It is the result of a project commenced in 1984 and directed by Matti Rissanen and Ossi Ihalainen at the University of Helsinki. The Corpus contains a diachronic part covering the period from c. 750 to c. 1700 and a dialect part based on transcripts of interviews with speakers of British rural dialects from the 1970's. The aim of the Corpus is to promote and facilitate the diachronic and dialectal study of English as well as to offer computerized material to those interested in the development and varieties of language. The material is intended for both mainframe and microcomputer use.

The aim of this guide is to help the users of the text files included in the diachronic part of the Corpus. It contains a key to coding conventions, and lists of source references and abbreviated titles to the extracts included. A volume by Rissanen et al. (1993, see Note 3) discusses the principles of compilation and offers a number of pilot studies illustrating the use of the material.

A brief introduction to the overall structure of the diachronic part will precede the list of source texts and the index of abbreviated titles. For the benefit of those interested in dialect material, a concise description of the dialectal part is included. The guide will give information on the character set and coding conventions used in the diachronic part, with practical advice on how to identify examples of data drawn from individual texts.

A number of the sections printed here accompany the Corpus as text files. These sections include Texts and File Names (Part One, Section 2), Character Set (Part One, Section 3.1.), Source Texts (Part Two) and Abbreviated Titles (Part Three). Owing to restrictions set by the lay-out format adopted and the character set used in most mainframe systems, other sections only appear printed in this guide.

The following researchers have participated in the team work resulting in the present version of the diachronic part of the Helsinki Corpus:

Old English:

Leena Kahlas-Tarkka, Matti Kilpiö, Ilkka Mönkkönen, Aune Österman

Middle English:

Inkeri Blomstedt, Juha Hannula, Mailis Järviö, Leena Koskinen, Saara Nevanlinna, Tesma Outakoski, Päivi Pahta, Kirsti Peitsara, Irma Taavitsainen

Early Modern English:

Merja Kytö, Anneli Meurman-Solin, Terttu Nevalainen, Helena Raumolin-Brunberg, Ritva Tiusanen

The Old English section has profited especially from the work of Leena Kahlas-Tarkka and Matti Kilpiö, and the Middle English section from that of Saara Nevanlinna and Irma Taavitsainen. In compiling the Early Modern English section, Terttu Nevalainen and Helena Raumolin-Brunberg have been responsible for collecting the Southern British English texts, Anneli Meurman-Solin the Scottish texts, and Merja Kytö the American English texts.

The Old English section of the Corpus is based on the material taken, with the kind permission of the Editors, from the machine-readable transcript (Release 1, October 1982) prepared for the Dictionary of Old English Project at the University of Toronto. We are truly grateful to acknowledge the help offered by the Editors, Antonette diPaolo Healey and Ashley Crandell Amos, and the staff of the Dictionary.

The researchers who have contributed to building up the dialectal part of the Helsinki Corpus include Markku Filppula, Jussi Klemola, Anna-Liisa (Ojanen) Vasko, Kirsti Peitsara, Ossi Stigell and Irmeli Tammivaara-Balaam. A special mention should be given to Gunnel Melchers for permission to include her transcriptions of the Yorkshire material in the Corpus.¹ The availability of the dialectal texts depends on the individual researchers, who have full copyright for the material (for permission to use the files, contact Matti Rissanen).

Matti Rissanen has supervised and coordinated the work on the diachronic part and Ossi Ihalainen on the dialectal part; Merja Kytö has acted as the project secretary coordinating the team work and devising the database arrangements.

The team of undergraduate students at the English Department who have given an invaluable contribution in keying in and proofreading the texts includes Kirsi Heikkonen, Jussi Klemola, Asta Kuusinen, Tuula Lehtonen, Tom Löfström, Arja Nurmi, Minna Palander, Tiina Selki and Päivi Öhman. We have also been helped in proofreading by Lisa Arnold, Päivi Käki and Deborah Ruuskanen.

We are most indebted to Norman Blake, Derek Pearsall, Martyn Wakelin, Roger Lass and Robert Stanton for helping us solve many problems concerning the compilation of the Corpus. A number of visiting scholars have generously given their time and expertise in proofreading the texts. They include Fran Colman, Christiane Dalton-Puffer, Anne Finell, Jonathan Hope, Hanna Mausch, Irmeli Valtonen, Susan Wright, and Brita Wårvik.

Invaluable help in more technical problems has been received from, among others, Stig Johansson, Knut Hofland, Lou Burnard, Daryl Gibb, Kimmo Koskenniemi, Visa Rauste, Hannu Hartikka, Teo Kirkinen, Leena Sadeniemi and Seppo Syrjänen.

We owe thanks to all those who have granted us permission to use the texts included in the Corpus. We are particularly happy to acknowledge the generosity of all authors of the editions, and especially the following persons, publishers and institutions:

Almqvist & Wiksell International, Stockholm, Sweden
American Philosophical Society, Philadelphia, Pa., U.S.A.
Edward Arnold, Kent, U.K.
A. Asher & Co. B.V., Amsterdam, The Netherlands
Cambridge University Press, Cambridge, U.K.
Centaur Press Ltd., Arundel, U.K.
The Chetham Society, Manchester, U.K.
Professor Peter Clemoes, Emmanuel College, Cambridge, U.K.
Columbia University Press, Irwington, Ny., U.S.A.
Constable Publishers, London, U.K.
J. M. Dent & Sons Ltd., London, U.K.
The Early English Text Society, Oxford, U.K.
Elsevier Science Publishers, Physical Sciences & Engineering
Division, Amsterdam, The Netherlands
Garland Publishing Inc., New York, Ny., U.S.A.
H. Gianotten, Tilburg, The Netherlands
Hakluyt Society, London, U.K.
The Johns Hopkins University Press, Baltimore, Md., U.S.A.
Houghton Mifflin Company, Cambridge, Ma., U.S.A.
Indiana University Press, Bloomington, In., U.S.A.
Manchester University Press, Manchester, U.K.
The Modern Language Society, Helsinki, Finland
Norfolk Record Society, Norwich, U.K.
Oxford University Press, Oxford, U.K.
Professor Derek J. Price, Yale University, New Haven, Ct., U.S.A.
Princeton University Press, Princeton, Nj., U.S.A.
Random Century Group, London, U.K.
Royal Historical Society, London, U.K.
Scolar Press, Berkeley, Ca., U.S.A.
The Surtees Society, Durham, U.K.
Université de Liège, Liège, Belgium
The University of Michigan Press, Ann Arbor, Mi., U.S.A.
The University of Tennessee Press, Knoxville, Tn., U.S.A.
The University of Toronto Press, Toronto, Canada
Unwin Hyman Ltd., London, U.K.
The Wellcome Institute for the History of Medicine, London, U.K.
Carl Winter Universitätsverlag, Heidelberg, Germany
Yale University Press, New Haven, Ct., U.S.A.

A bibliography of the works published or in preparation is given in Appendix 2. The team of com-pilers wishes to be informed about published research based on the Corpus, and the scholars using the Corpus are requested to report on studies published and in progress to Matti Rissanen, address: Department of English, P.O. Box 4 (Yliopistonkatu 3), FIN-00014 University of Helsinki, Finland.

Preface to the Second Edition

In the second edition, Appendix 2 has been updated and Appendix 3 added. Bibliographical references and distribution information (formats, addresses etc.) given in Part I have, similarly, been brought up-to-date. A few minor corrections have been introduced in the text throughout.

Preface to the Third Edition

In the third edition, Appendix 2 and the information given on distribution and other addresses has been updated.

CONTENTS

Preface
Preface to the Second Edition

Preface to the Third Edition

PART ONE

INTRODUCTION

1. Overall Structure
1.1. Diachronic Part
1.2. Dialectal Part

2. Texts and File Names

3. Coding

3.1. Character Set
3.2. Main Coding Principles

Preliminary Remarks
1. General Format
(1a) Lines
(1b) Paragraphs
(1c) Word-boundaries and Punctuation
(1d) Hyphens
(1e) `Ash', `Eth', `Thorn' Etc.
(1f) Superscript (=)
(1g) Abbreviations (~)
(1h) Accents (`)
(1i) Material Excluded

2. `Text Level' Codes

Preliminary Remarks
(2a) `Font Other Than the Basic Font' (^...^)
(2b) `Foreign Language' (\...\)
(2c) `Runes' (}...})
(2d) `Emendation' [{...{] and `Editor's Comment' [\...\]
(2e) `Heading' [}...}]
(2f) `Our Comment' [^...^]
(2g) `Multiple Text Level Codes'

3.3. Reference Codes

3.3.1. Preliminary Remarks
3.3.2. COCOA Format

3.3.3. An Example

3.3.4. Reference Code Values

Preliminary Remarks
(1) <B = `Name of Text File'
(2) <Q = `Text Identifier'
(3) <N = `Name of Text'
(4) <A = `Author'
(5) <C = `Part of Corpus'
(6) <O = `Date of Original'
(7) <M = `Date of Manuscript'
(8) <K = `Contemporaneity'
(9) <D = `Dialect'
(10) <V = `Verse' or `Prose'
(11) <T = `Text Type'
(12) <G = `Relationship to Foreign Original'
(13) <F = `Foreign Original'
(14) <W = `Relationship to Spoken Language'
(15) <X = `Sex of Author'
(16) <Y = `Age of Author'
(17) <H = `Social Rank of Author'
(18) <U = `Audience Description'
(19) <E = `Participant Relationship'
(20) <J = `Interaction'
(21) <I = `Setting'
(22) <Z = `Prototypical Text Category'
(23) <S = `Sample'
(24) <P = `Page'
(25) <R = `Record'

4. Distribution Formats

4.1. Availability
4.2. Larger Units

5. Using Concordance Programs

5.1 Oxford Concordance Program
5.2. WordCruncher

5.3. Lexa

Notes to the Preface and Sections 1 to 5

PART TWO

SOURCE TEXTS

Introductory Note
1. Old English

2. Middle English

3. Early Modern English

PART THREE

ABBREVIATED TITLES

Introductory Note
Abbreviated Titles

Appendix 1.

Full List of `File Names' and `Text Identifiers'

Appendix 2.

Studies Using Evidence from the Diachronic Part of the Helsinki Corpus [OMITTED]

Appendix 3.

Lexa - Corpus Processing Software

PART ONE

INTRODUCTION

1. Overall Structure

The principles of compilation² of the Helsinki Corpus reflect the view that linguistic change should be approached through evidence based on synchronic variation inherent in the structure of the language studied.³ It follows that the researcher of variant expressions conveying one and (more or less) the same meaning welcomes access to a rich collection of texts, representative of various text types, levels of style and modes of expression, geographical and social varieties etc. The emergence of computer techniques has radically changed corpus-based study of language. Depending on the topic, hours of tedious work spent on the manual collecting of data can be cut down by using a computer for compiling and processing routines. The compilers of the Helsinki Corpus aim at facilitating access to versatile data for those interested in computer-based corpus linguistics and the structure and development of the English language.

1.1. Diachronic Part

The diachronic part of the Helsinki Corpus includes a basic selection of texts compiled from the Old, Middle and Early Modern (British) English periods, and a supplementary part focusing on regional varieties (Scots now available and early American English in preparation). Except for shorter texts given in toto, the length of the extracts varies from 2,000 to 10,000 words. At present the Old English section of the Corpus contains 413,300 words, the Middle English section 608,600 words and the British English section 551,000 words, a total of 1,572,800 words (the figures exclude passages in foreign languages, and our own and the editor's comments).⁴ In the supplementary part the Scots section contains 870,000 words and the early American English section 300,000 words. In the basic selection, the focus of this guide, the different sub-periods are represented as follows:

Sub-period

Words

%

OLD ENGLISH
I -850
II 850-950
III 950-1050
IV 1050-1150
Total

2 190
92 050
251 630
67 380

413 250

0.5
22.3
60.9
16.3

100.0

MIDDLE ENGLISH
I 1150-1250
II 1250-1350
III 1350-1420
IV 1420-1500
Total

113 010
97 480
184 230
213 850

608 570

18.6
16.0
30.3
35.1

100.0

EMODE, BRITISH
I 1500-1570
II 1570-1640
III 1640-1710
Total

190 160
189 800
171 040

551 000

34.5
34.5
31.0

100.0

By and large, the selectional criteria adopted for including a text in the diachronic part of the Corpus reflect the principles of socio-historical variation analysis: the selection strives for a representative coverage of language written in a specific period. Periodization has been of primary importance (with regard to the date of the original text and the date of the manuscript), but attention has also been paid to geographical dialect, type and register of writing (text type, relationship to spoken language, setting on formal-informal axis) and sociolinguistic variation (different author-related parameters such as gender, age, social rank). For a more detailed discussion, cf. Rissanen et al. (1993, cf. Note 3). At the moment, no grammatical tagging is included, though texts are described by a system of parameter codings (cf. Section 3.3.).

As pointed out above, the Old English section of the Corpus is drawn from the text files of the Toronto Corpus.⁵ The material has been edited by both computer and manually to make it compatible, though not fully identical in all details, with the format adopted for the material keyed in at Helsinki. Mention will be made of the main discrepancies between the Old English section and the Middle and Early Modern English sections of the Corpus in the relevant parts of the guide.

A number of texts included in the Middle English section of the Corpus are based on computerized material obtained from the Oxford Text Archive. Both manual and automatic editing has been carried out to convert the material into the Helsinki Corpus format. The texts based on the Oxford Text Archive material include the extracts drawn from Katherine, Margarete, Juliane and Hali Meiðhad in the Katherine Group (MS Bodley), from Layamon's Brut (MS Caligula), Chaucer's Canterbury Tales, The Pricke of Conscience, and Cursor Mundi.

The texts keyed in at Helsinki are based on the best editions available. In most cases the use of original manuscripts has not been possible. All Middle and Early Modern English texts have been proofread twice and the Toronto-based Old English texts once. It is obvious, however, that a certain number of errors inevitably remains.

1.2. Dialectal Part

The material included in the dialectal part of the Helsinki Corpus is transcribed orthographically by the postgraduate students who made the recordings in the 1970's. The material, based on interviews of elderly male and female natives of small rural villages, is unique in the field of present-day dialectology, characterized as it is by an interest in urban varieties. As the speakers may show great variation in some areas of grammar even within the range of the same local dialect, the aim has been to include a substantial sample of material from each individual speaker rather than take a great number of shorter passages from several speakers. The following geographical regions are covered at present (the material collected from the West Midlands of England will be included in the near future):

Region

Words

EAST ANGLIA
THE SOUTH-WEST
YORKSHIRE
CLARE (IRELAND)

Total

117,700
254,000
16,000
18,900

406,600

The textual codes (in COCOA format, cf. Section 3.3.2.) introduce each sample of material, identifying the speaker and indicating, among other things, his or her age and occupation, the county, the village and the date of recording (<F = `file name'; <S = `speaker'; <G = `gender'; <A = `age'; <O = `occupation'; <C = `county'; <V = `village'; <D = `dialect'; <I = `interviewer'; <P = `page number').

Tagging and parsing systems that will make it possible to add grammatical information to the data are being prepared.⁶ Experiments with the CLAWS1 tagging suite (originally devised for the LOB Corpus) have yielded promising results.⁷ Much interest lies in experiments aimed at linking the database with sound by way of an interface built on the Macintosh Hypercard and Mac-Recorder applications.

For the availability of the material included in the dialectal part, contact Matti Rissanen, cf. above.

The principles of organizing and coding the material included in the diachronic part will be discussed in the sections below.

2. Texts and File Names

The following short title list of texts and text file names presents the contents of the diachronic part of the Corpus in a condensed form. The list is organized to show how different texts and text files relate to sub-period and text type codings. The relevant word(s) locating an entry in Part Two of this guide are given in bold face (for the representation of non-ASCII characters, see below).

The list follows the order of sub-period sections. When dealing with texts which have different codes for the original and the manuscript versions, the date of the manuscript has been followed. Within the sub-periods, the texts are listed according to the text type code they have been ascribed.

The order of text types is, by and large, that given below. In most cases a file contains samples coded as representative of one text type only. Sometimes, however, it has been necessary to ascribe different text type codes to samples of text located in one and the same file. In the list below these files are indicated with asterisks (** = `the file contains several types of text'; * = `this extract can be found in the file with the same name marked by two asterisks').

TEXTS AND FILE NAMES:

PERIOD, TEXT TYPE AND TEXT (FILE NAME)

OLD ENGLISH

OE I ( -850)

Documents

DOCUMENTS 1 (HARMER; ROBERTSON; BIRCH) (CODOCU1)

Text type undefined (OE verse)

CAEDMON'S HYMN; BEDE'S DEATH SONG; (CONORTHU)

THE RUTHWELL CROSS; THE LEIDEN RIDDLE

OE II (850-950)

Law

LAWS (ALFRED'S INTRODUCTION TO LAWS; ALFRED; INE) (COLAW2)

Documents

DOCUMENTS 2 (HARMER; ROBERTSON; SWEET-WHITELOCK) (CODOCU2)

Handbooks, medicine

LAECEBOC (COLAECE)

Philosophy

ALFRED'S BOETHIUS (COBOETH)

Religious treatises

ALFRED'S CURA PASTORALIS (COCURA)

Prefaces

ALFRED'S PREFACE TO CURA PASTORALIS (COPREFCP)

History

CHRONICLE MS A EARLY (COCHROA2)
BEDE'S ECCLESIASTICAL HISTORY (COBEDE)

OHTHERE AND WULFSTAN (MS L) (COOHTWU2)

ALFRED'S OROSIUS (COOROSIU)

Bible

THE VESPASIAN PSALTER (COVESPS)

Text type undefined (OE verse)

THE BATTLE OF BRUNANBURH (COBRUNAN)

OE III (950-1050)

Law

LAWS (ELEVENTH CENTURY) (COLAW3)

Documents

DOCUMENTS 3 (HARMER; ROBERTSON; WHITELOCK) (CODOCU3)

Handbooks, medicine

LACNUNGA (COLACNU)
QUADRUPEDIBUS (COQUADRU)

Science, astronomy

BYRHTFERTH'S MANUAL (COBYRHTF)
AELFRIC'S DE TEMPORIBUS ANNI (COTEMPO)

Homilies

WULFSTAN'S HOMILIES (O3) (COWULF3)
THE BLICKLING HOMILIES (COBLICK)

AELFRIC'S CATHOLIC HOMILIES (II) (COAELHOM)

AELFRIC'S HOMILIES (SUPPL. II)

Rules

THE BENEDICTINE RULE (COBENRUL)
THE DURHAM RITUAL (CODURHAM)

Religious treatises

AELFRIC'S FIRST AND SECOND LETTERS (COAELET3)
TO WULFSTAN; AELFRIC'S LETTER

TO SIGEFYRTH

Prefaces

AELFRIC'S PREFACE TO CATH. HOM. I; II; (COAEPREF)
LIVES OF SAINTS; GRAMMAR

AELFRIC'S PREFACE TO GENESIS (COAEPREG)

History

CHRONICLE MS A LATE (O3) (COCHROA3)
OHTHERE AND WULFSTAN (MS G) (COOHTWU3)

Geography

MARVELS (COMARVEL)

Travelogue

ALEXANDER'S LETTER (COALEX)

Biography, lives

AELFRIC'S LIVES OF SAINTS (COAELIVE)
GREGORY THE GREAT, DIALOGUES (MS H) (COGREGD3)

MARTYROLOGY (COMARTYR)

Fiction

THE OLD ENGLISH APOLLONIUS OF TYRE (COAPOLLO)

Bible

THE OLD TESTAMENT (COOTEST)
THE PARIS PSALTER (COPARIPS)

WEST-SAXON GOSPELS (COWSGOSP)

LINDISFARNE GOSPELS (COLINDIS)

RUSHWORTH GOSPELS (CORUSHW)

Text type undefined (OE verse)

FATES OF APOSTLES; ELENE; JULIANA (COCYNEW)
GENESIS (COGENESI)

EXODUS (COEXODUS)

CHRIST (COCHRIST)

THE KENTISH HYMN, THE KENTISH PSALM (COKENTIS)

ANDREAS (COANDREA)

THE DREAM OF THE ROOD (CODREAM)

THE WANDERER; THE SEAFARER; WIDSITH; THE FORTUNES OF MEN; MAXIMS I; THE RIMING POEM; THE PANTHER; THE WHALE; THE PARTRIDGE; DEOR; WULF AND EADWACER; THE WIFE'S LAMENT (COEXETER)

BEOWULF (COBEOWUL)

RIDDLES (CORIDDLE)

THE METRICAL PSALMS OF THE PARIS PSALTER (COMETRPS)

PHOENIX (COPHOENI)

THE METERS OF BOETHIUS (COMETBOE)

OE IV (1050-1150)

Law

LAWS (LATE; WILLIAM) (COLAW4)

Documents

DOCUMENTS 4 (ROBERTSON; ROBERTSON, APPENDIX) (CODOCU4)

Handbooks, astronomy

PROGNOSTICATIONS (COPROGNO)

Philosophy

THE OLD ENGLISH DICTS OF CATO (CODICTS)

Homilies

WULFSTAN'S HOMILIES (O3/4) (COWULF4)
A HOMILY FOR THE SIXTH ... SUNDAY (COEPIHOM)

Rules

WULFSTAN'S `INSTITUTES OF POLITY' (COINSPOL)

Religious treatises

AELFRIC'S LETTER TO SIGEWEARD; WULFSIGE (COAELET4)
ADRIAN AND RITHEUS (COADRIAN)

SOLOMON AND SATURN (COSOLOMO)

AN OLD ENGLISH VISION OF LEOFRIC (COLEOFRI)

Prefaces

ALFRED'S PREFACE TO SOLILOQUIES (COPREFSO)

History

CHRONICLE MS E (O3/4); (O4) (COCHROE4)

Biography, lives

CHAD (COCHAD)
GREGORY THE GREAT, DIALOGUES (MS C) (COGREGD4)

A PASSION OF ST MARGARET (COMARGA)

MIDDLE ENGLISH

ME I (1150-1250)

Handbooks, medicine

PERI DIDAXEON (CMPERIDI)

Philosophy

VESPASIAN HOMILIES, NO. III (cf. Homilies) (CMVESHOM)*

Homilies

ORM, THE ORMULUM (CMORM)
TRINITY HOMILIES (CMTRINIT)

VESPASIAN HOMILIES (cf. Philosophy) (CMVESHOM)**

BODLEY HOMILIES (CMBODLEY)

LAMBETH HOMILIES (CMLAMBET)

SAWLES WARDE (CMSAWLES)

Religious treatises

HISTORY OF THE HOLY ROOD-TREE (CMROOD)
ANCRENE WISSE (CMANCRE)

HALI MEIDHAD (CMHALI)

VICES AND VIRTUES (CMVICES1)

History

THE PETERBOROUGH CHRONICLE (CMPETERB)
LAYAMON (CMBRUT1)

Biography, lives

KATHERINE (CMKATHE)
MARGARETE (CMMARGA)

JULIANE (CMJULIA)

ME II (1250-1350)

Documents

THE PROCLAMATION OF HENRY III (CMDOCU2)

Homilies

KENTISH SERMONS (CMKENTSE)

Religious treatises

DAN MICHEL, AYENBITE OF INWYT (CMAYENBI)
A BESTIARY (CMBESTIA)

History

ROBERT OF GLOUCESTER (CMROBGLO)
HISTORICAL POEMS (in MS Harley 2253) (CMPOEMH)

Biography, lives

THE LIFE OF ST. EDMUND (THE EARLY SOUTH-ENGLISH LEGENDARY) (CMSELEG)

Fiction

MAN IN THE MOON (CMMOON)
DAME SIRITH; INTERLUDE (CMSIRITH)

THE FOX AND WOLF IN THE WELL (CMFOXWO)

THE THRUSH AND THE NIGHTINGALE (CMTHRUSH)

Romances

THE ROMANCE OF SIR BEUES OF HAMTOUN (CMBEVIS)
KYNG ALISAUNDER (CMALISAU)

HAVELOK (CMHAVELO)

KING HORN (CMHORN)

Bible

THE EARLIEST COMPLETE ENGLISH PROSE PSALTER (CMEARLPS)

Text type undefined (ME verse)

SONG OF THE HUSBANDMAN; SATIRE ON THECONSISTORY COURTS; SATIRE ON THE RETINUES (in MS Harley 2253) (CMPOEMS)

ME III (1350-1420)

Documents

USK, APPEAL(S); PETITIONS (M3); RETURNS; (CMDOCU3)
JUDGEMENTS; TESTAMENTS AND WILLS;

PROCLAMATIONS

Handbooks, astronomy

CHAUCER, A TREATISE ON THE ASTROLABE (CMASTRO)
THE EQUATORIE OF THE PLANETIS (CMEQUATO)

Handbooks, medicine

A LATE MIDDLE ENGLISH TREATISE ON HORSES (CMHORSES)

Science, medicine

A LATIN TECHNICAL PHLEBOTOMY (CMPHLEBO)

Philosophy

CHAUCER, BOETHIUS (CMBOETH)
Idem, THE TALE OF MELIBEE (cf. Fiction) (CMCTPROS)*

Homilies

THE NORTHERN HOMILY CYCLE (THE EXPANDED VERSION) (CMNORHOM)

Sermons

ENGLISH WYCLIFFITE SERMONS (CMWYCSER)

Rules

THE BENEDICTINE RULE (CMBENRUL)
AELRED OF RIEVAULX'S DE INSTITUTIONE INCLUSARUM (MS VERNON) (CMAELR3)

Religious treatises

PURVEY, THE PROLOGUE TO THE BIBLE (CMPURVEY)
THE CLOUD OF UNKNOWING (CMCLOUD)

MANNYNG, ROBERT OF BRUNNE'S "HANDLYNG SYNNE" (CMHANSYN)

THE PRICKE OF CONSCIENCE (CMPRICK)

CHAUCER, THE PARSON'S TALE (CMCTPROS)*

History

CURSOR MUNDI (CMCURSOR)
THE BRUT OR THE CHRONICLES OF ENGLAND (CMBRUT3)

TREVISA, POLYCHRONICON (CMPOLYCH)

Travelogue

MANDEVILLE'S TRAVELS (CMMANDEV)

Fiction

CHAUCER, THE GENERAL PROLOGUE TO THE CANTERBURY TALES; THE WIFE OF BATH'S PROLOGUE; THE SUMMONER'S TALE; THE MERCHANT'S TALE (CMCTVERS)
CHAUCER, THE TALE OF MELIBEE (cf. Philosophy) (CMCTPROS)**

GOWER, CONFESSIO AMANTIS (CMGOWER)

Letters, non-private

HENRY V, LETTERS (AN ANTHOLOGY; A BOOK OF LONDON ENGLISH); LETTER(S), LONDON (CMOFFIC3)

Bible

THE OLD TESTAMENT (WYCLIFFE) (CMOTEST)
THE NEW TESTAMENT (WYCLIFFE) (CMNTEST)

ME IV (1420-1500)

Law

STATUTES (II) (CMLAW)

Documents

INDENTURE, PETITIONS (M4); SHILLINGFORD (DOCUMENT(S)) (cf. Proceedings, depositions) (CMDOCU4)**

Handbooks, medicine

THE `LIBER DE DIVERSIS MEDICINIS' IN THE THORNTON MS (CMTHORN)

Handbooks, other

REYNES, THE COMMONPLACE BOOK (CMREYNES)
METHAM, PHYSIOGNOMY (CMMETHAM)

Handbooks, astronomy

METHAM, DAYS OF THE MOON (CMMETHAM)

Science, medicine

THE CYRURGIE OF GUY DE CHAULIAC (CMCHAULI)

Sermons

MIDDLE ENGLISH SERMONS ... MS. ROYAL (CMROYAL)
CAPGRAVE, CAPGRAVE'S SERMON (CMCAPSER)

MIRK, MIRK'S FESTIAL (CMMIRK)

GAYTRYGE, DAN JON GAYTRYGE'S SERMON (CMGAYTRY)

IN DIE INNOCENCIUM (CMINNOCE)

FITZJAMES, SERMO DIE LUNE (CMFITZJA)

Rules

AELRED OF RIEVAULX'S DE INSTITUTIONE (CMAELR4)
INCLUSARUM (MS BODLEY 423)

Religious treatises

THE BOOK OF VICES AND VIRTUES (CMVICES4)
KEMPE, THE BOOK OF MARGERY KEMPE (CMKEMPE)

JULIAN OF NORWICH, ... REVELATIONS OF DIVINE LOVE (CMJULNOR)

HILTON, ... EIGHT CHAPTERS ON PERFECTION (CMHILTON)

ROLLE, THE BEE AND THE STORK (CMROLLBE)

Idem, PROSE TREATISES (CMROLLTR)

Idem, THE PSALTER OR PSALMS OF DAVID (cf. Bible) (CMROLLPS)*

Prefaces

CAXTON, THE PROLOGUES AND EPILOGUES (CMCAXPRO)

Proceedings, depositions

DEPOSITIONS (cf. Documents) (CMDOCU4)*

History

CAPGRAVE, ... ABBREUIACION OF CRONICLES (CMCAPCHR)
GREGORY, THE HISTORICAL COLLECTIONS OF A CITIZEN OF LONDON (CMGREGOR)

Biography, lives

THE LIFE OF ST. EDMUND (MIDDLE ENGLISH RELIGIOUS PROSE) (CMEDMUND)

Fiction

CAXTON, THE HISTORY OF REYNARD THE FOX (CMREYNAR)

Romances

MALORY, MORTE DARTHUR (CMMALORY)
THE SIEGE OF JERUSALEM IN PROSE (CMSIEGE)

Drama, mystery plays

LUDUS COVENTRIAE (CMLUDUS)
MANKIND (CMMANKIN)

THE WAKEFIELD PAGEANTS IN THE TOWNELEY CYCLE (CMTOWNEL)

THE YORK PLAYS (CMYORK)

DIGBY PLAYS (CMDIGBY)

Letters, private

SHILLINGFORD (LETTERS); PASTON (CLEMENT; MARGARET; JOHN); MULL; STONOR; BETSON; CELY (GEORGE; RICHARD (THE YOUNGER)) (CMPRIV)

Letters, non-private

PASTON, WILLIAM (CMOFFIC4)

Bible

ROLLE, THE PSALTER OR PSALMS OF DAVID (cf. Religious treatises) (CMROLLPS)**

EARLY MODERN ENGLISH

EModE I (1500-1570)

Law

STATUTES (III) (CELAW1)

Handbooks, other

FITZHERBERT, THE BOOK OF HUSBANDRY (CEHAND1A)
TURNER, A NEW BOKE OF ... ALL WINES (CEHAND1B)

Science, medicine

VICARY, THE ANATOMIE OF THE BODIE OF MAN (CESCIE1A)

Science, other

RECORD, THE PATH-WAY ... OF GEOMETRIE (CESCIE1B)

Educational treatises

ELYOT, THE BOKE NAMED THE GOUERNOUR (CEEDUC1A)
ASCHAM, THE SCHOLEMASTER (CEEDUC1B)

Philosophy

COLVILLE, BOETHIUS (CEBOETH1)

Sermons

FISHER, SERMONS BY JOHN FISHER (CESERM1A)
LATIMER, SERMON ON THE PLOUGHERS; SEVEN SERMONS BEFORE EDWARD VI (CESERM1B)

Proceedings, trials

THE TRIAL OF SIR NICHOLAS THROCKMORTON (CETRI1)

History

MORE, THE HISTORY OF KING RICHARD III (CEHIST1A)
FABYAN, THE NEW CHRONICLES OF ENGLAND (CEHIST1B)

Travelogue

LELAND, THE ITINERARY OF JOHN LELAND (CETRAV1A)
TORKINGTON, YE OLDEST DIARIE (CETRAV1B)

Diaries

MACHYN, THE DIARY OF HENRY MACHYN (CEDIAR1A)
EDWARD VI, THE DIARY OF EDWARD VI (CEDIAR1B)

Biography, autobiograpy

MOWNTAYNE, THE AUTOBIOGRAPHY (CEAUTO1)

Biography, other

ROPER, WILLIAM, THE LYFE OF SIR THOMAS MOORE (CEBIO1)

Fiction

A HUNDRED MERY TALYS (CEFICT1A)
HARMAN, A CAVEAT ... FOR COMMEN CURSETORS (CEFICT1B)

Drama, comedies

UDALL, ROISTER DOISTER (CEPLAY1A)
STEVENSON (?), GAMMER GVRTONS NEDLE (CEPLAY1B)

Letters, private

BEAUMONT; PLUMPTON (AGNES; ISABEL; WILLIAM; DOROTHY; ROBERT); MORE (LETTER(S), THE CORRESPONDENCE); ROPER (MARGARET); CROMWELL (GREGORY); CUMBERLAND; SCROPE (CEPRIV1)

Letters, non-private

HOWARD; TUNSTALL; A LETTER BY THE LORDS; WOLSEY; HENRY VIII; BEDYLL; CROMWELL (THOMAS); MORE (LETTER(S), ORIGINAL LETTERS)(CEOFFIC1)

Bible

THE OLD TESTAMENT (TYNDALE) (CEOTEST1)
THE NEW TESTAMENT (TYNDALE) (CENTEST1)

EModE II (1570-1640)

Law

STATUTES (IV) (CELAW2)

Handbooks, other

GIFFORD, A DIALOGUE CONCERNING WITCHES (CEHAND2A)
MARKHAM, COUNTREY CONTENTMENTS (CEHAND2B)

Science, medicine

CLOWES, TREATISE FOR THE ARTIFICIALL CURE OF STRUMA (CESCIE2A)

Science, other

BLUNDEVILE, A BRIEFE DESCRIPTION OF THE TABLES ... LINES SECANT (CESCIE2B)

Educational treatises

BRINSLEY, LUDUS LITERARIUS OR THE GRAMMAR SCHOOLE (CEEDUC2A)
BACON, THE TWOO BOOKES ... ADVANCEMENT OF LEARNING (CEEDUC2B)

Philosophy

ELIZABETH I, BOETHIUS (CEBOETH2)

Sermons

HOOKER, TWO SERMONS UPON PART OF S. JUDES EPISTLE (CESERM2A)
SMITH, TWO SERMONS ON "OF USURIE" (CESERM2B)

Proceedings, trials

THE TRIAL OF THE EARL OF ESSEX (CETRI2A)
THE TRIAL OF SIR WALTER RALEIGH (CETRI2B)

History

STOW, THE CHRONICLES OF ENGLAND (CEHIST2A)
HAYWARD, ANNALS OF THE FIRST FOUR YEARS... (CEHIST2B)

Travelogue

TAYLOR (JOHN), THE PENNYLES PILGRIMAGE (CETRAV2A)
COVERTE, A TRVE AND ALMOST INCREDIBLE REPORT OF AN ENGLISHMAN (CETRAV2B)

Diaries

MADOX, AN ELIZABETHAN IN 1582: THE DIARY ... (CEDIAR2A)
HOBY, DIARY OF LADY MARGARET HOBY (CEDIAR2B)

Biography, autobiography

FORMAN, THE AUTOBIOGRAPHY (CEAUTO2)

Biography, other

PERROTT (?), THE HISTORY OF THAT MOST EMINENT STATESMAN, SIR JOHN PERROTT (CEBIO2)

Fiction

ARMIN, A NEST OF NINNIES (CEFICT2A)
DELONEY, JACK OF NEWBURY (CEFICT2B)

Drama, comedies

SHAKESPEARE, THE MERRY WIVES OF WINDSOR (CEPLAY2A)
MIDDLETON, A CHASTE MAID IN CHEAPSIDE (CEPLAY2B)

Letters, private

KNYVETT; HARLEY; PASTON (WILLIAM; KATHERINE) FERRAR (NICHOLAS; RICHARD); BARRINGTON (JOHN); MASHAM; BARRINGTON (THOMAS); EVERARD; PROUD; PETTIT;OXINDEN (RICHARD; KATHERINE); PEYTON; GAWDY (CEPRIV2)

Letters, non-private

CECIL (ROBERT); EDMONDES; ELIZABETH I (LETTERS); CECIL (WILLIAM); A LETTER BY THE FELLOWS OF TRINITY COLLEGE; CONWAY (CEOFFIC2)

Bible

THE OLD TESTAMENT (AUTHORIZED VERSION) (CEOTEST2)
THE NEW TESTAMENT (AUTHORIZED VERSION) (CENTEST2)

EModE III (1640-1710)

Law

STATUTES (VII) (CELAW3)

Handbooks, other

WALTON, THE COMPLEAT ANGLER (CEHAND3A)
LANGFORD, PLAIN AND FULL INSTRUCTIONS TO RAISE ALL SORTS OF FRUIT-TREES (CEHAND3B)

Science, other

HOOKE, MICROGRAPHIA (CESCIE3A)
BOYLE, ELECTRICITY & MAGNETISM (CESCIE3B)

Educational treatises

LOCKE, DIRECTIONS CONCERNING EDUCATION (CEEDUC3A)
HOOLE, A NEW DISCOVERY OF THE OLD ART OF TEACHING SCHOOLE (CEEDUC3B)

Philosophy

PRESTON, BOETHIUS (CEBOETH3)

Sermons

TILLOTSON, SERMONS (CESERM3A)
TAYLOR (JEREMY), THE MARRIAGE RING (CESERM3B)

Proceedings, trials

THE TRIAL OF TITUS OATES (CETRI3A)
THE TRIAL OF LADY ALICE LISLE (CETRI3B)

History

BURNET, ... HISTORY OF MY OWN TIME (CEHIST3A)
MILTON, THE HISTORY OF BRITAIN (CEHIST3B)

Travelogue

FIENNES, THE JOURNEYS OF CELIA FIENNES (CETRAV3A)
FRYER, A NEW ACCOUNT OF EAST INDIA (CETRAV3B)

Diaries

PEPYS, THE DIARY OF SAMUEL PEPYS (CEDIAR3A)
EVELYN, THE DIARY OF JOHN EVELYN (CEDIAR3B)

Biography, autobiography

FOX, THE JOURNAL OF GEORGE FOX (CEAUTO3)

Biography, other

BURNET, SOME PASSAGES OF THE LIFE AND DEATH OF ... EARL OF ROCHESTER (CEBIO3)

Fiction

PENNY MERRIMENTS (CEFICT3A)
BEHN, OROONOKO (CEFICT3B)

Drama, comedies

VANBRUGH, THE RELAPSE (CEPLAY3A)
FARQUHAR, THE BEAUX STRATAGEM (CEPLAY3B)

Letters, private

HADDOCK (RICHARD, SR; RICHARD, JR; NICHOLAS); STRYPE; OXINDEN (HENRY; ELIZABETH); HATTON (CHARLES; FRANCES; ALICE; ANNE; ELIZABETH); PINNEY (JANE; JOHN); HENRY (PHILIP) (CEPRIV3)

Letters, non-private

SOMERS; SPENCER; A LETTER BY THE PRIVY COUNCIL; CAPEL; CHARLES II; OSBORNE; AUNGIER; A LETTER BY THE COMMISSIONERS (CEOFFIC3)

3.Coding

3.1. Character Set

A number of conventions have been introduced for coding special characters, typographical practices, editorial comments and so forth in the computerized version. The set of conventions used is introduced below.

Main Coding Key

The coding system is based on the set of ASCII codes (96 printable characters). In the list below the coding symbol is followed by the coded symbol.⁸ The character "=" stands for "represents". The use and functions of the codes are discussed in more detail in Section 3.2.

I CHARACTERS USED

A. Alphanumeric characters

The following characters represent themselves:

A = A
B = B
C = C
Etc.

a = a
b = b
c = c

0 = 0
1 = 1
2 = 2

B. Non-alphanumeric characters

B.1. The following characters represent themselves:

Dec.

Hex.

Char.

Description

34
39
40
41
45
33
44
46
47

58
59
63

22
27
28
29
2D
21
2C
2E
2F

3A
3B
3F

"
'
(
)
-
!
,
.
/

:
;
?

= quotation marks (double quote)
= apostrophe (single quote)
= opening parenthesis (cf. B.2.1., below)
= closing parenthesis (cf. B.2.1., below)
= hyphen or minus (within or outside a word)
= exclamation point
= comma
= period or decimal point
= slash (cf. `reference codes', 3.3.2.
`punctuation', 3.2. (1c))

= colon
= semicolon
= question mark

Abbreviations: Dec. = decimal code; Hex. = hexadecimal code;
Char. = character.

B.2. The following characters are used for coding purposes:

B.2.1. Characters used for coding `text levels':

Dec.

Hex.

Char.

Description

40
41
91
93
123
125
92
94

28
29
5B
5D
7B
7D
5C
5E

(
)
[
]
{
}
\
^

= opening parenthesis (cf. B.1., above)
= closing parenthesis (cf. B.1., above)
= opening bracket
= closing bracket
= opening brace
= closing brace
= back slash
= circumflex

`Text levels':

(^.....^) = `font other than the basic font'

(\.....\) = `foreign language'

(}.....}) = `runes'

[{.....{] = `emendation'

[\.....\] = `editor's comment'

[^.....^] = `our comment'

[}.....}] = `heading'

for `multiple text level' codes, see below

B.2.2. Characters other than those used for coding `text levels':

Dec.

Hex.

Char.

Description

38
43
60
62
61
96
126

26
2B
3C
3E
3D
60
7E

&
+
<
>
=
`
~

= ampersand (= `and', `ond', `et')
= plus (= `ash', `eth' etc.)
= less than (= `reference codes', cf. 3.3.2.)
= greater than (= `reference codes', cf. 3.3.2.
= equals (= `superscript')
= grave accent (= `accent')
= tilde (= `abbreviation')

`Ash', `eth', `yogh', `thorn' etc.:

+A
+D
+G
+T
+TT
+Tt
+L

= u.c. ash
= u.c. eth
= u.c. yogh
= u.c. thorn
= u.c. crossed thorn
= u.c. crossed thorn
= (£) pound sign

+a
+d
+g
+t
+tt

+e

= l.c. ash
= l.c. eth
= l.c. yogh
= l.c. thorn
= l.c. crossed thorn

= l.c. e caudata

For the Lexa Font Module, which enables the display of the original Old and Middle English characters, see Appendix 3.

II CHARACTERS NOT USED

The following characters are not used:⁹

Dec.

Hex.

Char.

Description

36
37
42
64
95
124

24
25
2A
40
5F
7C

$
%
*
@
_
|

= dollar sign
= percent sign
= asterisk
= commercial at
= underline
= vertical line

3.2. Main Coding Principles

Preliminary Remarks

When coding the material, the main aim has been to key in as much as possible of the source text as reliably and consistently as possible. When editorial interference has proved inevitable, the changes follow the format below, or are indicated by `our comments' in the computerized text. Obvious instances of ill-formed characters in the source text (e.g. a character produced upside down) have been silently corrected. In this section the following points will be taken up in more detail (for textual parameters, or `reference codes', cf. Section 3.3.):

1. GENERAL FORMAT

(1a) Lines
(1b) Paragraphs

(1c) Word-boundaries and punctuation

(1d) Hyphens

(1e) `Ash', `eth', `thorn' etc.

(1f) Superscript (=)

(1g) Abbreviations (~)

(1h) Accents (`)

(1i) Material excluded

2. `TEXT LEVEL' CODES

(2a) `Font other than the basic font' (^...^)
(2b) `Foreign language' (\...\)

(2c) `Runes' (}...})

(2d) `Emendation' [{...{] and `editor's comment' [\...\]

(2e) `Heading' [}...}]

(2f) `Our comment' [^...^]

(2g) `Multiple text level' codes

1. GENERAL FORMAT

(1a) Lines

In the Middle and Early Modern English sections of the Corpus the main line division of the source text has been preserved (i.e. a new line in the source text begins a new line in the computerized version). However, to leave some space for further coding (on an 80-character line), the maximum length of the line is no more than 65 characters in the computerized version: column 64 (when the left margin is set on 1) is reserved for a space, and column 65 for the `line continues' character (#). Lines longer than 65 characters in the source text have been continued by placing the `line continues' character in column 65 and keying in the remaining words on the next line.
In the Old English section of the Corpus the line divisions do not follow those of the source texts. Instead, the material based on the Toronto Corpus (with lines longer than 80 characters in the format delivered) has been converted into the Helsinki Corpus format by editing the line length accordingly.

(1b) Paragraphs

A new paragraph begins on the fourth space (i.e. indented by three spaces). Strings of spaces in the original (marking a gap in the manuscript, or introducing a heading, for instance) have been ignored by moving the text to begin from the left margin. Paragraph-initial capitalized words (or capitalized words in comparable positions) have been standardized by capitalizing the first character only. Paragraph signs (&) have been omitted (cf. item (1i)).

(1c) Word-boundaries and punctuation

A space (or, inadvertently, two or more spaces) indicates a typographic word-boundary in the source text. Extra spaces between words in the original have been omitted. As the main aim has been to prepare the word form for subsequent indexing and concordance work, it has sometimes been necessary to separate two words typed as one word in the source text or, conversely, join parts of words separated in the original, e.g.:

Ex. 1.
Source text:

- - Ah nis nawt

bi þeos iseid. þ ha forrotieð þrin.' 3ef ha hare wed
lac lahe-liche haldeð. Ah þe ilke sari wrecches þe
iþe fule wurðinge. vnwedde waleweð.' beoð þe deof
les eaueres. þ rit ham & spureð ham to don al þ he
wule. þeos walewið iwurdinge & forrotieð þrin.
(Hali Meiðhad, in The Katherine Group. Edited from
Ms. Bodley 34, ed. S. T. R. O. d'Ardenne, Paris, 1977, p. 137).

Helsinki Corpus:

- - Ah nis nawt

bi +teos iseid. +tt ha forrotie+d +trin; +gef ha hare wedlac

lahe-liche halde+d. Ah +te ilke sari wrecches +te
i +te fule wur+dinge. vnwedde walewe+d, beo+d +te deofles
eaueres. +tt rit ham & spure+d ham to don al +tt he
wule. +teos walewi+d i wurdinge & forrotie+d +trin.

The punctuation of the source text has been retained as far as possible. The main punctuation marks are as follows:

! = exclamation point
, = comma
. = period or decimal point
/ = slash
: = colon
; = semicolon
? = question mark
- = hyphen or minus (also within the word-boundaries, cf. (1d), below)

Line-initial periods (.) are preceded by a space. The slash and double slash (//) may occur as clause separators or in fractions (`1/2' for `½;', etc.). For the sentential use of hyphens, cf. (1d), below; for punctuation and `text level' codes.

(1d) Hyphens

Used within words
Line-end hyphens in the source text, when regarded as additional to the normal spelling of the word, have been deleted. If space allows, the rest of the word is moved to the same line as the beginning of the word; if there is no space on the line or if the continuation of the word is the only item on the line, the whole word is moved to the next line (and the `line continues' character is placed at the end of the line). Hyphens have been preserved when deemed essential to the form of the word (e.g. in compounds); where doubt remains, the line-end hyphen has been retained or deleted according to the judgment of the coder.

Used outside words

Hyphens or dashes used outside the word-boundaries in the source text have been keyed in as hyphens, preceded and followed by a space (this type of hyphen may occur in a line-end position). Cf., for instance:

Ex. 2.
& tuss he toc forr+trihht anan
To m+alenn wi+t+t +te Laferrd;
Ma+g+gstre, - we witenn sikerrli+g
+Tatt tu +turrh Godess wille
& all o Godess hallfe arrt sennd
Larfaderr her to manne;
(The Ormulum (I-II), ed. R. Holt, Oxford, 1878, p. II, 225).

(1e) `Ash', `eth', `thorn' etc.

A set of compound characters has been introduced to key in the u.c. and l.c. `ash' (+A / +a), `eth' (+d / +D), `yogh' (+G / +g), `thorn' (+T / +t), `crossed thorn' (+Tt, +TT / +tt), and `e caudata' (+e) into the computerized version. `Yogh' does not occur in the Old English section of the Corpus, as this character is replaced by the letters g and G in the Toronto Corpus. The only compound characters to occur in the Early Modern English section of the Corpus are u.c. +A for `ash' and +L for the `pound sterling sign'.
For the Lexa Font Module, which enables the display of the original Old and Middle English characters, see Appendix 3.

(1f) Superscript (=)

Owing to possible syncretisms (e.g., or / o^r for `or' / `our'), letters printed in superscript in the source text have been indicated as in, e.g., y=t= for y^t, yo=r= for yo^r, Ma=tie= for Ma^tie, int=r=cess=rs= for int^rcess^rs, =xx=llll for llll.

(1g) Abbreviations (~)

Abbreviations indicated by a tilde or dash above the letter(s), by a letter with a flourish, or (in rare cases) by an apostrophe in the source text, have been coded with the letter followed by a tilde (~), as in, e.g., fro~ for fro, p~voked for pvoked, co~mau~dyd for comaudyd, s=r~=prised for s^r'prised, Cobhm~ for Cobhm, Cobh~m for Cobh'm etc.

(1h) Accents (`)

Accents of various types, when functional in the spelling of the word, have been coded by placing the grave accent (`) after the accented letter as in, e.g., cite` for cité, charite` for charité, me`me for même, tho` for thô etc. Other accents have been ignored, as in, e.g., swá, þám, pút etc.

(1i) Material excluded

Extra-textual material in the margins or in the text (titles, tables, diagrams, pictures, signs of the zodiac etc.) has been excluded when not relevant to the main line of the running text; explanatory comments on omissions have been added when deemed necessary.

Paragraph signs (¶) have been omitted.

Folio numbers have been omitted, as well as the characters (e.g. / and |) frequently used for introducing a new folio.

Lists of names and longer extracts of foreign language, or verse in a prose text have been omitted.

2. `TEXT LEVEL' CODES

Preliminary Remarks

As editorial and typographical conventions vary in different source texts (e.g. emendations can be indicated by italics, parentheses, brackets etc.), a number of `text level' codes have been used to transfer the function of the convention to the computerized version, irrespective of the particular format followed in the source text.
The opening and closing parentheses or brackets are keyed in next to the following or preceding word. `Emendation' codes may occur within the word (e.g. r+a[{d{]e[{+d{] for ræ[d]e[ð]; w[{ha{]-swa for w[ha]-swa), but otherwise a space has been keyed in to separate the character (punctuation marks included) following the closing parenthesis or bracket, e.g.:

Ex. 3.
+Durh +dessere senne ic, un+gesali saule, fel in to an #
o+der senne,
+de is icleped (\propria voluntas\) , +tat is, au+gen-wille. #
(Vices and Virtues, Part I, ed. F. Holthausen (E.E.T.S., O.S. 89), London, 1888, p. 13).

Similarly, `multiple text level' codes are separated by a space, cf. (2g), below. If the whole clause or sentence is coded as one of the `text level' types, the end-clause or end-sentence punctuation usually precedes the closing code as in, e.g.:

Ex. 4.
- - So as saynt Paule maketh many hedes sayenge.
(\Caput mulieris vir. caput viri christus. christi
vero deus.\) Se here be thre heedes vnto a woman. god,
chryst, & hyr husbande. - -¹⁰
(The English Works of John Fisher, Bishop of Rochester, Part I, ed. J. E. B. Mayor (E.E.T.S., E.S. 27), London, 1935 (1876), p. 321).

(2a) `Font other than the basic font' (^.....^)

In the Middle and Early Modern English sections of the Corpus typographical shifts (italics, bold face, gothic etc.) in the source text have been distinguished from the basic font. If the basic font has been italics (bold face, gothic etc.), fonts other than the basic font have been coded apart. Font changes coinciding with `foreign language' have not been coded, except in the instances of stage directions and names of characters found in drama, see (2g), A.1., p. 36. For italicized emendations and expansions, which occur repeatedly through the text, see (2d), p. 31 ff.

(2b) `Foreign language' (\.....\)

Coding language other than English separately from the running text has been deemed useful, as coded material can be excluded from subsequent word indexes, word lists and concordances. An effort has been made to distinguish between English and foreign-language material, but a certain amount of inconsistency has been unavoidable. Instances of loan words (e.g. technical and professional terms) have occasionally been somewhat problematic, particularly in pioneering texts dating from the periods of great influx of foreign elements into English.

The period, subject-matter and stylistic register were taken into consideration when judging whether to code a borderline case as foreign language or not. A number of ad hoc strategies were adopted (e.g. no coding was applied to liturgical terms such as Kyrieleison, Magnificat, Agnus Dei etc. when found in English contexts). Explicit references to foreign language status were taken into consideration, cf. below:

Ex. 5a.
On +tam geare +te man h+at (\solarem\) on Lyden beo+d +treo hund daga & fif & syxtig daga, & syx tida.
<R 64.9>
+ta synd on Lyden (\quadrantes\) genemned.
(Byrhtferth's Manual (A. D. 1011), I, ed. S. J. Crawford (E.E.T.S., 177), London, 1966 (1929), p. 64).

Ex. 5b.
C.^) and (^B. D.^) & passing
both through
the Centre (^E.^) is diuided
into fower
Quadrantes or
quarters, the upper
Quadrante
whereof on the left hand is marked with the letters #
(^A. B. E.^) in
which Quadrant, the right perpendicular line marked with the
letters F. H. betokeneth the right Sine of the giuen Arke #
(^A. F.^)
(Blundevile, A Briefe Description of the Tables etc., London, 1597, p. 49V).

Latin ligature characters have been replaced by combined characters in English (AE for Æ, ae for æ, OE for Œ, oe for œ, etc.). Material printed in other than the Roman alphabet (Greek, for the most part) has been omitted with a note given in `our comment'; for `accents', cf. (1h), above.

(2c) `Runes' (}.....})

In the Old English section of the Corpus runes are distinguished from the nonrunic basic text (the conversion is based on the coding included in the Toronto Corpus).

(2d) `Emendation' [{.....{] and `editor's comment'[\.....\]

If italicized emendations and expansions occur frequently and repeatedly through the text, they have been left uncoded as in e.g. him, from, drihten, spæcon etc. When emendations indicated by italics are coded, the emendation code covers the whole word or words, even when only part of the word is italicized in the source text.

Emendations and editorial comments (in the text or in the editorial apparatus) have been included when deemed necessary or otherwise helpful (manuscript variants given in the text apparatus have been ignored). Whether information given in the apparatus has been coded as an instance of emendation or of editorial comment has been the choice of the coder.

In the material drawn from the Toronto Corpus, in-stances of `emendation' (checked and corrected against manuscript readings by the Editors) are indicated with a percent sign (%) placed after the word (see examples given below). This sign has been replaced by the `emendation' signs used for the other sections of the Corpus. Whether to include several successive emended words within one `emendation' code or leave the Toronto Corpus single-word format as such has been up to the coder. Examples (line divisions in the material quoted from the Toronto Corpus have been edited to follow the Helsinki Corpus format):

Ex. 6.
Source text:

- - ado in hluttor eala;

beren[d]² / & 3e3nid feowerti3 / lybcorna [&]³ ado þonne in[to]⁴
ðæm wyrtum; læt standan þreo niht; syle drincan ær uhton
lytelne scænc fulne þæt se drænc sy ðe ær 3eleored.

Notes:

2 berend C; beren MS. L.
3 &. The MS. has, in error, a crossed l; expanded to oððe L; omitted in text C.
4 into: in MS. CL.
(Lacnunga in Anglo-Saxon Magic and Medicine, Illustrated Specially from the Semi-Pagan Text `Lacnunga', ed. J. H. G. Grattan and C. Singer, London, 1972 (1952), p. 118).

Toronto Corpus:
ado in hluttor eala, berend% & gegnid feowertig lybcorna
&% ado *onne into% =#m wyrtum, l#t standan *reo
niht, syle drincan #r uhton lytelne sc#nc fulne *#t se
dr#nc sy =e #r geleored.
(Character key: % = `emendation'; * = l.c. `thorn';
= = l.c. `eth'; # = l.c. `ash')

Helsinki Corpus:
ado in hluttor eala, [{berend{] & gegnid feowertig lybcorna
[{&{] ado +tonne [{into{] +d+am wyrtum, l+at standan +treo
niht, syle drincan +ar uhton lytelne sc+anc fulne +t+at se
dr+anc sy +de +ar geleored.

Ex. 7.
Source text:
- - seo is ealra²
duna mæst & hyhst.
[Þær syndon gedefelice menn þa habbað him]³
to cynedome þone re[a
dan⁴ sæ & to anwalde - -
- -
hyda hy habbað him to hrægle gedon [þa syndan]³ hundic
gean⁶ swiðast nemde⁷ . & [fore hundum]³ tigras & leon⁸
- -

Notes:

2 K: ealra.
3 Bracketed words supplied from T.
4 K: re[a]dan. H: readan.
6 K: hunticgean. T: huntigystran. Lat. text: venatrices.
7 K, C, H: nemde.
8 C: leon[es].
(`Wonders of the East' in Three Old English Prose Texts in Ms. Cotton Vitellius A XV, ed. S. Rypins (E.E.T.S. 161), New York, 1971 (1924), p. 64).

Toronto Corpus:
Seo is ealra duna m#st & hyhst.
<R 25.6>
*#r% syndon% gedefelice% menn% *a% habba=% him%
to cynedome *one readan s# & to anwalde. - -
- -
<R 26.1>
Ymb *as stowe beo= wif acenned, *a habba= beardas swa
side o= hyra breost, & horses hyda hy habba=
him to hr#gle gedon.
<R 26.3>
*a% syndan% hundic gean% swi=ast nemde, &
fore% hundum% tigras & leon

Helsinki Corpus:
Seo is ealra duna m+ast & hyhst.
<R 25.6>
[{+t+ar syndon gedefelice menn +ta habba+d him{]
to cynedome +tone readan s+a anwalde. - -
- -
<R 26.1>
Ymb +tas stowe beo+d wif acenned, +ta habba+d beardas swa
side o+d hyra breost, & horses hyda hy habba+d
him to hr+agle gedon.
<R 26.3>
[{+ta{] [{syndan{] hundic[{gean{] swi+dast nemde, &
[{fore{] [{hundum{] tigras & leon

In the Middle and Early Modern English sections of the Corpus the use of the `emendation' code closely reflects the practice followed by the editor in the source text: in the computerized version the `emendation' brackets cover only the strings of characters (or words) as indicated in the source text. Compare:

Ex. 8.
Source text:
The .vij. day ys fortunat to begynne alle werkys vp-on;
that persone [that ys born] that day schuld be dysposyd to
be sotel off wytt and dyuerse off condycionnys and chongabyl,
and dysposyd to lyfe longe; and yff a body falle in-to seke-
nes that day, he schuld sone r[e]couer; and [qwat that]
a man dremyth schuld turne to trwthe with-in half a yere;
(Days of the Moon, in The Works of John Metham Including the Romance of Amoryus and Cleopes, ed. H. Craig (E.E.T.S., O.S. 132), London, 1916, p. 150).

Helsinki Corpus:
The .vij. day ys fortunat to begynne alle werkys vp-on;
that persone [{that ys born{] that day schuld be dysposyd to
be sotel off wytt and dyuerse off condycionnys and chongabyl,
and dysposyd to lyfe longe; and yff a body falle in-to sekenes
that day, he schuld sone r[{e{]couer; and [{qwat that{]
a man dremyth schuld turne to trwthe with-in half a yere;

(2e) `Heading' [}.....}]

Text coded as `heading' is always in upper case. See, also, `multiple text level' codes (2g), below.

(2f) `Our comment' [^.....^]

Major changes (omissions of material, comment on manuscript changes etc.) are indicated in `our comment'. Comments are typed in capital letters, and the forms cited generally follow the font used in the source text.

(2g) `Multiple text level' codes

When two or three types of `text level' codes coincide, the codes are embedded as follows:
A. Combinations of two `text level' codes:
1. (^... (\...\) ...^)

2. (^... [{...{] ...^)
3. (\... [{...{] ...\)
4. (\... [\...\] ...\)
5. (\... [^...^] ...\)
6. [{... (^...^) ...{]
7. [{... (\...\) ...{]
8. [\... (\...\) ...\]
9. [^... [{...{] ...^]
10. [}... (^...^) ...}]
11. [}... (\...\) ...}]
12. [}... [{...{] ...}]
13. [}... [\...\] ...}]
14. [}... [^...^] ...}]

Examples:

1. `font other than the basic font' and `foreign language' (in Middle and Early Modern English drama texts only):

(^ (\Dixit angelus:\) ^)
(^ (\Angelus.\) ^)
O Joseph, ryse vp, and loke thu tary nought!
Take Mary with the, and into Egipt flee!
(The Late Medieval Religious Plays of Bodleian MSS Digby 133 and E Museo 160, ed. D. C. Baker, J. L. Murphy, and L. B. Hall, Jr. (E.E.T.S. 283), Oxford, 1982, p. 104).

2. `font other than the basic font' and `emendation'

And +get Poule specifie+t more of sixe synnes +tat men don.

(^Dwelle we not in ofte etyngis and drunkenesse[{s{] ^) +tat #
sue+t aftur. - -
(English Wycliffite Sermons, I, ed. A. Hudson, Oxford, 1983, p. 478).

3. `foreign language' and `emendation'

+Tis said he +tam for to vpbraid,

And +tan till his men +tus he said,
(\Cecus autem si ceco ducatum prestet,
ambo in fou[{e{]am cadunt.\)
(The Northern Homily Cycle, Part II, ed. S. Nevanlinna (Société Néophilologique de Helsinki, 41), Helsinki, 1973, p. 73).

4. `foreign language' and `editor's comment'

(\Item ordinat(um) est (etc) [\THE LETTERS IN PARENTHESES #

EXPANDED\] \)
(An Anthology of Chancery English, eds. J. H. Fisher, M. Richardson and J. L. Fisher, Knoxville, 1984, p. 197).

5. `foreign language' and `our comment'

As for any particular commemorations, I call

to minde what (^Cicero^) saide, when hee gaue generall
thanks. (\Difficile [^EDITION: difffcile^] non aliquem; #
ingratum quenquam praeterire:\)
(Francis Bacon, The Twoo Bookes of the Proficience and Advancement of Learning (1605) (English Experience, 218), Amsterdam and New York, 1970, p. 3R).

6. `emendation' and `italics'

I returned in the Evening &c:

20 Our Viccar preached on 11: (^Heb:^) 7: In the afternoone #
our
Curate on [{(II. 1. (^Cor:^) 24){] :
(The Diary of John Evelyn, ed. E. S. de Beer, London, New York and Toronto, 1959, p. 928).

7. `emendation' and `foreign language'

- - Touchynge this instruccyon thre

thynges I wold do. First I wold shewe that the instruccyons
of this holy gospell perteyneth to the vniuersal
chirche of chryst. Secondly that the heed of the
vnyuersall chirche [{ (\iure diuino\) {] is the pope. - -
(The English Works of John Fisher, Bishop of Rochester, Part I, ed. J. E. B. Mayor (E.E.T.S., E.S. 27), London, 1935 (1876), p. 314).

8. `editor's comment' and `foreign language'

[\ (\Uxor\) addresses the women in the audience:\]

(The Wakefield Pageants in the Towneley Cycle, ed. A. C. Cawley, Manchester, 1958, p. 24).

9. `our comment' and `emendation'

[^INTERLINEATIONS AND MARGINAL INSERTIONS

INDICATED BY SQUARE BRACKETS IN THE EDITION
ARE SURROUNDED BY ROUND BRACKETS AND CODED
AS `EMENDATIONS' IN THE VERSION BELOW:
[{(.....){] ^]
(The Diary of John Evelyn, as above).

10. `heading' and `italics'

[}OF CERTAINE CIRCLES CALLED (^ALMICANTERATHES^) .}]

(Blundevile, A Briefe Description of the Tables etc., London, 1597, p. 155R).

11. `heading' and `foreign language'

[} (\DE NOCTE.\) }]

(Ælfric's De Temporibus Anni, ed. H. Henel (E.E.T.S. 213), London, 1942, p. 18).

12. `heading' and `emendation'

[}BI +D+AM +DE FOR EA+DMODNESSE FLEO+D - -

- -
HIE NE WINNA+D WI+D [{+D{]ONE GODCUNDAN DOM.}]
(King Alfred's West-Saxon Version of Gregory's Pastoral Care, Parts I-II, ed. H. Sweet (E.E.T.S., O.S. 45, 50), London, 1958 (1871), p. 47).

13. `heading' and `editor's comment'

[} [\LETTER LXXII.\] }]

[} [\SIR THOMAS MORE TO CARDINAL WOLSEY.\] }]
(Original Letters, Illustrative of English History; Including Numerous Royal Letters, Third Series, I, ed. H. Ellis, London, 1846, p. 203).

14. `heading' and `our comment'

[} [^SIR RICHARD HADDOCK TO HIS SON RICHARD.^] }]

(The Camden Miscellany, Volume the Eighth: Containing ... Correspondence of the Family of Haddock, 1657-1719, ed. E. M. Thompson (Camden Society, N.S. XXXI), London, 1965 (1883), p. 44).

B. Combinations of three `text level' codes:

1. [} ..(\... [{ ... {] ...\) ... }]

2. [} ..[\ ... (\ ...\) ...\] ... }]

Examples:
1. `heading', `foreign language' and `emendation'
[} (\DOMINICA QUARTA [{POST FESTUM TRINITATIS.
EVANGELIUM.{] SERMO 4.\) }]
(English Wycliffite Sermons, as above, p. 236).

2. `heading', `editor's comment' and `foreign language'
[} [\ (\DE DIE.\) \] }]
(Ælfric's De Temporibus Anni, as above, p. 2).

3.3. Reference Codes

3.3.1. Preliminary Remarks

Each text or group of comparable texts is introduced by a set of textual parameters consisting of 24 reference codes intended to help identify and describe the text (two additional codes appear in the Old English section of the Corpus). The codes also make it possible to execute computer searches through the material selectively, focusing only on those sections of the Corpus that fulfil a defined set of criteria.

The main principles for defining the values will be explained in Rissanen et al. (1993, cf. Note 3). Decisions on the code values are, of course, based on earlier scholarship and, in the last resort, the subjective views of the compilers. Misjudgments and errors in debatable matters of this kind are inevitable. We are most grateful for corrections and suggestions by the users of the Corpus.

3.3.2. COCOA Format

The reference codes follow the COCOA format (cf. Oxford Concordance Program and Micro-OCP, Section 5.1.). The format can be easily converted to suit the format required by other concordance programs (cf. e.g. WordCruncher, Section 5.2.).¹¹

The value of each reference code introduced by an angular bracket is valid until it is replaced by a new value (for details, cf. Hockey and Marriott 1984: 16-19; Hockey and Martin 1988: 17-19, cited in Note 11). In the majority of cases, if the value of one reference category (other than <S for `sample', <P for `page' or <R for `record') changes in the middle of a text, the whole set of references is repeated.

The maximum length for the value ascribed to a reference code is 20 characters (allowed by the COCOA format of the OCP, version 1.0., and the first level reference code in the WordCruncher, versions 4.1 and 4.30). The characters used for the reference code are in the upper case. Alternative or double values are indicated by a slash (/), cf. 3.1. The character X given as a value reads `irrelevant' or `not known'. Expansions for the abbreviations used will be given in Section 3.3.4., following an extract of a corpus text given as an example below.

3.3.3. An Example

The beginning of a text file typically looks as follows:

[^TREVISA, JOHN.
POLYCHRONICON RANULPHI HIGDEN,
MONACHI CESTRENSIS, VOLS. VI, VIII.
ENGLISH TRANSLATIONS OF JOHN TREVISA AND OF
AN UNKNOWN WRITER OF THE FIFTEENTH CENTURY.
ROLLS SERIES, 41.
ED. J. R. LUMBY.
LONDON, 1876, 1882.
VI, PP. 209.14 - 231.7 (SAMPLE 1)
VIII, PP. 83.1 - 111.19 (SAMPLE 2)
VIII, PP. 347.1 - 352.13 (SAMPLE 3)^]

<S SAMPLE 1>

[} (\CAPITULUM VICESIMUM QUARTUM.\) }]

Leo +te emperour lete be +te enemyes of +te empere, and
werrede a+genst figures and ymages of holy seyntes. Pope

Gregory and Germanius of Constantynnoble wi+tstood hym
nameliche, as +te olde usage and custome wolde +tat is allowed
and apreeved by holy cherche, and seide +tat it is wor+ty and
medeful to do hem +te affecioun of worschippe. For we #
worschippe+t
in hem but God, [{and{] in worschippe of God and of
holy seyntes, +tat man have+t in mynde efte by suche ymages,
God allone schal be princepalliche worschipped, [{and after
hym creatures schal be i-worschipped{] in worschippe of hym.
Beda, (\libro 5=o=, capitulo 24=o=.\) +Tat +gere deide #
Withredus kyng
of Caunterbury, and Thobias bisshop of Rouchestre, +tat cou+te
Latyn and Grew as wel as his owne longage. (\Paulus,
libro 7=o=.\) +Tat +gere Sarasyns com to Constantynnoble and #
byseged
it +tre +gere, and took +tennes moche good and catel.

In the above passage the reference codes indicate that we are dealing with Polychronicon, a prose work of historical writing and non-imaginative narration translated from Latin into English in a Southern dialect by John Trevisa, a representative of the professional ranks, and aged between 40-60 years old. Both original and manuscript versions date from the third sub-period of Middle English (1350-1420). The set of reference codes is followed by a bibliographical reference to the source text giving information on the volume, page and line references to the extracts selected.

3.3.4. Reference Code Values

Preliminary Remarks

The values used for defining the textual parameters are listed below in the order the codes occur at the beginning of each text file. To help select only sections from the Corpus for computer searches, all the values appearing in the database will be given for each parameter (except for <B, <Q, <N, <A, <S, <P and <R) and for the three main periods distinguished.

(1) <B = `name of text file'
(2) <Q = `text identifier'
(3) <N = `name of text'
(4) <A = `author'
(5) <C = `part of corpus'
(6) <O = `date of original'
(7) <M = `date of manuscript'
(8) <K = `contemporaneity'
(9) <D = `dialect'
(10) <V = `verse' or `prose'
(11) <T = `text type'
(12) <G = `relationship to foreign original'
(13) <F = `foreign original'
(14) <W = `relationship to spoken language'
(15) <X = `sex of author'
(16) <Y = `age of author'
(17) <H = `social rank of author'
(18) <U = `audience description'
(19) <E = `participant relationship'
(20) <J = `interaction'
(21) <I = `setting'
(22) <Z = `prototypical text category'
(23) <S = `sample'
(24) <P = `page'
(25) <R = `record'

(1) <B = `name of text file'

The names of the 242 files follow MS-DOS conventions. Each file name begins with the character C (for `Corpus'), followed by O (for `Old English'), M (for `Middle English' or E (for `Early Modern English'). The file names reflect, by and large, the names of authors or texts in Old and Middle English sections of the Corpus. In the Early Modern English section the file names are based on the systematic coverage of different text types.

(2) <Q = `text identifier'

The purpose of the `text identifier' is to sum up the main characteristics of the text in one code and permit identifying the source for an example retrieved from the Corpus (cf. Section 5). A full list of the <Q codes found in the Corpus is given in Appendix 1.

The four items distinguished in the `text identifier' (maximum 20 characters allowed) are as follows (e.g. <Q O2/4 NN BIL GDC>) :

<Q

O2/4

NN

BIL

GDC>

(a)

(b)

(c)

(d)

(a) `part of corpus'
(b) `prototypical text category'
(c) `text type'
(d) abbreviated title

Of these `part of corpus', `prototypical text category' and `text type' conveniently repeat the information given by the reference codes <C, <Z and <T, see (a) and (b)-(c) below. For abbreviated titles, see (d) below.
(a) The values used to indicate `part of corpus' in the <Q code are those listed in (5).
(b)-(c) The abbreviations used for the values of `prototypical text category' and `text type' in the <Q code are as follows (the code <T is further discussed in (11), and the code <Z in (22)):

`Prototypical text category' in <Q code
EX = `expository'
IR = `instruction religious'
IS = `instruction secular'
IS/EX = `instruction secular'/`expository'
NI = `narration imaginative'
NN = `narration non-imaginative'
STA = `statutory'
XX = `none of the above'

`Text type' in <Q code
BIA = `biography, autobiography'
BIBLE = `Bible'
BIL = `biography, life of a saint'
BIO = `biography, other'
COME = `drama, comedy'
CORO = `correspondence, non-private'
CORP = `correspondence, private'
DEPO = `proceeding, deposition'
DIARY = `diary'
DOC = `document'
EDUC = `educational treatise'
FICT = `fiction'
GEO = `geography'
HANDA = `handbook, astronomy'
HANDM = `handbook, medicine'
HANDO = `handbook, other'
HIST = `history'
HOM = `homily'
LAW = `law'
MYST = `drama, mystery play'
NEWT = `New Testament'
OLDT = `Old Testament'
PHILO = `philosophy'
PREF = `preface' or `epilogue'
RELT = `religious treatise'
ROM = `romance'
RULE = `rule'
SCIA = `science, astronomy'
SCIM = `science, medicine'
SCIO = `science other'
SERM = `sermon'
TRAV = `travelogue'
TRI = `proceeding, trial'
XX = `none of the above'

(d) Abbreviated titles are listed in alphabetical order in Part Three of this guide and can be used to look up the source references to the texts (cf. Section 5.).

(3) <N = `name of text'

Names ascribed to texts reflect, by and large, the key words of the title (or author or type) of the text, e.g.
<N GREG DIAL C>

stands for `Gregory's Dialogues', MS C. Many `names of text' in the Old English section of the Corpus reflect those given in Healey and Venezky 1980, cf. Note 5); many of those adopted for the Middle English section are based on the title stencils given in the Middle English Dictionary (eds. H. Kurath, S. M. Kuhn etc., Ann Arbor, Michigan: University of Michigan Press, 1954-). In the Middle and Early Modern English sections the words LET TO stand for `letter to'.

(4) <A = `author'

The names of the authors, when known (in full), are given in the order `surname' - `first name', e.g.
<A WAERFERTH>
<A CHAUCER GEOFFREY>

(5) <C = `part of corpus'

The value for the sub-period represented by the text is ascribed as a combination of the original and manuscript versions (cf. codes <O and <M, below). When the values coincide, only one figure is given; when they differ, the value given for the original version precedes that given for the manuscript, separated by a slash (/). In printed texts the value <M is marked as irrelevant (X).

The organization of individual text files into larger units (Old, Middle and Early Modern English sections) is based on the value given in the code <C (and repeated in the first part of the code <Q). The numbers refer to the successive sub-periods distinguished for the three main periods (for dates, cf. codes <O and <M, below). If the dates of the original and manuscript versions differ, the text file is placed in a larger unit according to the value given to the manuscript; if the value for the manuscript is X (`unknown'), the value of the original version is followed.

Old English Middle English EMod English
O1 M1 E1
O2 MX/1 E2
O1/2 M2 E3
OX/2 MX/2
O3 M3
O2/3 M2/3
OX/3 M4
O4 M2/4
O2/4 M3/4
O3/4 MX/4
OX/4

(6) <O = `date of original'

(7) <M = `date of manuscript'

(8) <K = `contemporaneity'

The overall time spans covering the dates of the original version and the manuscript of the source text are indicated by the codes <O and <M; contemporaneity of the two is specified by the code <K (within the time span of some 40 years). The value SAME in this code means that, as far as we know, the manuscript is the same as the original text (e.g. Ormulum).

Old English

Middle English

EMod English

<O

-850
850-950
950-1050
1050-1150
X

1150-1250
1250-1350
1350-1420
1420-1500
X

1500-1570
1570-1640
1640-1710

<M

- 850
850-950
950-1050
1050-1150

1150-1250
1250-1350
1350-1420
1420-1500
X

X

<K

CONTEMP
NON-CONTEMP
SAME
X

CONTEMP
NON-CONTEMP
SAME
X

X

(9) <D = `dialect'

Old English

Middle English

EMod English

A/X
AM
AM/X
AN
K
K/X
WS
WS/K
WS/A
WS/AM
WS/X

EML
EML/NL
EMO
WML
WMO
NL
NO
NO/EMO
SL
SO
KL
KO
X

ENGLISH

Abbreviations: The elements of mixed dialects are separated by slashes. The final letter in Middle English dialect codings denotes the source of the definition: L = LALME (A Linguistic Atlas of Late Mediaeval English by Angus McIntosh, M. L. Samuels and Michael Benskin. Aberdeen: Aberdeen University Press, 1986); O = source other than LALME.

A = `Anglian'
AM = `Anglian Mercian'
AN = `Anglian Northumbrian'
K = `Kentish'
WS = `West-Saxon'

EML, EMO = `East Midland'
WML, WMO = `West Midland'NL, NO = `Northern'
SL, SO = `Southern'
KL, KO = `Kentish'
X = `unknown'

ENGLISH = `Southern British standard'

(10) <V = `verse' or `prose'

The values VERSE and PROSE occur throughout the main three periods of the Corpus.

(11) <T = `text type'

Old English

Middle English

EMod English

LAW

LAW

LAW

DOCUM

DOCUM

HANDB ASTRONOMY
HANDB MEDICINE

HANDB ASTRONOMY
HANDB MEDICINE
HANDB OTHER

HANDB OTHER

SCIENCE ASTRONOMY

SCIENCE MEDICINE

SCIENCE MEDICINE
SCIENCE OTHER

EDUC TREAT

PHILOSOPHY

PHILOSOPHY

PHILOSOPHY

HOMILY

HOMILY

SERMON

SERMON

RULE

RULE

REL TREAT

REL TREAT

PREFACE/EPIL

PREFACE/EPIL

PROC DEPOS

PROC TRIAL

HISTORY

HISTORY

HISTORY

GEOGRAPHY

TRAVELOGUE

TRAVELOGUE

TRAVELOGUE

DIARY PRIV

BIOGR LIFE SAINT

BIOGR LIFE SAINT

BIOGR AUTO
BIOGR OTHER

FICTION

FICTION

FICTION

ROMANCE

DRAMA MYST

DRAMA COMEDY

LET PRIV
LET NON-PRIV

LET PRIV
LET NON-PRIV

BIBLE

BIBLE

BIBLE

X

X

Abbreviations:
BIOGR AUTO
BIOGR LIFE SAINT
DIARY PRIV
DOCUM
DRAMA MYST
EDUC TREAT
EPIL
HANDB
LET PRIV
PROC DEPOS
REL TREAT

= `biography, autobiography'
= `biography, life of a saint'
= `diary private'
= `document'
= `drama, mystery play'
= `educational treatise'
= `epilogue'
= `handbook'
= `letter private'
= `proceeding, deposition'
= `religious treatise'

(12) <G = `relationship to foreign original'

(13) <F = `foreign original'

Old English

Middle English

EMod English

<G

GLOSS
TRANSL
X

TRANSL
X

TRANSL
X

<F

LATIN
X

LATIN
LATIN/FRENCH
FRENCH
DUTCH
X

LATIN
OTHER
X

(14) <W = `relationship to spoken language'

Old English

Middle English

EMod Englis

X

WRITTEN
SCRIPT
X

WRITTEN
SCRIPT
SPEECH-BASED

Abbreviations: SCRIPT = `written to be spoken'

(15) <X = `sex of author'

(16) <Y = `age of author'

(17) <H = `social rank of author'

Old English

Middle English

EMod English

<X

X

MALE
FEMALE
X

MALE
FEMALE
X

<Y

X

-20
20-40
40-60
60-
X

-20
20-40
40-60
60-
X

<H

X

HIGH
HIGH PROF
PROF
PROF HIGH
OTHER
X

HIGH
HIGH PROF
HIGH OTHER
PROF
PROF HIGH
OTHER
X

Abbreviations:

HIGH PROF = `the author is moving from higher social ranks to professional ranks'

(18) <U = `audience description'

(19) <E = `participant relationship'

(20) <J = `interaction'

(21) <I = `setting'

Old English

Middle English

EMod English

<U

X

PROF
NON-PROF
X

PROF
NON-PROF
X

<E

X

INT DOWN
INT UP
INT EQUAL
DIST DOWN
DIST UP
X

INT DOWN
INT UP
DIST DOWN
DIST UP
DIST EQUAL
X

<J

X

INTERACTIVE
X

INTERACTIVE
X

<I

X

INFORMAL
FORMAL
X

INFORMAL
FORMAL
X

Abbreviations:

PROF = `the work is intended for a professional audience'
INT DOWN = `intimate down' = `participants have an intimate relationship, the author in a superior position to the addressee'
DIST UP = `distant up' = `participants have a distant relationship, the author in a lower position than the addressee'

(22) <Z = `prototypical text category'

Finally, the texts have been grouped in larger categories, which are presumed to reflect the continuity of the types of text represented throughout the history of English.

Old English

Middle English

EMod English

STAT
INSTR SEC
INSTR REL
EXPOS
NARR NON-IMAG
NARR IMAG
X

STAT
INSTR SEC
INSTR REL
EXPOS
NARR NON-IMAG
NARR IMAG
X

STAT
INSTR SEC
INSTR REL
EXPOS
NARR NON-IMAG
NARR IMAG
X

Abbreviations:

EXPOS = `expository'
INSTR REL = `instruction religious'
INSTR SEC = `instruction secular'
NARR IMAG = `narration imaginative'
NARR NON-IMAG = `narration non-imaginative'
STAT = `statutory'

(23) <S = `sample'

(24) <P = `page'

On the one hand, the category of `sample' has been applied as a flexible means of marking the different extracts drawn from one text. Conversely, the category has been used to group several related texts or text extracts (e.g. charters fulfilling one and the same set of parameter values, letters written by the representatives of one and the same family etc.).

When confusion might arise, page numbers include volume references; columns are marked with C1 and C2, e.g.

reads `first volume, page 64, column 1'. When page numbers are not indicated, quarto (or folio) numbers are referred to instead (R = `recto', V = `verso'), e.g.,

reads `quarto E4, recto'.

(25) <R = `record'

The category `record' occurs in the Old English section of the Corpus only, as keyed in for the purposes of the Toronto Corpus (Healey and Venetzky 1980, cf. Note 5).

4. Distribution Formats

4.1. Availability

At the moment the different versions of the Helsinki Corpus are distributed by the HIT Centre/Humanities Information Technologies Research Programme (Allégaten 27, N-5007 Bergen, Norway; fax: +47-55-589470; e-mail: icame@hit.uib.no) and the Oxford Text Archive (Oxford University Computing Services, 13 Banbury Road, Oxford OX2 6NN, United Kingdom; fax: +44-1865-273275; e-mail: info@ota.ahds.ac.uk).

For tape and diskette formats offered, please consult the order forms of the distributors.

The following versions are available (four sections of this guide accompany all versions as text files, see above):

1. For mainframe use the material is available in two versions:
(1) 242 source files

(2) 1-file version, created by appending the 242 source files one after another

The tape distributed by the HIT Centre contains the 1-file version of the Corpus and the 242 source files (10 Mb + 10 Mb in size; in ASCII or EBCDIC code).

The tape distributed by the Oxford Text Archive contains the 1-file version of the Corpus (10 Mb in size; in ASCII or EBCDIC code).

2. For microcomputer use the material is available in two main versions distributed by the HIT Centre only:

(1) Text files for personal computers (MS-DOS or Macintosh)

The 1-file version and the 242 source files are distributed as compressed files to be decompressed with a program that accompanies the diskettes. The compressed files are organized to follow the sub-section division of the Corpus.

(2) WordCruncher files for MS-DOS machines
To use the WordCruncher files of the Corpus, first acquire the WordCruncher program (version 4.1 or 4.30) and make sure that it runs on your machine. The WCView part of the program permits carrying out searches on words and phrases; the WCIndex part of the program is needed for re-indexing the text files into new versions. The program can be obtained from the Johnston and Company (P.O. Box 446, American Fork, Utah 84003-0446, U.S.A.; fax: +1-801-756-0242), or from the HIT Centre.

Three WordCruncher versions of the Corpus are available:

1-file version: all the material in one file

3-file version: the material organized into three main period files
11-file version: the material organized into eleven sub-period sections

For further details, see Section 4.2. and Section 5.2.

Moreover, several versions of the Corpus are included in the CD-ROM disk ICAME Collection of English Language Corpora available from the HIT Centre (text version for MS-DOS, Macintosh and Unix, and WordCruncher and TACT versions for MS-DOS).

How the 242 files occur in the larger units is explained in Section 4.2.

4.2. Larger Units

For mainframe use the 242 source files were appended to create the one-file version HKI (the order of the 242 files in the HKI file is shown in Section 2 and in Appendix 1).

Respectively, for the WordCruncher version the 242 files were compiled as one HKI file. In addition, mainly for making it easier to include only parts of the Corpus in WordCruncher searches, the following versions were compiled:

(1) eleven sub-period files:
HO1, HO2, HO3, HO4 = Old English sub-sections
HM1, HM2, HM3, HM4 = Middle English sub-sections
HE1, HE2, HE3 = Early Modern English sub-sections

(2) three main period files:

HCO = HO1 + HO2 + HO3 + HO4

HCM = HM1 + HM2 + HM3 + HM4
HCE = HE1 + HE2 + HE3

The following table summarizes how the material is organized (the period code is ascribed according to the reference code <C, cf. Section 3.3.4. (5)). The files under the title MF are available for mainframe use in ASCII or EBCDIC format; the files under the title T are available for microcomputer use; the files under the title WCr are available in WordCruncher format:

Sub-periods

242 files

Larger units

MF/T

WCr

WCr

MF/T/WCr

-850

CODOCU1
CONORTHU

HO1

850-950

COLAW2
CODOCU2
Etc.

HO2

HCO

950-1050

COLAW3
CODOCU3
Etc.

HO3

1050-1150

COLAW4
CODOCU4
Etc.

HO4

1150-1250

CMPERIDI
CMORM
Etc.

HM1

1250-1350

CMDOCU2
CMKENTSE
Etc.

HM2

HCM

HKI

1350-1420

CMDOCU3
CMASTRO
Etc

HM3

1420-1500

CMLAW
CMDOCU4
Etc.

HM4

1500-1570

CELAW1
CEHAND1A
Etc.

HE1

1570-1640

CELAW2
CEHAND2A
Etc.

HE2

HCE

1640-1710

CELAW3
CEHAND3A
Etc.

HE3

Abbreviations: MF = mainframe (ASCII/EBCDIC); T = text file for microcomputers; WCr = WordCruncher

5. Using Concordance Programs

This section is aimed at illustrating the use of the Helsinki Corpus with two ready-made concordance program packages, the Oxford Concordance Program (OCP) and WordCruncher. The early versions of these programs were available or being prepared, when the Corpus project was launched. Though the coding scheme adopted is largely tailored to fit these programs, applications with other programs and operating systems have also proved highly rewarding. A recent application is Lexa, a multi-purpose program package illustrated in Section 5.3.

5.1. Oxford Concordance Program

Given the COCOA format adopted for coding the textual parameters, the Helsinki Corpus text files lend themselves to OCP and Micro-OCP applications as such.

The command SELECT can be used to include only portions of text for processing. Information on the reference categories coded in the Helsinki Corpus is given in Section 3.3. above. The source texts used are listed in Part II (according to authors and names of texts) and Part III (according to abbreviated titles). Part III is intended for immediate identification of the source of an example drawn from the Corpus; Part II gives further information on the samples included from each text.

Please note the following practical points when using the Corpus with OCP:

- the values used to define the reference code categories are given in Section 3.3.4.
- the reference category <Q can be conveniently used
- for obtaining information on the background of the source texts (sub-period, prototypical text category and text type), see Section 3.3.4.
- for identifying the source text on the basis of the abbreviated titles listed in Part III, see the example below.

- the reference category <P can be used
- for obtaining the page or like reference to the source text, see the example below.
- to make sure that all possible word forms of the items looked for will be taken into consideration,
- check the word forms from kwic-concordances prepared with OCP or some other program, or use the WordCruncher version of the Corpus for this purpose.
- check ASD, MED, OED or other basic reference works for possible variant spellings.

- notice that the characters declared as DIACRITICS in the OCP command file may help to sort out the output data but may also exclude occurrences not taken into consideration in the *ACTION section (e.g. if the superscript character is declared as a DIACRITIC, the form d=r= (for d^r and D^r) is not found if only the form dr is given in the PICK command).

- runs may fail if too many types of COMMENTS are defined in the OCP command file when applied to the one-file version or other larger units of the Corpus (the term COMMENTS is used to refer to text which does not appear as headwords but is given in the context of other words). The remedy is usually to reduce the number of types of COMMENTS defined.

By way of illustration, the following OCP file (for the mainframe version 2.30) makes a kwic-concordance of certain occurrences of the impersonal construction me thinks in Middle and Early Modern British English. The search is restricted to the instances written as two words and based on the -i- and -y- stems of the verb. Foreign language is defined as COMMENTS, and the contexts of the occurrences of each keyword are sorted according to what appears to the right of the keyword.

Example:

*input

references cocoa "<" to ">".
text 1 to 80.
comments between "(\" to "\)".
select where C="M1", C="MX/1", C="M2", C="MX/2", C="M3", C="M2/3", C="M4", C="M2/4", C="M3/4", C="MX/4", C="E1", C="E2", C="E3".
{ kwic concordance, ME+EModE, foreign language excluded }

*words

alphabet "A=a +A=+a B=b C=c D=d +D=+d E=e +E=+e F=f G=g +G=+g H=h I=i J=j K=k L=l +L=+l M=m N=n O=o P=p Q=q R=r S=s T=t +T=+t U=u V=v W=w X=x Y=y Z=z 1 2 3 4 5 6 7 8 9 0 &".
padding "` ~ - :' := # [ ] { } ( ) ^ / \ :"".

*action

do concordance.
include only phrases
"me thin*", "me +tin*", "me +din*",
"mee thin*", "mee +tin*", "mee +din*",
"my thin*", "my +tin*", "my +din*",
"mi thin*", "mi +tin*", "mi +din*",
"me thyn*", "me +tyn*", "me +dyn*",
"mee thyn*", "mee +tyn*", "mee +dyn*",
"my thyn*", "my +tyn*", "my +dyn*",
"mi thyn*", "mi +tyn*", "mi +dyn*".
references Q=20, P=8.
contexts sorted by right of keys.
maximum context span L.

*format

layout length 80.

*go

The search carried out on the HKI file yields the following data (only the top of the output file can be given here; the relevant examples are printed in italics; the location of the `text identifier', abbreviated title and `page' codes is indicated above the first example):

`text identifier'

abbreviated title (see Part Three)
`page'

M1 IR RELT VICES1

23

+de-liker, +gif +du

me +din
me +din

1
# uncu+de name

E2 XX COME SHAKESP

43.C2

I say, loue me: By

me thine
me, thine

1
owne true Knight,

E2 NI FICT DELONEY

73

een ready # to tell

me things
me things

1
for my profit, thou

M3 IR HOM NHOM
E1 XX TRI THROCKM
E1 XX TRI THROCKM

II, 205
I,72.C1
I,76.C1

ws +tus changed se.
Queenes owne Mouth,. (^Throckmorton.^)

Me think
Me think
me think
Me think

3
I wate noght what
you ought of right
you would conclude

E1 XX COME STEVENSO
E1 XX COME UDALL
E1 IR SERM LATIMER
E1 XX TRI THROCKM
E2 IS HANDO GIFFORD
E1 XX TRI THROCKM

14
L. 330
26
I,65.C2
A4R
I,72.C2

hyn taketh, And Tyb
is a thombe to day
npreaching prelates
m to the Sea apace,
etily well, but yet
nsell aunswere him?

me thinke
me thinke
me thinke,
me thinke
me thinke
me thinke
Me thinke,

6
at his elbowe almos
I care not to let
I coulde gesse what
it well done, you p
my meate doth me
(^Throckmorton^)

E2 XX COME MIDDLET
E2 EX SCIM CLOWES
E2 IS HANDO GIFFORD
E2 XX COME SHAKESP
E2 XX COME MIDDLET
E1 XX COME STEVENSO

16
28
B1R
55.C2
1
60

me Calues Head now,
nt such greeuances,
g or other, and yet
liquely sham'd, and
a dull Mayd alate;
ur tonges ye holde,

Me thinkes
Me thinkes
me thinkes
me thinkes
me thinkes
me thinkes
Me thinkes

6
her Husbands Head
I cannot speake to
shee frownes at me
there would be no
you had need haue
you shuld remembre

E1 NI FICT HARMAN
E1 XX CORP MORELET
E2 IS HANDO GIFFORD

40
507
E2V

lesse him for me!"
wne conscience. And
(^Sam.^) Yea, that

Me thinketh
"Me thinketh,
me thinketh
me thinketh

3
by the masse, by
in good faith, th
is more than the

E3 EX SCIO HOOKE

13.5,116

ve in it; # though,

me thinks
me thinks,

1
it seems very prob

Etc.

Further searches on the construction should take into consideration instances written as one word (using the PICK WORDS command), forms with other stem vowels and so forth.

5.2. WordCruncher

When compiling the WordCruncher (4.1 or 4.30) versions of the Corpus, the changes in the text files were kept to the minimum. Please note that

- to provide information on the source text and page references, the code <Q (`text identifier') was converted into the 1st-level code |Q (20 characters allowed) and the code <P (`page') into the 2nd-level code |P (eight characters allowed).
- as no 3rd-level codes (a number between 0 and 65535) were used, WordCruncher added the word `Heading' at the end of the reference outline.

- the files can, of course, be re-indexed using other reference codes when need be.

- the larger WordCruncher units are named as follows (cf. 4.2. above):

1-file version:

HKI

(all material)

3-file version:

HCO
HCM
HCE

(Old English)
(Middle English)
(Early Modern English)

11-file version:

HO1, HO2, HO3, HO4
HM1, HM2, HM3, HM4HE1, HE2, HE3

(OE by sub-sections)
(ME by sub-sections)
(EModE by sub-sections)

Which version to use depends largely on the item(s) investigated and on the level of delicacy required when sorting out the output lists. When looking for high-frequency words, working through the HCO, HCM and HCE files one by one may facilitate the task of sorting out the variant forms. Working on low-frequency items, on the other hand, may be handiest when using the HKI file.

By way of illustration, instances of the conjunction as if were retrieved from the Early Modern English texts by combining the possible variant forms of as and if within the space of eight characters. The context was restricted to seven lines. The first four examples obtained run as follows (the location of the `text identifier', abbreviated title and `page' codes is indicated under the first example):

Example:

Computer Book: c:\HC\HCE.BYB

Reference List: as,ase,if,iff,yf,yfe,yff

in euerye quarter a London busshell, or there about. For
the small corne lyeth in the holowe and voyde places of
the greate beanes, and yet shall the greate beanes be solde
as dere, as if they were all together, or derer, as a man
may proue by a famylier ensample. Let a man bye
.C. hearynges, two hearynges for a penye, and an other
.C. hearynges, thre for a peny, and let hym sell these
(E1 IS HANDO FITZH 41:Heading)

`page'

abbreviated title (see Part Three)

`text identifier'

but the Skinne and Musculus fleshe, for the
panicle vnderneth it is of Pericranium, and
the bone is of the Coronal bone. Howebeit there
it is made broade, as yf ther were a double bone,
whiche maketh the forme of the Browes. It is called
the Forhead or Front, from one Eare to the other, and
from the rootes of the Eares of the head before, vnto y=e=
(E1 EX SCIM VICARY 34:Heading)

that thyng semeth cheyfly to be desyred or wished, for the
#
cause
or loue, wherof any thing is desyred. As yf a man would ryde
for cause of helth, he desyreth not so much the mouing to
ryde,
as the effect of his helth. Therfore when that all thyngs be
(E1 XX PHILO BOETHCO 77:Heading)

thy selfe, for thou hast cast thy selfe into the worste
#
thynges.
Like as if thou shouldest loke vpon the foule erth and heuen
in
order (all outwarde thynges leyde apart for the tyme) then
it
(E1 XX PHILO BOETHCO 101:Heading)

5.3. Lexa

The software package Lexa consists of a set of programs intended to perform the following types of task:

(i) automatically tag a file or files grammatically. Tagging is cumulative and permits later revision and re-tagging. Manual tagging with user confirmation is also available.

(ii) generate a database of unique word forms from a group of input files (reverse dictionary generation is also catered for).

(iii) produce a concordance file from a set of input files (KWIC or KWOC type).

(iv) allow various types of information retrieval from any input texts.

The main tagging program (lexa.exe), along with others in the set, work interactively from a desktop with pull-down menus and mouse support or in a batch mode from the DOS command line. The programs have been adapted to deal with the Helsinki Corpus texts, e.g. they recognize comments and special coding in these and can skip (or include) them. The user profile (set up) can be stored in a configuration file and thus can be accessed repeatedly. A font module is included, which allows users to see original Old and Middle English symbols on screen and have these printed as well.

An install program is also included with which one can set up Lexa for one's own system; all programs are MS-DOS executable files which are machine-independent. To avail oneself of the font module, a system must have a VGA, EGA or Hercules Plus video adapter.

Included in the Lexa suite are a number of statistical options. These can be used in conjunction with any text files to carry out standard inferential statistical tests (chi-square test, Wilcoxon or Spearman tests, among many others). Lexical density and frequency tables can be generated, values can be ranked and various calculations later computed.

The above package has been published as three volumes with accompanying diskettes. Interested users of the Helsinki Corpus can obtain further information from the following address:

The HIT Centre/Humanities Information Technologies Research Programme
Allégaten 27
N-5007 Bergen
Norway

For more details on this corpus processing software, see Appendix 3.

Notes to the Preface and Sections 1 to 5

1. Cf. G. Melchers, Studies in Yorkshire Dialects, Based on Recordings of 13 Dialect Speakers in the West Riding (I-II) (= Stockholm Theses in English, 9), Stockholm University, 1972.

2. For previous reports on the structure and compilation of the Corpus, cf. O. Ihalainen, M. Kytö and M. Rissanen, "The Helsinki Corpus of English Texts: Diachronic and Dialectal: Report on work in progress", Corpus Linguistics and Beyond. Proceedings of the Seventh International Conference on English Language Research on Computerized Corpora, ed. W. Meijs (Amsterdam: Rodopi), 1987: 21-32; M. Kytö and M. Rissanen, "The Helsinki Corpus of English Texts: classifying and coding the diachronic part", Corpus Linguistics, Hard and Soft. Proceedings of the Eighth International Conference on English Language Research on Computerized Corpora (= Language and Computers: Studies in Practical Linguistics, 2), eds. M. Kytö, O. Ihalainen and M. Rissanen (Amsterdam: Rodopi), 1988: 169-179; M. Rissanen, "Three problems connected with the use of diachronic corpora", ICAME Journal, 1989 13: 16-19; M. Kytö, "Progress report on the diachronic part of the Helsinki Corpus", idem, pp. 12-15 and "Introduction to the Use of the Helsinki Corpus of English Texts: Diachronic and Dialectal", Proceedings from The Stockholm Conference on the Use of Computers in Language Research and Teaching, September 7-9, 1989, ed. M. Ljung (Stockholm: English Department, University of Stockholm), 1990: 41-56; M. Kytö and M. Rissanen, "A Language in Transition: The Helsinki Corpus of English Texts", ICAME Journal, 1992 16: 7-27.

3. For basics, cf. e.g. U. Weinreich, W. Labov and M. I. Herzog, "Empirical foundations for a theory of language change", Directions for Historical Linguistics: A Symposium, eds. W. P. Lehmann and Y. Malkiel (Austin and London: University of Texas Press), 1968: 95-195; M. Rydén, An Introduction to the Historical Study of English Syntax (= Stockholm Studies in English, LI) (Stockholm: Almqvist and Wiksell International), 1979; S. Romaine, Socio-historical Linguistics, Its Status and Methodology (= Cambridge Studies in Linguistics, 34) (Cambridge, London etc.: Cambridge University Press), 1982; M. Rissanen, "Variation and the study of English historical syntax", Diversity and Diachrony (= Current Issues in Linguistic Theory, 53), ed. D. Sankoff (Amsterdam and Philadelphia: John Benjamins), 1986: 97-109. For a more detailed discussion, cf. Rissanen et al., Early English in the Computer Age: Explorations through the Helsinki Corpus (= Topics in English Linguistics, 11) (Berlin and New York: Mouton de Gruyter, 1993).

4. The word counts are obtained using a microcomputer program devised by Mr. Micha» Jankowski (Adam Mickiewicz University, Poznan, Poland).

5. Cf. A. diPaolo Healey and R. L. Venezky, A Microfiche Concordance to Old English. The List of Texts and Index of Editions (Publications of the Dictionary of Old English, 1) (Toronto: Pontifical Institute of Mediaeval Studies), 1980.

6. Cf. O. Ihalainen, "Creating linguistic databases from machine-readable dialect texts", Methods in Dialectology. Proceedings of the Sixth International Conference Held at the University College of North Wales, 3rd-7th August 1987 (= Multilingual Matters, 48), ed. A. R. Thomas (Clevedon, Philadelphia: Multilingual Matters Ltd), 1988: 569-584; "A source of data for the study of English dialectal syntax: the Helsinki Corpus", Theory and Practice in Corpus Linguistics (= Language and Computers: Studies in Practical Linguistics 4), eds. J. Aarts and W. Meijs (Amsterdam and Atlanta, GA: Rodopi), 1990: 83-103.

7. Cf. R. Garside, "The CLAWS word-tagging system", The Computational Analysis of English. A Corpus-Based Approach, eds. R. Garside, G. Leech and G. Sampson (London and New York: Longman), 1987: 30-41.

8. The character descriptions are based on VAX EDT Reference Manual (VAX/VMS Volume 3A), Software Version VAX/VMS Version 4.0 by Digital Equipment Corporation, Maynard (Massachusetts), September 1984 (pp. A/1-6).

9. Owing to the problems encountered with the WordCruncher substring searches, the asterisk (*) used for `ash', `eth' (*a, *A, *d, *D) etc. in the pilot versions of the Corpus has been replaced by the plus sign (+) in the current version.

10. Two dashes in the examples cited indicate material deleted here.

11. Cf. WordCruncher. Text Indexing and Retrieval Software (Versions 4.1 and 4.30) (Provo, Utah: Brigham Young University and Electronic Text Corporation), 1987 and 1989; OCP = Oxford Concordance Program, cf. Users' Manual (Version 2), comps. S. Hockey and J. Martin (Oxford: Oxford University Computing Service), 1988; cf. also Users' Manual (Version 1.0), comps. S. Hockey and I. Marriott (Oxford: Oxford University Computing Service), 1984; Micro-OCP = a microcomputer implementation of OCP, comps. S. Hockey and J. Martin, Oxford University Computing Service (Oxford: Oxford University Press), 1988. For further applications, cf., e.g., TACT, User's Guide, Version 1.2 by J. Bradley (Toronto: University of Toronto Computing Services, 1990). A UNIX-based retrieval program devised for the diachronic part of the Corpus is being prepared at the Department of General Linguistics, University of Helsinki.

	Old English	Middle English	EMod English
<O	-850 850-950 950-1050 1050-1150 X	1150-1250 1250-1350 1350-1420 1420-1500 X	1500-1570 1570-1640 1640-1710
<M	- 850 850-950 950-1050 1050-1150	1150-1250 1250-1350 1350-1420 1420-1500 X	X
<K	CONTEMP NON-CONTEMP SAME X	CONTEMP NON-CONTEMP SAME X	X

Old English:	Leena Kahlas-Tarkka, Matti Kilpiö, Ilkka Mönkkönen, Aune Österman
Middle English:	Inkeri Blomstedt, Juha Hannula, Mailis Järviö, Leena Koskinen, Saara Nevanlinna, Tesma Outakoski, Päivi Pahta, Kirsti Peitsara, Irma Taavitsainen
Early Modern English:	Merja Kytö, Anneli Meurman-Solin, Terttu Nevalainen, Helena Raumolin-Brunberg, Ritva Tiusanen

Dec.	Hex.	Char.	Description
40 41 91 93 123 125 92 94	28 29 5B 5D 7B 7D 5C 5E	( ) [ ] { } \ ^	= opening parenthesis (cf. B.1., above) = closing parenthesis (cf. B.1., above) = opening bracket = closing bracket = opening brace = closing brace = back slash = circumflex

Dec.	Hex.	Char.	Description
36 37 42 64 95 124	24 25 2A 40 5F 7C	$ % * @ _ \|	= dollar sign = percent sign = asterisk = commercial at = underline = vertical line

Old English	Middle English	EMod Englis
X	WRITTEN SCRIPT X	WRITTEN SCRIPT SPEECH-BASED
Abbreviations: SCRIPT = `written to be spoken'

Sub-periods	242 files	Larger units
	MF/T	WCr	WCr	MF/T/WCr
-850	CODOCU1 CONORTHU	HO1
850-950	COLAW2 CODOCU2 Etc.	HO2	HCO
950-1050	COLAW3 CODOCU3 Etc.	HO3
1050-1150	COLAW4 CODOCU4 Etc.	HO4
1150-1250	CMPERIDI CMORM Etc.	HM1
1250-1350	CMDOCU2 CMKENTSE Etc.	HM2	HCM	HKI
1350-1420	CMDOCU3 CMASTRO Etc	HM3
1420-1500	CMLAW CMDOCU4 Etc.	HM4
1500-1570	CELAW1 CEHAND1A Etc.	HE1
1570-1640	CELAW2 CEHAND2A Etc.	HE2	HCE
1640-1710	CELAW3 CEHAND3A Etc.	HE3
Abbreviations: MF = mainframe (ASCII/EBCDIC); T = text file for microcomputers; WCr = WordCruncher

*input	references cocoa "<" to ">". text 1 to 80. comments between "(\" to "\)". select where C="M1", C="MX/1", C="M2", C="MX/2", C="M3", C="M2/3", C="M4", C="M2/4", C="M3/4", C="MX/4", C="E1", C="E2", C="E3". { kwic concordance, ME+EModE, foreign language excluded }
*words	alphabet "A=a +A=+a B=b C=c D=d +D=+d E=e +E=+e F=f G=g +G=+g H=h I=i J=j K=k L=l +L=+l M=m N=n O=o P=p Q=q R=r S=s T=t +T=+t U=u V=v W=w X=x Y=y Z=z 1 2 3 4 5 6 7 8 9 0 &". padding "` ~ - :' := # [ ] { } ( ) ^ / \ :"".
*action	do concordance. include only phrases "me thin", "me +tin", "me +din",* "mee thin", "mee +tin", "mee +din",* "my thin", "my +tin", "my +din",* "mi thin", "mi +tin", "mi +din",* "me thyn", "me +tyn", "me +dyn",* "mee thyn", "mee +tyn", "mee +dyn",* "my thyn", "my +tyn", "my +dyn",* "mi thyn", "mi +tyn", "mi +dyn".* references Q=20, P=8. contexts sorted by right of keys. maximum context span L.
*format	layout length 80.
*go

M1 IR RELT VICES1	23	+de-liker, +gif +du	me +din me +din	1 # uncu+de name
E2 XX COME SHAKESP	43.C2	I say, loue me: By	me thine me, thine	1 owne true Knight,
E2 NI FICT DELONEY	73	een ready # to tell	me things me things	1 for my profit, thou
M3 IR HOM NHOM E1 XX TRI THROCKM E1 XX TRI THROCKM	II, 205 I,72.C1 I,76.C1	ws +tus changed se. Queenes owne Mouth,. (^Throckmorton.^)	Me think Me think me think Me think	3 I wate noght what you ought of right you would conclude
E1 XX COME STEVENSO E1 XX COME UDALL E1 IR SERM LATIMER E1 XX TRI THROCKM E2 IS HANDO GIFFORD E1 XX TRI THROCKM	14 L. 330 26 I,65.C2 A4R I,72.C2	hyn taketh, And Tyb is a thombe to day npreaching prelates m to the Sea apace, etily well, but yet nsell aunswere him?	me thinke me thinke me thinke, me thinke me thinke me thinke Me thinke,	6 at his elbowe almos I care not to let I coulde gesse what it well done, you p my meate doth me (^Throckmorton^)
E2 XX COME MIDDLET E2 EX SCIM CLOWES E2 IS HANDO GIFFORD E2 XX COME SHAKESP E2 XX COME MIDDLET E1 XX COME STEVENSO	16 28 B1R 55.C2 1 60	me Calues Head now, nt such greeuances, g or other, and yet liquely sham'd, and a dull Mayd alate; ur tonges ye holde,	Me thinkes Me thinkes me thinkes me thinkes me thinkes me thinkes Me thinkes	6 her Husbands Head I cannot speake to shee frownes at me there would be no you had need haue you shuld remembre
E1 NI FICT HARMAN E1 XX CORP MORELET E2 IS HANDO GIFFORD	40 507 E2V	lesse him for me!" wne conscience. And (^Sam.^) Yea, that	Me thinketh "Me thinketh, me thinketh me thinketh	3 by the masse, by in good faith, th is more than the
E3 EX SCIO HOOKE	13.5,116	ve in it; # though,	me thinks me thinks,	1 it seems very prob

1-file version:	HKI	(all material)
3-file version:	HCO HCM HCE	(Old English) (Middle English) (Early Modern English)
11-file version:	HO1, HO2, HO3, HO4 HM1, HM2, HM3, HM4HE1, HE2, HE3	(OE by sub-sections) (ME by sub-sections) (EModE by sub-sections)