Structure of the WSC (1988-1994)

 

1.  Categories of the WSC

The WSC comprises different proportions of formal, semi-formal and informal speech. The extracts are divided into 15 categories and these categories cover a range of contexts in which each type of speech is found. In table 8.1, WSC Categories and Word Targets, the categories are grouped in terms of whether they are monologues or dialogues, public or private, scripted or unscripted. The codes assigned to the categories are also provided, along with the word targets for each category.

The formal speech section of the WSC involves all the monologue categories and the DGUs (Parliamentary debate). The semi-formal section is comprised of the interview categories, both public and private: oral history (DPH), social dialect (DPP) and broadcast interviews (DGI). The remaining dialogue categories comprise the informal speech section, with 50% of the overall corpus being comprised of private face-to-face conversations (DPC).

 

Table 8.1: WSC CATEGORIES AND WORD TARGETS

  •  
  •  
  • Category
  •  

    Text Category

     

    Code

     

    Word

    Target

  • Monologue

  • Broadcast news

    MSN

    24,000

  • Public scripted, broadcast

  • Broadcast monologue

    MST

    10,000

     

    Broadcast weather

    MSW

    2,000

  •  Monologue

  • Sports commentary

    MUC

    20,000

  •  Public unscripted

  • Judge's summation

    MUJ

    4,000

     

    Lecture

    MUL

    28,000

     

    Teacher monologue

    MUS

    12,000

  •  Dialogue

  • Conversation

    DPC

    500,000

  •  Private

  • Telephone conversation

    DPF

    70,000

     

    Oral history interview

    DPH

    20,000

     

    Social dialect interview

    DPP

    30,000

  •  Dialogue

  • Radio talkback

    DGB

    80,000

  • Public
  • Broadcast interview

    DGI

    80,000

     

    Parliamentary debate

    DGU

    20,000

     

    Transactions and Meetings

    DGZ

    100,000

  •  TOTAL

  •  

     

    1,000,000

     

    Table 8.2, WSC Categories – Targets and Final Figures, shows the number of words actually collected for each category, as well as the number of extracts. The WSC consists of extracts of approximately 2,000 words (as used by the Brown, LOB and ICE corpora). Exceptions to this target are made when the entire speech event is less than 2,000 words (e.g. weather reports, shop transactions and news bulletins).

     

    Table 8.2: WSC CATEGORIES – TARGETS AND FINAL FIGURES

     

    Code

     

    Text Category

     

    Number of Extracts

     

    Word

    Target

     

    Words

    Transcribed

    MSN

    Broadcast news

    36

    24,000

    28,929

    MST

    Broadcast monologue

    5

    10,000

    11,205

    MSW

    Broadcast weather

    12

    2,000

    3,641

    MUC

    Sports commentary

    10

    20,000

    26,010

    MUJ

    Judge's summation

    2

    4,000

    4,489

    MUL

    Lecture

    14

    28,000

    30,406

    MUS

    Teacher monologue

    8

    12,000

    12,496

    DPC

    Conversation

    226

    500,000

    500,363

    DPF

    Telephone conversation

    46

    70,000

    70,156

    DPH

    Oral history interview

    10

    20,000

    21,972

    DPP

    Social dialect interview

    11

    30,000

    31,058

    DGB

    Radio talkback

    37

    80,000

    84,321

    DGI

    Broadcast interview

    40

    80,000

    96,775

    DGU

    Parliamentary debate

    14

    20,000

    22,446

    DGZ

    Transactions and Meetings

    80

    100,000

    102,332

     

    TOTAL

    551

    1,000,000

    1,046,599

     

    The word counts for some of the categories include words from individuals whom it was not possible to contact for permission or background information sheets (see section 11.1.1, Who counts as a New Zealander?). The MSN Broadcast news category includes 709 words from such people (2% of words in this category). The DGB Radio talkback includes 49,016 words from such people (58% of words in this category). In all other categories the number of words contributed by such people is negligible.

    More information on the different categories in the WSC is provided in section 15, Texts, along with information on each extract included.

    The word counts quoted in this guide are based on DOS word counts produced from the original wordperfect files. These files have been converted and reformatted for the release version of the corpus. Word counts in the release version, therefore, may differ slightly.

     

    2.  WSC Gender, Ethnicity and Age Breakdowns

    The following tables give the final figures for the number of words in each category with a breakdown by gender and by the two main ethnic groups represented - Pakeha and Maori. The age breakdown for the overall corpus is shown in figure 8.1, Age Composition of WSC. In this section, the figures for several of the categories – especially MSN and DGB - do not match the figures in table 8.2, WSC Categories – Targets and Final Figures, because we do not have demographic information for all participants.

     

    Table 8.3: WSC Composition by gender

     

    Code

     

    Text Category

     

    Overall

    Words

     

     

    Number from Females

     

    Number from Males

    MSN

    Broadcast news

    28,166

    10,114

    18,052

    MST

    Broadcast monologue

    11,205

    4,453

    6,752

    MSW

    Broadcast weather

    3,641

    388

    3,253

    MUC

    Sports commentary

    26,010

    0

    26,010

    MUJ

    Judge's summation

    4,489

    0

    4,489

    MUL

    Lecture

    30,406

    11,151

    19,255

    MUS

    Teacher monologue

    12,493

    9,479

    3,014

    DPC

    Conversation

    500,332

    301,521

    198,811

    DPF

    Telephone conversation

    70,156

    50,554

    19,602

    DPH

    Oral history interview

    21,972

    12,760

    9,212

    DPP

    Social dialect interview

    31,058

    14,083

    16,975

    DGB

    Radio talkback

    35,304

    6,554

    28,750

    DGI

    Broadcast interview

    96,696

    36,043

    60,653

    DGU

    Parliamentary debate

    22,446

    6,349

    16,097

    DGZ

    Transactions and Meetings

    102,122

    52,826

    49,296

     

    TOTAL

     

    996,496

    516,275

    480,221

     

     

     

    52%

    48%

     

    The WSC data was collected between 1988 and 1994. The New Zealand overall population figures from the 1991 Census show that 51% of the population was female and 49% male. (Census figures are taken from New Zealand Official Yearbook, 95th edition, Department of Statistics 1992.)

     

     

    Table 8.4: WSC Composition by ethnicity

     

    Code

     

    Text Category

     

    Overall

    Words

     

     

    Number from Pakeha

     

    Number from Maori

    MSN

    Broadcast news

    28,166

    20,300

    7,204

    MST

    Broadcast monologue

    11,205

    11,205

    0

    MSW

    Broadcast weather

    3,641

    3,641

    0

    MUC

    Sports commentary

    26,010

    24,732

    0

    MUJ

    Judge's summation

    4,489

    4,489

    0

    MUL

    Lecture

    30,406

    26,315

    4,091

    MUS

    Teacher monologue

    12,493

    10,360

    0

    DPC

    Conversation

    500,332

    366,047

    92,154

    DPF

    Telephone conversation

    70,156

    62,985

    1,689

    DPH

    Oral history interview

    21,972

    21,972

    0

    DPP

    Social dialect interview

    31,058

    706

    30,352

    DGB

    Radio talkback

    35,304

    31,226

    1,765

    DGI

    Broadcast interview

    96,696

    56,735

    39,466

    DGU

    Parliamentary debate

    22,446

    22,257

    189

    DGZ

    Transactions and Meetings

    102,122

    92,772

    3,771

     

    TOTAL

     

    996,496

    755,742

    180,681

     

     

     

    76%

    18%

     

    Ethnicity figures from the 1991 New Zealand Census show Pakeha constitute 73.8% of the population and Maori 12.9%. In collecting the WSC an effort was made to ensure that a reasonable proportion of the data was collected from Maori (see section 11.12, Ethnic and Gender Representation).

     

     

    Figure 8.1: Age Composition of WSC (Number of words by age group)

     

    Figure 8.1, Age Composition of WSC, shows the number of words contributed to the WSC by each age group. In figure 8.2, Age Comparison for WSC and New Zealand Population, an age comparison between the WSC and Overall New Zealand population figures is provided (estimated for 1990 from figures in the New Zealand Official Yearbook, 95th edition, Department of Statistics 1992). WSC figures show the percentage of words contributed to the corpus by each age group, while the overall population figures show the percentage of the adult population in each age group.

     

     


    Figure 8.2: Age Comparison for WSC and New Zealand Population

     

     

    3.  WSC and ICE-NZ OVERLAP

    As mentioned earlier, WSC and the spoken component of ICE-NZ share 9 categories. The following table lists the categories which are shared, the number of words collected for each corpus and the actual number of words which are shared. WSC extracts which are included in both corpora are identified in section 15, Texts.

    Table 8.5: WSC and ICE-NZ OVERLAP

     

    Code

     

    Text Category

     

    Words WSC

     

     

    Words ICE-NZ

     

    Actual

    Overlap

    MSN

    Broadcast news

    28,929

    40,396

    26,401

    MST

    Broadcast monologue

    11,205

    45,276

    11,205

    MUC

    Sports commentary*

    26,010

    52,007

    26,010

    MUJ

    Judge's summation

    4,489

    22,473

    4,489

    DPC

    Conversation

    500,363

    206,624

    203,864

    DPF

    Telephone conversation

    70,156

    22,688

    22,688

    DGI

    Broadcast interview

    96,775

    21,810

    0

    DGU

    Parliamentary debate

    22,446

    22,446

    22,446

    DGZ

    Transactions and Meetings

    102,332

    22,145

    22,145

     

    TOTAL

     

     

     

    339,248

    *ICE-NZ's commentary section is not limited to sports commentary.

    The spoken component of ICE-NZ is still being finalised, so these figures are not final.