Structure of the WSC (1988-1994)
The WSC comprises different proportions of formal, semi-formal and informal speech. The extracts are divided into 15 categories and these categories cover a range of contexts in which each type of speech is found. In table 8.1, WSC Categories and Word Targets, the categories are grouped in terms of whether they are monologues or dialogues, public or private, scripted or unscripted. The codes assigned to the categories are also provided, along with the word targets for each category.
The formal speech section of the WSC involves all the monologue categories and the DGUs (Parliamentary debate). The semi-formal section is comprised of the interview categories, both public and private: oral history (DPH), social dialect (DPP) and broadcast interviews (DGI). The remaining dialogue categories comprise the informal speech section, with 50% of the overall corpus being comprised of private face-to-face conversations (DPC).
Table 8.1: WSC CATEGORIES AND WORD TARGETS
|
|
|
Target |
|
Broadcast news |
MSN |
24,000 |
|
Broadcast monologue |
MST |
10,000 |
|
Broadcast weather |
MSW |
2,000 |
|
Sports commentary |
MUC |
20,000 |
|
Judge's summation |
MUJ |
4,000 |
|
Lecture |
MUL |
28,000 |
|
Teacher monologue |
MUS |
12,000 |
|
Conversation |
DPC |
500,000 |
|
Telephone conversation |
DPF |
70,000 |
|
Oral history interview |
DPH |
20,000 |
|
Social dialect interview |
DPP |
30,000 |
|
Radio talkback |
DGB |
80,000 |
|
Broadcast interview |
DGI |
80,000 |
|
Parliamentary debate |
DGU |
20,000 |
|
Transactions and Meetings |
DGZ |
100,000 |
|
|
|
1,000,000 |
Table 8.2, WSC Categories Targets and Final Figures, shows the number of words actually collected for each category, as well as the number of extracts. The WSC consists of extracts of approximately 2,000 words (as used by the Brown, LOB and ICE corpora). Exceptions to this target are made when the entire speech event is less than 2,000 words (e.g. weather reports, shop transactions and news bulletins).
Table 8.2: WSC CATEGORIES TARGETS AND FINAL FIGURES
|
|
|
Target |
Transcribed |
MSN |
Broadcast news |
36 |
24,000 |
28,929 |
MST |
Broadcast monologue |
5 |
10,000 |
11,205 |
MSW |
Broadcast weather |
12 |
2,000 |
3,641 |
MUC |
Sports commentary |
10 |
20,000 |
26,010 |
MUJ |
Judge's summation |
2 |
4,000 |
4,489 |
MUL |
Lecture |
14 |
28,000 |
30,406 |
MUS |
Teacher monologue |
8 |
12,000 |
12,496 |
DPC |
Conversation |
226 |
500,000 |
500,363 |
DPF |
Telephone conversation |
46 |
70,000 |
70,156 |
DPH |
Oral history interview |
10 |
20,000 |
21,972 |
DPP |
Social dialect interview |
11 |
30,000 |
31,058 |
DGB |
Radio talkback |
37 |
80,000 |
84,321 |
DGI |
Broadcast interview |
40 |
80,000 |
96,775 |
DGU |
Parliamentary debate |
14 |
20,000 |
22,446 |
DGZ |
Transactions and Meetings |
80 |
100,000 |
102,332 |
|
TOTAL |
551 |
1,000,000 |
1,046,599 |
The word counts for some of the categories include words from individuals whom it was not possible to contact for permission or background information sheets (see section 11.1.1, Who counts as a New Zealander?). The MSN Broadcast news category includes 709 words from such people (2% of words in this category). The DGB Radio talkback includes 49,016 words from such people (58% of words in this category). In all other categories the number of words contributed by such people is negligible.
More information on the different categories in the WSC is provided in section 15, Texts, along with information on each extract included.
The word counts quoted in this guide are based on DOS word counts produced from the original wordperfect files. These files have been converted and reformatted for the release version of the corpus. Word counts in the release version, therefore, may differ slightly.
2. WSC Gender, Ethnicity and Age Breakdowns
The following tables give the final figures for the number of words in each category with a breakdown by gender and by the two main ethnic groups represented - Pakeha and Maori. The age breakdown for the overall corpus is shown in figure 8.1, Age Composition of WSC. In this section, the figures for several of the categories especially MSN and DGB - do not match the figures in table 8.2, WSC Categories Targets and Final Figures, because we do not have demographic information for all participants.
Table 8.3: WSC Composition by gender
|
|
Overall Words
|
|
|
MSN |
Broadcast news |
28,166 |
10,114 |
18,052 |
MST |
Broadcast monologue |
11,205 |
4,453 |
6,752 |
MSW |
Broadcast weather |
3,641 |
388 |
3,253 |
MUC |
Sports commentary |
26,010 |
0 |
26,010 |
MUJ |
Judge's summation |
4,489 |
0 |
4,489 |
MUL |
Lecture |
30,406 |
11,151 |
19,255 |
MUS |
Teacher monologue |
12,493 |
9,479 |
3,014 |
DPC |
Conversation |
500,332 |
301,521 |
198,811 |
DPF |
Telephone conversation |
70,156 |
50,554 |
19,602 |
DPH |
Oral history interview |
21,972 |
12,760 |
9,212 |
DPP |
Social dialect interview |
31,058 |
14,083 |
16,975 |
DGB |
Radio talkback |
35,304 |
6,554 |
28,750 |
DGI |
Broadcast interview |
96,696 |
36,043 |
60,653 |
DGU |
Parliamentary debate |
22,446 |
6,349 |
16,097 |
DGZ |
Transactions and Meetings |
102,122 |
52,826 |
49,296 |
TOTAL |
|
996,496 |
516,275 |
480,221 |
|
|
|
52% |
48% |
The WSC data was collected between 1988 and 1994. The New Zealand overall population figures from the 1991 Census show that 51% of the population was female and 49% male. (Census figures are taken from New Zealand Official Yearbook, 95th edition, Department of Statistics 1992.)
Table 8.4: WSC Composition by ethnicity
|
|
Overall Words
|
|
|
MSN |
Broadcast news |
28,166 |
20,300 |
7,204 |
MST |
Broadcast monologue |
11,205 |
11,205 |
0 |
MSW |
Broadcast weather |
3,641 |
3,641 |
0 |
MUC |
Sports commentary |
26,010 |
24,732 |
0 |
MUJ |
Judge's summation |
4,489 |
4,489 |
0 |
MUL |
Lecture |
30,406 |
26,315 |
4,091 |
MUS |
Teacher monologue |
12,493 |
10,360 |
0 |
DPC |
Conversation |
500,332 |
366,047 |
92,154 |
DPF |
Telephone conversation |
70,156 |
62,985 |
1,689 |
DPH |
Oral history interview |
21,972 |
21,972 |
0 |
DPP |
Social dialect interview |
31,058 |
706 |
30,352 |
DGB |
Radio talkback |
35,304 |
31,226 |
1,765 |
DGI |
Broadcast interview |
96,696 |
56,735 |
39,466 |
DGU |
Parliamentary debate |
22,446 |
22,257 |
189 |
DGZ |
Transactions and Meetings |
102,122 |
92,772 |
3,771 |
TOTAL |
|
996,496 |
755,742 |
180,681 |
|
|
|
76% |
18% |
Ethnicity figures from the 1991 New Zealand Census show Pakeha constitute 73.8% of the population and Maori 12.9%. In collecting the WSC an effort was made to ensure that a reasonable proportion of the data was collected from Maori (see section 11.12, Ethnic and Gender Representation).
Figure 8.1: Age Composition of WSC (Number of words by age group)
Figure 8.1, Age Composition of WSC, shows the number of words contributed to the WSC by each age group. In figure 8.2, Age Comparison for WSC and New Zealand Population, an age comparison between the WSC and Overall New Zealand population figures is provided (estimated for 1990 from figures in the New Zealand Official Yearbook, 95th edition, Department of Statistics 1992). WSC figures show the percentage of words contributed to the corpus by each age group, while the overall population figures show the percentage of the adult population in each age group.
Figure 8.2: Age Comparison for WSC and New Zealand
Population
As mentioned earlier, WSC and the spoken component of ICE-NZ share 9 categories. The following table lists the categories which are shared, the number of words collected for each corpus and the actual number of words which are shared. WSC extracts which are included in both corpora are identified in section 15, Texts.
Table 8.5: WSC and ICE-NZ OVERLAP
|
|
|
|
Overlap |
MSN |
Broadcast news |
28,929 |
40,396 |
26,401 |
MST |
Broadcast monologue |
11,205 |
45,276 |
11,205 |
MUC |
Sports commentary* |
26,010 |
52,007 |
26,010 |
MUJ |
Judge's summation |
4,489 |
22,473 |
4,489 |
DPC |
Conversation |
500,363 |
206,624 |
203,864 |
DPF |
Telephone conversation |
70,156 |
22,688 |
22,688 |
DGI |
Broadcast interview |
96,775 |
21,810 |
0 |
DGU |
Parliamentary debate |
22,446 |
22,446 |
22,446 |
DGZ |
Transactions and Meetings |
102,332 |
22,145 |
22,145 |
TOTAL |
|
|
|
339,248 |
*ICE-NZ's commentary section is not limited to sports commentary.
The spoken component of ICE-NZ is still being finalised, so these figures are not final.