Vocabulary and English For Specific Purposes Research - Quantitative and Qualitative Perspectives ( - PDFCOFFEE.COM (2024)

Vocabulary and English for Specific Purposes Research

Vocabulary and English for Specific Purposes Research provides an important contribution to the study of vocabulary and its relationship to ESP research and teaching. Presenting Coxhead’s original research plus a comprehensive review of research in this field, this volume advances understanding of the theoretical and methodological research in this area, and relates the findings to ESP teaching. Key features include the following: an outline of the nature and role of vocabulary in ESP from both quantitative and qualitative approaches; analysis of context in vocabulary research in four key areas; and a review of the application of vocabulary research to professional and pedagogical practice. Written by a leading researcher, Vocabulary and ESP Research provides key reading for those working in this area. Averil Coxhead is a Senior Lecturer in the School of Linguistics and Applied Language Studies, Victoria University of Wellington, New Zealand.

Routledge Research in English for Specific Purposes Series editors: Brian Paltridge and Sue Starfield

Routledge Research in English for Specific Purposes is a series of monograph studies showcasing cutting-edge research in the field of English for Specific Purposes. Books in this series provide theoretically innovative and empirically rigorous examples of research that advance understanding of topics within ESP, each providing a comprehensive background, a survey of modern research and avenues for future exploration in the area. Brian Paltridge is Professor of TESOL at the University of Sydney. He has taught English as a second language in Australia, New Zealand and Italy and has published extensively in the areas of academic writing, discourse analysis and research methods. He is editor emeritus for the journal English for Specific Purposes and coedited the Handbook of English for Specific Purposes (Wiley, 2013). Sue Starfield is a Professor in the School of Education and Director of The Learning Centre at the University of New South Wales. Her research and publications include tertiary academic literacies, doctoral writing, writing for publication, identity in academic writing and ethnographic research methods. She is a former editor of the journal English for Specific Purposes and coeditor of the Handbook of English for Specific Purposes (Wiley, 2013). www.routledge.com/Routledge-Research-in-English-for-Specific-Purposes/bookseries/RRESP

Titles in this series

Aviation English Dominique Estival, Candace Farris and Brett Molesworth Vocabulary and English for Specific Purposes Research Quantitative and Qualitative Perspectives Averil Coxhead

Vocabulary and English for Specific Purposes Research Quantitative and Qualitative Perspectives

Averil Coxhead

First published 2018 by Routledge 2 Park Square, Milton Park, Abingdon, Oxon OX14 4RN and by Routledge

711 Third Avenue, New York, NY 10017 Routledge is an imprint of the Taylor & Francis Group, an informa business © 2018 Averil Coxhead The right of Averil Coxhead to be identified as author of this work has been asserted by her in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging-in-Publication Data A catalog record for this title has been requested ISBN: 978-1-138-96313-9 (hbk) ISBN: 978-1-315-14647-8 (ebk) Typeset in Sabon by Apex CoVantage, LLC

Contents

List of figures List of tables Acknowledgements 1 Introduction 2 Approaches to identifying specialised vocabulary for ESP 3 The role and value of word list research for ESP 4 Multi-word units and metaphor in ESP 5 Specialised vocabulary in secondary school/Middle School 6 Pre-university, undergraduate and postgraduate vocabulary 7 Specialised vocabulary research and the professions 8 Vocabulary in the trades 9 Vocabulary research and ESP: curriculum, classroom tasks and materials design and testing 10 Future directions and conclusion Appendix 1 Appendix 2 References Index

Figures

3.1 Examples of the target word stress in an Electrical Engineering corpus and the BNC spoken corpus using Lex Tutor 3.2 An example of a shell noun mechanism 4.1 Examples from an academic corpus of the consequences of as a frame 4.2 Three patterns of use for on the basis of 5.1 An example of grammar integrated into a Literature class in an international school 5.2 Example from an EAL lesson in an international school 5.3 A section of teacher talk in the German International School grade 6 Mathematics corpus 5.4 Mini solar system text from Hook (2005) with marked GSL, AWL, Science list and words not found in any list 5.5 Two extracts on distillation from an international school Science class, year 6 5.6 Example of a text on taxation from a social enquiry unit at level 5 on tax education and citizenship 5.7 The top ten words in the Middle School Social Studies and History Vocabulary List 6.1 A sample of the Applied Linguistics text showing the various kinds of words 8.1 Connor, a Carpentry tutor, on specialised vocabulary in the trades 8.2 A section from Unit Standard 13036, carry out safe working practices on construction sites 8.3 A sample of text on diesel from a textbook in Automotive Engineering 8.4 Example from a building site interaction in the Carpentry corpus 8.5 Example of specialised trades vocabulary in context: professional writing in Carpentry 8.6 An example of a Builders’ Diary by a student 8.7 ‘Screw’ as a technical and non-technical vocabulary item in a student’s

Builders’ Diaries 8.8 Interview conversation about vocabulary and the Builders’ 8.9 A sample of text on diesel from a textbook in Automotive Engineering 9.1 An example of a vocabulary-related episode from Basturkmen and Shackelford 10.1 Examples of teacher talk from university lectures

Tables

2.1 Quero’s (2015) top ten medical words in a Medical and a general English corpus 2.2 Meanings and distribution of consist, credit and abstract across Science, Engineering and Social Sciences 2.3 Steps in the Chung and Nation scale for Anatomy vocabulary 2.4 Levels, methods and examples from Chujo and Utiyama 3.1 The top 20 lemmas in the AVL 3.2 Fourteen subject areas of the written Science corpus 3.3 Some examples of potential specialised vocabulary from a written Carpentry corpus 3.4 Initial analysis of coverage of Nation’s BNC/COCA frequency lists over a Carpentry corpus 3.5 The most frequent proper nouns in TED Talks six-by-six corpus 3.6 Commonly used specialised items selected by a Carpentry tutor 3.7 Acronyms, abbreviations and Latinate forms from the written academic corpus used for the AWL 3.8 Coverage of the GSL/AWL/Science-specific lists over the four secondary Science textbooks 4.1 Top 20 key academic collocations and their mean frequencies from Durrant 4.2 Collocations to the left and right of analysis 5.1 The first 12 most frequent items in the first three BNC/COCA lists from Delta Mathematics 5.2 Examples of mathematical collocations and multi-word units in Barton and Cox 5.3 The ten most frequent word families in the Middle School Vocabulary List 5.4 Categorisations and examples of Science-specific vocabulary from Ardasheva and Tretter adapted from Miller

6.1 Coverage of the AWL over a range of academic corpora by frequency 6.2 Examples of the distribution of meanings of consist, credit and abstract across three disciplines (%) 6.3 Most frequent academic word families in the sections of the AgroCorpus 6.4 Examples of general, semi-technical and technical vocabulary in Computer Science from Lam 6.5 Most frequent technical items and meanings in Computer Science from West’s second 1,000 words of the GSL 6.6 Examples from Watson-Todd’s opacity-ranked Engineering word list 7.1 Examples of specialised vocabulary from Breeze 7.2 Ten examples from Nelson’s keyword categorisations of Business nouns 7.3 Top 20 Business Service list words 7.4 Examples from Wette and Hawken of a written formal and informal medical terminology test 8.1 The written corpus of the LATTE project 8.2 Examples of high frequency specialised vocabulary in Plumbing, Fabrication and Carpentry 8.3 Questionnaire responses on specialised vocabulary of Carpentry 8.4 Examples of frequent Carpentry words in the Builders’ Diaries up to 6,000 of Nation’ BNC lists and beyond 8.5 Warm-up items for the tutor task 8.6 The first 26 words of the Automotive Engineering (AE) list and all of sublist 13 8.7 Thirty common abbreviations in Fabrication and their meanings 8.8 First 25 Fabrication words by frequency and by alphabet

Acknowledgements

I would like to thank my colleagues and postgraduate students in the School of Linguistics and Applied Language Studies, Victoria University of Wellington, for their advice and feedback on ideas and drafts. Thank you also to all the teacher and student participants and research assistants who have taken part in research that forms a significant portion of this book.

Chapter 1 Introduction

Introduction This book is about vocabulary research in English for Specific Purposes (ESP) – that is, technical or specialised vocabulary. The book is meant for established and new researchers, and interested teachers in ESP and vocabulary studies. The aim of the book is to broadly pull together vocabulary research into ESP in one volume, drawing on the strengths of research in vocabulary studies over recent years. ESP is an umbrella term for many areas of specialisation, including English for Academic Purposes (EAP), Professional and Occupational English and English in the Trades. The volume aims to use these discussions as a way to help build our understandings of vocabulary through the lens of ESP. That said, this is not a book about vocabulary acquisition, per se. ESP vocabulary research includes a broad base of quantitative research, mostly drawing on large-scale, corpus-based analyses of written and some spoken texts in ESP, and a less well-established, but no less important, focus on qualitative research. Qualitative studies can shed light on specialised vocabulary in ways which corpora alone cannot. As Durrant (2014, p. 354) writes, corpus-based studies cannot tell us ‘How students interact with the texts or what they need to be able to know about or do with words to complete their tasks successfully’. Technical vocabulary is known by a large number of different terms in the field (see Nation, 2013), including semi-technical and specialised vocabulary. A well-known distinction is Beck, McKeown and Kucan’s (2013) three-tier model: basic vocabulary (Tier One), high frequency/utility words that are crosscurricular (Tier Two) and low frequency, domain-/area-specific lexis (Tier Three). This book is concerned mostly with Tier Two and Tier Three vocabulary. I use

the term specialised vocabulary. This volume approaches vocabulary research for ESP by looking first at ways to identify this lexis, word list research in the field and multi-word units. The next section focuses on ESP vocabulary in four contexts: secondary school, university, professional and occupational contexts and trades-based education. The final section is on ESP vocabulary research in language curricula, materials design and testing. The book also aims to identify gaps in these fields and suggest possible research to help fill them.

Why is vocabulary important in ESP? There are many reasons why vocabulary is important in ESP, and each chapter in this book begins with reasons for investigating this field. Overall, there are several main reasons common to all these areas. The first reason is closely related to a feature of specialised vocabulary in ESP, which is its limited range of use (Nation, 2013). Defining this lexis can be difficult because we need to decide whether only words which are closely related to the subject are specialised or only those that are unique to the subject area are specialised. If we take the first approach, then the definition is much wider and inclusive. If we take the second approach, then the definition is much narrower and exclusive. For this reason, estimating the size of a technical vocabulary is difficult, because a great deal depends on which approach is taken. Estimates of how much technical vocabulary might be in a text can range from 20% to 30% of a text (Chung & Nation, 2003). If up to one word in three in a line of discipline-specific text could be technical in nature, then the sheer amount and frequency of discipline-specific lexical items in specialised texts is a powerful reason why this vocabulary is important. Nation (2013) points out that Medicine and Botany are fields with large technical vocabularies. Second and foreign language learners need a large vocabulary to cope with their studies in academic or professional environments. Evans and Morrison (2011, p. 203), in a paper on the first-year experience in English-medium higher education in Hong Kong, found a lack of technical

vocabulary to be a major source of difficulty for students. In research into vocabulary in trades education, students report the same problem (Coxhead, Demecheleer & McLaughlin, 2016). Vocabulary research in EAP can help identify the single words and multi-word units these learners need. It can also find out more about the vocabulary these learners use in their writing – for example, Hyland and Tse (2007) and Durrant (2014, p. 353) found that vocabulary use differs across disciplines. To use Durant’s examples, philosophy students use specialised adjectives such as ontological, engineers use specialised nouns and Science students use specialised verbs. Another reason why specialised vocabulary is important is that knowledge of the vocabulary of a field is tightly related to content knowledge of the discipline (Woodward-Kron, 2008). In a longitudinal study of undergraduate students’ academic writing in Education, she writes, The specialist language of a discipline is intrinsic to students’ learning of disciplinary knowledge; students need to show their understanding of concepts, phenomena, relations between phenomena etc. by incorporating the specialist language and terminology of their discipline into their writing accurately. They also need to adopt the specialist language in order to make meaning and engage with disciplinary knowledge. (Woodward-Kron, 2008, p. 246)

This engagement with disciplinary knowledge and vocabulary is important also because it signals belonging to a community which shares the same concepts and understandings of a field (Ivanič, 1998; Wray, 2002). Technical vocabulary in a field may or may not be shared with other technical areas, and learners do not tend to meet this specialised or technical vocabulary outside the discipline of their studies. Medical vocabulary, for example, is typically not included in everyday conversations in English. Plumbing vocabulary tends not to be well known outside the field but can become particularly important in the event of a burst pipe or worse. That said, we all need, at some point, to communicate with plumbers and medical professionals, and it is important that these specialists also know how to help non-specialists understand what they are saying. Vocabulary research can help these endeavours also.

Why am I interested in specialised vocabulary? My interest in this field developed firstly through teaching in language schools in various countries, such as Romania, Hungary and Estonia. The students in these schools were predominantly adult learners, and many had quite low levels of proficiency in English. Many of these students were professionals, for example, heart surgeons, agricultural scientists, teachers and business people, and their language needs did not seem to be well served by the general English textbooks which made up the curricula in the schools. These textbooks and materials had other important functions for the students, such as helpful ways to meet and talk about general topics, and support for language skills development. At a teacher’s conference in Estonia, Larry Selinker, professor emeritus of linguistics at the University of Michigan, gave a talk where he emphasised the importance of empirical research to support learning and teaching. This talk served as a turning point as I began to wonder what sort of empirical research I needed to know about for my teaching, and what assumptions I was making as a teacher. During my postgraduate studies back in Aotearoa/New Zealand, I began to teach EAP. It was during this time that I became more aware of research in vocabulary studies and how it could inform and, in some cases, transform the learning and teaching objectives of a class. I consulted Jim Dickie at Victoria University, a wise lecturer in my postgraduate studies about doing a thesis as part of my master’s study. Jim said, ‘You know what works, but you don’t know why.’ This was another turning point. And then John Read, also then at Victoria University, mentioned that Xue and Nation’s (1984) University Word List needed updating. So I went to talk to Paul Nation. This is how the Academic Word List (AWL) (Coxhead, 2000) research began. I have been lucky enough to be able to have opportunities to talk about research with these and other great colleagues in Aotearoa/New Zealand and in far-flung places many times over the last 20 years.

How is this book organised? The book is organised into three main parts. The first part contains the first three chapters, and they focus on different aspects of research into vocabulary in ESP. Chapter 2 looks at approaches to identifying vocabulary in ESP, from corpusbased approaches with quantitative measures through to qualitative approaches, including, for example, using a scale, consulting experts and consulting a corpus for evidence of language in use. Chapter 2 looks into specialised word lists, which is a fast-moving and fairly large area of research. There seem to be more word lists for ESP than ever before. This chapter looks first of all into developing and validating word lists and then moves into showing how word lists have been used to find out more about the nature of specialised texts, particularly in EAP and for finding out about how many words learners need to deal with the vocabulary of these texts (Nation, 2006). Chapter 4 focuses on multi-word units and metaphor, particularly in EAP, because this is where much of the research is to be found. The multi-word unit section of the chapter draws on research into general and specific collocations for EAP, lexical bundles and academic formulas, and on disciplinary perspectives (for example, work by Hyland, 2008; Biber, 2006; Simpson-Vlach & Ellis, 2010; Liu, 2012, to name a few). Part Two is about vocabulary in a range of contexts, beginning with secondary and Middle School lexis (Greene & Coxhead, 2015) in Chapter 4. Four main subject areas form the main part of this chapter: English Literature, Mathematics, Science and Social Sciences, with examples from written and spoken corpora. Chapter 6 focuses on pre-university, university and postgraduate vocabulary research, which are areas of major activity in EAP. Case studies from a range of subject areas are included, such as Sciences, Agriculture, Engineering, Medicine and Computer Science. Chapter 7 is based on vocabulary in English for Professional and Occupational Purposes, drawing on research into a variety of areas such as Aviation, Legal English and Business and Finance, and occupational vocabulary in Medical Communication and Nursing. The final chapter in this group is on vocabulary in the trades, based on a major research project between

Victoria University of Wellington and the Wellington Institute of Technology. The project investigates discourse and lexical elements of four trades: Carpentry, Plumbing, Automotive Engineering and Fabrication. The vocabulary part of the research into each of these trades is discussed in turn and used to illustrate key aspects of vocabulary for specific purposes. The last part of the book contains two chapters. Chapter 9 is about vocabulary in ESP in relation to teaching, learning and testing. The chapter begins with two overarching frameworks in vocabulary studies: Nation’s (2007) Four Strands and Laufer and Hulstijn’s (2001) Involvement Load Hypothesis, and their relationship to specialised vocabulary in learning and teaching. The chapter also includes a section on using word list research in course design and materials. The final part of the chapter looks at testing in ESP vocabulary research. Chapters 2 to 9 end with a section on limitations of research in these areas. These limitations are picked up in Chapter 10, where five main areas of needed research are discussed: more qualitative research, testing English vocabulary for specialised purposes, theorising in vocabulary studies (Schmitt, 2010), evaluations of specialised vocabulary research when it is incorporated into courses of study and materials design and the need for replication and, finally, widening the areas of research to include more analysis of spoken language, different contexts of research and multi-word units.

Chapter 2 Approaches to identifying specialised vocabulary for ESP

Introduction The focus of this chapter is approaches in research to identifying specialised vocabulary for ESP. The chapter begins with considering why this is an important aspect of lexical research in ESP, before moving on to corpus-based quantitative approaches to identifying vocabulary for ESP. Qualitative approaches follow, including analysing concordances from corpora, consulting experts, using a scale, using a technical dictionary, surveys, questionnaires, glossaries and case study analyses of teachers and learner decision making including analysing student texts for annotations. The chapter ends with a brief discussion of how a reader might carry out a research project in this area.

Why is identifying specialised vocabulary important for ESP? Identifying and categorising academic and disciplinary vocabulary for ESP is important for a range of researchers, learners, teachers and dictionary and materials designers. For researchers, identifying specialised vocabulary in ESP is important because there are many outstanding questions in this field of research. For example, one research question which has been approached in several ways is ‘When does general vocabulary stop and specialised vocabulary begin?’ (see

Hwang & Nation, 1995; Coxhead & Hirsh, 2007 for examples). Word list developers need to be aware of these technical meanings of everyday words so that selection principles can be followed as closely as possible (for more, go to Chapter 3). This research needs to be based on solid principles of selection, guided by the context for learning and the proficiency level of the learners. For dictionary and materials designers, identifying specialised vocabulary is important for deciding what lexical items are included in resources and what kind of attention they are given. A good example of the possible impact of a study of specialised vocabulary is the AWL (Coxhead, 2000, 2016a). This list is widely used in textbook series, dictionaries and paper-based and online materials design. For learners and teachers, identifying this vocabulary is vital for setting goals for learning and for programmes of study, as well as checking a learner’s progress and helping make tomorrow’s vocabulary learning easier (see Nation, 2013). Organising vocabulary learning is important for language learning, so finding out what learners know before they start a course of study can help determine what their vocabulary needs are. These needs might be different depending on the amount of background knowledge in the subject a learner already has and their proficiency in English. There is quite a range of terminology in the research on specialised vocabulary. Here are some examples. Some researchers use the term technical, as in (Chung & Nation, 2003, 2004) who measured the strength of the relationship between the word and the specialised subject area (for more on this research, see the following). Semi-technical vocabulary is a term used by Farrell (1990) to describe words which are not technical (specific to a discipline) or non-technical (everyday). Fraser (2007, 2009) uses the term cryptotechnical in Pharmacology rather than semi-technical (see Chapter 6). More recently, Watson-Todd (2017) has used the term opaque vocabulary, which means words that learners might struggle with in their learning because the general meaning is not the same as the technical meaning (see Chapter 6 for more on this study in Engineering). In this book, specialised vocabulary is used as an umbrella term for vocabulary which relates in some way to its particular discipline, whether the word is high, mid or low frequency (Nation, 2016).

Quantitative approaches to identifying vocabulary for ESP Corpora are commonly used in quantitative studies for identifying vocabulary for ESP in corpus-based research. A corpus is a body of texts of written or spoken language, and, increasingly, a corpus can be multi-media. An early example of multi-media in EAP is Hilary Nesi’s Essential Academic Skills in English (EASE) series (see Nesi, 2001). Corpus-based studies allow for larger-scale investigations of words in context than early studies of page-by-page analysis by hand. A key feature of corpus analysis is that such studies should be relatively easy to replicate, but only if the corpora are publicly available. Unfortunately, many corpora are not made available for further analysis, but some corpora have been made publicly available, such as the British Academic Written English Corpus (BAWE) and the British Academic Spoken English Corpus (BASE) (both available at sketchengine.co.uk) and the Hong Kong Polytechnic University Corpora of Professional English (available at the website for the Research Centre for Professional Communication in English: rcpce.engl.polyu.edu.hk). In specialised vocabulary research, a corpus of a particular field could contain a range of texts. For example, in Nelson’s (n.d.) work on business-specific vocabulary, four corpora were used: the written texts of business people, texts which they read, their spoken texts and texts which they listened to. This organisation of the corpus illustrates Nelson’s concern that a corpus represents, as much as possible, the kinds of documents or texts of that field. Corpus studies have contributed a great deal to our quest to identify and understand more about specialised vocabulary. They have been particularly useful for developing word lists for use in language classrooms and for independent study. Cheng’s (2012) volume on corpus linguistics contains clear step-by-step instructions on extracting lexis from a corpus, for example, to generate a word list and research single words, as well as ways to identify multi-word units in corpora through ngram analysis (see Cheng, 2012, for example). The following are some ways that researchers have investigated corpora to identify specialised vocabulary in ESP.

Corpus comparison A fairly quick way of finding technical vocabulary is using a corpus-comparison approach which involves, obviously, two corpora. One is a specialised corpus and the other is a general-purpose corpus. First of all, items which only occur in the technical or specialised corpus are labelled ‘technical’ whereas those in the general corpus are labelled ‘not technical’. For any comparison like this, there will be clear examples of lexical items which will occur only in the specialised corpus and not in the general corpus. For example, comparing a Carpentry corpus with a fiction corpus, words such as insulation and cladding only occur in the Carpentry corpus (Coxhead, Demecheleer, & McLaughlin, 2016). The second step is to look at shared items between both corpora. Each word is compared using a ratio depending on its frequency in either corpus. For example, Chung (2003) decided on a ratio of 50 occurrences in the specialised corpus to one occurrence in the general corpus and found that with this ratio, lexical items that were 50 times more frequent in the specialised corpus had more than a 90% chance of being technical vocabulary. Chung and Nation (2003) compared the corpus-comparison approach with several other methods of identifying technical vocabulary, including technical dictionaries and using a scale (see the following). They found corpus comparison to be the most practical and effective method option for identifying technical vocabulary. Gardner and Davies (2014) used corpus comparison for their Academic Vocabulary List (AVL), based on the 120-million-word academic subsection of the Corpus of Contemporary American English (COCA) corpus. This subsection of the corpus contains journal articles, newspapers and magazines, with journal articles constituting around two thirds of the sub-corpus. The sub-corpus is divided into 11 disciplines: Business and Finance, Education, Humanities, History, Law and Political Science, Medicine and Health, Philosophy, Religion, Psychology, Science and Technology and Social Science. The smallest section contains 8,030,324 running words and the largest contains 22,777,656 running words. The AVL is explored in more detail in Chapter 3. Table 2.1 Quero’s (2015) top ten medical words in a Medical and a general English corpus

Rank

Word type

Medcorpus frequency

Gencorpus frequency

Medcorpus/Gencorpus ratio

2,463 2,240 4,795 1,653 1,528 1,952 1,853 2,415 1,182 1,743

2 2 5 2 2 3 3 4 2 3

1,231.50 1,120 959 826.50 764 650.67 617.67 603.75 591 581

1 vascular 2 viral 3 lesions 4 lesion 5 meningitis 6 DNA 7 gastrointestinal 8 ventricular 9 atrial 10 CT

Table 2.1 shows the ten most frequent items and their relative frequencies from a comparative analysis of a Medical textbook corpus and a general English corpus from Quero’s (2015) study of Medical vocabulary in textbooks for university study. Note how much more specialised and frequently occurring words such as vascular and viral are in the Medical corpus relative to the general corpus. Abbreviations such as DNA are included in the table because these lexical items are particularly prevalent in Medical texts.

Keyword analysis Like corpus comparison, keywords are determined by looking at their frequency in several corpora and using a statistical formula for comparing them with a norm. One way to do this is to compare the frequency of words in one specialised field against a general corpus. The concept of keyness is linked to the probability of a word occurring in a text. If a word has a high level of keyness, the occurrence is probably not by chance. The Lancaster University Corpus Linguistics website has a good example of the concept of keyness (go to www.lancaster.ac.uk/fss/courses/ling/corpus/blue/l03_2.htm) using an analysis of

Baptist Church newsletters and a general corpus. Paquot’s (2010) Academic Keyword List (AKL) was developed using keyness, range and distribution of vocabulary in two academic written corpora (professional writing and student writing by native speakers of English). The study included single and multi-word items, and incorporated high frequency lexis. Examples from the AKL include same, second, which, scope, requirement, leading, late, according, according to and relation to. The full AKL is available at www.uclouvain.be/en-372126.html. Gilquin, Granger and Paquot (2007) point out the potential of learner corpora for comparative studies between writers in English with different first languages, for example, at different levels of proficiency, and with first language corpora (see Flowerdew, 2014 for more examples of studies using keyword analysis in ESP). A second way to use keyword analysis in specialised texts is exemplified in a study by Grabowski (2015), who wanted to find keywords in a corpus of pharmaceutical English which had four kinds of texts from the field: patient information leaflets, summaries of product characteristics, clinical trial procedures and chapters from academic textbooks. He did not use a general corpus, being instead interested in the keyness of lexical items in the four different kinds of texts in pharmacology – in other words, how each of these different text types is different from or the same as the others. To help with the comparison, Grabowski (2015) decided on a minimum number of occurrences for a word and used the statistical measure of log likelihood – a measure for comparing word frequency in corpora and whether lexical items occur more often in one section of a corpus than another (see McEnery, Xiao & Tono, 2006) to determine the probability of whether an occurrence of a word is by chance. Through this analysis, Grabowski was able to rank the four kinds of texts according to the number of keywords in them. The academic textbooks contained the highest number of keywords and the information leaflets for patients contained the least. Grabowski (2015) then provides examples of keywords linked to the communicative purpose of the texts in the corpus. Kwary (2011) points out that keyword analyses do not take multi-word units into account, stating that this is a drawback of such studies.

Issues and critiques of corpus design A major issue in corpus studies is determining the size of the corpus. A corpus needs to be large enough to ensure that sufficient samples of specialised vocabulary occur for analysis or selection in word lists, for example (see Nation & Sorrell, 2016). For high frequency words, corpora need not be very large, because the words occur often. Similarly, words which are closely related to a specialised field will also tend to occur quite frequently. In Plumbing, for example, a very frequently used word in written texts is pipe. Lower frequency words are more problematic, because they do not occur as often. Specialised corpora for ESP can help narrow down the analysis of texts and focus on specialised vocabulary in particular (see Krisnamurthy & Kosem, 2007; Ghadessy, Henry & Roseberry, 2001). Representativeness of the corpus is an important issue in corpus-based research, and it has a number of elements. One question for representativeness is whether the corpus represents the kind of writing, reading or multi-media ‘text’ which ESP students would be exposed to. For example, Gardner and Davies (2016) take issue with Durrant’s (2016) study of the AVL in university student writing in the BAWE corpus, which suggests that university undergraduate student writing is representative of writing in the disciplines because there are many other kinds of texts which also represent academic writing in the disciplines. This is not to say that investigating student writing is not a valid research activity, but that larger claims need to be based on wider and more representative samples of language. Paquot (2010) focuses on keywords in student writing, arguing that learner corpora shed a different light on academic vocabulary than analyses based on corpora of professional academic writers. Another example of representativeness can be found in the Language in the Trades (LATTE) project (see Chapter 8). The spoken corpus in this study includes both classroom and on-site recordings in the case of Carpentry. This corpus focuses on teacher talk, mostly for practical reasons: building sites are noisy places, multiple microphones would be needed across areas as big as a building site for a house and over 30 microphones would be needed to capture the language use of one whole class out of a possible cohort of 120 students. For Automotive Engineering, recordings include classroom sessions where the talk

changes from more engineering-oriented classroom talk about vehicles to a broader and more general chat about cars (see Parkinson & MacKay, 2016 for more on talk in the trades). Decisions need to be made about the purpose of the research and how it impacts the corpus development (see Nation, 2016 for more on taking account of the purpose of research and corpus development). Miller and Biber (2015) pose another question in relation to representativeness, noting that as corpora get bigger and bigger in word list studies, words occur with different frequency rankings and lists might contain different words. They posit, Corpus-based vocabulary researchers have paid considerable attention to the validity of their lists, usually evaluated through analyses of their predictive power when applied to a new corpus (i.e. the percent coverage of words in a new corpus accounted for by the words in the list). But reliability is a prerequisite to validity, and, in general, corpus-based vocabulary studies have not included evaluations of reliability: the extent to which we would discover the same set of words, ranked in the same order of importance, based on analysis of another corpus that represents the same discourse domain. (Miller and Biber, 2015, p. 33)

Miller and Biber (2015) use a corpus of Psychology textbooks for university study and experiment with techniques to produce a reliable – that is, replicable – subject-specific word list. This task proved particularly tricky to achieve, as texts in corpora can vary in size, topic and number (p. 49). Miller and Biber (2015) dealt with the texts of different lengths by splitting their corpus in half, resulting in two corpora of about 1.75 million words (p. 44). Different approaches to classification systems in corpus development and analysis can pose problems for comparing results from different studies. Decisions need to be made on a principled basis regarding whether a particular field of enquiry fits into one area or another. Becher (1989) classifies academic disciplines in higher education into four dimensions: hard-pure, hard-applied, soft-pure and soft-applied. This classification was used for the BASE and BAWE corpora. Krishnamurthy and Kosem (2007) compare approaches to classifications of academic disciplines across several systems and research studies in corpus

linguistics. Krishnamurthy and Kosem (2007) show that levels of classification can be very different across studies. Compare, for example, the four disciplines of Arts, Commerce, Law and Science in Coxhead (2000) and Becher’s (1989) hardpure, hard-applied, soft-pure and soft-applied used in the Michigan Corpus of Academic Spoken English (MICASE)/BASE/BAWE suite of corpora (see Nesi & Gardner, 2012), ten disciplines used by libraries to classify books drawing on the Dewey Decimal Classification System and 19 in a classification from the Higher Education Statistics Agency (Higher Education Statistics Agency, see www.hesa.ac.uk/jacs.htm). The purpose, scope and size of a study can determine the classification of a corpus; for example, the AWL (Coxhead, 2000) study was based on divisions and fields of study at a university in New Zealand in 1999, with a fairly large Law school but no Engineering or Medical school at that stage. Finally, a particularly important point about corpus analysis is made by Bennett (2010), who states that corpora can only give evidence of what is possible, rather than evidence of what is not possible.

Qualitative approaches to identifying specialised vocabulary for ESP This section looks at some qualitative approaches to find out more about vocabulary in ESP, beginning with corpora which can be used both quantitatively and qualitatively in studies on specialised vocabulary.

Using a corpus for qualitative analysis Qualitative analyses of corpora can provide information about the specificity of vocabulary in corpora, for example, through consulting concordance lines to support lexical decisions in writing for legal purposes (Hafner & Candlin, 2007) and consulting a corpus to check the meaning of technical vocabulary in context, as Coxhead and Demecheleer (under review) do to support the selection of the

specialised vocabulary of Plumbing. Hyland and Tse (2007) investigated the occurrences of words from Coxhead’s AWL (2000) and found variations in the frequency and meaning of AWL items in three disciplines. Table 2.2 shows some examples from Hyland and Tse (2007) (consist, credit and abstract). The second column shows the meanings of these words and their occurrences in Science, Engineering and Social Science corpora. Note the meanings are presented in the order of the highest total occurrences overall. This research shows the value of comparing corpora and close analysis of the context and meaning of target lexical items. Table 2.2 Meanings and distribution of consist, credit and abstract across Science, Engineering and Social Sciences (adapted from Hyland & Tse, 2007, p. 245)

Note that Hyland and Tse’s (2007) corpus includes both professional and student writers, which allows for comparison and contrast. Durrant (2014) also researches disciplinary differences in vocabulary in academic corpora, urging researchers to investigate not only professional writers but also student writing, because the texts which learners read at university are not the same as the texts which they are required to write.

Using a scale (Chung & Nation, 2004) Developing and applying a scale-based approach to identifying specialised vocabulary in a text is a highly principled, if time-consuming, method. Chung and Nation (2003) used a scale to identify the technical vocabulary of an

Anatomy textbook. The overall estimation was that at least one word in three in the Anatomy textbook was technical. In this method, the textbook is being used as a corpus, and lexical items are drawn out of the corpus analysis and then categorised according to a four-step scale. Chung and Nation (2003) outline the four steps in the scale that range from no connection to the field of Anatomy to words that only occur in that field and/or are unlikely to be known outside that field (see Table 2.3). Table 2.3 Steps in the Chung and Nation (2003, p. 105) scale for Anatomy vocabulary

Step in the scale

Definition

Examples

the, is, between, it, by, 12, Words such as function words that have a adjacent, amounts, common, Step meaning that has no particular relationship 1 commonly, directly, with the field of Anatomy. constantly, early, especially Words that have a meaning that is minimally superior, part, forms, pairs, structures, surrounds, Step related to the field of Anatomy in that they 2 describe the positions, movements or features supports, associated, lodges, of the body. protects chest, trunk, neck, abdomen, Words that have a meaning that is closely ribs, breast, cage, cavity, related to the field of Anatomy but are also shoulder, girdle, skin, Step used in general language, or may occur with 3 muscles, wall, heart, lungs, the same meaning in other fields and not be organs, liver, bony, technical terms in those fields. abdominal, breathing

Step 4

Words that have a meaning specific to the field of Anatomy and are not likely to be known in general language.They refer to structures and functions of the body.These words have clear restrictions of usage

thorax, sternum, costal, vertebrae, pectoral, fascia, trachea, mammary, periosteum, hematopoietic, pectoralis, viscera, intervertebral, demifacets,

words have clear restrictions of usage depending on the subject field.

intervertebral, demifacets, pedicle

The researchers found that the majority of the technical vocabulary in the Anatomy textbooks was at step 4 of the scale (64.4%), meaning it was specific to the field and would be virtually unknown to people outside that field. A further 35.6% was at step 3, meaning it was closely related to Anatomy but could also occur in general English and in other fields. Chung and Nation (2003) used the same scale approach on an Applied Linguistics textbook and found that approximately 20% of the lexis was technical. This means that two words in ten in a line of text are technical. This finding means that the Applied Linguistics textbook is more accessible than the Anatomy textbook. Around 88% of the Applied Linguistics technical vocabulary occurred in step 3, while only 11.6% was at step 4 of the scale (vocabulary unique to that subject area). This study is important because it shows that the amount of technical vocabulary can vary from subject to subject (in this case, from one word in three in Anatomy to one word in four in Applied Linguistics) and that there can be high levels of sharing between general English and other fields in areas such as Applied Linguistics and much less in areas such as Anatomy. Categorising words, using a scale as in Chung and Nation’s (2004) study, may seem relatively straightforward, but it demands a great deal of time, skill, specialised knowledge of a field and decision making. A scale like this can be adapted so that teachers and students can decide which words they need to focus on in class or independent learning and why (see Coxhead, 2014a). Schmitt (2010) points out that perhaps the third and fourth steps in the scale are quite closely related. Any subjective method such as using a scale needs results to be checked by another rater – a process outlined clearly by Chung and Nation (2003).

Using a technical dictionary Technical dictionaries are available widely in a range of fields. Chung and Nation (2004) report on using dictionaries to help identify technical vocabulary. They found that this approach was quite straightforward in that the dictionaries had

already been developed using decisions about what was technical and what was not. That said, the researchers also found that many decisions still needed to be made about whether a lexical item was technical or not, because of features of the dictionaries themselves. These features included the size of the dictionaries, whether a headword in a dictionary was in the main or sub-entries and whether word families or single words were to be included in the definitions. Chung and Nation (2004) report that this method of identifying technical vocabulary was about 80% accurate. Not all professional or academic fields have technical dictionaries, and it is important to find out as much as possible about how a dictionary was compiled and how principles for identification were decided on before embarking on a time-consuming analysis of technical vocabulary using this method. Consulting technical dictionaries was an important part of the checking process of developing trades-based vocabulary lists outlined in Coxhead et al. (2016). In this case, the researchers were not experts in the content area, in this case Carpentry, and they used dictionaries as well as corpora and experts to support decision making (see Chapter 8 for more on trades-based vocabulary research).

Consulting an expert Asking an expert to comment on the technical vocabulary of his or her field is an interesting method of identifying this lexis. It may seem to be a direct and helpful method, because these people are well placed to comment and provide guidance. Schmitt (2010) notes that experts might disagree on whether a word is technical or not, because they may have different levels or areas of expert knowledge within a field. Compare, for example, Engineering as a field of expertise. Water, roading and automotive engineers have vastly different areas of expertise. Schmitt (2010) also makes the point that experts may not approach the task of identifying technical vocabulary in the same way. This point became very clear in a study which involved expert opinions on technical words in Carpentry (Coxhead et al., 2016) and in Plumbing (Coxhead & Demecheleer, under review). One set of experts agreed to a high level of technical words in their trade, while

the other experts disagreed wholeheartedly, despite having been given the same instructions for identifying specialised vocabulary. Deciding on the technicality of ESP vocabulary can be quite difficult to do, and answers are likely to vary depending on whether the expert is considering the needs and knowledge of learners who are starting their studies or are part way through, whether the experts themselves actually used the technical words and whether they thought trainee plumbers would know the words already (Coxhead & Demecheleer, under review). Chapter 4 discusses several studies into formulaic language or multi-word units which have called on experts in language, including teachers, in developing word lists for learners (see Simpson-Vlach & Ellis, 2010, for example).

Surveys, interviews and questionnaires Surveys can be a useful approach to gathering perspectives on specialised vocabulary. Coxhead (2011a, 2012a) developed an online survey to find out more about teachers’ perspectives on subject-specific or specialised vocabulary in New Zealand secondary schools. The survey focused on the teaching of specialised vocabulary, including questions on ways that teachers identified this vocabulary, introduced and consolidated their students’ understanding of this lexis and resources and hands-on activities they used in class for learning this vocabulary. The teachers reported being guided by the questions and feedback from their learners when making decisions on what vocabulary to focus on in class (Coxhead, 2011a) (see Chapter 5 for more on this study and vocabulary in secondary school contexts). Peters and Fernández (2013) conducted interviews with Spanish students studying Architecture to find out more about the vocabulary these students focused on and the resources they used in their study of this vocabulary. The students in this study reported that they tended to look up technical words (such as gutter, façade and rubble) in technical dictionaries and that they also had problems learning words that would be considered general or scientific vocabulary, such as framework, sustainability and consumption. The learners reported that these words caused them more difficulty than the more technical

words. The LATTE project also used interviews with tutors and students to investigate specialised vocabulary (see Coxhead et al., 2016; Coxhead et. al., under review; and Chapter 8). Interviews can be time consuming for participants and researchers, but they do allow for in-depth discussion, and checking of concepts and interpretations. Questionnaires were developed in the LATTE project to find out more about the linguistic demands of students’ studies. One question in particular focused on what students considered to be specialised vocabulary (Coxhead et al., 2016). This question asked, What kinds of words do you need to know to study Carpentry? The student responses were compared to a pedagogical word list developed from a written corpus of carpentry, and there was a large overlap between items in the student responses and items in the specialised word list. These overlapping items included Dwangs, frames, galvanised (nails), bevel back, weather boards, cavity, Hardies, claddings, partitions, eliminate, isolate and minimise (see Chapter 8 for more on vocabulary in the trades).

Classroom-based approaches There is a range of different ways in which technical vocabulary for ESP could be identified using classroom-based approaches. Case study analyses of teachers and learner decision making (Coxhead, 2007, 2011a, 2012b) illustrate one such approach. An analysis of learners’ written texts, reading texts and follow-up interviews with learners and teachers in a study by Coxhead (2011b) found that university EAP students had strong views on whether a word was worth learning and using in their writing. Students made decisions on academic vocabulary depending on factors such as: the connection between the target word and the topic of the text; the strength of knowledge of that word; whether university lecturers might appreciate and give good grades for using technical words well; confidence, risk taking and self-belief; and whether a particular word fitted the context and purpose of the writing. Personal beliefs on which words to learn and why in an academic setting were important. Teachers draw on their experience in the classroom to help decide how to

select vocabulary to focus on with their learners (Coxhead, 2011a). In the quote that follows, an English Literature teacher sums up her approach to deciding which words to focus on, Some words are words that every year students seem to struggle with, some are critical to understanding the main idea in the text, some are relevant to language features being identified and, most importantly, some are key English terms that students must be familiar with in assessment conditions. Glossaries in textbooks can be a source of specialised vocabulary for teachers. In highly collaborative contexts, colleagues and heads of departments may decide which specialised words should be the focus of attention in class (Coxhead, 2011a). Observations of teachers and learners in class can shed light on how specialised vocabulary is treated in classrooms. Basturkmen and Shackleford (2015) observed, recorded and transcribed eight hours of first-year accountancy lectures at a university in New Zealand. The researchers focused on languagerelated episodes (LREs) and found that the lecturer initiated more LREs than the students in the accountancy lectures. A total of 76 out of 164 (46%) episodes focused on vocabulary – the highest proportion of LREs in the data set. Lecturers tended to be more pre-emptive in their LREs than reactive – thinking ahead to what might be problematic for their learners in terms of vocabulary. Folse (2010) observed a group of upper intermediate students in a pre-university intensive English program in the USA and looked for episodes of explicit vocabulary focus. He found that there were nearly five vocabulary-focused episodes per class meeting, which added up to around 24 vocabulary-focused episodes per day. In his analysis, Folse (2010) found that that this fairly small number of lexical episodes were not very rich. They tended to be carried out in speech without much visual support, such as writing up a word on the classroom board. Ardasheva and Tretter (2017) used interviews and observations in their study of science-specific vocabulary in secondary schools to shed light on their study of lexis in a Physics textbook (see Chapter 5 for more on this research). Such studies using qualitative methods are invaluable because they can shed light on actual instances of specialised vocabulary in use by both learners and teachers.

Annotations and glossaries Two early pieces of research into academic lexis, Lynn (1973) and Ghadessy (1979), analysed annotations in students’ textbooks and readings as a guide for selecting items for their word lists, on the grounds that learners clearly either had difficulty with these items or considered them important. The reasons a student might choose to annotate one word and not another could vary considerably. Perhaps the simplest way of identifying discipline-specific vocabulary is to be guided by vocabulary-related support in classroom textbooks and readings. Glossaries with specialised vocabulary can appear in margins, appendixes or as part of the frontispiece of a book. In-text definitions are sometimes used for specialised vocabulary. Chung and Nation (2004) discuss using definitions and other clues in an Anatomy textbook (see the aforementioned) to identify the technical vocabulary. Here is a university Biology textbook example with the target word plasticity and the definition which immediately follows it. Note also that fanwort has its Latin name in parentheses (cited in Coxhead, 2017a): Whatever else the fanwort (Cabomba caroliniana) may be, it is a striking example of plasticity – an organism’s ability to alter or “mold” itself in response to local environmental conditions. In electronic versions of texts, these specialised lexical items could be hyperlinked to a dictionary. Other support for specialised vocabulary in textbooks might also provide images such as pictures or graphics to demonstrate a specialised term. Chung and Nation (2004) note that while these in-text clues for the reader might give helpful information about the technical terms in the text, these clues are not the most useful way of building a full list of specialised vocabulary (Chung & Nation, 2004). They reported that this technique was the least accurate of the approaches to identifying technical vocabulary that they analysed, because many technical words in the texts were not defined in the text.

Using multiple measures to identify specialised

vocabulary Kwary (2011) proposes a hybrid approach to identifying specialised vocabulary using three methods. The first step is keyword analysis using corpora, chosen because it creates quickly an automatic list of possible target items. The next step involves systematic classifications using the keywords to help identify multiword units in the corpus. This step is based on lexicographical methods for developing technical dictionaries, and includes subject-field classification, either using a library-based system or by examining specialised texts in the field and can include consultation with subject specialists. The third step involves text analysis to investigate aspects of texts such as abbreviations and symbols which have technical meanings in context. In a search to find a statistical method to identify specialised vocabulary, Chujo and Utiyama (2006) used a section of the British National Corpus (BNC) of Commerce and Finance texts. This corpus of over seven million running words was analysed using seven different statistical measures as Chujo and Utiyama (2006) extracted vocabulary relating to beginner, intermediate and advanced learners. Examples at each of these levels can be seen in Table 2.4, along with the method of extraction. Table 2.4 Levels, methods and examples from Chujo and Utiyama (2006, pp. 261–262)

Level

Method

Examples of specialised business vocabulary

Beginner

Cosine and complimentary similarity measure

market, price, account, share

dividend, exchange, Log-likelihood, chi-square test, and chicapital, credit, payment, Intermediate square test with Yate’s correction stock Advanced

Mutual information and McNemar’s test

asset, shareholder, liquidity, fiduciary, volatility

Chujo and Utiyama (2006) note that the methods used for beginner level extractions brought to surface high frequency words, including quite a high proportion of function words. The average frequency of the words at beginner, intermediate and advanced levels dropped sharply from 58,517 as the average frequency for the Cosine method (beginner) to 134 for the McNemar method (advanced). The length of the words increased from beginner to advanced, as can be seen in Table 2.4. The top-500 items from each type of analysis were then investigated to find out more about their occurrences in the BNC frequency lists (Nation, 2006) and textbooks, as well as analysing their word length and use in a native speaker corpus.

Conclusion This chapter has presented a range of options for identifying and classifying specialised vocabulary for ESP by drawing on the methodologies of researchers from different studies. In some cases, one method only is used – for example, a corpus-based study of vocabulary used by university students across disciplines (Durrant, 2016). In other cases, several types of data are used to identify vocabulary for specific purposes, for example, expert opinions, corpus research and interview data, as in the Carpentry study by Coxhead et al. (2016). New approaches to identifying specialised vocabulary for ESP will emerge as this field of research gains ground and matures. We have already seen how corpus linguistics has made enormous changes in analysing spoken and written texts for technical vocabulary. Some disciplines and fields have been much more extensively explored than others; consider the amount of research on Aviation and Medicine or the amount of university-based research in comparison with secondary or primary-school-based research. Identifying and categorising vocabulary are important steps for research activities such as developing word lists and teaching and learning materials for specific purposes. An issue remains as to whether the technical vocabulary constitutes all the words which are closely related to a subject or only those that are unique to the subject area. The next chapter focuses on word lists in particular.

Chapter 3 The role and value of word list research for ESP

Introduction This chapter focuses on the role and value of word list research for ESP, using examples from a range of corpus-based studies in professional and academic contexts from two perspectives. The first focuses on research into developing and validating word lists and discusses the variety of lexical items that might be considered for selection for word lists, as well as principles for selection. The second investigates the use of word lists for research into vocabulary in ESP, including tracking vocabulary change in a field over time, researching the size of technical vocabulary, and the assessment of lexical knowledge based on word lists. The chapter ends by considering missing elements in word list research in ESP (for more on ways to identify specialised vocabulary, see Chapter 2; multiword unit lists, see Chapter 4; and specialised word lists, see Chapters 5 to 8).

Background and issues in word list research for ESP In vocabulary research, corpus linguistics studies use word lists for various reasons, including identifying, classifying and quantifying the amount or proportion of technical vocabulary in texts. Ways to identify specialised vocabulary were explored in Chapter 2. This kind of research tells us more about

the nature and size of technical vocabulary in particular professional and academic areas, helps us track technical vocabulary change over time, and enables the assessment of the vocabulary load of texts. This chapter focuses on such uses of word lists for vocabulary in ESP. A particular use of word lists is to estimate the coverage of a list over a text or corpus (see Nation, 2016; Nation, 2006; Coxhead, 2000). The purpose of these estimates is to find out how many words are needed to understand a text. Laufer (1989) suggests that 95% coverage is needed for comprehension but later estimates such as Nation (2006) and Hu and Nation (2000; see also Schmitt, Jiang & Grabe, 2011) suggest 98% is needed for written texts. Note that 98% coverage reported in Nation’s (2006) study equates to 8,000 word families (see the section on units of counting that follows for more on word families). In a study of spoken texts, van Zeeland and Schmitt (2013) find 95% coverage is sufficient for listening comprehension. Hsu (2014) is an example of an ESP analysis (using Engineering textbooks) which uses coverage figures to develop a word list for a particular group of learners in Taiwan. For a discussion of coverage and suggestions for replication research in this area, see Schmitt, Cobb, Horst and Schmitt (2017). Word list research has also been driven by the needs of particular groups of language learners and to help set learning goals (Nation, 2016). While early word list development was based perhaps on a mix of judgement and counting words by hand in texts, computer-based approaches have recently made the process quicker. It is now also considerably easier to make word lists, whether they are motivated by the needs of particular learners or not. This, in turn, means that critical assessments are needed of both existing and newly developed lists to determine their usefulness and validity. In pedagogically oriented word list development, the proficiency of the students, the subject area, their needs and the context for their studies all affect how the lists are made. Technical vocabulary would generally be expected to be limited in range to its specialised subject area or discipline and to be well known and regularly used by professionals in that field. People outside the specialised field might have a limited knowledge of that vocabulary, or might have never heard or come across these technical items at all. In some cases, the meaning of a word might be vaguely known by laypeople, but a specialist would be expected to know much more precise information about its meaning, use and nuances. Take

the example of file from Carpentry (a file is used in woodwork or metalwork, for example, to smooth a surface) and consider what a layperson might know about that word (where file commonly means to organise paper into folders in an office) in comparison to a carpenter. A fundamental issue in word list development is whether there is a common core of vocabulary which is shared by all language learners or whether all vocabulary is specialised (Basturkmen, 2006; Coxhead, 2013). This issue is important because it affects the starting point of the development of a word list for ESP. A common core approach attempts to take into account prior or existing knowledge of high frequency vocabulary by learners, whereas the second approach does not make that assumption. Also, a common core approach assumes different disciplines share a common core of academic language, which means an EAP teacher should focus on those items first, leaving the disciplinespecific vocabulary to be learnt outside the EAP class in the learner’s discipline. The teaching context plays a major part in the common core versus specialised debate. In wide-angled contexts where English for General Academic Purposes (EGAP) has been selected, for example in English as a second language contexts such as Aotearoa/New Zealand, Australia, Canada, Britain and the USA, the teacher is likely to have a heterogeneous group of students, and cannot tailor what they do to one discipline. But in more specific models, where the learners are more homogeneous, a more specific approach makes sense (see Basturkmen (2010) for more on wide to narrow angled approaches to ESP). An example of a common core approach is Coxhead’s AWL (2000), which used the first 2,000 word families of West’s (1953) General Service List of English Words (GSL) as a principled high frequency word list. To take into account the existing knowledge of learners studying to enter university or at university in English-medium institutions, items selected for the AWL had to occur outside of West’s GSL. A disadvantage of this approach is that any weaknesses or decisions made in the development of West’s GSL affect the AWL (see for example, Coxhead, 2000; Hyland & Tse, 2007; Nation & Webb, 2011; Gardner & Davies, 2014). One principle which West (1953) employed is ‘coverage’, which meant that he selected items based on their overall utility for language learners. That is, if one item, such as ‘work’ provided coverage of a concept, then he selected that item in favour of other similar items (such as ‘job’). In Coxhead’s (2000) study,

‘job’ met the principles for selection for the AWL and is therefore included in the list. Other examples are ‘end’ (GSL) and ‘final’ (AWL). This principle, therefore, had an effect on the AWL and on other lists based on the GSL/AWL studies. A specialised approach to making word lists would begin with no existing word list to represent existing knowledge. A clear advantage of this approach is that it fits well with particular groups of learners, in terms of their language learning needs. An example of this approach is Ward’s (2009) Engineering word list for lower proficiency university students in Thailand. Ward’s word list contains 299 word types (that is, individual words rather than word families) and covers 16.4% of a corpus of Engineering textbooks. It is important to note that of the 299 word types, 188 are also in the first 1,000 word families in West’s (1953) GSL, 28 are in the second 1,000 of West’s GSL and 78 also occur in the AWL. The top ten words from Ward’s list are system, shown, equation, example, value, design, used, section, flow and given (Ward, 2009). Another example of a specialised approach, but this time for general academic English, is Gardner and Davies’ (2014) AVL. This list contains around 3,000 lemmas. A lemma contains a baseword (for example, develop) and the inflections of the same part of speech (develops, developed, developing). Table 3.1 contains the top-20 lemmas in the AVL, including their part of speech. Gardner and Davies changed the lemma-based list into word families so that they could compare the AVL with other word lists such as Coxhead’s (2000) AWL. The lemma and family lists are downloadable at www.academicwords.info. Note that the items listed in Table 3.1 from the AVL might seem to also be high frequency items in general English. Nation (2016) points out that the selection criteria for the AVL resulted in high frequency lemmas such as between, however and group being included, noting that these words ‘seem to be only marginally academic’ (p. 150). In a recent study on the AVL, Durrant (2016) investigated the AVL coverage over the BAWE corpus and found that the average coverage of the AVL was 34% over each text in the BAWE. This finding suggests that the AVL is a very useful resource for researchers, teachers and learners and in EAP. Table 3.1 The top 20 lemmas in the AVL (Gardner & Davies, 2014, p. 317)

Most frequent 1–10

Most frequent 11–20

1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

study.n group.n system.n social.j provide.v however.r research.n level.n result.n include.v

11. 12. 13. 14. 15. 16. 17. 18. 19. 20.

important.j process.n use.n development.n data.n information.n effect.n change.n table.n policy.n

Flowerdew (2014, p. 27) comments on the core versus specificity debate by saying, ‘perhaps the conclusion that can best be drawn is that… different studies generate different results on account of the varied composition of the corpora and different software used.’ The short discussion earlier on core versus specific word lists certainly bears out Flowerdew’s observation.

Developing word lists for ESP using corpora In this section, the kinds of research-based decisions which have to be made when developing a word list using corpora are discussed, drawing on examples from Coxhead (2000) on the development of the AWL and several other word lists. As Chapter 2 shows, corpus-based studies create opportunities for largescale analyses of words and multiword units in a range of texts, including spoken, written and multi-modal texts. Corpus studies should allow for replication and current research tools can accommodate multi-millions of words. While the computer-based analyses can be quite quick, corpus studies still require principled decision making by humans (Byrd & Coxhead, 2010), and this process can be very time consuming. Some of the methodological considerations and decisions for corpus building in the development of word lists include the size and context of the corpus, the kinds of texts and representativeness, the balance of the corpus and length of texts and how many corpora are required. See also

Nation and Webb (2011) and Nation (2016) for an outline of general principles and guidelines on how to develop word lists. One decision on developing a word list is whether the corpus will be written or spoken or both written or spoken. There are often practical reasons for choosing written texts over spoken, and we can see that there are many more word lists in EAP and ESP which are based on written, not spoken, texts. Dang, Coxhead and Webb (in press) developed a spoken academic word list to address this issue. Spoken texts are considerably more difficult to obtain, because they involve more steps in gathering and checking the data. Compare, for example, the effort and funding required to record a corpus of specialised English in first-year university Law lectures compared to locating and downloading or buying online texts related to the topics of the first-year Law class. The written documents are far easier to acquire than the spoken documents. Furthermore, the size and scale of the written documents, such as whole textbooks as opposed to possibly quite short interactions on building sites, are likely to far outweigh the number of words in spoken texts. Ways to deal with this problem include ensuring all documents contain the same number of running words, using a programme which automatically cuts off texts at a specific number of running words, adopting a norming where researchers adjust the raw frequency of words in texts (the formula for adjustment depends on the basis for norming, for example, 1,000 words or 1,000,000 words) (see Biber, Conrad & Reppen, 1998), or employing ratios. An important concept concerning the distribution of vocabulary in texts is Zipf’s law (1935), which states that the frequency of words in a text multiplied by its rank in a frequency-based list results in a constant figure. This means that high frequency words which have a high rank in a word list occur frequently, and items with low frequency have a low rank, but if we multiply the rank of these words by their frequency, the rank by frequency number will be roughly the same. Approximately half of the different words in a text occur only once, which makes norming difficult. See Nation (2016, pp. 3–5) for more explanation and examples in course design for Zipf’s law. An example of such decision making that informs and affects the methodology of a corpus study can be seen in a study by Coxhead and Hirsh (2007). They wanted to find out whether there was a group of lexical items outside West’s GSL (1953) and Coxhead’s (2000) AWL which occurred with wide range and

reasonable frequency across university-level Science texts. One of the early steps was the development of the corpus for the study. Coxhead and Hirsh used the seven existing Science subject corpora from Coxhead’s (2000) written corpus of academic texts. They added seven more subjects to increase the size and scope of the corpus. The additions to the Science corpus were made to cover a wider range of subject areas at the first-year university level. When developing the AWL corpus, Coxhead (2000) did not collect Engineering and Medicine texts because those subjects were not taught at the first-year level at the university where she collected her corpus at that time. In 2007, Coxhead and Hirsh (2007) expanded the corpus to include those two subjects, and the five others, to represent the wider scope of first-year subjects offered as majors at Massey University, New Zealand (where Coxhead was then working) and the University of Sydney, Australia (where Hirsh is based). Studies at second or third year or postgraduate levels were not included because the purpose of the study was to develop a word list to support learners preparing to study at a first-year level at university in an English-medium university. See Table 3.2 for the list of subject areas and the number of running words in each subject in the corpus. Note that the sub-corpora are all roughly similar in size. An asterisk (*) denotes the seven new subject areas developed for the Science-specific word list. Table 3.2 Fourteen subject areas of the written Science corpus (Coxhead & Hirsh,2007, p. 70)

Subject area

Running words

Agricultural Science* Biology Chemistry Computer Science Ecology* Engineering and Technology* Geography Geology Horticultural Science* Mathematics

129,492 125,898 124,400 124,589 123,759 128,561 125,833 125,144 128,124 127,234

Mathematics Nursing and Midwifery* Physics Sport and Health Sciences* Veterinary and Animal Sciences* Total

127,234 124,218 123,136 124,488 126,504 1,761,380

Coxhead and Hirsh (2007) made a pragmatic decision to focus on developing a smaller, balanced, wide-ranging corpus, in favour of a larger corpus for their word list. The resulting corpus contains just over 1.75 million running words, which is relatively small. A danger with a small corpus is that lower frequency items might not have enough opportunity to occur, whereas high frequency items should occur frequently in any corpus. To develop a larger corpus would mean enlarging each of the subject areas overall. Any attempt to make specialised word lists from individual subject-specific corpora would require a much larger collection of texts in each subject area. The written academic texts gathered for this corpus included study guides, laboratory manuals and textbook chapters from core texts. Academic staff members were consulted on the representative nature of the texts – that is, whether the texts were used in courses and were part of the expected reading of first-year students in those subjects. In several cases, online textbooks were included in the corpus, and they present a new challenge to corpus development. Publishers of online textbooks can update book chapters without waiting for a new edition which means that textbooks online are potentially more dynamic, so a chapter downloaded for a corpus might not be locatable in its same form online even in the space of a few short months. The Science word list developed in this study by Coxhead and Hirsh (2007) contains 318 word families, which cover 3.71% of the Science corpus. The top ten items from this list are cell, species, acid, muscle, protein, molecule, nutrient, dense, laboratory and fluid. Once developed a word list can be used to explore the nature of specialised words in other contexts. Coxhead, Stevens and Tinkle (2010) used Coxhead and Hirsh Science list in a study of the vocabulary of secondary school Science texts and found that 264 word families from the Science list covered 5.9% of a series of four textbooks (279,733 running words). While the

secondary school corpus is very small, this coverage figure suggests that these lexical items are useful for secondary school learners. The second 1,000 word families of West’s (1953) GSL also covers around 6%, but with nearly three times the word families. Therefore, those 264 word families are useful for secondary school students who are studying Science using those textbooks. However, there are clearly also differences in the kinds of Science vocabulary in secondary school and at university level, as can be seen in the coverage figures in the university study by Coxhead and Hirsh (2007) and the secondary study by Coxhead et al. (2010). Coxhead and Quero (2015) found that the Coxhead and Hirsh Science list covered roughly 6% of two corpora of Medical textbooks (both containing five million words). Another example of a Science word list developed from a corpus is reported in Greene and Coxhead (2015) based on Middle School textbooks in the USA (see Chapter 5 for more on these lists). A larger study is underway by Coxhead with a much larger corpus of secondary school Science texts. The focus of this study is to look more closely at the vocabulary of secondary school Science from a common core approach and to compare and contrast these existing word lists.

Research principles for word list development Developing and applying consistent principles for selecting lexical items for word lists is important because principles guide researchers in making many decisions in word list development. One principle could be that the items have to meet a particular frequency requirement in the corpus overall and in specific subject areas within a sub-corpus. For example, if a researcher wanted to develop a word list of Biomedical Science for first-year students, then a corpus might contain several sub-corpora of subjects such as Cell Biology, Chemistry, Human Biology, Psychology and Statistics, depending on the requirements of the degree. The frequency of lexical items in the whole Biomedical Science corpus overall would give an indication of candidates for a word list. The frequency of those candidates as they occur in each subject measure helps narrow down the possible candidates for the word list. This second measure of frequency in different sub-

corpora relates to the range of occurrence of the words in a corpus. Range concerns the occurrence of lexical items across several corpora, subcorpora or texts. In the case of a corpus with sub-corpora, such as the example of the Biomedical Science corpus earlier, items which are shared across Cell Biology, Chemistry, Human Biology, Psychology and Statistics might be selected for a word list of Biomedical Science. On the other hand, items which occur only in one subject area, such as Cell Biology, might be selected for a word list for that particular subject only. In this case, the range principle could be applied across the texts in the Psychology sub-corpus. Hyland and Tse (2007) found differences in AWL word patterns and frequencies across subject areas in a corpus of university-level professional and student writers and argue that such differences add to the arguments against the common core approach. The AVL (Gardner & Davies, 2014) was based on four key principles for selecting lexical items from their large sub-corpus of academic English through a corpus-comparison approach. The selection principles were frequency, range, dispersion and discipline. The frequency principle stated that frequency of the candidates for the word list had to be 50% more in academic texts than nonacademic texts. The range principle involved two steps. The first was that items had to occur in at least seven of the nine disciplines in the academic corpus and the second was that they needed to occur with at least 20% of their expected frequency. While range ensures occurrences of lexical items across academic disciplines, the third principle of dispersion focuses on how evenly these items occur across the academic corpus. The final principle is to ensure that the range of disciplines needs to be taken into account to ensure that general academic vocabulary is selected, instead of technical vocabulary related to a discipline. It is important to understand the methodology and principled decisions of researchers in the development of word lists such as the AVL (for more on the word lists for general and specific academic purposes, go to Chapter 6).

Deciding on the unit of counting for developing word lists

The unit of counting is an important aspect of word list development. Units of counting can include individual word types, lemmas or word families. Types are individual words. For example, dog and dogs are both types. An advantage of counting types for ESP word lists is that individual types may be technical words, whereas items in a word family might include items that are technical vocabulary as well as items which are not technical in nature. For example, patient might be a technical word in Nursing, but patience is not. A lemma contains the inflected forms of a word, for example, like, likes, likes and liking. The unit of counting selected for a word list depends on the purpose of the word list. A word family is a much broader unit of counting than individual types and lemmas. It contains the inflections and derivative forms of a word. Bauer and Nation (1993) categorise affixation in English and provide a series of levels of affixation. For example, the AWL was developed up to Level Six of Bauer and Nation’s (1993) scale. Here is an example of a word family from the AWL: benefit, beneficial, beneficiary, beneficiaries, benefited, benefiting and benefits. Nation and Webb (2011) point out that the decision on the unit of counting depends on the purpose of the word list. If a word list is being used to measure the vocabulary load of written texts, then a word family is perhaps the most useful unit of counting. In each case of specialised word lists, for example in Carpentry or Aviation, then word types might be the better unit, because individual types might be classified as technical or specialised vocabulary, as the example of file mentioned earlier. Output from computer programmes such as RANGE (Heatley, Nation & Coxhead, 2002) reports on both type and family frequency and range, which means that types can be considered for selection in ESP word lists relatively easily. Nation (2016) has more definitions and examples of types, lemmas and word families, pointing out that lemmas are really one kind of word family.

Frequency and range requirements It is useful in corpus studies to maintain similar numbers of running words in different sub-corpora when there are several components, such as spoken and

written texts. Texts of different lengths yield different frequency and range results, and therefore affect studies into the vocabulary of ESP. This is because low frequency items have more opportunity to occur in longer texts than they do in shorter texts. Range means comparing across several specialised fields or types of texts within one specialised field. Dispersion is a measure which takes into account the evenness of the distribution of words across sections of a corpus which are equal in size (see Leech, Rayson & Wilson, 2001). For example, in the Coxhead and Hirsh (2007) Science Word List study, dispersion was used across all 14 subject areas. Dispersion can also be used across texts in a corpus. It is an important measure because it helps avoid possible bias for selecting items for a word list which might have occurred in only one text in a sub-corpus but with sufficient frequency to make it a candidate for selection. Biber, Reppen, Schnur and Ghanem (2016) point out problems when using large corpora and dispersion measures.

What kinds of specialised words might be found in a corpus and how might they be classified? In this section, I look at various kinds of vocabulary that might be found in specialised corpora and consider issues around their classification in relation to selection principles for word list development. An analysis of a written Carpentry corpus (Coxhead, Demecheleer & McLaughlin, 2016) brought to light a range of specialised vocabulary which is closely related to this trade. Table 3.3 contains examples of lexical items from the corpus which could be identified as specialised. The examples in the table illustrate how wide ranging a specialised vocabulary can be. These examples include everyday words (for example, floor and roof), abbreviations (H1 and H1.2) and proper nouns, such as Hardies. Note that jack is a possible proper noun in this list of examples, but in the Carpentry corpus, jack is not used as a proper noun. GIB in building is a type of plasterboard, but in other kinds of texts such as Medicine, this abbreviation may be an acronym which has

other meanings. Let’s look now at high frequency words, highly specialised or technical words, compound nouns, proper nouns and abbreviations and Latinate forms. Table 3.3 Some examples of potential specialised vocabulary from a written Carpentry corpus

Poor

roof

foundation

stud

dwangs

H1 Gang nail Studs

H1.2 truss Hardies

Bullnose Knurl flange

GIB mullion 2 × 4

jack dunnage flashing

High frequency items Any text will contain high frequency items (Nation, 2013) and these items make up the majority of running words in texts. Table 3.4 shows that 83.77% of the Carpentry texts occur in the first 3,000 word lists of Nation’s (2006) BNC lists/COCA frequency lists. This means that just under 84% of the running words in a corpus of Carpentry textbooks can be found in the first 3,000 word lists, meaning they are high frequency. The coverage percentages drop rapidly from the high frequency items towards the less frequent items. This drop is typical of frequency lists in general. Table 3.4 also shows the coverage of the corpus of a list of proper nouns and another list of compounds, also compiled by Nation. A small percentage includes marginal words, including an interesting range of swear words from student writers in Builders’ Diaries (Coxhead, 2015c), as well as acronyms. Because the BNC lists are based on a general corpus, it is highly unlikely that Carpentry terms will feature in the lists. Therefore, Table 3.4 shows that 2.84% of the words in this Carpentry corpus do not occur in any lists. Further work therefore needs to be done to analyse and categorise the items that are not found in any list. This identification and categorisation research should reveal more about the specialised vocabulary of Carpentry, and result in a word list for further research

in this field and for comparison across other trades. Nation (personal communication, 29 January 2017) analysed the Carpentry list and found that many of these words were parts of multiword units.

High frequency/everyday words with technical meanings Table 3.4 Initial analysis of coverage of Nation’s BNC/COCA frequency lists over a Carpentry corpus

BNC/COCA word lists

Coverage

Cumulative coverage

First 3,000 word families 4,000–8,000 word families 9,000–25,000 Proper nouns Compounds Marginal words Acronyms Not in any lists

83.77% 8.49% 1.75% 0.67% 1.43% .74% .31% 2.84%

83.77% 92.26% 94.01% 94.68% 96.11% 96.85% 97.16% 100%

High frequency words can take on specialised meanings in particular contexts. The occurrence of high frequency or everyday words in ESP is not surprising, since these words carry a heavy load in any text in English. A feature of these everyday words is their sheer frequency, which means there is already an expectation of a reasonably high level of occurrence. But in specialised texts, these words may well also have a specialised meaning, and the closer they are to the topic of the text, the more frequently they will be used. Fraser (2009) refers to everyday words being used with a technical meaning as cryptotechnical because they have specialised meanings that might not be immediately obvious. High frequency vocabulary is likely to represent a large percentage of words in a text, which means that high frequency words with technical meanings are likely to occur very frequently also. For example, Sutarsyah, Nation and

Kennedy’s (1994) study found 34 words (including cost supply, and average occurred on average once every ten words in a university-level Economics textbook). One word in ten, in practical terms, is one occurrence per line of text. Their analysis showed that 20 of these 34 words were clearly essential to Economics. Contrast these everyday words in Economics with items such as sternum, costal, vertebrae in Anatomy from Chung and Nation (2003), which might stand out more in an ESP text because they look like medical words. Learners might have some expectations that highly specialised vocabulary looks different from everyday words, perhaps by being Graeco-Latin in origin or marked in terms of the narrowness of use. Specialised vocabulary, as Sutarsyah et al. showed in 1994, can include everyday words. Another important reason for knowing more about the nature of everyday words and specialised meaning is that once an everyday meaning of a specialised word is known, it can be difficult to then apply that word to a new context. Teachers are aware of this problem. In an interview on specialised vocabulary, a secondary school teacher in Aotearoa/New Zealand commented that ‘in Social Studies and History, students think they know words because they know the everyday meaning but they don’t know the specialised meaning [of the word]’ (Coxhead, unpublished data). Another secondary school teacher of Science in Aotearoa/NZ identified the other side of that problem when she said, Teaching biology is like teaching a language subject. For every known word students are familiar with, there is a Biology word. For example, dissolve is not scientific English. It [Science] refers to solubility and insolubility. It relates to solute and solvent. Students have to be able to explain this meaning in a scientific context. If they use a scientific word in general terms, it will not be used in the correct way in normal language. It is not only non-native speakers of English who might have this problem, as native speakers also face learning new meanings for words which are already known in an everyday sense. Computer Science as a field of expertise is especially good at using everyday terms for specialised purposes (Radford, 2013). Words such as print, save, file, send and share all refer to common activities in word processing applications. At surface level, these words appear to mirror the everyday actions which preceded

them involving pen and paper, such as file, save, open, close and folder but in Computer Science they have much more specific meanings. There are everyday words in Computer Science which are much less obvious, such as host (a computer) or string (a data type). Another example of a technical term in Computer Science is a software release train, which is a way to release versions of software for different projects using a planned schedule (Peter Coxhead, personal communication). A software release train is a form of software release schedule in which a number of distinct series of versioned software releases for multiple products are released as a number of different ‘trains’ on a regular, preplanned schedule. Users might work on several releases at a time while developers test, trial, release and polish aspects of software. This example also illustrates how multiword units can combine to form a new meaning. It is not just in Computer Science that everyday vocabulary can be used with specialised meanings. Accurate and precise in Physics carry specific meanings which are not the same in everyday English (see Coxhead, 2012a). However, computer programmes such as RANGE (Heatley et al., 2002) tend not to differentiate between homographs. This means that items such as patient as a noun or adjective in Nursing or Medical studies are not flagged as being technical in nature. In such cases, researchers can use concordance programmes such as AntConc (go to www.laurenceanthony.net/software.html) to help decide how words such as patient are being used and in what patterns. Tom Cobb’s Compleat Lexical Tutor website (www.lextutor.ca/) allows users to create concordances of target items, either using corpora provided on the website or a corpus provided by the user. The context of any key word can be shown in one line or in an extended context, as shown in the comparison of the target word stress in Electrical Engineering and in spoken English in Figure 3.1.

Figure 3.1 Examples of the target word stress in an Electrical Engineering corpus and the BNC spoken corpus using Lex Tutor (Cobb, n.d.)

This process is an example of how quantitative corpus research can combine with qualitative methods. Developments in corpus linguistics research suggest that problems with homonyms (such as bank as in river bank and money bank to use examples from Cobb, 2013) can be addressed through corpus tools, for example by identifying, counting, and highlighting collocation patterns of the target words in context and allowing researchers to examine these patterns in concordances.

Abbreviations as specialised vocabulary Abbreviations are possibly less problematic than high frequency lexis in specialised fields because abbreviations are often explained in texts. For example, it is common practice for the full form to be written in texts and then the abbreviations included in brackets, as in English for Specific Purposes (ESP). There are acronyms which are shared across fields with very different meanings. A good example is EAP as in English for Academic Purposes or Employee Assistance Programme. Quero’s (2015) study of technical vocabulary in Medical texts found a number of items which, on the surface, might appear to be more general purpose than specific, including words such as ten and fish. These words were written in

capital letters (as in TEN and FISH) in the Medical texts in Quero’s corpus, which suggested that these ‘words’ are acronyms. TEN stands for ‘toxic epidermal necrolysis’ and FISH stands for ‘fluorescence in situ hybridisation’. It is clear in these instances that mistaking the number ten (10) for TEN while reading these Medical texts would be at the very least embarrassing but potentially problematic. That said, it is also clear that there is a great deal of knowledge about the medical field packed into these acronyms. It is important to note that corpus analysis (see the following), at the time of writing, still requires researcher time to check instances of words in context to check when everyday words are being used in a text with a technical meaning. Acronyms occur in all kinds of specialised fields, including trades, secondary and university studies, and professional areas of endeavour.

Proper nouns as technical vocabulary Proper nouns are another example of everyday words which might have specialised meanings in particular fields. Proper nouns include names of places as in Wellington (the capital city of New Zealand), people (Peter Jackson, the film director), and other items such as companies (Toyota, Fonterra). Paul Nation’s proper noun list which is available as part of the RANGE BNC program (see Paul Nation’s website: www.victoria.ac.nz/lals/about/staff/paul-nation), is very useful for analysing the amount of proper nouns in a text or larger corpus. Proper nouns, such as Alzheimer as in Alzheimer’s Disease in Medicine and items which are part of a word string or phrase such as Maslow as in Maslow’s hierarchy of needs in Psychology, are also examples of everyday words which can also be identified as vocabulary for ESP. Proper nouns appear to have higher frequency in some disciplines, for example Medicine and History, than Mathematics (Greene & Coxhead, 2015). Proper nouns also occur in the trades, such as Aqualine, Braceline, Branz, Ecoply, Ezybrace, Flexibrace, Fyreline and Gantt in Carpentry (Coxhead et al., 2016). One debate about proper nouns is whether to include them in the first 1,000 words of English or to separate them from frequency analyses and a key caution is that proper nouns are not problem free

for language learners (Brown, 2010; Nation & Kobeleva, 2016). The number of proper nouns in texts can vary according to the subject area and specialisation of a text or corpus. A figure of around 2% seems to be regular in most general corpora. In the field of Education, the coverage figures of proper nouns in texts varies quite a lot. For example, Coxhead’s (2012c) study of secondary school English Literature texts found that proper nouns accounted for 1.92% of the words in the corpus of just over 88,000 running words. This figure is very close to Nation’s (2006) finding of just over 2% of the novel Lady Chatterley’s Lover. With specialised texts, however, the amount of proper nouns can be considerably higher. An initial analysis of a corpus of postgraduate Applied Linguistics research articles, textbook chapters, and other academic readings has proper nouns accounting for up to 4% (Coxhead, unpublished data). This figure is not surprising because the reference lists alone in such a corpus contain many names of authors and publishers. Proper nouns also account for over 4% of a corpus of Middle School textbooks in Social Sciences (see Greene & Coxhead, 2015), which is again not surprising considering the importance of place names and people in History and Geography. In the New Zealand context, we would expect to see proper nouns that reflect the people and the places in a History and Geography corpus that contains texts written about the country (for example, Māori, Wellington, Sir Edmund Hilary, the Treaty of Waitangi/Te Tiriti o Waitangi). However, we would also expect to see proper nouns that relate to people and places in other countries depending on the topics of the texts. Medicine is another field where proper nouns can play a large role in a text. Two examples of proper nouns in Medical corpora from Quero (2015) illustrate difficulties they can present in corpus studies. These examples are StevensJohnson syndrome and Parkinson’s (disease). Stevens-Johnson and Parkinson could appear in a Medical text as an in-text reference for published papers, as a reference to the researchers in general, or as a reference to the syndrome or condition. An analysis of a 110,000 word corpus of Fabrication (welding) for polytechnic students found just under 1% of the text were proper nouns (Coxhead, unpublished data). Examples of high frequency proper nouns in a Plumbing corpus include Buchan, Eco, Kelvin, Legionella, Newton, Pascal, Pex, Swarf and Zincalume (Coxhead & Demecheleer, under review). More research on proper nouns in specialised subjects and professions is required, initially on

identification of these items and how they are used in texts. Table 3.5 The most frequent proper nouns in TED Talks six-by-six corpus (Coxhead & Walls, 2012)

The range of proper nouns in a corpus is demonstrated in Table 3.5, which contains the nine most frequent proper nouns in a 43,000-word corpus of TED Talks (Coxhead & Walls, 2012). The corpus included six topic areas: Technology, Science, Global Issues, Entertainment, Design and Business. The purpose of the study was to investigate various word lists and their coverage over the corpus, to evaluate the vocabulary load of listening to TED Talks for EAP learners. Coxhead and Walls found that 629 proper nouns covered 1.44% of the total tokens in this corpus. Note that in the table, TED is the most frequent proper noun, which is unsurprising given the nature of the corpus. The range figure in the second column indicates the occurrences of the proper nouns in the number of files out of the six in the corpus. The total number of occurrences in each file appears in the final six columns. TED occurs in five out of six of the files and that the occurrences are not balanced in each file (compare nine and ten occurrences in Entertainment and Business with the lower numbers in the other areas). The frequency of these proper nouns drops considerably from the most frequent to the least frequent. In sum, the key points here about everyday words, abbreviations and proper nouns are that they can occur in different fields with specialised meanings, high frequency items with specialised meanings could possibly account for a fairly large amount of a text, and these specialised meanings may not necessarily be

known by learners or even particularly easy to learn. Whether proper nouns and acronyms are considered to be part of a high frequency vocabulary or not is an important decision for word list developers (see Chapter 3). The examples of everyday words, proper nouns and acronyms have all been gleaned through corpus analyses, which is the focus of the next section.

Highly specialised words As well as high frequency vocabulary, specialised corpora will contain highly specialised words. As mentioned earlier, researchers who are not experts in the technical field under investigation might not be able to identify and classify some highly specialised vocabulary because it may appear to be everyday vocabulary. For example, for a small pilot study of ten texts used in Carpentry studies in a polytechnic in New Zealand, a highly experienced tutor was asked to identify the technical or specialised vocabulary in a list of items (listed in Table 3.6) that relate closely to Carpentry (Coxhead et al, 2016). These items did not occur in the 25,000 BNC/COCA lists from Nation (2006, 2013) and were therefore candidates for a specialised Carpentry word list. The tutor decided to highlight items which he regularly used in class as being the most important items in this list. The first 17 items he selected, their range across the ten texts in the corpus, and their frequency can be seen in Table 3.6. Table 3.6 Commonly used specialised items selected by a Carpentry tutor

Type

Range

Frequency

claddings radiata weatherboards subfloor H3 flashings dwangs

10 10 9 8 8 7 7

374 56 100 91 71 157 49

dwangs PPE on-site lineal substructure r-value NZS3604 onsite scribing straightness hands-on

7 7 7 5 5 5 5 5 5 5 5

49 47 19 47 45 18 13 13 11 6 5

Dwangs are also known as nogging, noggs or blocking in New Zealand and other areas of the world. Parkinson and Mackay (2016, p. 41) explain the meaning of dwangs or noggs and illustrate the importance of specialised vocabulary in the trades, using an extract from an interview with a Carpentry tutor, Colin, as part of the LATTE project (see Chapter 8 for more on this project). Colin says, They learn the terminology in Construction… like ‘fix’ means attaching something. The term ‘fixings’ is the bolts screws etc. […] people use a brand name – ‘stanley knife’ instead of ‘craft knife’. You’ve got to be in the industry to know what it is. […] plumbers and carpenters might use different names for the same tool: pliers or nippers. Again, when I went through my apprenticeship, I was taught that you use the term ‘noggs’, the small bit of timber in between your studs, […] the equivalent to it is ‘dwang’… ‘dwang’ sounds too much like an Australian way of speaking so we use ‘noggs’. (Parkinson and Mackay, 2016, p. 41)

It is interesting to compare two more examples from the Carpentry corpus: H1 and H1.2. H1 refers to a building code. It was not selected by the tutor for inclusion in his list of commonly used terms in class. H1.2 refers to a kind of timber. It is often used in class and on the building site and was included in the list by the tutor. Another approach to identifying specialised vocabulary with an expert might be to ask the tutor to rank the items according to how closely the

items relate to Carpentry, and then check with several other tutors to see how they rank the same list of words. Researchers need expert support to help identify and classify technical terms from corpora in the initial stages of word list development so that the word list they produce can be more useful to other researchers who then draw on these lists for their own research. See Chapter 8 for more on research into specialised vocabulary in the trades.

Compounds Nation (2016) notes that an important part of any corpus-based study is to make sure that the corpus is as clean as possible. A common problem in this regard is the irregularity of hyphenated items in a corpus and its impact on compound forms. Examples of hyphenated forms can be seen in Table 3.4, such as onsite (and on-site) and weatherboards. If these items were split into their constituent parts, on and site and weather and boards, they would be counted in very different ways by the RANGE programme. On, site and boards are in the first 1,000 word families of Nation’s (BNC) (see Nation, 2006) and weather is in the second 1,000 word families of the BNC lists. However, by keeping the compound nouns together, the meaning of the items is clear. Nation (2016) calls these forms ‘transparent compounds’. Nation’s (2006) BNC lists include a list of transparent compound nouns. The compound nouns in the Carpentry texts could be included in that existing list, and the RANGE Programme would identify all the words in the compound noun list which appear in the Carpentry corpus. Alternatively, another word list of compound nouns for Carpentry could be developed and kept separately for research purposes. A common frustration in working with texts is the slow and laborious process of finding and deciding on what to do with hyphenated forms and compounds. Nation (2016) has a chapter on hyphenated lexical words and transparent compounds, as well as suggestions on what to do with them based on Nation’s work with his BNC corpus and word list development.

Abbreviations, acronyms and Latinate forms Abbreviations and Latinate forms can also occur in academic and professional corpora, and require principled decision making on whether they are included in word lists or are excluded. Keeping track of common acronyms, abbreviations and Latinate forms is important in word list research, so that the number of items appearing in the ‘not found in any list’ is reduced. It also tells us more about the kinds of specialised items that appear in corpora and how they might be similar or different across a range of subjects and professions. Table 3.7 contains a list of 24 acronyms, abbreviations and Latinate forms found in a corpus of academic written texts compiled for the AWL study by Coxhead (2000). Note that et and al are separated in the table but commonly occur together in academic texts. The items in Table 3.7 reflect academic writing conventions (for example, et al. and ibid), as well as abbreviations that reflect the content of the corpus such as Goods and Services Tax and Human Resources. Items such as CM, DR and BR need to be checked for their range and frequency in the texts, because they may stand for different things in different subject areas. Table 3.7 Acronyms, abbreviations and Latinate forms from the written academic corpus used for the AWL (Coxhead, 2000)

AB

AC

AL

ALPHA BR CO DR GST IE KM

BC CA DER ET HR INC LE

BETA CM DNA ETC IBID INT LTD

Validating word lists for ESP

Validation of word lists is important because any word list is influenced by the corpus it was made from. Validation can be done by using a second corpus, preferably a mirror of the first corpus, and running the word list over that second corpus to check coverage and look for any differences. For example, Coxhead (2000) gathered two corpora for her AWL study. The first was the 3.5 million running word corpus of written academic texts which she used to develop the word list. The coverage of the AWL over that corpus was 10% overall. The second was a smaller corpus of academic written texts which was used to check the coverage of the AWL to see how the list performed over another corpus of similar texts. The coverage of the AWL over that corpus was 8.5%. The differences in coverage were attributed to the different sizes in the corpora and the predominance of Science texts in the second corpus. The coverage of the AWL over Science texts tends to be around 9%, depending on the level of technicality of the texts. Coxhead also developed a fiction corpus to check whether the AWL was more academic than general in nature. The coverage of the AWL over that corpus was 1.4%, which suggests that the AWL is more academic than general in nature (Miller & Biber, 2015).

Using word lists for research into vocabulary in ESP In this section, I explore how research that has resulted in word lists is used in other vocabulary research to find out more about the nature of vocabulary in ESP.

Analysing and comparing the vocabulary load of texts Word lists have also been used to research the vocabulary coverage of texts and find out more about the nature of the lexis in written and spoken texts. Coxhead et al. (2010) investigated the extent to which existing word lists (AWL and the Science List from Coxhead & Hirsh, 2007) could be a shortcut for learners reading secondary school texts. Table 3.8 shows the running totals of coverage over four

secondary school Science textbooks in a series, from the first year of secondary school (Year 9) through to Year 12. Coxhead and Hirsh’s (2007) Science word list covered 3.18% of their university-level corpus, whereas the list covered 5.8% of the secondary school Science textbooks on average. This higher coverage by the list suggests that the secondary school texts contain more high frequency words than do university-level texts. Similarly, the AWL coverage over the secondary school Science texts (5.8%, on average) is lower than the coverage over Coxhead’s (2000) university-level Science sub-corpus from her written academic corpus (9%). This finding suggests that university-level texts use more AWL words than secondary school texts (see Chapter 5 for more on specialised vocabulary in secondary school contexts). Table 3.8 Coverage of the GSL/AWL/Science-specific lists over the four secondary Science textbooks (adapted from Coxhead et al., 2010, p. 46)

The coverage of the second 1,000 words from West’s GSL (1953) is worth mentioning because this list covers 2% more over these Science textbooks than over other kinds of texts. This higher coverage can be explained by the high number of occurrences in the text of items such as ray, electric, reflect, angle and temperature (Coxhead et al., 2010). Note the increase in proper nouns and words not found in any list from the Year 9 text through to the Year 12 text. Coxhead et al. (2010) concluded that these existing word lists do not go far enough to cover

the vocabulary needed by second language learners working with these textbooks in schools. A recent study of Computer Science journal articles by Radford (2013) used Nation’s BNC lists to investigate the vocabulary load of those texts over the decades from the 1950s to the 2000s. Radford also gathered first-year university teaching materials for a second corpus and textbooks for a third. His findings make sobering reading for teachers and researchers of Computer Sciences. Firstly, the vocabulary load of journal articles was far higher than the textbooks and teacher-generated material (for more on these results, see the following). The teacher materials had the lowest vocabulary demands, which suggests that the teachers had mediated the vocabulary load of the reading they were giving to the students. These results fit with other studies which look at the vocabulary load of university-level texts, such as Nation (2006). The next most difficult texts from a vocabulary perspective were the Computer Science textbooks. Finally, the journal articles had a much larger vocabulary load. Radford’s (2013) results showed the cumulative coverage of the corpus dropped from 79% (approx.) coverage at 30,000 + proper nouns in the 1950s journal articles section of his Computer Science corpus to 65% (approx.) at 30,000 + proper nouns in the 2000s section of this corpus. This study rather starkly demonstrates the kind of specialised vocabulary that could be needed to read Computer Science journal articles and suggests that more such work needs to be done in vocabulary research. A closely related issue in research on the size of a technical vocabulary is how that vocabulary changes over time. Radford’s (2013) study suggests that vocabulary in Computer Science has changed and grown rapidly in just half a century. More such studies could be carried out to track changes in specialised vocabulary over time and inform our understanding of the kinds and rates of change in a variety of areas.

Missing elements in word list research Nation’s (2016) book on word lists is a comprehensive analysis of research into the development and evaluation of word lists. Nation, Coxhead, Chung and

Quero (2016) recommend that for research into specialised word lists using corpus comparison, the quantitative results require careful checking, that the size of any comparison corpus needs to be substantial for target items to occur, and that it should be carefully checked to ensure that no specialist texts are in the comparison corpus. As seen earlier, the majority of research in this chapter has drawn on computer-based, quantitative analyses of corpora. In terms of such analyses, there is little research that compares and contrasts lexical items in different corpora to investigate how differences in corpora reflect differences in word lists (see Miller & Biber, 2015). That is, we need to see more work that investigates the effect of various corpora and how they were made on a range of word lists. Table 3.7 shows a range of studies using the AWL on different corpora, and this is an example of the kind of work that needs to be done, particularly around the validation, evaluation and replication of word lists in many research studies on vocabulary in ESP. Another missing element in terms of qualitative research is in-depth analysis of words from lists in context in corpora in professional and academic fields (Byrd & Coxhead, 2010). This includes looking at the contexts to see what affordances for learning vocabulary, such as definitions and examples, might be there to support learning. Shell nouns (see Schmid, 2000) are an example of the kinds of lexical items which require closer analysis of texts. A shell noun is an abstract noun which is used to express or refer to complex ideas. Figure 3.2 shows an example, adapted from Coxhead and Byrd (2012, p. 13), where there is a chain from sentence one through to sentence three. Note how ‘the results’ in the first sentence lead the reader to ‘an alternative mechanism’ which is defined in the second sentence. That definition is then wrapped up in sentence three using the words ‘this concept’, which means that the writer avoids repeating the definition from the second sentence. A frequency count alone would not bring such patterns to light. Therefore, research needs to ensure that corpora are looked at more closely and word lists are used to guide decisions on which words to look at first because of their frequency and range.

Figure 3.2 An example of a shell noun mechanism (adapted from Coxhead and Byrd, 2012, p. 13)

Word list research needs to build links with research into other elements of a learning environment to measure the affordances or effects on vocabulary learning. An example of such research is a study of teacher talk in Mathematics, Science and English as an Additional Language (EAL) classes recorded over a week in each subject at the start, middle and end of an academic year at an international school in Berlin. Coxhead (under review) finds that one feature of the teacher talk in these recordings is that they contain a very high percentage of high frequency vocabulary. This project also focuses on how subject-specific vocabulary is explained or taught in class, and in textbooks and materials, with an aim to developing pedagogically oriented word lists for this specialised context. Analysis is ongoing in this project. The importance of research which focuses on the effect of teachers and learners as mediators in their own approaches to word lists is underscored in an interview from a study of secondary school teachers approaches to specialised vocabulary (Coxhead, 2011a). One teacher explained her approach to specialised vocabulary in this way, I try to get students to be reflective about their learning which includes noticing the words they are unfamiliar or less familiar with and then making judgments about their usefulness. This is often a student by student discussion and can be very quick but if more than one student has raised the same interest in a word I attempt to draw the whole classes’ attention to it, often leaving it on the board if no general consensus can be made about its usefulness. Creating an environment where vocabulary is discussed is my main aim rather than ‘learning’ a preconceived list of words. (Coxhead, 2012a)

It is important that the voices and perspectives of the people who might use word

lists are heard in word list research.

Conclusion This chapter has looked at word lists in ESP through looking at research on developing lists and considering the kinds of lexical items that might be available in a specialised corpus for selection for word lists, and at principles for selection of items for lists. The chapter also looked at research based on word lists, including vocabulary size, language change and assessment. Whereas this chapter has concentrated on word lists based on single words, Chapter 4 looks at multiword units and some examples of word lists using more than single words.

Chapter 4 Multiword units and metaphor in ESP

Introduction This chapter focuses on multiword units and metaphor in vocabulary research in ESP. According to Nation (2016, p. 71), ‘Multiword units are phrases that are made up of words that frequently occur together’. Examples of multiword units include collocations made up of two words (e.g. heavy work, heavy heart) through to lexical bundles of three or more words (e.g. on the other hand; in the case of). The chapter begins with a short discussion of why research in this area is important. The next section looks into various types of multiword units, including two-word collocations, lexical bundles and academic formulas. Examples in this section include corpus-based research into general and subjectspecific EAP, based on professional academic writing, textbooks and learner corpora. The section on metaphor draws predominantly on research into EAP and ESP in Engineering, Health Communication and Computer Science. The chapter ends with limitations for research into multiword units and metaphor in ESP.

Why research multiword units and metaphor in ESP? There are many types of formulaic language (Schmitt, 2010) and many names for it (Wray, 2000). Multiword units might include common collocations (such as significant finding and data analysis) as well as bundles of three or four words (for example, as a result of /on the basis of). The relationship between these

words is not a matter of chance, and much research has focused on the statistical relationships between words. Granger and Paquot (2008) explain how continuous sequences of words (which do not have free slots between words, for example, at the end of the) are called clusters, bundles or n-grams, and those with slots are called collocations and frames. Discontinuous sequences require statistical analysis that takes co-occurrence into account. Multiword units can make up a fairly large proportion of texts (Nattinger & DeCarrico, 1992), but estimates of the amount of formulaic language in English vary. Some accounts suggest high proportions of formulaic language; Altenberg (1998) suggests around 80% of the London-Lund Corpus of Spoken English ‘form part of a recurrent word-combination in one way or another.’ Erman and Warren (2000) found between 52% and 58% of English texts were formulaic. Gardner and Davies (2007) found high proportions of phrasal verbs in the BNC. Such estimates can vary depending on how the units are counted. A particularly important point is that multiword units in spoken and written English can be quite different (Carter & McCarthy, 2006; Scott & Tribble, 2006). That said, it is common to both speaking and writing that high frequency words are part of multiword units, as can be seen in these examples of lexical bundles from Biber, Johansson, Leech, Conrad and Finegan (1999): on the other hand, in the case of the, and as a result of. Research suggests that multiword units benefit language learners in several ways. Wood and Appel (2014, p. 9) state that multiword units play ‘an essential role in creating meaning and structure in academic discourse’. Biber, Conrad and Cortes (2004, p. 371) refer to lexical bundles as ‘basic building blocks of discourse’. Wray (2002) points out that formulaic sequences allow for speakers to process and interact, and express their identity with a group. Schmitt (2010) reports benefits while reading for both native and non-native speakers. Finally, just like technical single lexical items in disciplines, specialised multiword units relate closely to subject knowledge. For example, Pinna’s (2007) study of dentistry examines words on their own as well as clusters of two and three words together. Pinna finds key relationships between words such as ‘bone’ to other words such as ‘graft’ and ‘cortical’. While awareness of multiword units in teaching and learning for specific purposes has been rising, as Gardner and Davies (2007) and Coxhead (2008) state, it has not been very clear which multiword units to focus

on and in what way. An important element of research into multiword units is to consider the difficulties multiword units might present for language learners. Coxhead (2008), in a study of vocabulary use in academic writing, reported learner beliefs as a factor which affects whether learners will actively attempt to learn and use multiword units in academic writing. For example, one learner in Coxhead’s (2008) study chose to learn verbs in academic English so that she did not have to learn more than one word at a time. This research can help shed light on the choices learners make. Byrd and Coxhead (2010) point out that teachers (and by extension learners) would find it difficult to access information about the context of multiword unit use in research which is based on privately held corpus data. Websites such as Mark Davies’ searchable interface of the academic section of the COCA Academic Word and Phrase means teachers and learners can now go online to explore the context and use of academic multiword units themselves (go to www.wordandphrase.info/academic/). The gathering of large corpora and analysis of vocabulary patterns in different academic subjects has begun to shed light on the frequency, roles, and use of such multiword units in ESP. Frequency analyses are commonplace in multiword unit research (Biber, 2006), carried out through corpus analysis (Schmitt, 2010). Two basic approaches have been identified in the identification of multiword units in texts: the frequency-based approach which is guided by large-scale corpus studies and a semantic/grammatical approach to phraseology (Ebeling & Hasselgård, 2015). Quantitative and qualitative research can combine to provide a fuller picture of frequency and meaning of multiword units, in studies such as Gardner and Davies (2007), for example. These researchers identified the most frequent phrasal verbs (lexical verbs plus adverbial particle, for example go out, take up and carry on) in the BNC and then documented the meanings of the most frequent 100 phrasal verbs they found. The phrasal verbs with the most meanings (or word senses) according to the Gardner and Davies (2007, p. 352) study were take in (17 senses), pick up (16 senses) and set up (15 senses). Other research into multiwords in general English include Martinez and Schmitt’s (2012) list of 505 phrasal expressions such as work on and think about (also based on the BNC). Martinez and Schmitt (2012) organised these expressions using frequency of occurrence and a learner-oriented test of items from this list is available on

Cobb’s (n.d.) Lex Tutor website www.lextutor.ca/tests/levels/recognition/phrasal/). Shin and Nation (2010) also used the BNC, but focused on the spoken section of the corpus to develop their high frequency multiword units list. Shin and Nation (2010) note that many of the items in their list would fit within the first 2,000 words of English, reinforcing the idea that high frequency words occur in high frequency multiword units. Corpus analysis allows researchers to not only identify frequent multiword units, but also categorise them and investigate their functions. Biber (2006, p. 134) defines lexical bundles as ‘simply the most frequently occurring sequences of words, such as do you want to and I don’t know what’. Biber et al. (2004) identified three functions of lexical bundles in corpora of university classroom teaching and university textbooks. These three functions are stance (for example, it’s important to; well I don’t know), discourse organisers (for example, on the other hand) and referential expressions (for example, at the end of). Many studies have since used this analysis of functions in research on lexical bundles (see the following). Biber et al. (2004) list bundles in each of these three categories, and see also Biber (2006) for more on these three functions of bundles. An analysis of play and role by Cheng, Greaves, Sinclair and Warren (2009) illustrates how lexical items that co-occurred with these two items (as in play an important role, fundamental role to play/significant role to play) add to the meaning of ‘participate and/or contribute in a weighty/meaningful manner’ (see also Cheng, 2012). Investigations into two-word collocations in academic texts, lexical bundles (strings of recurrent multiword units, typically between four to six words long), slots or frames, and academic formulas are more common in the literature now than ever before. Studies in ESP and EAP of each of these types of multiword units are looked at more closely in the next section.

Collocations in general academic written texts Identifying multiword units in context is an important step in finding out more about their frequency and specialised nature. Several studies on collocations in

academic and discipline-specific contexts have appeared in the literature in recent years (for instance, Carter & McCarthy, 2006; Ackermann & Chen, 2013; Durrant, 2009; Liu, 2012). The focus of this research has primarily been on EAP. Largescale corpora are a feature of this research, as can be seen in Ackermann and Chen’s Academic Collocation List (ACL). The ACL was based on the Pearson International Corpus of Academic English, which contains more than 25 million running words of journal articles and textbooks from 28 disciplines. The corpus was divided into four disciplines: Applied Sciences and professions, Humanities, Social Sciences and Natural/Formal sciences (Ackermann & Chen, 2013, p. 237). Initial analysis of the corpus was carried out by computer to identify collocations, followed by a refinement process involving quantitative and qualitative analysis, a review by experts and organisation of the collocations which remained to form the ACL, which contains 2,468 items. It is important that this large list of multiword units from Ackermann and Chen was categorised the collocation list using grammatical patterns (2013, p. 241). The largest group within this list is combinations of nouns, such as anecdotal evidence and target audience at 74.3%. The next biggest group is combinations of verb + noun/adj, such as undertake research and seem plausible at 13.8%. Verb + adv (e.g. explicitly state) and adv + adj (e.g. highly controversial) make up the remaining 6.9% and 5.0% of the list. The research used a common core approach rather than a discipline-specific approach, which is why the most frequent adj + noun and noun + noun combinations include academic writing, brief overview and causal link. Another large-scale study of academic collocations is Durrant’s (2009) corpusbased analysis of five academic disciplines: Arts and Humanities, Engineering, Medicine and Health Sciences, Science, Social Sciences, Law and Education. In the 25-million-word corpus, Durrant initially identified 1,000 two-word collocations in his corpus. He compared the frequency of these 1,000 items across all five academic areas of his corpus and found that these collocations occurred between 30,000 and 35,000 times per million words in four corpora, but around 17,000 times in the Arts and Humanities corpus. Some of the principles employed by Durrant (2009) in selecting the collocations from the corpus included focusing on word forms rather than lemmas or word families, limiting his analysis to collocations occurring in four-word spans, a keyword analysis comparing the

collocations in the academic corpus with a non-academic corpus (in this case, 85 million words of the BNC) and a frequency criteria. It is also useful to look at the items which Durrant (2009) did not select, including two-word collocations containing proper nouns, abbreviations, acronyms, Latin terms or numbers. He also did not select items with higher frequencies in more marginal parts of the academic texts, such as references lists. This kind of information is vital for understanding how studies were carried out and might be replicated. Of the 1,000 collocations, Durrant (2009, p. 163) notes that most are grammatical (see examples in Table 4.1) and he highlights patterns such as verb + that (for example, confirm that, hypothesise that) as useful for EAP learners. The top-20 items from Durrant’s study are in Table 4.1. They are arranged by their mean frequency per million words. Table 4.1 clearly illustrates the grammatical patterns of these common academic collocations. The patterns also show that this study was based on written corpora, given the appearance of items such as as shown, and that the corpus is academic in nature, shown in examples such as these results, present study and our study. Table 4.1 also shows that high frequency words play a major role in the academic written texts. Durrant (2009, p. 165) notes, The identification of such patterns remains methodologically problematic, though programs such as Concgram (Cheng et al., 2009) seem to offer an interesting way forward here. It should be borne in mind, however, that as collocations become longer their frequency will in general decrease, and their range of applications is likely to narrow (that is, they become more situationally specific). The existence of a useful cross-disciplinary set of two-word items is therefore a necessary, but not a sufficient, condition for the existence of a similarly useful cross-disciplinary set of longer collocations. Methodological difficulties of analysing collocations in academic texts arose in a study by Coxhead and Byrd (2012). A primary problem was theoretical and involved unpacking how collocations are to be defined in the field. In this case, Coxhead and Byrd used the statistical measure of log likelihood in their methodology, following McEnery, Xiao and Tono (2006), but Byrd and Coxhead (2010) reported raw data where possible so that further research might be carried out using their data. Coxhead and Byrd (2012) analysed common collocations of

Coxhead’s AWL (2000) in a corpus of 3.5 million running words, used for the AWL study. Using examples from the AWL, Coxhead and Byrd provide examples of a narrow analysis of noun collocations for create, suggesting concrete and abstract Table 4.1 Top 20 key academic collocations and their mean frequencies from Durrant (2009, p. 166)

Academic two-word collocation

Mean frequency per million words

between and number of based on due to associated with this study and respectively related to was used this paper compared to note that as shown consistent with to determine respect to these results was performed present study our study

935.56 634.6 404.64 374.12 315.52 296.96 249.68 190.72 184.64 163.68 152.04 134.56 126.16 121.88 121.08 107.16 91.88 84.8 76.28 68.32

Table 4.2 Collocations to the left and right of analysis (Coxhead & Byrd, 2012, p. 1 1)

Left collocations: analysis and ~

Right collocations: ∼ and analysis

description data collection representation

assessment evaluation interpretation management results

categories. Concrete nouns collocating with create include document, environment, database, record and field, and mostly come from Computer Science (2012) while abstract collocates include impression, difficulties, reasons, problems and rights. An analysis of analysis, as it were, showed that the collocations before and after analysis can differ (see Table 4.2), but some collocations operate both before and after the target word. For example, method can occur before analysis as well as after analysis, for example, methods of analysis and analysis of methods. Coxhead and Byrd (2012) noted that some AWL words tend to co-occur, such as analysis with assessment, data, evaluation and interpretation.

Collocations in discipline-specific texts Gledhill’s (2000) examination of a corpus of Pharmaceutical Sciences focused on collocations of high frequency items (for example, been, have, can and to) in the introduction sections of 150 research articles (RAs). This analysis draws attention to the functions of these high frequency words in these highly specialised texts. For example, of has several functions: In RA Introductions, of serves to qualify empirical process nouns (e.g. characterization of… measurement of…) and to form fixed biochemical or clinical terminology. While of is salient in Titles and Abstracts, fixed expressions and collocations (such as effects of treatment Y) are repeated but also expanded to longer stretches of phraseology in Introductions. (p. 125)

(p. 125)

Geldhill also notes collocations to the left and right of the target word to show the variation of patterns, such as left collocates in these patterns: effect/s of, treatment of, and number of, and right collocations in these patterns: of cells, of compounds and of studies. Another example of examining collocations in specialised corpora is Ward’s (2007) study on specialisation and precision in Engineering. Two corpora were used for this study: a 380,000-word corpus of undergraduate textbooks in Chemical Engineering, and a 250,000-word corpus of Chemical, Civil, Electrical, Industrial and Mechanical Engineering. Ward (2007, p. 25) looks at several lexical items in these corpora and concludes through his analysis that gas, for example, is an everyday word, ‘but in chemical engineering it is a) technical because it is precise, and b) precise because it is technical.’ Ward (2007) identifies other such items in Chemical Engineering which have everyday but also technical meanings: rate, control, system, temperature and time. It is important to note the that high frequency words such as time, occur over 100 times in Chemical, Electrical and Industrial Engineering in Ward’s corpus, compared to 54 occurrences in Mechanical Engineering and 34 in Civil Engineering (Ward, 2007, p. 22). There are variations in collocations for the word time from Ward’s corpus and these collocations reflect the specialised nature of the corpus, as can be seen in these examples: settling time, reaction time and residence time (Ward, 2007, p. 22). Let’s now look at sets of longer multiword units, beginning with lexical bundles and academic formulas.

Lexical bundles: functions and categories Lexical bundles are strings of three or more words that occur frequently (Biber et al., 1999). Biber (2006, p. 134) notes that lexical bundles tend not to be idiomatic. Bundles can be complete grammatically (for example, on the other hand) or not (in the case of the) (Paquot & Granger, 2012, p. 138). Analysis by Biber et al. (1999) showed that 60% of the lexical bundles in academic prose are phrasal (in the case of; as a result of), parts of noun phrases (on the basis of), or prepositional

phrases (on the other hand). Just like single-word research, frequency criteria are usually applied to the selection of lexical bundles from corpora. Some studies have used frequency cut off points of ten occurrences per million words (Biber et al., 1999), 20 per million (Biber, 2006; Cortes, 2013; Hyland, 2008) and up to 40 occurrences per million words of text (Biber & Barbieri, 2007). The range of occurrence is also taken into account in some way, particularly in studies involving multiple disciplines or subject areas (see, for example, Simpson-Vlach & Ellis, 2010). Analyses of corpora for academic purposes showed that lexical bundles occurred more often in classroom discourse than textbooks and academic prose (Biber et al., 2004). Biber (2006) explains that classroom teaching discourse uses bundles from each of the three main categories (stance, discourse organisation and reference) because lexical bundles are useful for instructors who need to organise and structure discourse which is at the same time informational, involved, and produced with real-time production constraints. (p. 148)

These categories can be broken down into more fine-grained categories. For example, discourse organising bundles include topic introduction and topic focus, and topic elaboration and topic clarification (Biber, 2006). Here are some examples from Biber and Barbieri (2007) of these categories: Topic introduction bundles: What I want to do is quickly run through the exercise… Topic elaboration/clarification bundles: It has to do with the START talks, with the Russians, Identification/focus bundles: For those of you who came late I have the, uh, the quiz. (p. 271)

Lexical bundles in EAP: corpus-based studies of

writing and speaking Lexical bundles in professional academic writing were the focus of research by Byrd and Coxhead (2010), using Coxhead’s corpus (3,500,000 words), which was the corpus for Coxhead’s AWL study (2000). The texts in this corpus included textbooks, journal articles, laboratory manuals, book chapters and technical reports. Byrd and Coxhead (2010) investigated four-word lexical bundles in four sub-corpora: Arts, Commerce, Law and Science. These four disciplines shared 73 bundles, which represented 1.1% of the total corpus (each bundle occurring at least 20 times per million words). Byrd and Coxhead (2010) then took the range of occurrence into account, and selected only the lexical bundles that met a minimum of 10% in each of the four disciplines in the corpus. On the other hand, for example, had a frequency of 353 times in the whole corpus, and its range was 23% (Arts), 27% (Commerce), 35% (Law) and 15% (Science). Like Hyland (2008) and his finding of lower levels of lexical bundles in Applied Linguistics, Byrd and Coxhead (2010) study found that Arts contained the lowest percentage of lexical bundles (1.44%). The Sciences also had fairly low amounts of lexical bundles at 1.46%. In contrast, bundles in Law accounted for 5.44%, and in Commerce, they accounted for 2.65%. This study shared 21 bundles with those found by Biber et al. (2004) and Hyland (2008) (for more on lexical bundles in professional arenas, such as Ha (2015) on lexical bundles in Finance, Crawford Camiciottoli (2007) and Nelson (2000) in Business Studies and Verdaguer, Laso and Salazar (1996) in Biomedicine, see also Chapter 8). Nesi and Basturkmen (2006) investigated the amount and discourse functions of lexical bundles in an academic spoken corpus, made up of 160 lectures from Nesi’s British Academic Spoken Corpus (BASE) (see Thompson & Nesi, 2001) and 40 lectures from the MICASE (Simpson, Briggs, Ovens & Swales, 2002). In total, the corpus contained 1,270,798 words in four disciplines: Arts and Humanities, Social Sciences, Life Sciences and Physical Sciences. Nesi and Basturkmen (2006) focused in particular on bundles which occurred 10 or more times in each discipline and 50 times in the whole corpus for closer analysis. Seventeen of the most frequent 20 bundles in this study were also in Biber et al. (2004) top-20 bundles in classroom discourse (for example, the end of the, at the same time, and if you want to). Nesi and Basturkmen (2006) then worked carefully on

concordances of bundles to examine their role in the cohesive discourse of the lectures. They reported in some detail on referential bundles and discourse organisers from the lecture corpus, based on the categories from Biber et al. (2004), and concluded that second language learners in academic contexts need to be aware of these bundles and the roles they play in the discourse of these very common academic listening events.

Lexical bundles in textbooks Lexical bundles in textbooks have been the subject of further research to find out more about the frequency and function of bundles in texts which EAP/ESP students read in their university studies. These studies have extended the fields of lexical bundles research. Chen and Baker (2010) found 105 four-word bundles in a corpus of Electrical Engineering textbooks, and classified them into the functional categories from Biber et al. (2004). The largest group of bundles in the corpus were referential bundles (78%), followed by stance bundles (19%), and discourse bundles (3%) (Chen & Baker, 2010). Comparing those bundles with those found in a corpus of Electrical Engineering materials for ESP, Chen and Baker (2010) found that the learning materials had different proportions of lexical bundles in the three categories although the frequency order was the same as in the textbook corpus: referential bundles (88%), stance bundles (9%) and discourse organisers (3%). In a study of first-year Business and Engineering university textbooks and of intermediate and advanced EAP textbooks, Wood and Appel (2014) analysed three to five word formulaic sequences (which they term multiword constructions). They looked at the boundary between three and four-word constructions; that is, using the multiword unit at the end of the as an example, this five word sequence contains a range of possible sequences, including at the end, the end of, at the end of and the end of the. Wood and Appel (2014) approached this challenge in two ways, using their corpora of first-year university Business textbooks (774,042 words) and Engineering textbooks (804,071 words) to compare with five EAP textbooks. First, they identified root structures,

by examining the frequency of three word constructions within four-word constructions. That is, they looked at whether the three word construction as long as was more or less frequent than the four-word construction as long as the. By doing this, they could then list the root three word structure and reduce the number of constructions which ended with a or the. For example, the amount of occurs 375 times in their Engineering and Business corpus, compared to the amount of the which occurred 45 times. In this example, Wood and Appel (2014) list the construction like this: the amount of (the) to indicate that the amount of is the root structure and that the in brackets is a variable option. The second step was to consider overlaps between four-word sequences, again to identify root structures. This step meant that at the end of and the end of the combined to be listed as (at) the end of (the). Wood and Appel (2014) found that the multiword constructions were often not included in the EAP textbooks in their study and that these sequences tended not to be the focus of pedagogy in the textbooks. Some studies have investigated lexical bundles in specific disciplines, such as Grabowski’s (2015) research on Pharmaceutical lexical bundles in English textbooks and three other non-academic text types in the discipline. The detailed comparison of lexical bundles showed examples related to administering medicine, and processes and procedures related to medicine (for example, metabolised in the liver and approved for treatment) (Grabowski, 2015). Such studies as Chen and Baker (2010), Wood and Appel (2014) and Grabowski (2015) are useful because they investigate functions and boundaries of lexical bundles in learning materials. This research can inform decisions by textbook writers, learners and teachers on the lexical bundles which are worth focusing on and where materials might need to be adapted to be more like textbooks which students will read in their university studies.

Lexical bundles in student and professional writing in EAP Lexical bundles have also been examined in research into student and professional academic writing (for example, Tribble, 2011; Hyland, 2008; Cortes,

2013). Hyland (2008) identified 240 different 4-word bundles in a 3.5-million-word written corpus of published and student writing including PhD dissertations and master’s theses). Hyland (2008) found up to almost 16,000 individual bundles in the corpus, accounting for approximately 2% of the total words. Hyland (2008) found differences in the amount of lexical bundles in the disciplines: Electrical Engineering (3.5%), Business Studies (2.2%), Applied Linguistics (1.9%) and Biology (1.7%) (Hyland, 2008, p. 12). The functions of the bundles were also analysed and categorised into research-oriented (e.g. the structure of the), textoriented (for example, as a result of the) and participant oriented bundles (e.g. as can be seen). Hyland’s (2008) findings included that 50% of the bundles occurred only in one discipline, and 30% were shared in two other disciplines. These findings suggest that the amount and kind of lexical bundles can vary depending on the discipline. Cortes (2013) investigated lexical bundles in student and professional writing in History and Biology. She found that academic writers (academics) and students varied in the amount of lexical bundles they used in writing, saying that students use fewer lexical bundles less often in writing, and tend to rely on a fairly small group of bundles. Chen and Baker (2010) examined the use of lexical bundles by Chinese learners of English and first language novice writers and first language professional writers in English. Professional writers were found to use more noun-phrase bundles and referential bundles than both groups of learner writers. The Chinese writers were found to use a small group of lexical bundles – a common finding in studies of second language writers as they tend to use the lexical bundles that they feel most comfortable with. Nesselhauf (2005 p. 69) refers to lexical bundles used in this way as ‘lexical teddy bears’.

Lexical bundles in learner corpora Lexical bundles have been researched fairly widely in a range of studies on learner corpora (for example, Ebeling & Hasselgård, 2015; Paquot & Granger, 2012), with frequent comparisons of first and second language writers in English. Differences in the amount of use of lexical bundles in student writing have been

found in several studies. Ädel and Erman (2012), for example, examined lexical bundles in the writing of Linguistics students in Stockholm, Sweden, and at King’s College London, England. In total, 325 essays were analysed, first using a quantitative frequency-based approach of four-word bundles in the corpora, and then using a qualitative analysis of the functions of these bundles using Biber’s (2006) framework. Ädel and Erman (2012) found that the native speakers used more and a wider variety of lexical bundles in their writing than the Swedish students writing in English. Differences were found in the use of stance bundles (used more by the first language writers) and discourse organising bundles (favoured more by the Swedish writers). Both sets of writers used about the same amount of referential bundles.

The academic formulas list (Simpson-Vlach & Ellis, 2010) Simpson-Vlach and Ellis (2010) used a combination of quantitative corpuscomparison analysis and qualitative approaches to determine the most frequent and pedagogically useful academic formulas. Simpson-Vlach and Ellis (2010) used four corpora in this study: an academic speech corpus and an academic writing corpus, and a non-academic speech corpus and a non-academic writing corpus. In the quantitative analysis of the corpus, the authors used n-grams, mutual information and log likelihood to identify academic formulas, comparing occurrences in all four corpora. They then called on people with language testing experience and language teaching experience to rate a sample of the academic formulas they had identified. The rating exercise had three elements for the raters to consider: whether the formulas were ‘a formulaic expression, or fixed phrase, or chunk’ (p. 496); whether they had ‘a cohesive meaning or functions, as a phrase’ (p. 496); and ‘the formula teaching worth’ (p. 488). After correlating the quantitative and qualitative data, Simpson-Vlach and Ellis (2010) presented three sublists. The ‘core’ AFL list of written and spoken formulas (e.g. and the same; as opposed to), the first 200 formulas of spoken academic English (e.g. (nothing) to do; the same thing; blah, blah, blah) and the first 200 formulas of written

academic English (for example, be related to the; is more likely). The authors also categorised the formulas into functions such as contrast and comparison. The Academic Formulas List has been used to compare second language writing in English with first language writers, such as Eriksson’s (2012) study of English for Specific Academic Purposes (ESAP) students in Sweden and Hiltunen and Mäkinen’s (2014) study of Swedish doctoral students with either Swedish or Finnish as a first language with the BAWE (Nesi & Gardner, 2012). For more on general academic multiword units in academic corpora, such as Liu’s (2012) research on general academic formulaic language, including lexical bundles, phrasal/prepositional verbs, and idioms, see Chapter 6. See also Carter and McCarthy’s (2006) work on items such as the importance of and for example in different academic contexts.

Frames Frames contain slots for words to fit in, such as the XXX of (as in the concept of Biber et al., 1999) investigated variation in frames using a corpus of 5.3 million words of academic research articles and books. The frames had to occur more than 200 times in the corpus. Biber et al. (1999) found that ‘the XX of the’ and ‘in the XX of’ were two of the most frequent patterns in their data set. Ädel and Römer (2012) investigated the use of with the XX of (for example, with the intention/idea/use of) in an academic corpus. Ädel and Römer (2012) found that the most frequent nouns fitting into that frame were idea, use and help, as in with the help of. Flowerdew (2014) notes that these kinds of patterns, while frequent, may not be continuous strings and might not be meaningful. Nation (2016) has a similar concern that teachers need to know how multiword unit word lists have been constructed and that items in a list might not be meaningful. An interesting feature of common frames is how far they might stretch. That is, how many words might occur in the frame or how many words might appear between the parts of the frame. This is why researchers tend to select items in a four-word frame, for example, so that the occurrences of words are kept close together. Figure 4.1 contains examples of a target frame, the consequences of,

from an analysis of an academic written corpus (Coxhead, 2017a, p. 66). The examples show how this frame gets extended by the addition of extra words in each case.

Figure 4.1 Examples from an academic corpus of the consequences of as a frame (adapted from Coxhead, 2016b, p. 183)

Metaphor in EAP and ESP EAP research suggests metaphor is also important for second language learners because metaphors can represent over 4% of an academic lecture (Littlemore, Chen, Liyen Tang, Koester & Barnden, 2010). These researchers found that metaphors can carry important elements of meaning such as evaluation in academic speech, and that second language learners find metaphor difficult to identify and understand. Such information on the function and meanings of metaphor in context is useful for second language learners and teachers in ESP. Metaphor research in EAP and ESP has focused predominantly on the occurrence of metaphor in particular disciplines, such as Medicine, Economics (White, 2003; Charteris-Black, 2000) and Engineering. Computer Science is a field which has been noted for its extensive use of metaphor, particularly where everyday words are used for dealing with complex and abstract concepts (Izwaini, 2003). Some examples of metaphor in Computer Science include common words such as file, folder, button and save. In health communication, Ferguson (2013) discusses a range of metaphors, such as medicine is war, in which the enemy is disease and the doctors (fighters) use technology as weapons to fight the war on behalf of the patient. Another metaphor in health communication is the body is a machine, whereby the heart is referred to as a pump, the brain is a

computer and other body parts might be referred to as the plumbing. Metaphors in Medicine have been assigned functions by van Tongeren (1997, cited in Ferguson, 2013 p. 245). These functions include filling a vocabulary gap, explaining medical concepts (for example to patients) and exploring new concepts which do not have ‘well-established terms’ (Ferguson, 2013, p. 245). There are several examples from business studies of ESP research into metaphor. One example is Charteris-Black and Musolff (2003), who compared the use of metaphors for euro trading in two corpora of financial reporting, one British and the other German. In the English data, three main clusters of metaphorical meaning were found. These three clusters, in order of frequency are as follows. The value of the euro is an entity that moves up and down. The second cluster concerns states of health or strength, and the third can be summed up as euro trading is physical combat (italics added). There are two main subtypes of combat: boxing and general war metaphors. Examples of these three clusters include low/lower, fall/fell, downside for movement (p. 160); support, weak and ailing for health/strength (p. 163); and batter, hit and impact for physical combat (p. 165). According to the authors, both corpora reflect movement and health or strength, but the German data showed more concern with stability than combat. It is interesting to note that health metaphors are used more often in winter than in summer (Boers, 1997). Pérez and de los Rios (2015) explored metaphor in Finance in Spanish and English corpora. They found around 34% on average more use of metaphor in the English corpus and differences in the amount of metaphor in the corpora that refer to Finance in terms of a path, health and war, a living organism, or other references such as including colours, games and performance. Skorczynska Sznajder (2010) examines the use of war, health and sports metaphors in a business English textbook corpus and a business journal article/business periodical corpus and finds implications for how learners respond to metaphor in language and in thought: Approaches to specialist vocabulary instruction through conceptual metaphors are necessary to enhance students’ understanding of a discipline, especially if the learners are to be aware of possible social effects derived from conceptualizing a particular discipline through ideologically-motivated metaphors, as in the case of war metaphors in business and economic discourse. (Skorczynska Sznajder, 2010, p. 40)

(Skorczynska Sznajder, 2010, p. 40)

Boers (1997) found that exposure to health, fitness and fighting metaphors affected the language used by 100 business and economics university students in a problem-solving activity and, to some extent, the decisions they made in response to a socio-economic issue. Pardillos (2016) recommends raising awareness of legal metaphors in ESP, and uses a qualitative approach of judgements by multilingual legal specialists of items such as burden of proof and beyond reasonable doubt in sentences in English and whether there are similar metaphors in their languages. Metaphor in spoken academic English has been investigated in several studies. Littlemore et al. (2010) investigated metaphor in four university lectures from a spoken academic corpus and found that the average metaphoric density was 4.1%. Out of 132 (on average) items that second language participants found problematic in a lecture, 50 (38%) were used metaphorically. Of those problematic ones, the students were not able to explain the meaning of almost 50% of the metaphors that were used. These metaphors had three main functions in the university lectures: evaluation, discourse organisation and the expression of key ideas. The evaluative function was used to decide on the importance, centrality or worth of key ideas, while the expression of these ideas in metaphor was related more to explaining particularly difficult concepts or points in an argument. Littlemore (2001) found comprehension problems for 20 Bangladeshi postgraduate students in lectures because the students often missed the evaluative component of lectures, concluding that students who are not able to follow the metaphors are likely to misunderstand the lecture or not understand the key points at all. Littlemore et al. (2010: 202), therefore suggest that training second language speakers to recognise and understand metaphor in academic lectures ‘is no luxury’ in EAP classes.

Limitations of multiword units in research A key limitation in multiword unit research is defining exactly what is being

investigated, with many possible terms (Nation, 2016). As we have seen in this chapter, the possible range and combinations of lexical patterns is quite varied, from two-word combinations through to larger formulaic sequences. Some of these patterns can be incomplete and not very meaningful, as we can see in these examples: to do with the, or I think it was. Simpson-Vlach & Ellis, 2010, p. 493) pick up on this issue, highlighting the point that they are ‘neither terribly functional nor pedagogically compelling’. Selection principles, such as whether incomplete patterns are included in analyses, can also vary as researchers consider important aspects of the design of their studies, such as the frequency and range of sequences, the text types and disciplines for analysis, the purpose of the research overall. Long strings of words might not be continuous in texts (Paquot & Granger, 2012). For example, a lexical bundle such as the consequences of, is part of a highly frequent frame ‘the something of something’. This means that a very frequent pattern, a/the something of something might contain a high frequency word and occur often, or a low frequency word and not occur very often. Frequency is important for language learners and teachers, and while some lexical bundles or collocations might have a strong statistical relationship, they might not be very frequent in texts. Byrd and Coxhead (2010, pp. 46–47) make this point by writing, The scale used to report lexical bundles is typically in terms of the number of bundles per million words. For example, on the basis of… occurs 308 times in the 3.6 million words that make up the AWL corpus. That’s 106 times per million words, or 53 times per 500,000 words, or twice per 15,625 words. Studies of vocabulary acquisition report that learners need many encounters with a word or phrase before it becomes part of their lexicon (Nation, 2008). Few learners will read a million words in an EAP class. Most will read fewer than the 15,000 words needed to encounter on the basis of even twice. While the frame of ‘the XXX of XXX’ (for example, the basis of research) might be frequent in academic texts, actual strings such as on the basis of may not occur very often at all. Furthermore, deciding on the unit of counting can be problematic. For example, in the results of an analysis of a set of academic readings on Midwifery, the target word labour co-occurs with both stage and

stages, as in stage of labour/stages of labour. In these cases, which words should be included in the multiword units? Should the plural form and the singular form both be included? A clear limitation of the research so far is its main focus on EAP, with few examples of research into specialised disciplines and professional corpora. Few studies as well use anything more than quantitative analyses of corpora to explore the use of multiword units and metaphor in writing and speaking. A further limitation is that much research has focused on written texts, rather than written and spoken texts. An exception is the work by Biber (2006) and colleagues on the T2KSWAL corpus, which includes a spoken corpus of over 1.6 million running words (see Chapter 6 for more on this research). Another limitation is the lack of information on the context of bundles found in corpora (Byrd & Coxhead, 2010). An example of the kind of contextual information on lexical bundles which learners and teachers might find useful came from an analysis of on the basis of in concordance lines using Coxhead’s AWL written academic corpus. Three patterns arose in the data, as Figure 4.2 illustrates.

Figure 4.2 Three patterns of use for on the basis of (adapted from Byrd & Coxhead, 2010, p. 53–54)

This kind of analysis presents issues on how to bring this kind of data into classrooms and into programmes of learning, as well as how effective any teaching approach might be that includes it. A final limitation of this area of research is the lack of replication studies in the literature to confirm findings, explore any differences or similarities, and built certainty in the field in terms of methodological approaches and generalisability.

Conclusion This chapter has focused on multiword units and metaphor in EAP as well as ESP. This area of language in use has predominantly been researched using corpus-based approaches, and most of the research is in the area of general EAP. Multiword units is just one of the many terms which are used in this area of research, and readers of the research need to be mindful of the terms which are being used, the selection principles and analysis, so that results can be interpreted easily. Metaphor research provides interesting findings for specific areas of study, such as Business and Computer Science. Much more work is needed in the area of specialised vocabulary in both multiword units and metaphor. The next chapter moves into specialised vocabulary in the context of secondary schools.

Chapter 5 Specialised vocabulary in secondary school/Middle School

Introduction This chapter begins with a discussion of the context of learning specialised vocabulary in secondary/Middle School and the importance of carrying out vocabulary research in this area. It then focuses on vocabulary in four core areas of secondary school studies: English Literature, Mathematics, Science and Social Studies. The effect of specialisation at school on vocabulary development and the vocabulary load of secondary school texts and teacher talk follow. To highlight differences in contexts and approaches in research, I draw on findings and examples from three research projects in English on vocabulary in context in school corpora in New Zealand, vocabulary in Middle School texts in the United States (Greene & Coxhead, 2015), and teacher talk research from an international school in Germany. The chapter also addresses researching teachers and the teaching of specialised vocabulary, based on interviews, case studies and classroom observations in second and foreign language contexts. The chapter concludes with some challenges for research into vocabulary in schools in countries where English is a foreign language.

Why research vocabulary in secondary school education?

A key driver for my own research interest in this field is that today’s secondary school student is potentially tomorrow’s first-year university student, meaning these students may become my students one day. From this point of view, it makes sense that I take notice of vocabulary in these settings. My second concern is that EAP is not restricted to just higher education as an area of inquiry. This point came home to me during many conversations with secondary school teachers about Coxhead’s (2000) AWL, where it became clear that there was little research on vocabulary in the high school context. These teachers had noted that the AWL contained items that were useful in their context, and they were keen to find out more about the specialised vocabulary of their areas. This interest sparked a range of studies into the vocabulary load of secondary school texts in Science (Coxhead, Stevens & Tinkle, 2010) and English (Coxhead, 2012c), teachers’ understandings of and approaches to specialised vocabulary in their classrooms (Coxhead, 2011a, 2012a), vocabulary size research in schools (Coxhead, Nation & Sim, 2015) and the nature and growth of vocabulary knowledge in the international school context, and of academic vocabulary knowledge in secondary schools in New Zealand (Luxton, Fry & Coxhead, 2017). Much of the research into the nature of specialised vocabulary in this book and in Applied Linguistics so far has focused on university and professional fields. Secondary school educational contexts and EAP have been less explored. That said, there are a number of approaches to secondary education in the wider literature. For example Humphrey (2016) investigates school settings and EAP and discusses models such as Language Across the Curriculum and content and language integrated learning (CLIL) in these contexts. In relation to vocabulary, Humphrey (2016) compares everyday and academic contexts under the dimension of subject matter. The contrast here is between everyday vocabulary ‘in simple nominal groups’ vs ‘technical lexis, defined and classified in complex nominal groups’ (p. 452). Technical vocabulary fits with academic disciplines, in this definition, and is bound closely to concerns around literacy. Learners and teachers, therefore, have a potentially heavy burden in this area. Vocabulary research in EAP can help, for example by identifying specialised vocabulary in disciplines and looking at teaching and learning of vocabulary in and out of classrooms. For more on the potential for massive online multi-player games, see Coxhead and Bytheway (2015).

In the secondary school arena, vocabulary has begun to make its way into national curriculum documents, including the New Zealand Curriculum (Ministry of Education, 2010) which states that learners need assistance with the specialised vocabulary of eight learning areas in the curriculum: English, the Arts, Health and Physical Education, Languages, Mathematics and Statistics, Science, Social Sciences and Technology. Vocabulary size is identified by the Ministry of Education as a major challenge for educational achievement (Ministry of Education, n.d.). New Zealand-based research by Gleeson (2010) found that secondary school teachers consider vocabulary to be a major challenge for their students. Vocabulary is also an area of concern in the USA’s Common Core State Standards (CCSS) (National Governors Association Center for Best Practices & Council of Chief State School Officers, 2010). Greene and Coxhead (2015) discuss the relationship between the CCSS and academic vocabulary for Middle School students in the US system. Like the New Zealand curriculum, vocabulary is embedded into the CCSS, for example, in standards which require the use of specialised vocabulary in relation to History and Social Sciences in school texts and students’ writing and speaking (Gardner, 2013; Greene & Coxhead, 2015; Johns, 2016). Gardner (2013) has responded to this need for vocabulary and the Common Core by developing a Common Core Word List, by combining the most frequent words in Nation’s BNC frequency lists and the COCA. This reflection of vocabulary in national curricula and teacher-based research suggests that there is a need to find out more about the nature of vocabulary at secondary school level.

Vocabulary in secondary school texts Secondary school students need to engage with texts in all of their courses of study. Reading textbooks can be particularly demanding in terms of vocabulary load (see Coxhead et al., 2010 and the following). Subject area textbooks are not necessarily written with non-native speakers or non-native readers of English in mind. Furthermore, secondary school students study across a range of subject areas. In the case of New Zealand, there are eight such curriculum areas. In other

contexts, where English is a medium of instruction, for example in international schools in Thailand, China or Germany, students could well be studying multiple academic subjects in English and be learning in another foreign language. In these cases, it is important to know more about the specialised vocabulary of English in secondary school subjects, so that teachers and learners can focus on developing lexical knowledge as well as subject knowledge in English. One key feature of texts in secondary schools which is important to consider is whether and how they take on the role of teaching or explicitly explaining specialised vocabulary. These roles could be carried out through glossaries, word lists, explanations in texts such as meanings of words in brackets and highlighting particular lexical items in some way so that the readers’ attention is drawn to them.

Challenges of specialised vocabulary in schools As already mentioned, there is quite a range of possible contexts where English is being used in classrooms in different subject areas as well as a range of learners with different proficiencies in English and different first languages, all in one class. Learners may encounter the same high frequency words in one subject area which occur in another subject area with specialised meanings. For example, the word product is specialised in Economics and in Mathematics and is used in general English. In an online survey of secondary school teachers in Aotearoa/New Zealand, Coxhead (unpublished data) asked teachers to reflect on elements of specialised vocabulary in schools. One teacher responded by writing, There is specialised vocabulary for every subject that does not translate and this is an area that could be exploited more. More importantly for senior students is the language of instructions in assessments. This also does not translate from subject to subject and this issue should be addressed. For example what does ‘describe’ mean in English and what does ‘describe’ mean in Geography. Maybe visual depictions would be useful. (Coxhead, unpublished data)

This rather full answer from the teacher suggests how the movement between subject areas by students in secondary schools in their everyday schedules is problematic in terms of the specialised vocabulary that the students encounter in different subjects. Another response from a teacher about deciding whether words are specialised in their subject area was, ‘Blank looks on faces!’(Coxhead, unpublished data). See Appendix 1 for the survey questions. Other factors which can influence specialised vocabulary in schools include the kinds of reading required in different subjects, the age of learners and their language learning background and level of vocabulary knowledge and what teachers and learners consider to be important about vocabulary in and out of class. And, of course, it is not just what students read in the course of their studies which need to be investigated from a specialised vocabulary perspective. Classroom talk between teachers and learners is also an important source of input and output for learners. Gibbons (2006, p. 1) draws attention to the importance of talk in language learning contexts, noting, ‘The talk of teachers and students draws together – or bridges – the ‘everyday’ language of students learning through English as a second language, and the language associated with the academic registers of school which they must learn to control’. Llinares, Morton and Whittaker (2012) contrast examples of everyday (things, objects, food, changes) and scientific (biology, organism, features, characteristics) lexis. They point out that ‘teachers are aware of the importance of eliciting the right technical word from the students, especially when it is particularly relevant for the topic under study’ (p. 191). They also note that teachers in CLIL classrooms tend to pay most attention to meaning ‘with priority mainly given to the meaning of key concepts for the understanding of important subject content’ (p. 192). A feature of learning a language in schools through English, for these authors, is the contrast between the ‘technical and often abstract’ language used in content classes compared with the communication-focused language of general or English as a foreign language classes (p. 191). In highly mobile learning situations, such as international schools where parents and caregivers and children may move countries and language contexts quite often, some students can miss out on opportunities to fully develop academic and specialised vocabulary in their secondary school education. A possible danger in these cases is that these students can fail to develop this

language in their first language, let alone their second. Progression in schools from early years of study to later years involves higher and higher levels of specialisation in subject areas. For example, general Mathematics in early years becomes algebra or calculus in later years of high school. This specialisation in subjects has an impact on the vocabulary used in textbooks and classrooms. In the next section, specialised or technical of vocabulary from four core areas of secondary school studies are presented. These four areas are English Literature, Mathematics, Science and Social Sciences. This section also looks at the effect of specialisation at school on vocabulary development and the vocabulary load of secondary school texts. To highlight differences in contexts and approaches in research, I will draw on findings and examples from a research project on vocabulary in context in school corpora in New Zealand (Coxhead & White, 2012; Coxhead, 2012c), on Middle School texts in the United States (Greene & Coxhead, 2015) and an international school in Germany (Coxhead, 2017b). Let’s start with the Middle School Vocabulary Lists from Greene (2008) by way of background.

Middle School Vocabulary Lists (Greene, 2008) Greene’s (2008; see also Greene & Coxhead, 2015) research into academic vocabulary for Middle School students was a response to Coxhead’s (2000) earlier work on the AWL, but targeted the needs of learners in Middle School in the USA. Greene (2008) gathered a corpus of 109 textbooks used in Grades 6–8 in Middle Schools, in the following subjects: English grammar and writing, Health, Mathematics, Science and Social Sciences and History. The corpus is roughly even between the three grades, but not quite so even across the subjects, with Mathematics, Social Sciences and History and Science containing more textbooks and Health containing fewer. The total corpus size is over 18 million running words, with a fairly even spread of running words across the grades. Grade 8 contains the most running words (nearly 6.7 million) and Grade 6 contains the least (5.9 million). This large-scale corpus allowed Greene to find out the coverage of existing word lists such as West’s GSL (1953) and Coxhead’s (2000)

AWL over the textbook corpora. This first step established whether there would be candidates for selection outside the GSL and AWL for new word list. These word lists also provide coverage figures over a text. That is, they show what percentage of a text is ‘covered’ by a word list. The GSL covers nearly 80% and the AWL covers nearly 5.4% of the Middle School texts. These figures shows that these texts are less difficult than university-level texts, with their higher coverage of general English (the GSL) and lower coverage of the AWL (academic English) than in Coxhead’s (2000) study of university-level texts. The lists do not include proper nouns, abbreviations or compound nouns. The next step in Greene’s (2008) research was to identify candidates for inclusion in a Middle School Vocabulary List for each of the subject areas. Greene’s study is important for several reasons. Firstly, it focused on the actual texts which students are required to read. Secondly, it considered the lexical needs of students in different subject areas. And, thirdly, Greene made principled decisions about the size and balance of the corpus, and the selecting of items for the word lists. Greene selected items outside the first 2,000 of West’s GSL from the corpus. Out of the remaining words, she used frequency criteria to select items from the AWL which met the frequency and range cut offs across the subjects in her corpus. She then considered items which were not in the AWL which met the frequency and range cut offs. Finally, she selected items which met a discipline-specific frequency cut off point, using each of the subject corpora. These selection principles mean that the Middle School lists can have some overlap, since some lexical items would occur in all subject areas. An example of such a word is chapter (Greene & Coxhead, 2015), which is unsurprising because the corpus is made up of textbooks with chapters. The selection principles also mean that the subject or discipline is taken into account, which means the Health list, for example, contains items such as drug, muscle and infect. The Middle School lists are discussed next in each of the areas on English Literature, Mathematics, Science and Social Sciences, but as an overview here, it is important to note that each list roughly contains between 600 to 800 types. Greene (2008; Greene & Coxhead, 2015) used word families only when the actual word family members occurred in the textbook corpus, unlike Coxhead (2000) who used Bauer and Nation’s (1993) word families as a guide for the AWL (see

Chapter 3 on word lists). The coverage of these lists over the textbook corpus is quite impressive, ranging from 10.17% in Science, down to 5.83% in Social Sciences and History. Greene then set up a parallel corpus (nearly nine million running words) to validate her first study and found similar coverage results of the Middle School lists over the parallel corpus. A third corpus of Middle School fiction texts was used to establish whether the Middle School lists contained academic vocabulary, rather than general-purpose vocabulary (following Coxhead’s methodology for validity). The results ranged from 1.73% coverage over the fiction corpus in Mathematics through to 2.89% over the fiction corpus by the English grammar and writing Middle School lists. Let’s now turn to case studies of secondary school subjects and specialised vocabulary, beginning with English Literature.

English Literature English Literature classes and curricula are an important area of study for specialised vocabulary in secondary education. In many cases in Aotearoa/New Zealand, English as a foreign or second language students seem to be placed in classes which focus on English Literature as a way to support language learning or provide a way into learning English. English Literature is a core subject at secondary school in New Zealand. In a corpus-based study of English Literature texts in New Zealand schools by Coxhead (2012c), a key question early on in the development of the corpus was, ‘What is a text in English Literature?’ The New Zealand curriculum is enquiry-based, which in essence means that there are no prescribed texts. There are certainly no textbooks readily available for building a corpus such as the one from Greene (2008). The corpus for the New Zealand study needed to include a wide range of texts for literature, including written and spoken texts such as films, TV/radio/newspaper/magazine/billboard and other advertisements, websites, YouTube clips, short stories, newspaper stories, television news clips, novels and plays. To find out what might represent these kinds of texts, a national association of English Literature teachers contacted their members on my behalf and a crowdsourced list of texts commonly used by

teachers was developed. The list was roughly divisible by senior and junior school texts, which for building a corpus is very useful. But the size and purpose of the texts for the corpus proved quite tricky to manage (Coxhead & White, 2012). Teacher-made resources were often used in classes, quite possibly handed down and around between teachers. The Ministry of Education Te Kete Ipurangi website (available at www.tki.org.nz/) is a repository for teacher-made and shared resources. Mindful of Radford’s (2013) master’s research on universitylevel Computer Science, which suggested that teacher-made or sourced materials have lower vocabulary load figures than textbooks and journal articles, we kept these resources separate from the source texts of English Literature. Finally, for one of the parts of the curriculum, students can choose their own texts to study, and it was beyond the scope of a small study of secondary school texts to crowdsource beyond the teachers at that stage. This point does bring up some interesting questions about the nature of the texts and potentially the vocabulary in those texts for a secondary school study. A small-scale study was carried out based on recommendations from the crowdsourced English Literature list, based on a junior and a senior school collection of texts. The corpus contained just over 250,000 words in the senior section and 170,000 running words in the junior section. An imbalance like this between corpora can be seen as a weakness of the study. However, the texts at senior level are longer on the whole than the texts for junior school. Compare, for example, Pride and Prejudice at 122,816 running words in the senior corpus and Much Ado About Nothing (available from www.gutenberg.org/ebooks/2240) at 23,297 running words in the junior corpus. The six movies in the corpus have a similar total of running words at around 20,000, but the movies are quite different: Run, Lola, Run; Moulin Rouge, and Shine in the senior corpus and Lemony Snickett, Shrek and Whale Rider in the junior corpus. What did this study find about the nature of vocabulary in English Literature in schools? Firstly, this study showed that students need a large vocabulary to cope with the demands of reading secondary school literature, at junior and senior levels. The corpus analysis showed that, like Nation’s (2006) study of novels, the junior and senior texts needed 8,000–9,000 word families plus proper nouns to reach 98% coverage. To give an idea of the kinds of specialised vocabulary which might be in secondary school English Literature texts in New

Zealand, here is the response of a Drama/English Literature teacher to Coxhead’s (2011a) online survey on specialised vocabulary in schools (Question 5 in Appendix 1) (this quotation is from unpublished data in the study). The teacher is responding to a request for more information in the survey on how teachers decide what specialised vocabulary to focus on with their students: I research the topic we are working on and the action words within that (i.e. Devising = split stage, levels, contrast, voice over; Shakespeare = groundlings, courtly behaviour, enjambment lines) and then explain these to the class in terms they will understand whilst still using the correct terminology. (Coxhead, unpublished data)

Middle School English grammar and writing list Greene’s (2008) (see also Greene & Coxhead, 2015) vocabulary list for Middle School English grammar and writing was based on a corpus of 18 textbooks from Grades 6–8. The corpus contains nearly three million running words. There are 722 types in the Middle School English grammar and writing list, and 374 families. Examples of words from the Middle School English Grammar and Writing Vocabulary List (Greene & Coxhead, 2015) from the highest frequency include pronoun, phrase, adjective, paragraph, topic, adverb and clause. These lexical items are very specific to grammar and writing, and nod to the textbook nature of the source corpus for Greene’s study. Paragraph and topic are in Coxhead’s (2000) AWL. The list then continues with a mix of specialised and more general items: identify, chapter, compound, preposition, modify and predicate. The full list contains 722 word types in total, with 374 word families. Coverage of this word list over the English grammar and writing textbook corpus is 6.83%, and this fairly high coverage is sustained over a parallel corpus (6.08%). West’s GSL (1953) and the Middle School grammar and writing list combined cover 88.97% of the original grammar and writing corpus. The Greene list clearly reflects the narrow focus of English grammar and writing, and the high coverage

of the GSL (compared to around 70%–75% over Coxhead’s AWL corpus) suggests that there is a lexical difference between textbooks for Middle School students and university-level texts.

Vocabulary in teacher talk in an EAL Literature class in an international school The sample of teacher talk that follows comes from an EAL class at Grade 6 (1112-year-old students) in an international school in Germany (Coxhead, 2017b) shows how an English Literature class integrates grammar and literature. In this example (Figure 5.1), the teacher and students are discussing a punctuation point in relation to a story they are reading. The teacher talk sample in Figure 5.1 contains two words from Greene’s Middle School grammar and writing list: possessive and apostrophe. This teacher talk example illustrates well that the students in this class need to know the specialised vocabulary of English grammar and literature in order to understand the teacher talk. The teacher scaffolds and checks understanding using questions. Figure 5.2 comes from a sample of teacher talk a little later in the school year. In this example, the teacher is working directly on vocabulary learning by drawing deliberate attention to a target word, in this case centre, and eliciting aspects of vocabulary knowledge from the learners. Note that the column on the right has notes on the focus of the teacher talk at each point.

Figure 5.1 An example of grammar integrated into a Literature class in an international school

The example in Figure 5.2 shows clearly how the teacher focuses on aspects of knowing a word such as meaning, word families and strategies for learning vocabulary, all in a short space of time in class. These aspects of knowing a word relate to Nation’s (2013) elements of word knowledge: form, meaning and use. The next section looks at vocabulary in Mathematics at school.

Figure 5.2 Example from an EAL lesson in an international school

Vocabulary in Mathematics Mathematics is a core subject at secondary school in New Zealand. To identify examples of Mathematics vocabulary, I carried out an analysis of an advanced

Mathematics textbook by Barton and Cox (2013) used in New Zealand secondary schools. Table 5.1 shows the 12 most frequent lexical items in the textbook from the first 3,000 high frequency BNC lists from Nation (2013). The first column shows words in the first 1,000 list and the majority are function words and nonmeaning carrying words. That said, the 11th most frequent word in this group is cos. Other examples from the first 100 of the BNC 1,000 outside the most frequency dozen words in Table 5.1 are: point, number, fixed, related, and let (as in Let X be…). The second 1,000 BNC list in the middle column contains more examples of words which are more closely related to Mathematics, including calculate and constant. Table 5.1 The first 12 most frequent items in the first three BNC/COCA lists from Delta Mathematics (Barton & Cox, 2013)

BNC 1,000

BNC 2,000

BNC 3,000

the a of is and to in 1 that for cos at

example value determine calculate log maximum length path exercise distance mathematics constant

sin function equation curve complex solution task minimum solve method angle obtain

The third column in Table 5.1 also contains lexis, which is closely related to Mathematics. The table shows possible differences and problematic features of everyday vocabulary and technical vocabulary. Leung (2005) investigates formal and informal vocabulary use in primary school Mathematics and argues for ‘weaning’ students off informal language in Mathematics. She writes,

The technical and specialist use of language for specific purposes can be interpreted in two very different ways: (A) technical language as a sign of expertise and valued knowledge (positive evaluation) and (B) technical language as unnecessary jargon (negative evaluation). (p. 127)

Leung (2005) also notes that both connotations ‘are underpinned by an implicit acknowledgement that the use of technical language is a form of meaning making and meaning interpretation’ (p. 128). The examples of collocations and multi-word units occurring over 50 times in the Barton and Cox (2013) textbook in Table 5.2 shows the kind of technical vocabulary students in schools need to work with their Mathematics textbook. Being able to recognise and interpret this vocabulary is essential in this subject area.

Middle School Mathematics vocabulary list Greene’s (2008; Greene & Coxhead, 2015) Middle School Mathematics Vocabulary List, contains 616 word types, made into 312 word families. The ten most frequent word families are in Table 5.3. Table 5.2 Examples of mathematical collocations and multi-word units in Barton and Cox (2013)

sin x critical path trig functions X axis simultaneous equations sin x cos complex numbers derived function differential equation conic sections parametric equations

parametric equations Table 5.3 The ten most frequent word families in the Middle School Vocabulary List (Greene, 2008; Greene & Coxhead, 2015)

Equate, equation, equations Graph, graphic, graphing, graphs Area, areas Fraction, fractional Chapter, chapters Data Triangle, triangles, triangular Percent, percentage, percentages Decimal, decimals Factor, factored, factoring, factorisation, factors

Figure 5.3 A section of teacher talk in the German International School grade 6 Mathematics corpus

Note that area is the third most frequent word family in Table 5.3. Area is a good example of an everyday word which has high frequency and a specific meaning in Mathematics. The Middle School corpus is textbook-based, which is reflected in the high frequency of the words chapter and chapters.

Teacher talk in Mathematics The section of teacher talk in Mathematics in Figure 5.3 is taken from a study in an international school in Germany. The teacher in this segment is an English as a first language speaker from the United States of America. The students are in Grade 6, which is their first year of high school and are between 11 and 12 years old. In Figure 5.3, the words in bold are items which could be identified as technical in this text, such as triangles (see the top ten list in Table 5.3 from Greene’s Middle School Mathematics Vocabulary List). Note the occurrence of multi-word units, such as four-sided polygon, and interior angle sum in Figure 5.3, as well as examples of classroom management by the teacher, and references to earlier work as a class on the concept of quadrilaterals. In the example, the word number is highlighted in the text because it is clear that the teacher is not talking about just any number. Instead, she is referring to numbers in relation to quadrilaterals. Therefore, in this example, number is potentially a specialised item of vocabulary. This sample of text contains more than one technical word per line. This, in turn, suggests that teacher talk is also a potential source of technical vocabulary for learners. The sample certainly provides evidence of how the teacher works to make sure that the students can relate language to concepts, for example, in referring to possible four-sided polygons in everyday contexts, such as a kite.

Vocabulary in Science Harmon, Hedrick and Wood (2005) divide Science vocabulary into three main areas in textbooks: technical (e.g. photosynthesis), non-technical (e.g. component)

and procedural (e.g. be the result of). Research by Taboarda (2012) found that knowledge of technical vocabulary in Science was a better predictor of reading comprehension in Science than general vocabulary knowledge. Ardasheva and Tretter (2017) investigated vocabulary in Science at secondary school for new students. Using a Physics textbook, Ardasheva and Tretter (2017) selected and categorised the lexis related to Newton’s laws of motion based on a 76-page chapter in a textbook. Categorisation of the specialised Science vocabulary was carried out by adapting a schema from Miller (2009, cited in Ardasheva & Tretter, 2017) on levels of difficulty of scientific words. The focus was on developing a word list of lexical items which might cause comprehension difficulties for newcomers. Table 5.4 has categories and examples of Science-specific vocabulary identified from Run and jump, an activity which introduced Newton’s laws of motion through activities such as experiments and tasks. What is interesting here is the sheer range of specialised vocabulary in this chapter from the Science textbook. Table 5.4 Categorisations and examples of Science-specific vocabulary from Ardasheva and Tretter (2017, p. 7) adapted from Miller (2009)

Category of vocabulary

Examples

Scientific processes/descriptions of motion Apparatus Common scientific terms Everyday and scientific meanings Difficult conceptual phrases Measurements

Acceleration, increase Accelerometer, meter stick Motion, diagram, gravity Log, weight, value Apply force, exert a force Number, amount

Part of the categorisation involves deciding what kind of comprehension difficulties would be caused by scientific vocabulary. Some items may cause problems because they have everyday meanings and scientific meanings, such as act and travel. Ardasheva and Tretter (2017) also identify cultural references from the text, such as Newton, Galileo, tackle and football, which would be new concepts for learners, as well as general words that are not scientific but would

be new for learners, such as backward (for directions) and neglect (low frequency lexis). As well as this identification and categorisation exercise, Ardasheva and Tretter (2017) used qualitative methods such as interviews and observations to find out more about how the class teacher worked with science-specific vocabulary in class, both before and after the intervention outlined by Ardasheva and Tretter (2017). The intervention involved four Grade 9 and 10 classes taught by the same teacher for students aged between 14 and 19 years old, and was based on a ‘learning routines cycle’. This weekly cycle began with definitions such as descriptions, examples, drawings, gestures and discussion. The cycle continued through the week with activities such as matching pictures, written homework, card games, charades, other games and quizzes where students completed a cloze exercise. Pre and post-tests demonstrated learning of technical terms in the course of the programme, and Ardasheva and Tretter (2017) advocate for time on technical vocabulary in class. They also point out that older learners who are beginning their English language studies were faced with ‘dramatically more complex’ (p. 15) science than younger learners at elementary levels. Fang (2006) investigates technical and everyday vocabulary in Science in schools, and notes that technical vocabulary can include everyday vocabulary used in combination, such as school of fish and geological fault, to express a scientific term. Harmon et al. (2005) find that, ‘The heavy use of scientific terminology to explain concepts […] raises the readability level of science textbooks’ (p. 271). Note that ‘raising the readability level’ in this context is not a desired outcome, because the higher the readability level, the more difficult the text is to read. Kim (2016), in an article on talking to learn in a science class, notes that research on specialised, technical vocabulary of science remains limited particularly when it comes to low-literacy bilingual learners.

Investigating a university-based Science list in secondary school texts In an effort to investigate whether an existing university-based, Science-specific

word list (Coxhead & Hirsh, 2007) can go some way towards identifying the specialised vocabulary of Science in schools, Coxhead et al. (2010) analysed the vocabulary in a secondary schools Science textbook series. Coxhead et al. (2010) found that Coxhead and Hirsh’s EAP Science List covered 5.90% of the textbooks. This coverage is higher than the 3.79% coverage of the same word list over tertiary level science texts reported by Coxhead and Hirsh (2007). This higher coverage figure in the secondary school Science textbooks suggest that there is some overlap at secondary and tertiary level of this science vocabulary. There is also a difference in that 54 word families (17%) in the list appeared in the tertiary corpus but did not occur in the secondary school textbooks. It could be that the secondary school textbook corpus is quite small, which means these lexical items did not have much opportunity to occur, or it could mean that there are differences between the vocabulary in the sciences at secondary and tertiary levels. This study needs replication and validation to support any generalisations that could be made. Figure 5.4 shows an example of a section of text from one of the Science textbooks by Hook (2005, p. 88). This example is from a Year 9 textbook and is about atoms. The text has been marked up with words from West’s GSL (1953), Coxhead’s AWL (2000) and the EAP Science List (Coxhead & Hirsh, 2007). The GSL words are in normal text, the AWL words are bolded, the shaded words are from the science list and the words which are outside these lists are in italics.

Figure 5.4 Mini solar system text from Hook (2005) with marked GSL, AWL, Science list and words not found in any list

The science-specific words from the Coxhead and Hirsh (2007) list appear on almost each line of the textbook sample from Hook and are repeated in the text. The textbook study by Coxhead et al. (2010) found that the first 2,000-word families of Nation’s (2006) BNC had fairly consistent coverage of the secondary school Science textbooks and the textbook for the final year of study at school had a larger vocabulary load than the other textbooks. The textbook for the final year of study included some words which did not occur in the other textbooks, such as sulphate, sulphur, gondwana and Gondwanaland, and some native New Zealand animals, such as tuatara and takahe. So far, we have focused on specialised vocabulary in written texts, so now let’s move to spoken texts in secondary school.

Teacher talk and Science vocabulary: an example from an international school In recordings of teacher talk from a Science class in an international school (see Figure 5.5), we can see the teacher (in normal type) working with the class to elicit and discuss concepts related to distillation, by recalling previous classes, discussing concepts from everyday life, and talking about the materials in the classroom itself. In Figure 5.5, the students’ responses are in italics. The teacher explicitly focuses on vocabulary in class, for example, by asking the students to review the vocabulary and by pushing them to use the target language in context. There is a high level of interaction in this example, as the students and teacher work back and forwards in the extracts. Note the large amounts of vocabulary in these examples.

Figure 5.5 Two extracts on distillation from an international school Science class, year 6

Vocabulary in Social Sciences Social Sciences tend to include subjects such as Business Studies, Classics, Economics, Geography and History. In New Zealand, junior secondary school students take ‘Social Studies’ which narrows into specialised areas in senior years such as Business Studies, Geography, History and Economics. Social Studies is now available at senior levels in some schools, which blurs distinctions somewhat when it comes to identifying specialised vocabulary in the Social Sciences. The New Zealand Ministry of Education hosts a website of learning and teaching materials for Social Studies, including the example that follows. The unit of work on Tax Education and Citizenship directs the students to explore websites and develop their own resources such as quizzes as part of a social enquiry cycle of learning. Figure 5.6 contains a sample of the taxation text, which was developed by Inland Revenue, a government department. Analysis shows that there are many high frequency words in this text. Words in the text which also occur in Nation’s frequency-based first 1,000 words of the BNC (Nation, 2006, 2013, 2016) include: tax, rates, government and unemployed. The first 1,000 BNC list covers just over 80% of the text. Items from the second 1,000 of Nation’s lists include legal, duty, borrow, income. This list covers 9.28% of the text. Items from the third 1,000 list include mortgage, inheritance, sovereignty and increment. This list covers 3.38% of the text. In total, these three high frequency word lists cover nearly 95% of the text, which means that with support, students with knowledge of the first 3,000 words should be able to cope with the vocabulary of this text. If learners do not have a vocabulary of 3,000 words, the text could be simplified by replacing low frequency words with high frequency words, but this needs to be done carefully so that learners still encounter the specialised vocabulary in context to support their learning of Social Sciences.

Figure 5.6 Example of a text on taxation from a social enquiry unit at level 5 on tax education and citizenship

Note that 4.2% of the words in this text are also in Coxhead’s (2000) AWL. Examples of these words are legal, document, area and depression.

Middle School Social Studies and History Vocabulary List The Middle School Social Studies and History Vocabulary List (Greene, 2008; Greene & Coxhead, 2015) was selected from a corpus of 26 textbooks which totalled nearly 5,600,000 running words. The list contains 809 word types and 394 families. This list covered 5.83% of the textbook corpus for Social Studies and History. Proper nouns play a major role in these subject areas, and represent up

to 5% of the textbook corpus for this discipline. The top ten words in the Middle School Social Studies and History Vocabulary List are in Figure 5.7. These words show that the analysis was based on a written corpus, since chapter is the most frequent word. From there, the words seem to reflect geographical and historical content. The appearance of congress in this top ten list reflects the fact that this corpus comes from textbooks in the USA.

Figure 5.7 The top ten words in the Middle School Social Studies and History Vocabulary List (Greene, 2008; Greene & Coxhead, 2015)

Researching teachers and the teaching of specialised vocabulary In this section, we move to teacher cognition about specialised vocabulary in secondary school vocabulary studies based on studies carried out in the New Zealand context. Coxhead (2011a, 2012b) looked into how secondary school teachers in Aotearoa/New Zealand identified specialised vocabulary in their subject areas. Survey questions also asked teachers how they introduced and consolidated specialised vocabulary in their classes, and what resources they used in their teaching (see Appendix 1 for the survey). A total of 153 teachers from 50 New Zealand schools began the online survey but only 61 respondents completed the survey in full. These teachers grouped into four subject areas: Science (n = 21) (including Biology, Chemistry and Physics), English to Speakers of Other

Languages (ESOL) and Languages (n = 17) (including French), English Literature and Arts (n = 16) (such as Theatre Studies, Visual Art and Drama) and Social Sciences and Economics (n = 7). Unfortunately, this last group is much smaller than the other groups. From the data analysis, it became clear that a range of factors influenced the decisions teachers made about specialised vocabulary selection. Nearly three quarters relied on student feedback and questions in class for identifying specialised vocabulary to focus on. Over 50% of the teachers used their own content knowledge to guide their decision making or thinking about the needs of English as a second or foreign language in the class, while just under 50% were guided by discussions in the class and the content of the class. One ESOL teacher reported gathering vocabulary from many sources, and considering how words might ‘transfer’ between subjects. She also made sure that when students raised questions about words in class, she included these words in assessment, as a way to encourage students enquiries about lexis in class (Coxhead, 2012b). In the survey overall, textbooks were less likely to be drawn on as a source of support for deciding which vocabulary to focus on. Everyday words were clearly considered problematic for a Biology teacher, who reported a focus on words which have a specialised meaning in Biology but a different meaning in other school subjects (Coxhead, 2012b). This teacher thought it was important to ensure that students were well aware of the correct use of specialised vocabulary in Biology. As mentioned in the Social Sciences section earlier, online sources such as the New Zealand Ministry of Education resources and word lists provided guidance on which words to focus on in class. Other curricula internationally also provide work lists to guide teachers, such as the International Baccalaureate Diploma Programme (IBDP). How the vocabulary is selected and defined by curriculum designers for the IBDP, for example, would be an interesting research project. In response to the questions in the survey about how teachers introduced and consolidated specialised vocabulary in their classes (Coxhead, 2011a), the everyday context of the class was widely reported as being important (77%). A total of 71% of the teachers reported thinking about specialised vocabulary when planning lessons and teaching. It is common for teachers to augment textbooks with their own designed or gathered materials. More than 50% reported using

their own resources in class, as well as textbooks and online tools were used once a week (over 40%). Dictionaries and vocabulary cards as learning strategies were reported very differently across the teacher group, from every day to never. The teachers completed Likert Scales indicating how often they used particular resources in class for specialised vocabulary (see Appendix 1). For example, Science teachers reported using dictionaries the most often, followed by Social Sciences and Economics, and English and Arts teachers. ESOL and languages teachers were the least likely group to use dictionaries. Teacher experience also seemed to affect pedagogical choices in teaching vocabulary. Teachers who had taught for more years were more likely to use activities requiring students to use target lexical items in their writing and/or speaking. Teachers who had taught more than 16 years reported doing more activities such as think-pair-share and dictionary practice activities. This smallscale study suggests that these teachers did vary somewhat in their approaches to specialised vocabulary in schools, depending on their subject area and experience. The most important difference seemed to be in the area of materials and resources used in class, including dictionary use.

Challenges of specialised vocabulary and technology in schools: Henry

In a follow-up to the aforementioned study, I interviewed ten teachers about their subject areas and vocabulary in their teaching. Next are notes from an interview with a teacher of technology in a secondary school (Henry; not his real name) in a large New Zealand city. Henry taught at a low decile school. In the New Zealand system, the decile level of the schools takes into account the socioeconomic status of the community where students come from. Approximately 10% of all schools in New Zealand are in each of ten decile levels (see www.minedu.govt.nz/Parents/AllAges/EducationInNZ/SchoolsInNewZealand/SchoolDecileRat for more information about decile ratings). The decile levels are used to allocate government funding with lower decile schools getting more funding. It should be noted that even mid-decile and high-decile schools have some learners from low-

income families. The purpose of including Henry’s interview notes in this chapter is to share concerns around language, teaching and learning specialised vocabulary. This teacher, Henry, was a trained language teacher who had migrated to New Zealand from South Africa around five years before the interview. He was experienced in industry before moving into education. His classes were normally almost 100% male, with one or two girls. The students spoke a variety of first languages, such as Tongan, Samoan, Te Reo Māori and English. One of Henry’s main concerns is that students were regularly pushed into Technology as a subject because they were not doing particularly well in their academic studies, and Technology was seen as a practical or hands-on subject. Henry reported that vocabulary is ‘a big part of what we teach in tech. Students need the vocabulary to get credits in tech’. Henry said he focused on teaching the names of materials and objects that the students needed to know for class, such as car parts, joints in woodworking, such as tongue and groove. He would actively teach the names for all the tools, including what a tool was used for, its maintenance and name, at the same time. There will always be some [health and] safety usage and vocabulary as well. This point was important, and Henry was keen to see that in the next class, if he saw students getting a tool out for woodwork, he would like to know the students know when to use a chisel not a screwdriver. One of Henry’s key techniques with vocabulary was to teach what he called ‘concrete words’ with the actual parts in front of students. He talked through what students were doing as they were making objects in their class. For example, when the students were labelling car engines, he told them what the parts of the engines were, asked students to repeat the words, checked on their memory of the words the next day in class and sometimes had verbal or written assessment activities on the target vocabulary the next day. For example, he might have had students labelling pictures or picking out a word from lists. A bilingual himself, Henry often explored vocabulary in his students’ languages, for example, when identifying a tool, but he mostly taught in English. He learned early on that the distinction between a concept and its label is interesting for students and it can build confidence for students with low self-confidence. Henry pointed out a range of challenges for vocabulary in Technology classes.

The most challenging aspect of vocabulary was that the assessment materials were not understandable for the students. The assessment was set by industry and was targeted for people in the workplace, rather than students in secondary school, so the students struggled to understand the assessment questions because the text contained too many words which the students did not know. The local polytechnic prescribed the assessment material, which came through the motor trade organisation and the building trades. The effect of this load of vocabulary in assessment meant that Henry found himself teaching towards assessment. He checked every year that the terms in assessment questions and items that his students might possible use in their answers were explicitly taught. Another challenge was that students were introduced to many new terms each day in Technology, for example, health and safety vocabulary: precautions, hazardous, projecting, assigned, and codes of practice. Having been through a course of study on Teaching English to Speakers of Other Languages (TESOL), Henry learned about collocations, and was developing a list of common collocations for his students, for example, legislative regulations and maintaining adequate room. He also kept ahead of the class, preparing lists of vocabulary which would be coming up in class and making lists of what he thought his students would not know. Henry could not count on students who were interested in being mechanics knowing anything about cars. When deciding on the amount of vocabulary to focus on per class, Henry found that for most of the students, ten new words would be too much of a learning stretch for them. He decided that a lesser number, perhaps five to eight words, would be fine for them to work on. Students needed to be able to use the words as part of their learning. Henry found that if words were concrete, and if students actually handled the tools or other objects related to the vocabulary, the students remembered these words quite well. Maintaining interest was also a challenge: Henry found he often had only a few minutes of class time before students might begin to lose interest. The final challenge, from Henry’s perspective, is the most heartbreaking. He reported that Technology in higher level schools was moving into thinking, reading, writing and solutions for a modern world. This shift meant that there was a gap developing between lower and higher level schools in Technology, as the lower level schools were focused on hands-on skills rather than higher level

problem solving. This gap would have major consequences for the students in Henry’s classes in his lower level school. As he said, ‘It’s likely that you will not get employed if you are good with a chisel, unless perhaps in the arts. It’s important to not only earn a salary but to have something to live for.’ Henry’s recount of his experiences with teaching Technology in a lower decile school gives us some insight into the concerns and approaches of teachers in vocabulary and other parts of education. Interviews such as this one with Henry present an opportunity to find out more about the challenges, needs and opportunities for vocabulary in schools. This kind of data can challenge or expand the data gathered through computer analysis of texts.

Conclusion This chapter has focused on specialised vocabulary in school through examples of research in different contexts and subject areas. It is clear that much more research is needed in secondary schools to find out more about specialised vocabulary, for example, in a wider variety of subject areas, as well as the progression of vocabulary from early to later years in school. The next chapter focuses on vocabulary in university contexts.

Chapter 6 Pre-university, undergraduate and postgraduate vocabulary

Introduction This chapter is divided into three main areas of vocabulary research in university contexts: EGAP for pre-university studies, ESAP for undergraduate level studies and ESAP for postgraduate studies. EAP is a sub-field of ESP. The chapter begins by considering why the pre-university and university contexts have been and continue to be a major area of interest for vocabulary studies. The pre-university section focuses on written and spoken vocabulary studies for EGAP. Subject and discipline-specific vocabulary at undergraduate and postgraduate levels follows, looking in particular at the Sciences, including a Science list for EAP and subjectspecific research in Agriculture, Chemistry and Computer Science; Medicine, including Pharmacology; Engineering; and Applied Linguistics. The chapter concludes with a discussion of limitations in researching the vocabulary of specialised vocabulary in academic setting.

Why research vocabulary in pre-university and university contexts? Maxwell (2013) makes a very telling point about EAP: ‘Nobody is a native speaker of Academic English’. For this reason, research into specialised vocabulary for academic purposes is important from a range of perspectives. In a

study of vocabulary in use in academic settings (Coxhead, 2011d), an experienced EAP lecturer in a university in New Zealand referred to vocabulary as ‘the hidden curriculum in EAP’ (p. 150). He explained this point further by saying, Well, it’s [vocabulary] there in everything we do but sometimes we don’t actually realise that vocabulary learning is actually going on and it is part of the task. You know, sometimes we are so focused on the writing element that it is easy to let it slide and not think enough about how they are developing the words they are using, because we are so focused on getting them to deal with the form and such things like that. Equally I think you can go the other way in that you can focus so much on explicit teaching of vocabulary that students don’t see it as part of the reading and writing process…. It is a balance I suppose getting the right balance is the hard part. (Coxhead, 2011d, p. 150)

In the same study, Coxhead (2011d) interviewed English as a second language writers on their approaches to specialised vocabulary in their academic writing. Fale, a participant from Samoa who was studying Nursing at university, highlighted how some of her encounters with specialised vocabulary came about through her husband beginning university studies, Fale: Ever since [my husband] came to university he use (sic) this word [perception]. It is getting common in the house because of him. We called it ‘[my husband’s] word’. He brought it home from university and uses it when we are talking together. [For example, he says], ‘Come on, that is your own perception. (Coxhead, 2011d, p. 139)

Fale and the whole family began to use this word as a marker of university language and found that perception and other academic words from university studies started to be used more often in family debates. Other students in the same study remarked that using specialised vocabulary in writing was particularly important when writing academic assignments for lecturers. There has been a large amount of research into vocabulary in pre-university

and university contexts. Like ESP, the field of EAP has been driven by learner needs. Early attempts to identify the lexis needed to succeed in university settings include Lynn (1973) and Ghadessy (1979) who developed word lists focused on lexical items that learners annotated in their textbooks, through to Xue and Nation’s (1984) University Word List, built by combining four existing academic word lists, including Lynn’s and Ghadessy’s. Many of the words in the resulting word lists are academic in nature, rather than technical (Nation, 2013). Difficulty with academic vocabulary for students has been noted in a range of more recent studies, such as by Biber and colleagues (see Biber, 2006) and Evans and Morrison’s (2011) research into the first year of study in the medium of English at a university in Hong Kong. In his book on word lists, Nation (2016, p. 149) makes a key point about academic vocabulary. He writes, Academic vocabulary needs to be seen as cutting across the three frequency bands of high, mid-, and low frequency words. Words can be both academic words and high frequency words, both academic words and mid-frequency words and so on. We have seen this same point illustrated in Chapter 2 of this book, where everyday words, for example, can also carry technical or specialised meanings in different subject areas. In this chapter, we will see how different selection criteria and approaches to research into academic vocabulary can affect the results of studies. It can be difficult to decide whether a study of technical vocabulary in Engineering or Medicine, for example, belongs in a chapter on university contexts or in Chapter 7 on occupational or professional contexts. This problem is compounded if a study draws on a corpus of professional writing, such as research articles, but is intended for university-level or pre-university-level students. In this case, the question could be whether these learners really read these kinds of texts. The numbers of English language learners studying at university in Englishmedium institutions began to increase in the 1990s and 2000s and, apart from some times of economic decline when numbers decreased, the trend continues upwards. Evans and Morrison (2011) identify four types of students who undertake studies in English-medium institutions: students wanting to study in English-speaking countries, students in post-colonial countries where English-

medium universities still remain, students studying by distance or at campuses of existing English-medium universities overseas and those who study in countries such as Germany and in parts of Scandinavia where programmes are delivered in English. Vocabulary is a major area of concern for the students in this study. Reflecting the nature of the possible areas of study and levels of study, EAP tends to be separated into EGAP and ESAP (Dudley-Evans & St. John, 1998). As these labels suggest, general Academic English focuses on common areas of academic activity for a range of disciplines, for example, writing general academic essays, listening to lectures, giving presentations, and reading for academic purposes. EGAP students could enrol in a pre-university programme as preparation for their studies and perhaps as a way to develop their language skills to pass an international academically focused test, which is recognised by English-medium institutions and allows entrance into university studies. ESAP tends to focus on the ‘language, discourse structure, and terminology of the genre-specific and discourse-specific domains, such as writing a PhD thesis in biochemistry’ (Flowerdew, 2015a, p. 466). Chapter 2 on identifying specialised vocabulary and Chapter 3 on word lists have already introduced examples of research into single words and multiword units from both EGAP and ESAP. The focus of this chapter is to delve more deeply into vocabulary for pre, undergraduate and postgraduate purposes, using examples from quantitative and qualitative approaches (and from studies which draw on both approaches).

Pre-university studies: vocabulary in English for general academic purposes Studying at university can mean exposure to several million running words a year through reading textbooks, source books, content and learning-based websites and other academic sources of information. Pre-reading for lectures, tutorials and laboratories is commonly assigned to learners. The nature of the words encountered while reading academic texts is a fundamental area of research into university language. An example of this research is work carried

out by Miller (2011) whose US-based study analysed the percentage of the AWL (Coxhead, 2000) as well as readability/complexity and syntactic features of the texts in two corpora: university textbooks and ESL reading books. The textbook corpus contained six disciplines: Business, Humanities, Natural Science, Social Science, Education and Engineering. In the AWL comparison, Miller found that roughly half the number of AWL items were in the ESL reading materials as were in the university-level texts. Miller (2011) points out that this figure means that on an average page of 400 words, an ESL textbook would contain approximately 15 fewer AWL items per page than a university textbook. Miller (2011) comments, It is possible, then, that the ESL textbooks are providing students neither the exposure to the range of academic vocabulary nor the number of encounters with academic vocabulary that they may need to develop successful comprehension of university textbooks. (Miller, 2011, p. 39)

Written academic texts have been gathered into corpora for analysis of lexis which is shared across disciplines. Examples of this kind of research include, for example, Coxhead’s AWL (2000), Gardner and Davies (2014), Browne, Culligan and Phillips (2013a) and Liu (2012). Academic word lists have started from two different points, as we have already seen in Chapter 2. One way is to assume that EAP learners have a basic general knowledge of vocabulary before they start to specialise in academic studies. Coxhead’s AWL is an example of this approach, drawing on the most fully formed and principled general word list available in the late 1990s, West’s (1953) GSL. The coverage of the AWL has been reported in a range of studies since the original 2000 study by Coxhead (see Table 6.1). Note that the studies with higher coverage of the AWL tend to be on written academic texts, while spoken academic texts and secondary school texts have lower coverage figures, overall. Table 6.1 Coverage of the AWL over a range of academic corpora by frequency (adapted from Coxhead, 2011c, p. 356)

Study

Khani andTazik (2013)

Corpus

Number Percent coverage of of the AWL running (Coxhead, 2000) words

Applied Linguistics research 1,553,450 articles Written learned section of the Cobb and Horst Brown corpus (Francis & Kucera, 14,283 (2004) 1979) Konstantakis Business 1 million (2007) Ward (2009) Engineering 271,000 Vongpumivitch, Applied Linguistics research 1.5 Huang and papers million Chang (2009) Sciences, Engineering, and Social Hyland andTse Sciences, written by professional 3,292,600 (2007) and student writers Li and Qian 6.3 Finance (2010) million Chen and Ge Medical research articles 190,425 (2007) Academic writing in Arts, 3.5 Coxhead (2000) Commerce, Law and Science million Valipouri and Chemistry 4 million Nassaji (2013) Martinez, Beck Agricultural sciences research 826,416 and Panza (2009) articles Coxhead and 1.5 Science Hirsh (2007) million Coxhead, Pathway series of secondary Stevens and 279,733 Science textbooks Tinkle (2010) Webb and University admission listening Paribakht (2015) tests

11.96 11.60 11.51 11.3 11.17

10.6 10.46 10.073 10 9.60 9.06 8.96 7.05 6.48

Paribakht (2015) Thompson (2006) Dang and Webb (2014) Coxhead and Walls (2012) Hincks (2003) (cited in Dang & Webb, 2014)

tests Lectures in the BASE corpus Lectures and seminars in the BASE corpus TED Talks Student presentations

6.48 4.9 1,691,997

4.41

43,656

4 2.4

As well as researching the coverage of the AWL over spoken academic texts, Dang and Webb (2014) used the BASE to find out more about the vocabulary profile of these texts. The 1,691,997-word corpus contained four discipline areas: Arts and Humanities, Life and Medical Sciences, Physical Sciences and Social Sciences. Dang and Webb (2014) found that 4,000 word families plus proper nouns and marginal words provided 96.05% coverage of the corpus, and 8,000 word families provided 98% coverage. Life and Medical Sciences, however, took 13,000 word families plus proper nouns and marginal words to reach 98.05% coverage families. Again, studying in the Sciences seems to involve a larger vocabulary load than other academic disciplines. The AWL has also been investigated in a range of studies that focus on textbooks for university students and EAP learners. Miller (2011) carried out a lexical analysis of 75 reading texts from 3 EAP student textbooks and 28 university textbooks from 6 academic disciplines. This study found that EAP textbooks contained fewer items from the AWL than the university textbooks. Another example of an academic word list which takes high frequency general English into account is Browne et al.’s (2013a) New Academic Word List (available at www.newgeneralservicelist.org/nawl-new-academic-word-list/). This list was developed from a corpus of over 288 million running words. The corpus was in three main sections: the Cambridge English Corpus of academic journals, non-fiction, student essays and academic discourse (over 248 million words); the Michigan Corpus of Academic Spoken (MICASE) and BASE corpora (three million words); and an academic textbook corpus (approximately 36 million words). Browne et al. (2013b) excluded the vocabulary in their New GSL, developed from the two-billion-word Cambridge English Corpus (and available

at www.newgeneralservicelist.org/). The resulting word list contains 963 words, such as repertoire, obtain, distribution, parameter, aspect, dynamic, impact, domain, publish and denote. The list is available in several formats: headwords, lemmas, a bilingual English and Japanese list of meanings of the words in the list and a frequency-based version of the list. There are several weaknesses of an approach that assumes learners already have knowledge of high frequency words when building an academic word list. One is whether learners actually do know a basic vocabulary before they start learning vocabulary for EAP. Studies in Vietnam (Webb & Chang, 2012; Nguyen & Nation, 2011), Denmark (Henriksen & Danelund, 2015) and Indonesia (Nurweni & Read, 1999) suggest that foreign language learners tend not to show mastery of the first 1,000 words of English even after quite a few years of studying the language. Such estimates in English as a second language contexts are not so readily available. There are now more general word lists, including Nation’s (2006) BNC lists, a later BNC/COCA high frequency word list also by Nation (2012), Brezina and Gablasova’s (2015) New GSL, and Browne et al. (2013b) list, which is also called a New GSL List. Dang and Webb (2014) compared the first four of those existing general word lists over a large corpus of spoken and written general English texts in their quest to develop a word list for beginners. They broke their Essential Word List into sets of 100 items and argued for an overall size of 800 items in the list because the number of items sets a reasonable and focused goal of high frequency words for learners and teachers. Hyland and Tse (2007) were critical of Coxhead’s (2000) approach of identifying vocabulary occurring across a range of academic subjects, pointing out that lexis can behave differently in terms of meaning and grammatical patterning in texts. Hyland and Tse (2007) compared lexical items in Coxhead’s AWL in professional and student writing corpora in Science, Engineering and Social Science. They found variations in frequency and in meaning in these corpora. Table 6.2 shows examples from Hyland and Tse (2007) of consist, credit, and abstract from the AWL. The meanings of these words are in the second column, with the number of occurrences of the words in Science, Engineering and Social Science in the next columns. Note the meanings are presented in the order of the highest total occurrences overall.

Table 6.2 Examples (adapted from Hyland &Tse, 2007, p. 245) of the distribution of meanings of consist, credit and abstract across three disciplines (%)

Gardner and Davies (2014) developed their AVL by starting from scratch, that is, without any assumption of high frequency vocabulary. Gardner and Davies (2014) used a ratio-based corpus-comparison approach, based on a 120-millionword academic section of the COCA. The academic corpus contained 13,000 academic journal articles and magazine articles with an academic orientation in the following nine subject areas: Business and Finance, Education, History, Humanities, Law and Political Science, Medicine and Health, Philosophy, Religion and Psychology, Science and Technology and Social Science. The AVL is available in three formats: a word family list of 1991 words, a list of 3,015 lemmas and a list of 20,845 word types. The AVL coverage over the academic COCA and the academic sections of the BNC is 14% (Gardner & Davies, 2014). This coverage is higher than Coxhead’s AWL coverage over her academic written corpora (10%) and most academic written corpus-based studies of the AWL (see the aforementioned). This difference in coverage comes from general high frequency words such as so and because meeting the selection criteria for the word list.

A multiword unit approach to general academic vocabulary Some examples of corpus-based, multiword unit research in general academic vocabulary have already been discussed in Chapter 4, including Biber (2006), Ackermann and Chen’s (2013) ACL, Durrant’s (2009) research into collocation in

academic texts, and Simpson-Vlach and Ellis’s (2010) Academic Formulas List. Liu’s (2012) study investigated a range of multiword units in academic corpora, drawing on some of earlier research to find out more about lexical bundles, idioms and phrasal verbs for general academic purposes. Liu used the academic sections of the COCA (nearly 83 million running words) and BNC (over 15 million running words) for the 2012 study. He interrogated these corpora using existing lists of general academic lexical bundles (Biber et al., 1999; Carter & McCarthy, 2006; Simpson-Vlach & Ellis, 2010; phrasal verbs (Biber et al., 1999; Gardner & Davies, 2007); several of his own earlier studies, and idioms sourced from a number of dictionaries. Biber et al.’s (2004) categorisation system was used to qualitatively identify the functions of the lexical bundles (discourse organising bundles, referential bundles and stance bundles – see Chapter 4 for more on this method). Liu (2012) also provided three frequency-based bands of 228 frequent multiword constructions: the first band contains items which occurred more than 100 times in the corpora (77 constructions), the second band contains 85 constructions which occurred 50–99 times and the third band contains 67 items that occurred 20–49 times. Here are examples from the most frequent band: Such as (det + N), as well as (det + N), NP suggest that, according to (det + N) and (be) based on (det + N). These examples show again the prevalence of high frequency vocabulary in academic texts and that high frequency words often make up high frequency multiword units.

Specialised vocabulary in undergraduate studies A particular difficulty with research in this field is to decide on whether it fits into academic or occupational purposes. The examples of research in this section focus on subject-specific research in undergraduate studies. Chapter 7 contains research on English for Professional or Occupational Purposes. In some cases, this division is somewhat arbitrary. Aviation, for example, is in Chapter 7. Medicine is in both this chapter (research into Medicine as an area of academic study) and the next (Medicine in occupational settings, such as workplace communication).

Sciences Studies into specialised vocabulary in the Sciences in university contexts have found lower levels of coverage of high frequency words than in secondary school texts (see Coxhead et al., 2010). In Chapter 2, we looked briefly at Coxhead and Hirsh’s (2007) Science List for EAP. The genesis of this research was the lower coverage West’s GSL (1953) and Coxhead’s (2000) AWL over the Science subcorpus in Coxhead’s (2000) study. This lower coverage overall suggested that Science texts contain large amounts of similar vocabulary to the other disciplines, but they also contain different vocabulary. Science has been noted in the literature for its differences in lexis. Biber’s (2006) study of lexical bundles in university textbooks found higher percentages of bundles in Natural Science than in Social Science, Business, Engineering and Humanities. Biber attributes this greater reliance on lexical bundles in Natural Science to the heavy technical content of the textbooks. Taken together, this research indicates that the vocabulary of Science is an important area to focus on, particularly when it comes to preparing pre-university Science students to deal with reading Science textbooks and other texts in university studies. Coxhead and Hirsh (2007) started with the Science sub-corpus in Coxhead’s (2000) AWL study and decided to increase the corpus from 750,000 running words in seven subject areas. They included seven more Science subjects and doubled the size of the corpus (see Chapter 2). One purpose of the study was to find out where a general academic word list such as the AWL might stop and a Science word list might begin. This study is different from subject-specific studies such as Valipouri and Nassaji’s (2013) study of the GSL and AWL in Chemistry research articles (see the following), in that Coxhead and Hirsh (2007) were looking across a range of subjects within the Sciences, not focusing in-depth in one particular subject. The second purpose of the study was to develop a Science-specific word list. Coxhead and Hirsh’s 318 word families in the EAP Science List (2007) covered 3.79% of the Science corpus, but 3.06% of the original seven subjects of the AWL corpus (Biology, Chemistry, Computer Science, Geography, Geology, Mathematics and Physics) and 4.52% over the new seven subject areas of Agricultural Science, Ecology, Engineering and Technology, Horticultural Science, Nursing and Midwifery, Sport and Health Science and Vet and Animal

Science. This finding suggests that the Science list has better return for learning for these Sciences. The question of where the AWL might stop and a Science-specific list might start provided quite an interesting picture, based on frequency analysis. The most frequent 300 word families of the AWL (570 word families) covered 7.1% of the Science corpus whereas the Science list (318 word families) covered 3.71%. Breaking down the whole lists into their sublists and looking at the coverage figures, Coxhead and Hirsh (2007) found that Sublist 1 of the AWL, which contains the most frequent 60 word families, covered 2.87% of the Science corpus and Sublist 1 of the Science list covered 2.01%. Together, then, these 120 word families cover almost 5% of the corpus. The AWL sublists continue to have higher coverage over the Science corpus than the EAP Science sublists, as both lists drop in coverage following a basic Zipf (1935) pattern (see Chapter 3). That is, the coverage roughly halves each time. Coxhead and Quero (2015) investigated the coverage of the Science list over a corpus of ten million running words of Medical textbooks to see whether the list offered any value for medical students preparing for their studies. The coverage of the EAP Science List was 5.98% over the Medical corpus, which suggests this list contains lexical items which deserve attention in this specific area of study. The next section moves into more subjectspecific research on vocabulary at university in the Sciences.

Agriculture Martínez et al. (2009) carried out a corpus-based study of specialised vocabulary in research articles in agricultural sciences (Agro Corpus) to find out more about the AWL (Coxhead, 2000) and items outside that list. The corpus contained 826,416 running words from 218 research articles written by academics based in English-speaking universities and published between 2000 and 2003. The corpus analysis included a comparison across sections of the articles: introduction, method, results and discussion. Eight AWL types covered 0.4% of the agriculture corpus: data, analysis, significant, similar, significantly, area, response and concentration. The highest proportion of AWL items was in the discussion

sections. Martínez et al. (2009) suggest that discussion sections of research articles tend to be more argumentative (c.f. Martínez, 2003) which could be a reason for the higher use of AWL vocabulary. The discussion sections tend to be longer than method or results sections, meaning that these lexical items have more opportunity to occur. The distribution of lexis across the other sections varied, with less variation and fewer AWL items in results sections and more variation in lexis in methods sections. Table 6.3 shows examples of AWL items across the sections of the AgroCorpus. The comparison of these findings from Agriculture with Chen and Ge’s (2007) medical study (see the following), Coxhead’s (2000) general academic vocabulary research and Hyland and Tse’s (2007) analysis of lexis across disciplines and in student and professional writing, revealed that some, but not very much, vocabulary is shared between the corpus-based findings of all four studies. In part, this finding is not surprising, because of the different research goals and corpora involved in each study. Martínez et al. (2009, p. 196) conclude that their study shows ‘that the more specific the corpus, the greater the specificity of items, and consequently, the lower the variability.’ Table 6.3 Most frequent academic word families in the sections of the AgroCorpus (Martínez et al., 2009, p. 190)

RA Introduction

RA Methods

RA Results

RA Discussion

response area environmental stress variation interactions region factors potential available

analysis data site area method medium estimated extracted conducted culture

significantly data similar sites sequence response analysis concentration indicated range

similar significant response sites data area environments indicate stress accumulation

A further step in the study by Martínez et al. (2009) was to select several items from the corpus and carry out a qualitative analysis using a technical dictionary and examples of the lexical items in context and in clusters from the AgroCorpus. A good example is culture, which relates to plant cultivation, as in these examples: blueberry cell cultures, cultures were grown, cultures were maintained and cultures were incubated. Martínez et al. (2009) also point out that high frequency words from West’s GSL (1953) occur with specialised meanings in Agriculture in their corpus, for example, effect and control, and that learners may not know these words before starting their studies. This point provides a counter argument to assuming that learners would know the first 2,000 words of English before commencing study of an academic nature (Coxhead, 2000). More analysis of target words through consultations with dictionaries and experts, as Martínez et al. (2009) have done in their work enriches quantitative findings and provides more evidence of language in use in specialised subjects.

Chemistry Using a corpus of four million words containing 1185 research articles in Chemistry, Valipouri and Nassaji (2013) focused on developing a Chemistrybased word list for foreign language learners in the field. The corpus had four main subject areas: Analytical, Organic, Inorganic and Physical/Theoretical chemistry. The process of developing the Chemistry list included a corpus analysis using selection principles such as frequency and a focus on content words (which meant abbreviations were excluded from the list). A qualitative analysis of the corpus data was carried out using Chung and Nation’s (2003) rating scale (see Chapter 2) to help judge the level of technicality of the items from the corpus because Valipouri and Nassaji (2013, p. 252) wanted to exclude items that were too technical in nature. That is, words such hydroxyl and octahedral were eliminated because they were specific to a particular area of Chemistry and not common to all four areas of the corpus. This step also involved judgements of lexical technicality by three Chemistry professors. A total of 145 items were eliminated through this step in the process. The resulting

Chemistry Academic Word List (CAWL) contained 1400 academic word families and covered 81.18% of the corpus. Here are the ten most frequent words in the CAWL list: use, show, react, result, solve, spectrum, can, form, temperature and high. These words occurred over 9,000 times each in the corpus.

Computer Science Lam (2001) researched the impact of new Computer Science vocabulary on firstyear students at a Hong Kong university. With a group of 425 first-year Computer Science students, Lam (2001) used qualitative methods, including interviews, computer usage exercises and think-aloud techniques along with quantitative methods such as multiple-choice questionnaires to find out more about vocabulary in this subject area. Table 6.4 shows some examples of general, semi-technical and technical vocabulary in Computer Science according to Lam (2001). Table 6.4 Examples of general, semi-technical and technical vocabulary in Computer Science from Lam (2001, p. 28)

General vocabulary

Semi-technical Computer Science vocabulary

Technical Computer Science vocabulary

employee is wish maintain six target

child dialogue mouse parent tree view

fort ran gotrue hipo keyboard megabyte tuple

Note that some examples in the second column in Table 6.4 might appear to be everyday vocabulary, such as parent and tree, but they also have a specific meaning in Computer Science. The learners in Lam’s (2001) study reported that

while dictionaries were useful for developing an understanding of semi-technical vocabulary, the students used subject-specific glosses to help with their learning. Table 6.5 Most frequent technical items and meanings in Computer Science from West’s second 1,000 words of the GSL (Radford, 2013, p. 32)

GSL word

Technical meaning

bit double host list print program(s)

binary an integer data type a computer a data structure a term for outputting data from a program in the computer programming sense either the ‘superuser’ on a Unix system, the top-level node of a tree-like data structure, or the mathematical square root in a data structure a data type

root row string

Radford (2013) used his many years of study and teaching in Computer Science to identify lexical items with a technical meaning in his teaching materials corpus. Table 6.5 shows examples of high frequency specialised vocabulary in Computer Science (occurring over 200 times in the corpus) and their meanings. These lexical items also occur in the second 1,000 words of West’s GSL (1953). Both Lam and Radford contend that it is the semi-technical words in Computer Science which second language learners struggle most with during their university studies. The ‘fully’ technical words, on the other hand, require effort from all learners, native speakers and non-native speakers of English, and these technical words include a range of content words including proper nouns, acronyms and abbreviations.

Specialised vocabulary in Medicine

A key feature of English for Medical Purposes is a large technical vocabulary, including a substantial proportion of lexical items from Latin and Greek (Ferguson, 2013). A range of studies have used corpora to find out more about the frequently used lexical items in Medicine, mostly with the aim of identifying vocabulary worth focusing on for second language learners studying Medicine. This research is important because Chung and Nation (2003) found that one word in three in an Anatomy textbook was technical. Such a high proportion of technical vocabulary suggests that learners in this field have a large learning task in terms of vocabulary. Chen and Ge (2007) used a corpus of medical research articles to investigate the frequency and distribution of the families of the AWL (Coxhead, 2000). Chen and Ge (2007) found that the AWL covered 10% of their medical research article corpus, but only 51.1% of the AWL occurred with the frequency needed to be selected for a Medical word list. This coverage by the AWL is not surprising because it was not developed from a corpus of Medical texts; instead, the focus of that list was general academic purposes. Chen and Ge (2007) compared the AWL across sections of their research articles: abstract, introduction, materials and methods, results and discussion and find that the AWL covers just under 10% of the materials and methods and results sections, but higher than 10% over the other sections. This study shed light on how a general academic word list such as the AWL might be useful to some degree for students studying Medicine, but there is a much larger group of words outside the AWL than in the AWL, which are important for specialisation in this field. A follow-up study on Medical vocabulary was carried out by Wang, Liang, and Ge (2008) who developed a Medical Academic Word List (MAWL). This word list was created using a 1.09-million-word corpus, again with research articles with the introduction, method, results and discussion structure. The corpus contained 32 subject areas, including Anesthesiology and Pain Medicine, Medicine and Dentistry, Cardiology and Cardiovascular Medicine, Nephrology and Clinical Neurology. Principles involved in making this list included specialised occurrence of lexis (outside the first 2,000 words, represented by West’s, 1953, GSL), range and frequency. Two subject experts were consulted to check the items in the word list. The resulting MAWL contains 623 word families and covers 12.24% of the source corpus. The MAWL and AWL have a 55% overlap, and the Wang et al.

(2008) Medical list includes general academic lexical items such as previous and data, as well as specialised Medical lexis, such as vein and lesion. A more recent study by Lei and Liu (2016) used a Medical textbook corpus as a reference corpus (3.5 million running words) and a source corpus of medical journal articles (2.7 million running words) as a comparison corpus in order to develop a lemma-based Medical word list: the Medical AVL. This project combined selection and checking procedures from Gardner and Davies (2014; Coxhead (2000). The selection process began with frequency of occurrence in the Medical corpus, and was followed by a comparison of frequency in the BNC corpus and Medical corpus. Other selection principles included the range of occurrence, dispersion across texts, a measure to determine whether items were discipline specific and a final dictionary check for high frequency words and specialised meanings. The resulting word list contains 819 lemmas. Lei and Liu (2016) checked the coverage of their specialised word list over three corpora, general, academic and medical, to check whether the Medical list was more specialised than general in nature. The list in Lei and Liu (2016) contains part of speech information for each item (for example, adult_a; adult_n) and lemmas are also notated by part of speech: alter_v, alteration_n, altered_a, alternatively_r (adverb). A study which reflected the kind of reading that medical students in Taiwan are required to do, Hsu’s (2013) study involved a 15-million-word corpus of 155 Medical textbooks in 31 subject areas and Nation’s (2006) BNC lists. A key focus for the Hsu (2013 study is a gap between semi-technical and highly technical vocabulary in Medicine, particularly for medical students in the Taiwanese context. Because a large number of lexical items in the Medical corpus occurred outside the first 14,000 words of the BNC (the lists were available only up to 14,000 at the time of this study), Hsu (2013 p. 467) selected lexical items that occurred outside the first 3,000 words of the BNC lists (Nation, 2006), across more than half the 31 subject areas of her corpus and a frequency of over 800 occurrences. Some of the high frequency items in this word list are diagnosis, renal and syndrome. An example of a medical study which focuses on one subject area is Fraser’s (2007, 2009) research into the vocabulary of Pharmacology, drawing on a 185,000 corpus of 51 research articles from six main areas of the field. The resulting

Pharmacology Word List contained 601 word families which covered 12.91% of the corpus (examples include abnormality, mutation, plasma, saline, and toxicity). Fraser (2007) also developed lists of abbreviations and acronyms for Pharmacology. Grabowski (2015) also looked into pharmaceutical lexis using a corpus, concentrating on multiword units (see Chapter 4). This section has focused on studies in specialised vocabulary in Medicine, using various approaches to extract lexis from corpora. This research can be particularly challenging for researchers who do not have medical training, especially when there are high frequency words which have a medical meaning that only in the medical context (Hsu, 2013). Examples of such specialised vocabulary include susceptible to colds, tear duct, anesthesia induction agents, dilate pupils and cerebrospinal fluid (Hsu, 2013, p. 467).

Engineering Like Medicine, Engineering has been an area of research into specialised vocabulary and the majority of this research is based on corpora. Several studies have used student textbooks as a corpus for identifying specialised vocabulary to support student learning. Chapter 2 outlined a study (Ward, 2009) which discussed the development of a Basic Engineering List for university students in Thailand. This exercise created a fairly powerful list of 299 words (such as equation, process, show and temperature) which covered 11.3% of the Engineering textbook corpus. Another study of Engineering was carried out by Mudraya (2006), who developed an Engineering Academic Word List made up of 1200 word families based on a corpus of 12 textbooks from 13 disciplines, which totalled nearly two million running words. This study also focused on learners in Thailand and identified 1260 word families for students to focus on in their studies, including items such as all, force, form, give, part, point, show, problem and work. A study based in Taiwan by Hsu (2014) used a textbook corpus (4.57 million running words; 100 textbooks; 20 subject areas) to develop an Engineering English Word List (EEWL). The first 2,000 words of English were excluded from

this study, and other selection principles included range, frequency and uniformity. A total of 729 word families met the selection criteria, and covered 14.3% of the corpus. Hsu (2014) identified lexical items that occurred in some parts of the corpus but not others. For example, membrane and enzyme are high frequency items in Biomedical, Biochemical, Biotechnology and Marine Engineering but do not often occur in Communication, Mechanical and Civil Engineering. In a recent paper, Watson-Todd (2017, p. 35) focuses on specialised vocabulary in Engineering through a corpus analysis which involves both quantitative and qualitative approaches. The first steps involve identifying Engineering vocabulary in a corpus, and the next steps involve investigating the ‘opaqueness’ of these words. ‘Opaque’ words are defined as ‘words which do not have their usual meaning’. This concept is discussed in Chapter 2 in the section on everyday words as specialised vocabulary. Watson-Todd’s research took place in Thailand and focused on the English vocabulary needed by Engineering students. His corpus contained 27 textbooks (approximately 1.15 million words). Initial steps for identifying technical vocabulary were a keyword analysis with the Engineering corpus and the BNC, and applying selection criteria such as removing abbreviations. These first steps garnered a list of 186 candidates for the opaque analysis. The overlap between this list and the Ward list was 47.31%; and with the first 100 word families from Mudraya’s list was 39.78%. The top ten words in the 186-word list from Watson-Todd (2017, p. 38) are determine, flow, figure, temperature, energy, force(s), pressure, function(s), equation(s) and shown. For the opaque analysis, Watson-Todd (2017) used the corpus to compare the meanings of 186 words in context with the meanings in online dictionaries which were commonly used by students. Using six selection criteria for selecting candidates for his word list, including parts of speech and meaning checks with the Engineering corpus, the BNC and several dictionaries, Watson-Todd (2017) identifies items such as constant (n) an Engineering word as having a high opaque rating because it has a specialised meaning of ‘fixed number’. The meaning-based analysis is illustrated well in the contrast between note (n) with a low opaque rating with the meaning of ‘brief record’ and note (n) with a high opaque meaning of ‘notice’. Table 6.6 contains examples of words and their opacity ratings from Watson-Todd (2017). Note that this table also includes

collocations to the left and right of the target word, and examples from the corpus. The final list of opaque Engineering words contains 41 items which are less general and more oriented towards Engineering (Watson-Todd, 2017, p. 37). Table 6.6 Examples from Watson-Todd’s (2017, p. 36) opacity-ranked Engineering word list

Word

Opacity rating

Meaning in the Engineering corpus

constant

6

fixed number

note

6

point out

acting

5

influencing

tree

3

hierarchy

force

2

strength

Patterns of collocation

Examples from the Engineering corpus

a spring has a force constant k note that when the -that element is supplying energy acting the external forces acting on loads -; the system have no forces -; horizontal components spanning -; can be represented by search -; the binary binary tree in figure 2 driving-; the conservative frictional -; gravitational gravitational force exerted by the -exerted surface is a -; the -k

The case of Applied Linguistics Several studies have investigated the lexis of Applied Linguistics, beginning with Chung and Nation’s (2003) use of a scale to identify technical vocabulary in a textbook (see Chapter 2). Chung and Nation (2003) found just over 5,000 types of the 93,445 word corpus were technical, or about 20% of the Applied Linguistics textbook, and that just over 40% of this vocabulary was present in the first 2000

words of West’s GSL (1953), 17.4% was also in Coxhead’s AWL (2000), 16.3% was technical and 24.5% were low frequency words. In other words, this technical vocabulary could be found in many different frequency levels in English. Figure 6.1 shows a portion of the Applied Linguistics textbook (Ellis, 1999) from Chung and Nation (2003), where words in normal type are high frequency words, AWL words are bolded (e.g. interaction and input), words in italics are low frequency (pedagogy and interpersonal) and words which are underlined are technical. The sample in Figure 6.1 illustrates the 20% or one in five estimate of technical vocabulary from Chung and Nation (2003). It also shows the repetition of key technical words in context, and how this text contains technical vocabulary in both education and research. This study took place before Nation’s (2006) BNC frequency lists and any of the newer academic and general word lists which have appeared since 2014, so a further study could perhaps map these newer lists to the data from the Chung and Nation (2003) research.

Figure 6.1 A sample of the Applied Linguistics text showing the various kinds of words (Ellis, 1999, p. 1)

Two other studies which focus on Applied Linguistics are Vongpumivitch et al. (2009) and Khani and Tazik (2013). Both studies developed corpora of research articles in order to analyse the presence of AWL in the corpus, and to identify

words outside that word list which could be candidates for a specialised word list. These studies both found over 11% of the Applied Linguistics texts could be found in the AWL, in contrast to Chung and Nation’s finding of 6.9% over the Applied Linguistics textbook. Vongpumivitch et al. (2009) developed a corpus of 200 research articles from five journals, and identified 128 non-AWL words which include items that relate to this specialised area of education (for example, metalinguistic, morphology/morphological and phonological), and also to research (for example, ANOVA, correlated/correlation(s) and longitudinal). Khani and Tazik (2013) also drew on academic journal articles for their research, this time downloading 240 articles in total from 12 journals, making a total corpus of 1,553,450 running words. They identified 773 types which cover 12.48% of their corpus. A total of 15 AWL words are in the top-20 words in their list, for instance, research, text, data and task. The five non-AWL words in the top 20 are discourse, classroom, linguistic, corpus and proficiency. These studies all illustrate the relationship between vocabulary in Applied Linguistics, research and language education. They also illustrate the broad variety of lexis which occurs in this area of Humanities and Arts.

Limitations of specialised vocabulary research in EAP There is often overlap between academic and professional purposes which can make it difficult to decide what research fits where. To help make this kind of decision, it is important to consider the purpose of the research, since ESP is often driven by student needs. A good example is the work on Engineering which can be divided between Engineering for academic purposes, focused on the vocabulary of written texts in Engineering for learners at university, such as work by Ward (1999) and Hsu (2014) which focuses on the textbooks for the identification of specialised vocabulary. The needs of Engineering students once they leave university and enter the workforce would shift the focus to specialised vocabulary for professional purposes, and perhaps focus more on professional reading such as research articles. Another example is Medical English, where students start with a building block of studies including Biology and Chemistry,

for example, and the following years of education move into more specific areas of study and specialisation. Post-university study, medical purposes also move into professional English. This is why some work on Medicine and Engineering is in this chapter, and some is in the following chapter on specialised vocabulary in the professions. The Humanities have not been as thoroughly researched in university vocabulary studies. The hard-pure and hard-applied Sciences seem to have demanded a great deal of attention, possibly because of the current push for Science, Technology, Engineering and Mathematics (STEM) and the dominance of these areas in international student and second language speakers of English enrolment into STEM. That said, while areas such as Design, Education and Criminology do not seem to have attracted much research activity, even many of the undergraduate areas of the Sciences, such as Biology, Chemistry and Psychology have also received little attention.

Conclusion This chapter has focused on general and specialised vocabulary in EAP and university contexts. Overall, this area of research has been particularly productive, as researchers use corpus-based research to find out more about the lexis of general EAP, and, increasingly, about vocabulary in the disciplines.

Chapter 7 Specialised vocabulary research and the professions

Introduction This chapter focuses on English for Professional Purposes and English for Occupational Purposes. Basturkmen (2010, p. 6) illustrates these helpful distinctions in ESP: English for Professional Purposes, using her examples, includes English for the health care sector under general English for Professional Purposes and English for Nursing as English for Specific Professional Purposes. In English for Occupational Purposes, Basturkmen provides an example of English for the hospitality industry as English for General Occupational Purposes and English for hotel receptionists as an example of English for Specific Occupational purposes. The chapter begins with a discussion of why researchers might investigate the specialised vocabulary of English for Professional and Occupational Purposes. The chapter includes research into professional vocabulary in Aviation, Legal English, and Business and Finance and Occupational Vocabulary in Medical Communication and Nursing. These areas are drawn upon to illustrate the potential and limitations of vocabulary research for the professional and occupational spheres.

Why investigate specialised vocabulary in the professions?

The relationship between knowledge of a profession or occupation and its specialised vocabulary is very strong. Students need to know the vocabulary of their field well in order to function as professionals. Peters and Fernández (2013, p. 236) make the point that ‘since professional decisions and judgements hang upon their [learners’] command of specialised language, there is a strong incentive for learners to grapple with the challenges of bridging the gap between their L1 and L2, and develop the necessary lexicon.’ Learners often consult specialised dictionaries to support their learning, as was the case in Peters and Fernández’s (2013) study of Spanish students of Architecture. The students in this study sought support for three types of vocabulary in their field (p. 240). The first was lexical items that were specific to architecture and building, such as duct and cladding. The second was lexis that was shared with other academic areas of study and disciplines (general academic vocabulary), such as resources and stress. The final type was everyday words for materials used in the building process, such as measure and straw. So specialised vocabulary in the professions is important because it is essential for learners and it can cover different types of vocabulary within one field. Professional purposes vocabulary is also important because it is needed for communication between professionals and between professionals and laypeople. At the personal level, everyday types of encounters with professionals in the health area can remind us of the importance of clear communication between experts and non-experts. Medical staff need to know the technical vocabulary of their field as well as the everyday or layman’s terms for the same diseases or conditions, so that they can communicate with patients and their families and support people. When a basketballer in New Zealand suffered an eye injury on the court in early 2017, he was interviewed on the radio about what the medical staff had said. His response was basically that the doctors had said ‘a bunch of big technical words and then a bunch more’. But he understood enough to know that his sight was going to be fine and the injury was not permanent. Another good example in Dentistry is the use of the word tartar in advertising and everyday English to mean hardened plaque on teeth. Tartar or plaque are more likely to be used in a conversation with a non-specialist audience or advertising than the technical term calculus. Not all specialised vocabulary is a matter of life and death, but it is important enough to know when to use it, how to use it and

what other vocabulary can be used in its place as necessary. The characteristics of specialised vocabulary make it important for research in professional contexts. One characteristic is the sheer amount of specialised vocabulary in a field of study; it potentially represents up to a third of the vocabulary needed in any field. As mentioned in earlier chapters Chung and Nation (2003) found that one word in three in an Anatomy textbook was technical. Another characteristic is that these technical words are not all long and complicated. They can be high, mid and low frequency vocabulary (Nation, 2016), and they can be single words or multi-word units. Examples in Aviation (Aiguo, 2007) include black box, sniffer dog, base leg, downwind leg crosswind leg and upwind leg. Some of this lexis may be shared between professions, while some lexis may be limited to particular field. All of these points can present potential difficulties if researchers are experts in Applied Linguistics research, but not experts in the field they are researching. It is also important to find out what aspects of specialised vocabulary might be needed in a field. For example, writing for publication in English for Academic Professional Purposes (Belcher, Serrano & Yang, 2016; J. Flowerdew, 2015) demands a high level of academic subject area knowledge and English language proficiency. Verdaguer, Laso and Salazar (1996) developed a corpus of Biology, Medicine and Biochemistry to use as a reference tool for Spanish Biomedical academic writers needing to publish in English. Their main focus was ‘high frequency non-specialised lexical items and phraseology, which pose the main difficulties to researchers whose mother tongue is not English’ (p. 22). This response to a need for non-specialised vocabulary indicates how important lexis is in professional academic writing in English. Another example of such support comes from Cheng (2014) who draws on a multi-disciplinary corpus of academic research articles to identify common collocations used in different sections of published texts, including introductions, discussions, and results sections. In Aviation, pilots and air traffic controllers rely on spoken communication, and need to know what vocabulary they can use in standard situations and what to use in the event of an emergency. Let’s now look more closely at specialised vocabulary in Aviation.

Aviation Vocabulary in Aviation English is very tightly controlled and is divided into phraseology and plain English (Moder, 2013). This lexis is prescribed by the ICAO (ICAO Document 4444 Air Traffic Management, cited in Moder, 2013). Estival, Farris and Molesworth (2016) provide a basic definition of Aviation English, saying it is generally considered to consist of prescribed exchange formats, standard phraseology, which is defined as prescribed vocabulary and syntax, and specific pronunciation. Each of these elements is an attempt at solving problems of communication that could be critical for safety. (p. 15)

Air traffic controllers and pilots are required to have sufficient vocabulary to communicate in standardised language and in ‘plain’ speak. As mentioned earlier, Aviation professionals have to be able to communicate in unexpected situations, and through radio-based communication rather than face-to-face (Moder, 2013). The routine communication of Aviation English involves the sharing of information from written documents and radar displays (air traffic controllers) and aviation instrumentation (pilots). Vocabulary presents a major learning challenge for people who are training to be pilots, not just because there is a great deal of new vocabulary but also because they need to learn to leave out ‘unnecessary words’ in their communication (Estival et al., 2016, p. 25). An idea of how restricted the vocabulary of Aviation English can be seen in Estival (2016, pp. 37–46), including examples that may cause confusion such as request (as in ‘I should like to know or wish to obtain’, p. 39) and require (which is ‘not a preference but an operational requirement’, p. 39). Aviation language has been investigated in several ways. Moder and Halleck (2012, cited in Moder, 2013) found frequent verbs in Aviation follow stages of a flight and include everyday words such as hold, turn, maintain and control. Lopez, Condamines and Joseelin-Leray (2013) carried out a study of ‘standardised official phraseology’ using a reference corpus of 16,821 words of radiotelephony communication for radio controllers. A second corpus was extracted from two

manuals and an advanced learner corpus of French controllers and international pilots (77,782 tokens). Lopez et al. (2013) report that the reference corpus contained more nouns than the learner corpus, with acronyms making up 8.2% of all noun tokens in the written corpus, such as calculated take off time. The spoken data also contained a large number of acronyms which cover 3.1% of noun tokens. Testing language skills of pilots is an important area of research. Knoch (2014) carried out a small-scale study in which pilots were asked to rate the performance of test takers in an Aviation test. During the rating process, a concern was raised about one of the test taker’s use and knowledge of vocabulary by four of the ten pilots in the study. The judges commented that the candidate seemed to be able to cope with the scripted or standard language demands in the test, however, the pilot participants were concerned that the candidate would not be able to cope with using plain language for communicating. This lack of flexibility in lexis suggested a lack of technical knowledge. What is interesting in these judgements from a lexical perspective is the interaction between judgements of linguistic ability, level of technical knowledge, and the overall effectiveness of communication. Sufficiency in language and technical knowledge meant sufficiency in total communication. But a judgement of insufficient technical knowledge but sufficient language ability meant insufficiency in total communication. These insights from pilot judgements lead Knoch (2014) to argue that in Aviation, ‘the testing of language and technical knowledge cannot and should not be separated’ (p. 85) because these two elements of professional knowledge are so intertwined. A feature of Aviation English is the challenging environment for this heavily prescribed specialised vocabulary. In an experimental study of native and nonnative English speakers in flight simulators, Estival and Molesworth (2016) investigated the effect on error rate in communication of four conditions: the rate of speech of Air Traffic Control (ATC), the information load from ATC communications, pilot workload, and congestion in the radio frequency. In each case, the pilots had a flight which set baseline conditions for the condition (for example, slow rate of speech from ATC with pauses; low information load in transmissions from ATC) and a paired flight with the increased challenge (i.e. fast speech rate from ATC with no pauses; high information load per

transmission from ATC). Estival and Molesworth (2016) find that increased pilot workload affected communication accuracy of all the pilots in the study. Lexical errors in communication were more likely to involve omissions rather than errors, and errors were more likely to be in giving an incorrect number. Estival and Molesworth (2016, p. 173) conclude from their data that ‘mistakes rarely occur with the limited vocabulary of Aviation English phraseology, but are more frequent with the actual numbers to be transmitted, which are less predictable from context’. In Aviation-related research, Cutting (2012) investigated English for ground staff, based on four groups: security guards, bus drivers, catering staff and ground handlers. These four ‘trades’ have very different roles and different sets of people they communicate with. For example, ground handlers have several main roles, according to Cutting (2012), including baggage handling, communication with pilots, safety and movement of aircraft and driving the truck which pushes the aircraft back from the gate. The research included a range of data gathered through observations and interviews in the workplace in Britain and France. Cutting’s focus was predominantly functional and therefore grammatical. It would be interesting to find out more about the lexis required in these occupations and how much, if any, overlap there might be between the various jobs in the different sections of the same workplace.

Specialised vocabulary for legal purposes Vocabulary in Legal English has a reputation for being particularly difficult and recent growth in international law has increased the need for learning this lexis (Breeze, 2015). Breeze comments that there is a need for more research into Legal vocabulary and the way it functions in Legal texts. Maher (2016) used a keyword analysis of a corpus of one million running words of postgraduate legal student writing (essays and theses) with the BAWE corpus to develop a list of semitechnical Legal vocabulary which was shared between a range of Legal disciplines. As well as uncovering a range of Legal vocabulary, such as court, law, justice and article, Maher (2016) noted preferences in the Legal English writing

for particular verbs (e.g. hold, state, note and require). Maher’s (2016) analysis also focused on high frequency words in the data set (that and of) and their functions in the corpus, such as referring to sources and evaluation by the writer. As well as there being a variety in specialised vocabulary in areas of law and Legal texts, Northcott (2013) notes that a key point in English for Legal Purposes is that different legal systems mean that different language might be required. This means that research into this area of vocabulary needs to be careful in selecting any texts for a corpus analysis in Law for a particular group of learners or purposes. For example, Marín (2014) focuses on vocabulary from the United Kingdom Supreme Court in her quest to identify Legal vocabulary to support law students in the Spanish context, while Hartig and Lu (2014) focus on plain English and Legal writing in law schools in the United States by novice and professional writers. To identify Legal vocabulary, Breeze (2015) developed a 400,000-word corpus of legal documents downloaded from the Internet. These documents included ten legal text types, for example, contracts of sale, non-competition agreements, merger agreements and lease agreements. A frequency count helped identify the most frequent words in the corpus and a comparison with the BNC frequency lists cited in Scott and Tribble (2006) indicates that there are major differences between general and Legal English vocabulary. For example, shall, company and agreement occur in the first 20 words of the Legal frequency list and board, obligations and forth occur in the first 100 words. Breeze (2015, p. 50) also investigated Legal collocations and clusters in her corpus. High frequency items included subject to, in effect, shall have the meaning set forth, and any interest in. The ten areas of the corpus were all investigated for specialised vocabulary, rather than general Legal vocabulary. Table 7.1 contains examples of keywords from Breeze’s analysis. These keywords were identified through comparing the subcorpora with the whole corpus. The keyness factor for these words is 45. Table 7.1 Examples of specialised vocabulary from Breeze (201 5, p. 58)

Non-competition agreements

Merger agreements

Loan agreements

executive firm

company merger

bank borrower

firm interests area buyer agreement damages exchangeable service liquidated

merger stock holding subsidiaries election options shares holder stockholder

borrower advance loan interest rate committed amendment accounts principal

In a practical study of the technical vocabulary of law, Csomay and Petrović (2012) identified instances of Legal terminology in a 128,897 word corpus made up of transcripts from legal television shows (e.g. Law and Order) and movies (for example, A Few Good Men and Runaway Jury and checked the words in legal dictionaries. They identified technical words, such as bar, constitute, deny, court, document and excuse, covered more than 5% of their TV and movie corpus. Csomay and Petrović (2012) found that technical vocabulary was not evenly distributed throughout the movies and television programmes. It makes sense that courtroom scenes may well contain more legal lexis than other scenes in such shows. The amount of technical word families per movie (22.4) differed from the amount per TV episode (12.2). Examples of high frequency specialised Legal vocabulary in the corpus, such as argue (for example, I’m prepared to argue the motion) and depose/deposition (I’m going to depose Mr Lefkin. The deposition is set for next Thursday afternoon. I’m going to take depositions from all the executives), illustrate how these legal terms are used (Csomay & Petrović, 2012, p. 312) in context. This study is particularly interesting because it interrogates a spoken corpus as a resource for encountering specialised vocabulary in context and illustrates how the research can be used for pedagogical purposes. Attempts to help law students with vocabulary in Legal writing include Hafner and Candlin’s (2007) study in Hong Kong, using a set of online tutorials, a concordancer and a collocation tool. The study found on the whole that rather than using these tools for lexical support, the students were more likely to be searching for examples of legal documents (for example, affirmation or defence and counterclaim) that they were required to write for their course (Hafner &

Candlin, 2007, p. 312). The students were patchy in their use of the corpus tools as lexical support. That said, there was recognition by the students of a specialised Legal corpus as being a useful source of language.

Business Studies Business vocabulary has been investigated in a range of studies, primarily through corpus analysis and focusing on word list development. Nelson (2000, n.d.) gathered texts for his corpora on ‘writing about business’, including for example, business books, journals, and articles, and ‘writing to do’ business, drawing on annual reports, faxes, and letters. Mirror spoken corpora were gathered: ‘talking about business’, containing interviews and business reports on radio and TV and ‘speaking to do business’, including texts from meetings, speeches and presentations. Nelson carried out a keyword analysis using a smaller version of the BNC. Table 7.2 shows examples from Nelson’s categorisations of nouns for people, companies, business activities, money and finance, and ‘things’. These words were found more often in the Business corpora than the BNC corpus. The categorisation was based on a computer analysis and Nelson’s own intuition as an experienced teacher of Business English. These kinds of categorisation can help language learners and teachers identify the kinds of nouns which are common in Business English. These words cut across high, mid, and low frequency profiles and include abbreviations and acronyms. Nelson (n.d.) also categorises verbs from his keyword analysis, such as invest, restructure, underwrite, compete and merge as verbs related to work and business, as well as personal and interpersonal verbs such as announce, relate, motivate, inform and propose. Like Nelson, Crawford Camiciottoli (2007) categorised the 174 technical words she identified in her corpus of Business Studies Lectures Corpus. For example, in the categories of business activities and economic trends, the technical words include failing and deal, and price and cost were categorised as common technical words in the area of increasing business performance. Table 7.2 Ten examples from Nelson’s (n.d.) keyword categorisations of Business nouns

People

Companies & Institutions

customer management contractor manager seller/’s buyer supplier distributor director shareholder

company/’s industry .com organisation airline telecom pic EU CO subsidiary

Activities

MoneyFinance

Things

business investment product delivery payment information transmission expense property development earnings strategy production economy sector communication revenue equipment operation currency material installation fee requirement competition margin opportunity implementation salary partnership

Konstantakis (2007) developed a Business Word List by drawing on part of Nelson’s (2000) corpora. Konstantakis (2007) used the Business English textbook corpus of 33 course books (approximately 600,000 words) to identify lexis outside the first 2,000 word families of West GSL (1953) and Coxhead’s AWL (2000). Through a process of refinement and categorisation of abbreviations, acronyms and proper nouns, the final word list of 560 word families covered 2.79% of the corpus. Examples from this list Konstantakis (2007, p. 98) include telefax, television, telex, tennis, terrific, territory, textiles, prawns, preferably, premises, price-list and printer. Browne and Culligan (2016) developed a Business Service List using a 64.5million-word business corpus, which contained texts from the BNC, textbooks, newspapers, journals and websites. Their list contains approximately 1,700 high frequency general business English words. The word list is available for downloading in a range of formats for teaching and research, including lemmatised and frequency versions, as well as ready-made versions to be used for analysis on Ant Word (Anthony, 2016) and Lex Tutor (Cobb, n.d.). Table 7.3 shows the top-20 items in the Business Service List by Browne and Cullligan (2016). Note the range of items from everyday vocabulary through to more specialised lexis. Nelson (2000) and Scott and Tribble (2006) make the point that

business English contains a great deal of general English, with some business terms making up a smaller amount of the text. This point is true of other areas of specialisation as well (Nation, 2016, 2013). Table 7.3 Top 20 Business Service list words (Browne & Culligan, 2016)

Word

Business Service List Rank

mister goods equity dividend portfolio sponsorship inventory transaction non lease hedge distribution premium client impact authority obtain maturity publish sometime

1 2 3 4 5 6 7 B 9 10 1 1 12 13 14 IS 16 17 18 19 20

Tangpijaikul (2014) investigates English for Business and Economic News in the Thai context. The researcher developed an 890,000-word corpus of business and economic news, arguing that, ‘Financial English words that are frequently used in business and economic news are important for business people, traders

and economists’ (p. 52). This study involved both a quantitative analysis of the corpus with a keyword analysis of the specialised corpus and the BNC and a qualitative rating scale analysis carried out by the researcher, and two experienced managers in Marketing and Finance. Tangpijaikul (2014) finds 134 words were agreed upon through the analyses as being technical for business and economic news, including items such as bancassurance, brokerage, capitalisation, demutualisation, populist, shareholder, surge, venture and waiver (p. 65).

Finance Several studies into Finance have drawn on the Hong Kong Financial Services Corpus, a product of the Research Centre for Professional Communication in English, Department of English of the Hong Kong Polytechnic University. This corpus contains over seven million running words of annual reports and earnings calls. Li and Qian (2010) analysed the corpus using existing word lists, Coxhead’s AWL and West’s GSL and Nation’s BNC 1,000. Their coverage figures suggest that the annual reports and earnings calls contained a large amount of high frequency vocabulary. The AWL performed differently across the various areas of the corpus, for example, the coverage of procedures was almost 20%, as opposed to several other areas where the coverage was closer to 8%. The most frequent AWL items in the corpus were finance, invest, fund, issue, secure, period, corporate, income, option and require. Neufeld, Hancioğlu and Eldridge (2011) reanalysed the corpus used in the Li and Qian study. These researchers focused on cleaning the corpus and dealing with non-ASCII characters in the corpus. The coverage of the AWL/GSL came to just over 91% in the reanalysed corpus. As Neufeld et al. (2011) note, this reanalysis is helpful because it supports the need for accessible corpora to support the verification of the findings. Moving on from single words to multi-word units in Finance, Cheng (2012) describes the extraction of phrases of two to five word multi-word units from the Hong Kong Financial Services Corpus using a programme called ConcGrams (Greaves, 2009). Examples of such phrases include risk management, management of risk and management of operational risk. In other words, the

multi-word units do not need to be continuous. High frequency phrases occur regularly in the corpus, as would be expected. Mid-frequency ConcGrams from the corpus include outflow/resources, risk shares, and trading treasury (p. 98). In a wide-ranging study of Financial English involving both quantitative and qualitative methodologies, Ha (2015) investigated a written and spoken corpus. The 6,753,212-word corpus contained written annual reports, scripted spoken earnings calls and spoken impromptu question and answer sessions (which follow from the scripted earning calls presentations) from four sectors: Banks, Financial Services, Insurance and Real Estate. Ha carried out keyword analyses with the annual reports and a written academic corpus of Computer Science, identifying 1,361 Finance-specific lexical items that covered up to 30% of the corpus. Out of that larger number, Ha (2015) choose 837 words to subject to a technicality analysis (from least technical to most technical on a five point scale), which involved referencing existing word lists and checking the meaning of the words in both general and financial dictionaries. Out of the 837, 802 were assigned to the least to moderately technical. Nine items were found to be in the most technical group including accretable, accretive and lien (p. 169). Ha (2015) also rated the technicality of 539 multi-word units extracted from the Financial corpus using a meaning-based categorisation. Examples of moderately technical multi-word units include real estate, fair value, capital expenditures, balance sheet and deferred tax. Very technical multi-word units include common stock, carrying value and mutual fund. This categorisation depended on the distance between the literal meanings of the constituent parts of the multi-word unit. The inclusion of written and spoken corpora in Ha’s (2015) research is important because research into spoken vocabulary in professional purposes is not as common as written vocabulary. An example of specialised single-word and multi-word units in English for Professional Purposes by Salvi (2014) gives us insight into the lexical choices of two European political leaders speaking in a time of financial crisis. Not surprisingly, Salvi (2014) finds topic-specific vocabulary such as financial, crisis, economic and euro in common between the speakers, as well as differences in that one speaker favours growth and the other favours budget. A follow-up analysis of multi-word units in the speeches shows that one speaker used more specific technical units, such as percent of GDP, the sovereign debt crisis and the financial stability board, than the other.

Health-care communication Specialised vocabulary in Medical English has been the subject of a range of research in EAP to identify this vocabulary to support language learners and teachers (see Chapter 6), however, it is vital to also consider the use of this lexis in health communication. That is, how complex medical concepts and procedures are communicated between medical professionals and patients, as well as between members of the profession. Franken and Hunter (2012) report that the choice of language by practitioners had an impact on communication effectiveness with patients in a study carried out in Aotearoa/New Zealand. In particular, Franken and Hunter (2012) note examples of when vocabulary use was problematic, such as unfamiliar vocabulary, ‘They use words that you’ve never heard before’ (p. 15) and the need to take the audience into account: The doctors also need to simplify their medical terms to patients. I understand a lot of it but I have been in with doctors with my parents they used medical terms where my mother wouldn’t understand. If they could simplify that, make it simple and understandable for people, maybe we would know then what they are talking about and… this is your job. But they’re all using big medical words. (Franken and Hunter, 2012, pp. 15–16)

Compare these problems with communication with a definition of successful communication through considerate language use, ‘She [the nurse] was so good because she would talk in the language that I understand’ (Franken & Hunter, 2012, p. 16). Ferguson (2013) notes that for learners in English for Medical Purposes (EMP) there are many variables such as the first language background of learners and their existing medical knowledge. Having existing medical training in a first language should facilitate learning specialised vocabulary in the same field in a second language such as English. Dahm (2011) looked into everyday and medical terminology among international medical graduates in Australia to support the development of a medical professional ESP course. Dahm’s (2011) study showed that Medical language is known by medical students but medical communication needs attention because of ‘divergences in the meaning and perception of medical

terms’ (p. 187) between medical staff and patients. A feature EMP of communication between doctors and patients is idiomatic language. Basturkmen (2010) reports on two ways in which idiomatic expressions were used in medical consultations, in a data set gathered by an ESP teacher through observations. One use of idiomatic language was to describe pain (for example, the odd pain, shooting pains), and the other use was to describe symptoms (for example, be under the weather and broke out in this red rash). Basturkmen (2006) states that such lexical choices (using unmedical vocabulary instead of medical vocabulary) are an attempt to lessen the distance between medical professionals and patients. Multiword units were another feature that the ESP teacher noticed in the observations, including expressions doctors might use when offering suggestions or options to patients, such as it would be a good idea to and what I’d like to suggest is…. (p. 104). Case studies are another important area of Medical English. Canziani and Mungra (2013) undertook a study of the vocabulary in clinical case studies, based on a small corpus of 200 case histories, making the total number of words in the corpus just under 250,000, with the aim of developing a word list. Examples of the areas of Medicine in the corpus include Infectious Diseases, Medicine and Dentistry (General), Nephrology, Obstetrics, Gynecology and Women’s Health and Oncology. The texts in the corpus had very different lengths but the average number of words in a case history was just over 1200 words. Case histories are possibly not very familiar publications for people outside the field. All the case histories in the corpus contained these parts: abstract/introduction/presentation, diagnostic procedure, management of the patient and outcome and discussion/conclusion. The study included comparing the results with some existing specialised word lists such as Coxhead’s (2000) AWL and Wang, Liang, and Ge’s (2008) Medical Academic Word List. Some of the most frequent items in Canziani and Mungra’s (2013) academic word list for clinical case studies were patient, diagnose/diagnosis, symptom/symptomatic/asymptomatic, clinic/clinically, infect/infected/infections/infectious, artery/arteritis/arteries, surgery and cardiac/cardiovascular/endocarditis. These words include both Medical words which might be in general usage as well as highly technical words which might not be well known outside the field of Medicine (for example, endocarditis).

A final point to make about specialised vocabulary in medical communication is the presence of proper nouns, such as Parkinson’s, which are used for diseases and conditions. A wealth of medical information stands behind proper nouns such as Stevens-Johnson, which is a form of toxic epidermal necrolysis, as Quero (2015) found in the course of her analysis of Medical textbooks. Abbreviations are also present in large amounts in Medical English. Some of these items could be readily mistaken for another word, such as in the cases of TEN (toxic epidermal necrolysis) and FISH (fluorescence in situ hybridisation). This section has highlighted that different aspects of specialised vocabulary in health-care communication have been investigated in several ways. The next section focuses on Nursing.

Nursing Specialised vocabulary in Nursing, according to Bosher (2013), is an important part of accuracy in spoken language in this profession. Elements of lexis which are important in accuracy include having an understanding of shades of synonyms. For example, Cameron (1998) compares differences in connotation between choosing to use one of these synonyms depending on the situation: belly, stomach or abdomen. This point suggests that decisions on specialised vocabulary use in Nursing require layers of understanding around general and specialised vocabulary. Part of the decision making on word selection depends on the audience or the person a nurse might be talking with. Nursing involves patient communication, and lexical choice is important when paraphrasing sometimes quite dense medical information and whether to use specialised vocabulary or not, as already seen in the earlier example from Franken and Hunter (2012). An example from O’Hagan et al. (2014) demonstrates that the choice to use specialised vocabulary with patients is not always viewed negatively. The researchers used simulated patient recordings of interactions with nurses to explore judgements of nurses’ communication skills. The simulations were assessed by 15 nurse educators and clinicians. In one case, a nurse chose to use technical vocabulary in a discussion with a patient who had presented with

asthma in hospital previously. In this case, the use of specialised vocabulary was viewed positively, because the nurse showed sensitivity to the needs of the patient by recognising the level of knowledge of the patient and matching it with the choice of technical and specific vocabulary. Table 7.4 Examples from Wette and Hawken (2016) of a written formal and informal medical terminology test

Test instructions

Examples

Replace six formal phrases or words with informal equivalents

For example, previous episodes, excise the dead tissue, practise good hygiene, to lose consciousness

For example, do you have any other medical Rephrase 1 1 questions that conditions? Do you have any history of abdominal are unclear or overly formal pain? Have you had any significant illnesses in the past? Are you ever over-intoxicated? For example, match these terms with any of the Identify 15 lay terms for parts of the body, bodily five options that follow: (a) I've had some spotting, (b) I've had the runs, (c) I've had some leaking functions, pain and illness, using a multiple-choice Options: discharge, diarrhea, light bleeding knowledge assessment frequent urination, incontinence Single words can be important in Nursing studies, as Marston and Hansen (1985, cited in Bosher, 2013) note. They discuss the need for sub-technical vocabulary that underlies technical vocabulary in Nursing, for example administer, position and record. Cameron (1998) also records word strings which are important in Nursing studies, such as bring up, hold on and turn up with. Table 7.4 shows an example of an assessment of lay-medical vocabulary and appropriate formulaic language in a medical degree programme for international students (Wette & Hawken, 2016). Note that the examples include knowledge of appropriate language in informal and formal expressions (first two rows in Table 7.4) and understanding medical terminology and how a layperson might describe it (third row).

High frequency vocabulary plays an important role in Nursing communication. Staples and Biber (2014) investigate grammatical patterns in a spoken corpus of nurse-patient interactions in comparison with a corpus of general English conversations, using a functional analysis of stance. This study provides a wealth of examples of nurse-patient interactions, such as this one showing a nurse using hedging (in this example, kind of) when talking about symptoms: So, when us you you have a low grade temperature which could kind of be from inflammation so we’re not going to be real worried about that (p. 134). These examples of language in use would also be useful for a lexical analysis to find out more about everyday vocabulary and the nature and size of technical vocabulary in Nursing. One study of a corpus of research articles in Nursing by Yang (2015) was an attempt to find out more about specialised vocabulary in context. This study resulted in a specialised word list of 676 word families which covered approximately 13.64% of the source corpus. The one-million-word corpus contained 252 articles, and the analysis excluded high frequency words in English. Examples from this word list include words which we might expect to see in a medically oriented word list, such as participate, cancer, surgery and symptom. Other examples from the list reflect the research orientation of the corpus: data, analyse and method.

Limitations of research into vocabulary and the professions in ESP This chapter has focused on areas of ESP where there has been a focus on vocabulary and a number of studies in each field have been included where possible. The field of Aviation has been fairly well served in vocabulary studies by the regulation of language by the industry in the interests of safety. That said, most of the effort has gone into the specialised vocabulary of pilots and air traffic control, for obvious reasons, and yet many more people in all kinds of professions also work in the Aviation industry, see Cutting (2012). Another area which has been fairly well served is Business, for some quite obvious reasons. A main

limitation, therefore, of vocabulary in the professions, is that there seems to be little research overall. Or vocabulary is a small part of a larger study and mentioned in passing. Collections of research such as Paltridge and Starfield’s (2013) Handbook of English for Specific Purposes and Gotti and Giannoni’s (2014) book on corpus analysis provide useful examples of research into some other areas of ESP. However, much of the research is focused on more grammatical than lexical features of text, in the case of corpus-based research. Another important point is that while some of the research in this chapter has come from qualitative research, such as Knoch’s (2014) analysis of a rating scale for pilot communication and O’Hagan et al. (2014) on assessing nursing communication, much of the research into vocabulary has remained steadfastly quantitative. The word list study by Yang (2015) is a useful start to finding out more about the vocabulary of Nursing. This corpus-based study focused on the lexis in research articles in the field. Much of the health communication research into Nursing centres on the nature of spoken language in the profession, as in the examples earlier of word choice when speaking with colleagues, patients with little knowledge of their condition, and patients with a greater knowledge of their condition. A spoken corpus of health communication in a workplace setting would be a particularly rich and helpful source of information on vocabulary use, choice and technicality. Marra (2013) discusses techniques for gathering such data which have been honed in the area of workplace discourse. Overall, from this sampling of the literature in several areas of English for Professional and Occupational Purposes, there is more research in areas such as Business and Medicine than in other areas of professional vocabulary. The next chapter picks up on this particular point and focuses on research into vocabulary in the trades.

Conclusion This chapter has shown that there is some research on vocabulary in professional and occupational purposes areas, and it comes from several areas of Applied Linguistics research. For example, the medical communication studies involve

discourse and grammatical features analysis, corpus linguistics techniques and ESP initiatives. The literature includes several large-scale studies, such as Nelson (n.d.) and Browne and Culligan (2016), where the results are shared in a number of ways which will help with replication studies, through freely available corpus tools such as AntConc (Anthony, 2016) and the Compleat Lexical Tutor (Cobb, n.d.). This means that users of the research are free to analyse their own corpora and compare results, for example. The Hong Kong Professional-Specific Corpora also allow for multiple studies to be carried out on the same corpora, as we have seen in the work of Li and Qian (2010), Cheng (2012) and Neufeld et al. (2011). Word lists feature in this area of specialised vocabulary research, but a great deal more is needed in terms of increasing the scope of the research to include corpora which reflect the needs of learners in professions and the language used in those professions. These corpus-based studies also need to consider what qualitative research might bring to bear on the data, particularly in terms of word choice and the identification of multi-word units and their functions. Chapter 8 focusses on vocabulary in another area of professional education: the trades.

Chapter 8 Vocabulary in the trades

Introduction This chapter reports on a major research collaboration to investigate language use in the trades, an under-researched area in terms of specific purposes and vocabulary research. The LATTE project in Aotearoa/New Zealand is based at Victoria University of Wellington and Wellington Institute of Technology (Weltec). The context for this study is campus in urban Wellington with a diverse ethnic group of students, including just over 50% Pakeha (NZ European), 14% NZ Māori, 15% Asian, 10% Pacific and small numbers of European and other ethnicities (Parkinson & Mackay, 2016, p. 37). The majority of the students in the trades are male. The Pasifika students tend to be bilingual and have a separate stream in Carpentry for cultural and linguistic support (Parkinson & Mackay, 2016). There is a focus on practical education in this context, which means that classes have a very handson, talk-based flavour. Students in the Carpentry programme build a house in the course of their studies. The chapter begins with a discussion of the importance of the trades as an area of vocabulary research in ESP. An overview of the LATTE project, which focuses on four trades, Automotive Engineering, Fabrication, Carpentry and Plumbing, follows. This project involves both quantitative analysis using corpora and qualitative data in the form of interviews, observations and case studies. The corpus-based research looks at lexical analyses of both professional writing in all four trades, and a section on the spoken corpus for the LATTE project. The data analysis for the spoken corpus was not complete at the time of writing this chapter, and so this section is quite brief. Next, each of the trades illustrates a

different element of specialised vocabulary in turn, beginning with the two construction trades, Carpentry and Plumbing. The Carpentry section focuses on qualitative data from student questionnaires about specialised vocabulary in this trade, quantitative analyses of vocabulary use in student writing in the form of Builders’ Diaries and interview data about the use of the diaries for specialised vocabulary learning and recording. Plumbing is the trade in the next section of the chapter, and the focus for this trade is on using experts to identify specialised vocabulary for developing word lists. The Engineering trades follow, with Automotive Engineering focusing on developing a word list by corpus analysis alone. The section on Fabrication (Welding) focuses on abbreviations in this trade. The chapter ends with a discussion of the limitations of research in vocabulary in the trades research.

The Language in the Trades Education project The LATTE project is a collaboration between Weltec and Victoria University of Wellington, New Zealand. The project leader is Dr Jean Parkinson (project contact leader), Victoria University of Wellington, and the team includes Dr Averil Coxhead, Victoria University of Wellington, Emma McLaughlin, Weltec, Dr James Mackay, Weltec, Len Matautia, Weltec and Murielle Demecheleer, Victoria University of Wellington. The project focuses on questions such as what are the discourse and lexical features of the language which is specific to a trade (for example, Coxhead, Demecheleer & McLaughlin, 2016; Coxhead & Demecheleer, under review) and how do learners go about learning that language? What are the features of language and visual elements in trades’ texts? What are the literacy practices of trades’ tutors (Parkinson & Mackay, 2016)? The investigations include both the written and spoken language which students are exposed to in the building trades (Carpentry and Plumbing) and in Engineering trades (Automotive Technology and Fabrication). The research includes corpora of professional written texts for each of these trades, including course materials, manufacturer’s instructions, Building Codes, Standards and Specifications. This corpus reflects two levels within the New Zealand

Qualifications Framework, Levels 3 and 4, to gain an understanding of whether there are differences or similarities between novice and professional vocabulary in the trades, in comparison with the professional written corpus. There is also a corpus of student writing in the form of Builders’ Diaries in Carpentry (Parkinson, Demecheleer & Mackay, 2017). The spoken corpus includes classroom and on-site/workshop tutor-based language, including sessions by Automotive students where everyone in class just talks about their common passion: cars. In the following sections of this chapter, the vocabulary research in the LATTE project will be used to illustrate qualitative and quantitative research in vocabulary for ESP.

Why research vocabulary in the trades? Most research on specialised vocabulary tends to be focused on academic and specific purposes, concentrating for example on pre-university studies and the vocabulary such students may meet in the course of their university studies. Examples of such work can be found in Chapters 5 and 6. Trades vocabulary is important to research because this area is under explored in ESP. By way of an illustration of the kinds of vocabulary used in this context, here is an example of tutor talk from a Carpentry class from the LATTE project, Figure 8.1. In this example, Connor, the tutor, is discussing thermal insulation in a theory class. Theory classes take place alongside practical building in Carpentry. This discussion takes place in a Pasifika stream of the programme and begins with several brand names, before going on to the topic of insulation, Note that Connor refers to the islands, where the temperatures are considerably warmer over the course of a year than in New Zealand. This cultural information is an important part of the Pasifika trades class and is being woven into the talk of the classroom to help illustrate a key point about insulation. The key word, insulation is repeated often through the talk, and its word family member (insulated, as in a fully insulated house) features also.

Figure 8.1 Connor, a Carpentry tutor, on specialised vocabulary in the trades

Another important point is that while it appears that research on the trades focuses on literacy requirements for vocational education (see, for example, Ivanič et al., 2009), little of that work focuses on vocabulary. The New Zealand Qualifications Authority includes vocabulary as part of its Literacy Learning Progressions, as can be seen in this description of what ‘literacy learners need’ (Ministry of Education, 2010): Literacy learners need to learn to make meaning of texts. This learning includes the use of background knowledge (including knowledge relating to their culture, language, and identity), vocabulary knowledge, knowledge of how language is structured, knowledge about literacy, and strategies to get or convey meaning. There are clearly examples of research which focus on the kinds of knowledge required for trades education, for example Vaughn, Boone and Eyre (2015) on carpentry and ‘vocational thresholds’, and Parkinson and Mackay (2016) on

literacy practices in Carpentry and Automotive Technology. Research on identity such as Holmes and Woodhams (2013) focuses on how talk in the workplace supports identity building in apprenticeships. A key component of learning to be a tradesperson is to demonstrate through words and actions that the learners belong to the trades (see Colley, James, Diment & Tedder, 2003; Chan, 2013). Another example of literacy and trades can be found in Casey et al. (2006) on embedding literacy and numeracy into plastering. Vocabulary is an important part of trades education. Interviews in the Carpentry programme at a Polytechnic in Aotearoa/New Zealand (Parkinson & Mackay, 2016) uncovered a core concern for tutors about the vocabulary of Carpentry for all students in the programme, whether they were first or second/foreign learners of English. Like educators in other fields, such as secondary school and university, a major concern of tutors was that students were not aware of the technical nature of everyday words. Another concern was that learners needed to know the meanings of words and to be able to use them precisely. Learners need to be able to distinguish between different types of hammers, for example, and between trade names. Trades students are also clear that vocabulary is important in the trades and that it poses challenges for their learning (Coxhead, Demecheleer & McLaughlin, 2016). One particular challenge is that much of the learning is through talk in classrooms and in the case of Carpentry, on building sites. The apprenticeship model encourages working on projects alongside registered tradespeople to gain qualifications which also suggests a focus on learning through listening. As part of the Languages in the Trades project, Carpentry students were asked to complete a questionnaire (see Appendix 2) about some of the challenges of vocabulary in their programmes. The students’ concerns included the sheer amount of vocabulary to learn, ‘Learning what word means which component because there are so many to learn it can get confusing and mix up’ (p. 46), ‘safiet’ [soffit]. They were also concerned that the words in Carpentry are unfamiliar: ‘They sound really weird making it hard to remember and spell out’. Another worry was that trades education required a great deal of learning while listening (Coxhead et al., 2016). These comments from students relate closely to key concepts of vocabulary learning in terms of memorisation, form and meaning connections, and the need for creative use of vocabulary to support learning. The

comments also reflect some key points about vocabulary for specific purposes in that it is closely related to the subject area and there are many new lexical items to learn in the course of study. Part of knowing specialised vocabulary in the trades relates closely to assessment. This point is evident in this section of a class transcript from Automotive Engineering, where Bob, a tutor, has been revising on the content of the course with his students (and promises a reward of a YouTube video of the Isle of Mann crash afterwards). Questions from Bob’s quizzes on starter motors and current flows include, ‘Why is there low current flow through the armature? What’s reducing the current flow through the armature?’ He continues by saying, Bob: Remember all those words that have been spoken this morning because you need to, in the test it is not gonna be drawing the current paths, it’s gonna be explaining with words because the level four students, you need to do more than regurgitate current paths. You need to be able to speak with technical terms or record with technical terms the current flow paths. Ok excellent, moving on then. (Unpublished data, LATTE project)

Another reason why specialised vocabulary is important in trades-based education is that students who are studying a trade develop their knowledge of the language of the trade as they develop their knowledge of the trade. Woodward-Kron (2008) points out that a student’s disciplinary knowledge is closely tied to the command of the specialised language of that discipline. Furthermore, everyday vocabulary plays a role in trades-based language (examples are shown in Figure 8.2), and there is a clear expectation of using the right term and right layer of specificity for trades-related vocabulary. Learners in the trades are expected to know what tools are used for what jobs, for example. These points all lead us to consider some important questions around tradesrelated vocabulary, such as What are the most frequently used lexical items of the four trades in the LATTE study: Automotive Engineering, Fabrication, Carpentry and Plumbing? How might we identify these words for learners and teachers? What kinds of vocabulary make up a specialised word list of plumbing? The first main step to answering some of these questions was gathering written and spoken corpora for analysis.

Corpus-based approaches to specialised vocabulary in the trades Several corpora were gathered for the LATTE project. The purpose of the written professional corpus was to find out more about the nature and frequency of lexical items in the texts which learners were exposed to in their studies in the trades. The written corpus was gathered by interviewing tutors to find out more about the texts they used in class and where possible, gathering these texts into a corpus for analysis. Table 8.1 shows the breakdown of the running words in the written professional corpus. Note that only texts which were actually used in the trades’ courses were included in the corpus, which means that the overall number of running words is fairly low at 1,641,000. In terms of balance between the overarching trades, Construction and Engineering, there are over 90,000 more running words in Construction than Engineering. Carpentry and Fabrication have the lowest number of running words. What kinds of vocabulary might be in this professional corpus? To give a sense of the specialised lexis, Figures 8.2 and 8.3 provide examples of texts from two trades: Carpentry and Automotive Engineering. Figure 8.2 shows a short section of the Unit Standard text (121 words or tokens) about safe work practices on a construction site for New Zealand Carpentry students (New Zealand Qualifications Authority, 2017). This text is an example of a text from Carpentry for assessment purposes. This sample of text is probably quite understandable for a general audience because its subject is health and safety. An analysis of this text found that it contains a range of lexical items, including around almost 92% from the first 4,000 words of Nation’s (2006, 2013) BNC/COCA 25,000 lists. Only four items, accordance, respirator, UV and extinguisher, occur outside the first 4,000 words of the BNC lists. Note the repetition of ‘employer’s safety procedures’ in the text as an example of a multi-word unit. Table 8.1 The written corpus of the LATTE project

Corpus of professional trades writing

Overarching trade Construction Engineering Total

Individual trade

Carpentry Plumbing Automotive and Panel & Paint Fabrication

Subtotal number of running words

Total number of running words

300,000 567,000

867,000

570,000

774,000

204,000 1,641,000

Figure 8.2 A section from Unit Standard 13036, carry out safe working practices on construction sites

Figure 8.3 A sample of text on diesel from a textbook in Automotive Engineering (Weltec, 2016)

In contrast to Figure 8.2, Figure 8.3 contains 137 tokens from a set of materials on Diesel in Automotive Engineering. Like the example in Figure 8.2, just over 90% of the tokens in this text are in the first 4,000 word lists of the BNC, including mounted, port, pump, pumps, reduces,seal, transfer and trapped from BNC-COCA-2,000; consists, distributor fuel, injection, input, squeezed and volume from BNC-COCA-3,000; and offset, outlet, rotates, shaft and slots from BNC-COCA-4,000. There are two items which occur outside the first 4,000 lists of the BNC/COCA, and they are rotor in BNC-COCA-7,000 and vane/vanes in BNC-COCA-11,000. Note the multi-word units in the Automotive text, including distributor type injection. Table 8.2 Examples of high frequency specialised vocabulary in Plumbing, Fabrication and Carpentry

Plumbing

Fabrication

Carpentry

carport claddings drainlayer drainlayers drainlaying flashings gasfitters gasfitting loadbearing

compoundslide setsquare feedshaft tailstock headstock markingout cross-slide scribing fourjaw

baseplate flashings dwang radiata subfloor claddings underlays sarking weathertightness

loadbearing open-vented seaming underlays upstand

fourjaw datums Tsquare leadscrew threejaw

weathertightness precast prefinished insitu weathertight

Table 8.2 shows some more examples of vocabulary in Plumbing, Fabrication and Carpentry. When looking at this table, the first point to consider is whether people outside these fields of expertise would recognise these words, or use them in their daily communication. A word like sarking, in the third column of Table 8.2 or seaming in the first column, are unlikely to be well known outside of the trade. These words would clearly be in Step 3 or 4 of Chung and Nation’s (2004) scale of technical vocabulary (see Chapter 2). This point is important for learners and teachers who are focused on early learning in the trades because it illustrates the specialised nature of the vocabulary of the trade. Some of this vocabulary is not readily accessible in everyday language situations. One aspect to consider about vocational vocabulary is whether there is evidence of shared vocabulary between the trades. Table 8.2 has the words underlays and claddings in Plumbing and in Carpentry. This sharing in the Construction trades is not surprising but the extent of the overlap has yet to be explored in research. Such research would be useful in determining whether there might be a shared vocabulary in trades. Shared vocabulary could be found between various types of Engineering, perhaps, but careful checking is needed to ensure that technical meanings are taken into account in any comparison. Another point to consider is how members of a possible word family, such as drainlayer, drainlaying, drainlayers occur in a corpus (see the first column in Table 8.2). Debate around the unit of counting for word lists is ongoing (see Nation, 2016 and Chapter 2), but it is important to keep in mind that while individual types can have technical meanings, not all members of a word family will necessarily also be technical in nature. Consider fix and fixings, for example, as an example from Carpentry. Fix is a fairly common word in general English, in the sense of repair or attach something to something, usually a wall. Fixings however occurred only in the trades corpora, and did not occur in Nation’s BNC/COCA lists.

The table also shows a range of compounds in the trades, such as leadscrew and underlay. All three trades include compounds, which presents some interesting problems when it comes to identifying and classifying technical terms. The first problem is whether these words belong with a word family or whether they should go into a list of compounds. Another problem is that some compounds might be hyphenated in one text but not hyphenated in another. Hyphens present their own problems in corpus analysis for word lists (Nation, 2016).

Spoken vocabulary in the trades The purpose of the spoken vocabulary corpus for the LATTE project was to find out more about the vocabulary which is used by trades’ tutors in Weltec. If exposure to language is a key part of learning, then clearly spoken classroom discourse is a key data set. The longstanding Language in the Workplace research project led by Janet Holmes and colleagues at Victoria University of Wellington illustrates the importance of gathering spoken data. Holmes and Woodhams (2013) provide an example of research from that stable, illustrating how builder talk contains technical vocabulary which apprentices need pick up in their work, for example to rip meaning ‘to cut timber with the grain; a specialized rip saw is used’ and pallets which are ‘a flat wooden structure on which goods are stored’ (p. 283). There is also an important precedent in discourse studies of classroom-based studies in foreign and second language studies; see, for example, Horst (2010) in English as a second language community conversation classes. Gibbons (2006) points out, The talk of teachers and students draws together – or bridges – the ‘everyday’ language of students learning through English as a second language, and the language associated with the academic registers of school which they must learn to control. (p. 1)

And, as pointed out already earlier, initial interviews with students in the trades clearly showed that listening and talking were important avenues for learning technical vocabulary (Coxhead et al., 2016) The LATTE project has gathered and transcribed spoken texts from Carpentry, Plumbing, Automotive Engineering and Fabrication. Each of the spoken corpora contains over 95,000 running words. This corpus was gathered in classrooms and on building sites and contains theoretical and practical talk by tutors and students. As much as possible, a range of tutors were included in the corpus to provide an understanding of language use by different speakers and to avoid bias in the sampling. A full analysis of this corpus is not yet complete. Figure 8.4 shows an example of a 118-word interaction between a tutor (T.) and a student (S.) on a building site from the Carpentry corpus (unpublished data, LATTE project). Just over 89% of this text is in the first 1,000 word families of Nation’s BNC/COCA lists. Items outside the first 1,000 word families include screw in the BNC-COCA-2,000; angle, drill, drilling and hips in the BNC-COCA3,000; flush from the BNC-COCA-4,000; diagonally from the BNC-COCA-6,000; and rafter in the BNC-COCA-8,000. Note the ‘here and now’ language in this sample, illustrated by the tutor saying ‘that angle there, not that one. See that one there. See all of them need to be’. Note that understanding this sample of talk hinges on knowing what ‘square’ means in Carpentry, and the relationship between angles, hips and facings.

Figure 8.4 Example from a building site interaction in the Carpentry corpus

The next section looks at each of the four trades in the LATTE project in turn, beginning with the Construction trades, and uses examples from the project to look more closely at aspects of the specialised vocabulary in both qualitative and quantitative research.

Construction trades: Carpentry vocabulary There is a triple focus on Carpentry vocabulary in this section. First is quantitative data on students’ own perceptions of what vocabulary they need to learn in Carpentry based on questionnaire data. The next section looks at a sample of Carpentry vocabulary in a professional text. The final section looks at specialised vocabulary in Builders’ Diaries: student writing in Carpentry which involves keeping a daily diary of work done. These diaries are not only records of work but also tools used by the students for vocabulary learning. The final section focuses on professional writing and vocabulary use from the written corpus and analysis of student vocabulary use in the Builders’ Diary corpus. An early part of the LATTE project involved finding out more about language and language skills from learners in the trades. Emma McLaughlin from Weltec compiled a questionnaire for the learners that included several questions about vocabulary in the trades (see Appendix 2). Table 8.3 shows some responses from students to this question: What kind of words do you need to know to learn Carpentry? The table shows 30 of the over 100 words from the Carpentry students’ questionnaires. Table 8.3 Questionnaire responses on specialised vocabulary of Carpentry

No hammer marks

Hardies

Dwang/nog

brightsteel nails galvinised nails truss frames purlins

claddings rusticated faciers [sic] studs site plan

hard up level bevel square it off cavity/ceiling battens

purlins flashings hot dipped galvanised (HDG) purlins

site plan specifications partition 2 × 4

cavity/ceiling battens weather boards router bevel back

This example shows the kinds of lexical items the learners thought were central to their studies. The words and multi-word units they noted down include instructions on what to do (square it off), items which they have used in their classes/on the building site (t russ, flashings and purlins), and words that describe a tool or object very specifically, such as galvanisednails, brightsteel nails and 2 × 4 (as in a 2 × 4 piece of wood). For Faciers [sic], read fascia.

Figure 8.5 Example of specialised trades vocabulary in context: professional writing in Carpentry

The following example from a Carpentry text from the LATTE written corpus (Figure 8.5) contains some of the vocabulary identified by the students. The focus of this text is the procedure needed for setting up a builder’s level. Note the use of imperative verbs for instructions and how specific the text is about what kinds of screws are needed (attachment screws and foot screws). The key word level is very specifically used also, and pronouns are rarely used. This example can be contrasted with the spoken example in Figure 8.5 in terms of specificity of lexis and the here and now nature of the spoken language. The professional written texts and spoken texts of Carpentry provide part of

the picture of specialised vocabulary of this trade. Another part of the picture is to look at student writing in the trade. In Carpentry, students’ ‘Builders’ Diaries’ provide some insight into the kinds of lexis used by the students in their writing and evidence of developing lexical knowledge over time. Figure 8.6 shows an example from a student’s diary. Note the pictures included in the diary to show whatever process is being described. The example shows close connections between the pictures and the texts. Other images in diaries include diagrams, as can be seen in the earlier figure. Note the use of specialised vocabulary in all these examples, such as joist, pile and specific measurements. Some examples of specialised vocabulary in use did not necessarily relate to the trade, as can be seen in Figure 8.7. The target word screw is used by the same writer (unpublished data, LATTE project). The first example is from a description of a problem/solution kind of text by the writer, where the problem is solved by using a particular kind of screw. The second example shows the writer using a colloquial meaning of the word screw meaning ‘to mess something up’ or ‘get something wrong’ – in this case, the measurements for joists.

Figure 8.6 An example of a Builders’ Diary by a student

Figure 8.7 ‘Screw’ as a technical and non-technical vocabulary item in a student’s Builders’ Diaries

The diaries are used by some of the Carpentry students as a vocabulary learning technique. In Figure 8.8, a sample of an interview between a student and a researcher in the LATTE project shows how the diary is used by the student as a vocabulary learning tool (unpublished data, LATTE project). In this case, the student uses the diary as a record of weekly words and encodes them to show learning and use.

Figure 8.8 Interview conversation about vocabulary and the Builders’ Diaries

In another interview, a student (CL) provides advice about vocabulary learning for people who might be thinking about taking up Carpentry studies in the following year. CL replied, Definitely write it in their diaries when they are doing their diaries, because the diaries are the most important thing, I wish I had started my diary earlier in the

year, like every day because I have lost a lot of words that I could have known… things that help me… I forgot my diary for a couple of weeks and I forgot the words. The diaries, then, act as an aide memoir for some students and as a record of their learning and classroom activities during the course of their studies. What vocabulary do these learners use in their Builders’ Diaries (unpublished data, LATTE project)? Table 8.4 shows examples of Carpentry words at each level of Nation’s (2006) BNC lists in an initial analysis of the student Builders’ Diary corpus in the LATTE project. The examples include lexical items such as building, line and edge in the first 1,000 words. These words were rated as technical by tutors in a lexical decision task (see the example from Plumbing that follows for more on this process) and are clear examples of technical words which are also part of general English. These words would perhaps be rated as being Step Three on the Chung and Nation scale (2004) (see Chapter 2). A number of these words have appeared in examples of written Professional Carpentry texts and diary examples in this chapter. The table also shows some lexical items which occurred outside the first 5,000 word families of Nation’s BNC lists. These items include proper nouns, some marginal words which are more likely to appear in the students’ diaries than professional English (for example, crap) and words which reflect the Aotearoa/New Zealand context of the LATTE study, such as types of wood with Māori names, including Matai and Kauri. The final row in the table lists examples of lexical items which only appeared in the Carpentry corpora for the LATTE study. Table 8.4 Examples of frequent Carpentry words in the Builders’ Diaries up to 6,000 of Nation’s BNC lists and beyond

Word list

Examples from diaries

1st 1,000 2nd 1,000 3rd 1,000 4th 1,000

Building, fixing, figure, line, edge Roof, steel, framing, tools, equipment, project Construction, installation, joints, foundation Insulation, timber, moisture, horizontal, trim

4th 1,000 Sth 1,000 Proper nouns Marginal Compounds Abbreviations Carpentry only items

Insulation, timber, moisture, horizontal, trim Bracing, exterior, stud GIB; Karori, Batts, Zincalume, H3, radiata, rimu, matai MM, X, crap Formwork, plasterboard, handrail, flathead, bathroom, Hardfill, offcut PM, Cont, PVC dwang, gibbing, skirtings

Note that MM in the table is categorised as a marginal word because in spoken texts it is a filler. In this context of trades, MM relates to the measurement of a millimetre. This example shows that quantitative analysis by computer needs close follow up by qualitative analysis of the corpus to check for such instances of technical word use that might affect results or seem strange. A comparison of the professional writing corpus for the LATTE project in Carpentry and the student diary corpus (almost 210,000 running words) finds that there are differences in the vocabulary use between the corpora (see Chapter 1 for more on vocabulary load analysis). For example, the professional corpus contained fewer items from the first 2,000 word families of Nation’s BNC lists (76.44%) than the student diaries (approximately 80%). These lists had been ‘backfilled’ with types found in the corpus which belonged to existing word families (for more on this process, see the example from Fabrication). In contrast, the professional writing contained between two to three times more items from the third 1,000 BNC list than the student diaries. Diaries which had been judged by Carpentry tutors as having higher language levels and accurate use of terminology had similar lexical coverage to the professional corpus (nearly 84.5%) by the first 3,000 word families of the BNC. The diaries which were judged as having slightly lower language levels and accurate use of terminology had lower coverage of the 3,000 BNC lists at just over 82%. These figures suggest that less proficient diary writers use some of the same vocabulary as professional writers and more proficient writers, but there are also some differences. This section has focused on Carpentry. The next section moves to another Construction trade, Plumbing, and how identifying specialised vocabulary was carried out in this trade through consulting experts and close analysis of the

vocabulary in the professional written corpus of this trade.

Construction trade: Plumbing and tutor decision tasks The focus of this section is drawing on experts to identify specialised vocabulary in the trade, once a written corpus analysis had been carried out and general English words with no specialised meanings had been removed from the data set (except for items we wanted to check). As part of the selection processes for deciding on lexical items for a Plumbing list (Coxhead & Demecheleer, under review), a tutor decision task was developed. Chapter 2 highlighted some of the main concerns around consulting experts for insights into technical or specialised vocabulary, particularly those raised by Schmitt (2010) who notes that experts tend not to agree. For the LATTE project, we were not looking for agreement between experts per se, although complete agreement would have been a particularly sweet outcome of the process. Rather, we were conscious that the Applied Linguistics researchers were not well versed in the vocabulary of Plumbing, and we were concerned that lexical items might be overlooked in our quest for specialised vocabulary. Guidance from experts, and in particular, experts who were involved in teaching Plumbing, was used therefore to compare with results from our non-expert analysis of specialised vocabulary. This process began with an analysis of the written Plumbing corpus, using Nation’s (2016) BNC/COCA lists. The results of this analysis showed that a large number of words from the first 3,000 or high frequency lists from Nation occurred in the corpus (over 86%). Examples of these words include building, gas, pipe, pressure, drain and discharge. These lexical items were problematic, since they are highly likely to occur in general texts as well. Therefore, we needed to decide which words from these lists would be candidates for including in a word list of Plumbing. At this point, it is important to remember that the list being developed was for Plumbing only, not for other trades. Some of these lexical items might well also be specialised vocabulary for other trades, Carpentry in particular. To start this process, we selected all of the words in the 3,000 BNC lists which appeared ten times or more in the professional writing corpus. This frequency cut

off was decided because it provided a large number of samples to work from and the frequency figures dropped quickly from under ten instances down to one (and there was a long tail). For these high frequency items, two researchers worked independently and rated all the items as general or technical. They used the Plumbing corpus, dictionaries and websites to check the general English items. Any items which appeared to be both general and specific, such as water, were set aside for inclusion in the expert tutor decision task, so that professional judgements could be used to guide word list selection. Items from the 4,000 and other lists which occurred more than ten times in the corpus were considerably less problematic for selection, because they were clearly specialised in nature, but these items were also set aside for the tutor decision task. Some lexical items were completely new to the research team. A total of 815 items were selected for the tutor decision task. The first part of the decision task was a ‘warm-up task’ of 50 items which all the tutors did so that we could compare their responses to the same words. The remaining 765 words were divided into two separate lists to make the decision task more manageable for the tutors. Table 8.5 shows some examples of the warm-up task items in the left column. The middle column asks for ranking, whereby 2 is for items that they see as technical, 1 is for items that are related to Plumbing but not so technical and 0 for items which are not related to Plumbing at all. The column on the right asks tutors to note whether they feel they need to teach this word to students in their Plumbing classes. The tutors took this task very seriously, and took time to discuss the meanings of words and what decisions they make about teaching lexical items in class. Three tutors did this task. One tutor ranked all the words in both tasks, because he was keen to discuss vocabulary in Plumbing and to see all the words that had been selected. The two other tutors did one task each. Did they agree on their rankings? As we had expected, the level of disagreement was much high than the level of agreement on technical words (about 13% agreement), but the discussions and selections all helped provide insight into pedagogical considerations and technical vocabulary in Plumbing. All three tutors agreed on the word shower as a technical word in this field, for example. The tutor who taught the entry-level classes had 68% agreement with the Applied Linguistics researchers on the technicality of words. Decisions on

teaching appeared to be made on both personal and professional grounds. An example of a professional decision would be whether the tutors thought the learners might already know a word. If they thought the learners would know a word already, then the word was not given a technical rating. Clearly, if the tutors taught different levels of Plumbing or had different learners in mind, the ratings would be different. Similar factors affected decisions the tutors made about whether to teach a word. Interestingly, the tutors discussed multi-word units as well as single words in their rankings for teaching. For example, if a word was central to a task in Plumbing, then they would discuss whether the lexis would be taught as part of teaching that task. An example is measure an angle, where the target word in the decisions task was angle. Insights such as this are particularly useful for selecting items for word lists and for uncovering more about the nature of vocabulary in the trades. For more on this study and the resulting pedagogical word list, see Coxhead and Demecheleer (under review). Table 8.5 Warm-up items for the tutor task

Words (20) Level of technicality: 2111(0) Do you need to teach this word? Y/N pressure asbestos external welding flashings earth sealants straight diameter drainlayers length opening amps penetrations plans tap

tap impact shower chemical positive The pedagogical word lists have been translated into Tongan, on the basis that trades education is English medium and uses English textbooks (Coxhead, Parkinson & Tu’amoheloa, under review). This research drew on a Pasifika research methodology called Talanoa (Vaioleti, 2006), which is a culturally informed approach to gathering data based on developing warm and reciprocal relationships through face-to-face communication between participants and researchers. Many of the high frequency words in the Carpentry list have Tongan equivalents, such as asphalt/valitā and ceiling/‘aofi, but some English words have been adopted as loan words (or have been Tonganised), for example, wire/uaea, mortar/mota, beam/pimi. Dwang(s), however, requires paraphrasing into Tongan: papa pātini pe nōkingi ‘o ha alangafale/horizontal bracing pieces between house frame. The next section looks at the Engineering trades, Automotive Engineering and Fabrication, from the LATTE project. These two areas provide examples and discussion points for specialised vocabulary using qualitative and quantitative data.

Engineering trades: Automotive Engineering The focus of this section is using a corpus to identify the specialised vocabulary of Automotive Engineering (AE). This field covers a range of topics such as understanding and repairing engines, electronics and electrical work and vehicle servicing. The AE list (Coxhead, in preparation) developed in the LATTE project was based on a written corpus of texts used in classes at Weltec, just as corpora were the starting points for all the other trades. A total of 1,226 items were identified using frequency principles for this word list. These items are divided into 12 sublists of 100 words by frequency. That is,

the first sublist contains the 100 most frequent words, the second Sublist contains the next 100 most frequent words and so on. Sublist 13 contains 26 items. Table 8.6 shows the first 26 words of Sublist 1 on the left and all of the words in Sublist 13 on the right. Note how much more general the words in the left column are in comparison with those on the right. In some ways, it is heartening to know that the most frequent word in the AE corpus is check. Sublist 13 contains hyphenated items, such as vent-cap and main-shaft. These hyphenated items need careful analysis in future research, as do compounds. These areas of specialised vocabulary often present complex issues for analysis. The words in this list were all checked using dictionaries, the corpus and AE tutor decision tasks to determine whether they were specialised vocabulary. There may be items in the second column that seem odd, such as sipes or dieseling. The meanings of all these words were checked before selection for the AE list. There are a number of very interesting general words on the AE list, including toe, satisfactory, and tooth. These words serve as a reminder that while a word might be considered general, it could also have a technical meaning. Learning a new meaning for a known word can be confusing, as we have seen in Chapter 5. Table 8.6 The first 26 words of the Automotive Engineering (AE) list and all of sublist 13

The most frequent 26 items from Sublist 1 of the AE list Sublist 13 of the AE list check engine test volt figure pressure battery vehicle fuel circuit air

megavolt self-steer vent-cap wash-wipe no-load air-filled all-season all-terrain ball-in-race Brushgear can-bus

complete operate valve read work control resist connect current correct flow require sensor component pump

detent dieseling field-effect main-shaft nickel-metal press-in press-out pulse-width self-discharge side-stand single-wire-system sipes thermoswitch U-shaped varistor

The following is the sample text on Diesel from earlier on in this chapter (Figure 8.9), this time with the items from the AE word list in bold and italics. The text contains 137 words, and over 34% of these words are in the AE list. This figure is very close to Chung and Nation’s (2004) estimate of one word in three (or 33%) words in an Anatomy text being technical, according to their scalebased analysis. This marked up text from AE highlights the importance of multiword units, such as pump housing, input shaft and distributor type injection pumps. More work needs to be done in this area. The marking up in the text also suggests why items such as fuel and pump are in the highest frequency sublist of the AE list (see Table 8.6). These items are regularly used in the sample text.

Figure 8.9 A sample of text on diesel from a textbook in Automotive Engineering (Weltec, 2016)

Engineering trades: Fabrication The second Engineering trade in the LATTE project is Fabrication (Welding), and the focus of this chapter is abbreviations as specialised vocabulary. Fabrication is divided roughly into heavy and light areas of work. Fabrication includes welding, sheet metal work, boiler making, steel construction, and fitter-welder jobs. In an analysis of the specialised vocabulary of written Fabrication texts for the LATTE project, we found 53 common abbreviations (see Table 8.7 for 30 of the 53) and their meanings, organised alphabetically. Some of these abbreviations may well be used in other trades, such as AMP and Ar. Others are clearly closely related to Fabrication, such as AWS for the American Welding Society and GTAW which stands for Gas tungsten arc welding. Meanings were checked using dictionaries, fabrication websites and tutor checking. The list of abbreviations is arranged alphabetically, which is useful for finding lexical items quickly. However, alphabetical lists of words can be problematic. While they make words easy to find, they do not necessarily provide the user with any useful guide to frequency of the words. Table 8.8 shows the top 25 items for Fabrication arranged by frequency and by alphabet to illustrate this point. Word parts seem to play quite a major role in Fabrication texts. Some examples include over~, as in overload, overalls, overlap, overheat, overex-tend, overtight and breakover. Another striking aspect of Fabrication lexis, as with

other trades in the LATTE project, is that this vocabulary ranges from general words used with a technical meaning through to particularly technical words, such as parallax, vanadium, tirfor and thermosetting. With highly specialised lexis such as these words, it is clear that it is no mean feat to develop lexical proficiency in the trades. Table 8.7 Thirty common abbreviations in Fabrication and their meanings

Abbreviation

Meaning

AC AMP AWS BHN CNC DC DCEN DRO DTI GMAW GTAW HAZ HERA HSE HSS ISO MIG MMAW MPa Nm OCV PCD PPE PTFE

alternating current ampere American Welding Society Brinell Hardness Number Computer Numerical Control direct current direct current electrode negative dial or digital readout Dial Test Indicator gas metal arc welding gas tungsten arc welding heat affected zone Heavy Engineering Research Association Health & Safety in Employment High Speed Steel International Organization for Standardization metal inert gas manual metal arc welding megapascal Newton metre open circuit voltage pitch circle diameter personal protective equipment polytetrafluoroethylene

PTFE PVC RCD SI SWL TIG WPS

polytetrafluoroethylene polyvinyl chloride residual current device International System of Units Safe Working Load tungsten inert gas Welding Procedure Specification

Table 8.8 First 25 Fabrication words by frequency and by alphabet

First 25 Fabrication words by frequency First 25 Fabrication words by alphabet weld work figure cut tool material machine source steel centre hazard metal equipment check angle drill surface measure load line point

angle centre check cut draw drill equipment figure hazard lift line load machine material measure metal part point require source steel

point draw require part lift

steel surface tool weld work

Limitations of this research on specialised vocabulary in the trades It is clear from this chapter that there is little research into specialised vocabulary in the trades, so in essence one limitation of the research in this field is that there is so much more to be done. The LATTE project, reported in this chapter, is limited to four trades for practical reasons. Extending to more trades would be time consuming and outside the brief of the project. That said, much more needs to be done on different trades in education and the LATTE project provides some support for extending the research. Furthermore, in trades-based research, like other educational research, it is important to recognise that government, educational institutions and industry play major roles in the organisation of learning and the content of courses. This means that replication studies in vocabulary research in the trades might not be easily achievable, depending on the levels and focus of courses on offer. These limitations mean that it is not particularly easy to generalise from localised research, such as the LATTE project. This chapter reports on a range of elements of vocabulary for trades, reporting mostly on single words. Much more research needs to be carried out on multiword units, including collocations, frames, lexical bundles and metaphor, for example. Another important point is that it is clear from the LATTE project that a great deal of learning in the trades is based on spoken communication, but much of the research presented in this chapter is based on written sources. As mentioned, the LATTE project includes spoken corpora which at the time of writing have yet to be fully analysed. Another area of limitation and therefore a gap for further research is tradesbased vocabulary and different first languages. In the Aotearoa/New Zealand

context, as we have seen in this chapter, some students in the trades come from Pacific islands (for example, Tonga, Fiji, Samoa and others) and bring with them cultural and linguistic learning experiences. More research is needed to consider the needs of ESP learners from different backgrounds and what vocabulary is needed in these contexts for learning in the trades. Furthermore, the impact of the research findings on materials and course development needs to be investigated fully. Finally, trades-based education incorporates apprenticeship models of education, where learning on the job or in the workplace is a key element of a programme of study. The LATTE project focuses on language in the classrooms and on the building sites belonging to the institution, but some students are more heavily involved in apprenticeships or workplace learning. This workplace LATTE element is clearly an area for more research.

Conclusion This chapter has focused on specialised vocabulary in the trades, based on the LATTE project. The trades area has relatively little research on lexis, but this chapter shows that this vocabulary shares the same characteristics as other specialised vocabulary. The examples of texts and specialised lexis in this chapter demonstrate that general vocabulary in English can also take on particular meanings in the trades, that people outside the trades do not necessarily understand specialised trades-based lexis and that there are a large number of words in English that learners in the trades need to know. A particularly interesting element in the trades is that oral communication is a main avenue of teaching and learning, and this point is very important if learners do not have strong literacy skills. This chapter completes Part Two and its focus on vocabulary for ESP in different contexts. It is fitting that the trades as an area of high need for vocabulary research is followed by a chapter which focuses on curriculum and materials design for vocabulary in ESP.

Chapter 9 Vocabulary research and ESP Curriculum, classroom tasks and materials design and testing

Introduction The focus of this chapter is vocabulary in ESP in relation to teaching, learning and testing. The first part focuses on research-based principles and concepts that are important in curriculum, classroom tasks, and materials design for specialised vocabulary. Two theoretical frameworks are presented first: Nation’s Four Strands (2007) and the Involvement Load Hypothesis (Hulstijn & Laufer, 2001; Laufer & Hulstijn, 2001), with examples. The next section focuses on studies into learning and teaching specialised vocabulary, including the use of specialised word lists for pedagogical purposes. The final section focuses on testing and vocabulary for ESP, with a main focus on classroom-based assessment. The chapter ends with a discussion of the limitations of research into specialised vocabulary learning and testing.

Why focus on curriculum, classroom tasks, materials design and testing in specialised vocabulary and ESP research?

L. Flowerdew (2015b, p. 104) makes an important point about a gap in research in teaching and learning with specialised vocabulary when she writes, ‘There are few EAP-oriented studies that go beyond simple frequency counts and also consider learnability and teachability’. Learners and teachers can spend a great deal of time in classes together, and they need to know what approaches to classroom tasks will help with retention of vocabulary knowledge. Knowing what aspects of word knowledge are important for learners and what elements of materials design can help rather than hinder learning are also points for learning specialised vocabulary. These areas are important with both specialised vocabulary as single words or as part of multi-word units (Coxhead, 2008; Meunier & Granger, 2008).

Figure 9.1 An example of a vocabulary-related episode from Basturkmen and Shackelford (2015, p. 92)

Nation (2016) points out that while word lists can help with course design, their influence has yet to be ascertained, because little research has focused on their role and effectiveness in course development, design and assessment. Chapters 4 to 8 focused on specialised vocabulary in a range of contexts. This research is predominantly aimed towards teaching and learning in some way. For example, word list research is often couched as being important because lists can help learners and teachers decide which words to focus on. Research into specialised vocabulary, lexical bundles and collocations is being integrated more into textbooks and dictionaries, and more for learners with intermediate and advanced levels of English proficiency (Gouverneur, 2008). Learner corpora research has formed the basis of selection of lexis to include in learner dictionaries such as the Louvain English for Academic Purposes Dictionary (see Granger & Paquot, 2010) and the Macmillan English Dictionary for Advanced Learners. See Granger (1998) for more on learner corpora and materials design,

dictionaries, textbooks, and online materials. Specialised vocabulary is an important part of all language skills in ESP. For example, academic essay writing is a major form of assessment in undergraduate studies (Moore & Morton, 2005), and it requires engagement with academic concepts, lexis, and texts. Depending on the subject area, as we have seen in Chapters 2 and 3, specialised vocabulary can make up substantial amounts of a text. Research by Basturkmen and Shackleford (2015) (see Chapter 6) showed how two Accountancy lecturers drew attention to vocabulary during classes. On average, the lectures included 20 vocabulary-related episodes each hour. A total of 46% of the LREs in the data set were vocabulary based. An example of a vocabulary-related episode can be seen in Figure 9.1. In this example, the lecturer explains what ‘delayed payment to trades payable’ means, using a short definition and an example. If specialised vocabulary was not important in Accounting, this lecturer would not spend valuable class time focusing on it. Testing is a key element of specialised vocabulary. Douglas (2013) states that tests in ESP are based on three key understandings of the field: First, that language use varies with context, second, that specific purpose language is precise, and third that there is an interaction between specific purpose language and specific purpose knowledge. (p. 368)

There is a clear connection between the need to identify the vocabulary of particular fields of ESP endeavour, developing courses and materials which take the results of such studies into account, and testing learners’ knowledge of that language at the end of language courses. If specialised vocabulary is included in course planning but not included in assessment, then the message to learners is that in the end, this vocabulary is not important. If we are serious about specialised vocabulary, then we need to be serious about testing how well learners are learning it, or if they are not learning it, why not?

Nation’s Four Strands The fundamental idea of Nation’s (2007) Four Strands is that they provide an organisational framework for a vocabulary curriculum. The strands are designed

to run through the curriculum. The Four Strands are meaning-focused input, which involves learning from listening and reading; meaning-focused output, which involves learning from writing and speaking; language-focused learning, where learners are concentrating on learning aspects of words such as spelling, pronunciation, and grammar; and fluency, where learners practise the vocabulary they know well in speaking, reading, writing, and listening. In meaning-focused input and meaning-focused output, ‘meaning’ does not refer to the meanings of the vocabulary, but learners are concentrating on the messages, ideas and concepts being communicated through the input and output strands. With fluency, the focus is also on communication, but in conditions where the learners know the vocabulary well and they know the ideas they want to communicate well. Having three strands focused on communication fits with a recommendation of Ellis (2005) that learners spend more time focusing on meaning and less time focusing on form. Nation advocates for equal amounts of time and focus on each strand, arguing that the three communication-based strands are ‘more widely beneficial’ and that language-focused learning is efficient (2007, p. 9). The Four Strands encapsulate some key conditions for learning another language, such as the importance of output and input. Input is recognised as important for vocabulary learning (Nation, 2013). Output can lead to noticing, ‘or giving attention to an item’ (Nation, 2001, p. 63). Noticing, according to Swain (1995), is an essential part of language learning, because the process of using lexis in speaking or writing can expose gaps in learners’ knowledge which learners can then focus on filling. Input and output also provide opportunities for learners to use or encounter words in new ways. Wittrock (1974) calls this use of vocabulary ‘generative’. Nation (2013) now terms this use of language ‘creative’. This generative or creative process is an important part of developing learners’ understanding of vocabulary, as Corson (1985, p. 115) explains, ‘Meaning is clarified in the act of trying out new words in the context of one’s own utterances and in hearing them used in reply in the original utterances of others.’ Joe (1998) developed a scale for measuring the amount of generation of vocabulary by learners when retelling texts, from 0 on the scale where there is no evidence of generative use through to high use of generation (4 on the scale). High generation is signaled by learners working with a synthesis of elements of

an input text, their own experience and knowledge, and the specialised meaning and use of the word. Nation and Yamamoto (2012) state, ‘The four strands principle is primarily a way of providing a balance of learning opportunities’ (p. 167). The strands can operate at the curriculum level and at the classroom activity level. An example of an activity which fits into the meaning-focused output strand is ranking. Using a sample of frequent items from the Coxhead and Hirsh (2007) EAP Science List, students could be asked to rank the words in terms of their closeness to particular fields of Science (Hirsh & Coxhead, 2009). The sample words are cell, species, acid, muscle, protein, molecule, nutrient, dense and laboratory. The students could rank the words in relation to fields such as Computer Science, Nursing, Biology, Agriculture and Sport Science by giving them a number (ten would signal a close relationship, whereas one would signal a more distant connection between the words and the fields). The students would then share their ideas, and provide reasons for their rankings. Using the following examples of Technical Law terms from Csomay and Petrović (2012, pp. 314–315) (see Chapter 7), bar, arrest, constitute, deny, court, document, permit, warrant, withdraw, proceed, firearm, fingerprint, exam and excuse, these terms could be categorised into groups related to different aspects of legal processes or people’s roles in courtrooms. The meaning-focused output in this activity comes through the discussion of ranking. Other suggested strands-based activities for classrooms which draw on specialised vocabulary include a fluency activity with specialised vocabulary that could make use of the 4–3–2 speaking activity (Nation, 2013; Thai & Boers, 2016). In this activity, learners prepare a talk based on a subject they know well, using vocabulary related to that subject that they also know well. A 4–3–2 is basically a repetition activity, where the speakers start with needing to speak for four minutes, then repeat the talk but in three minutes and then in two minutes. Some classroom activities might include several strands. For example, student presentations as part of an EAP or ESP course could involve both meaningfocused input and output. Having learners research a specialised topic by reading and listening as part of their preparation involves meaning-focused input. Preparing visual aids and giving the talk to multiple audiences involves meaningfocused output and fluency practice. Coxhead’s (2014b) poster carousel for ESP vocabulary comes from Lynch and McLean’s (2000) work on repetition and

recycling. The poster carousel is based on the conference poster model, where learners research a topic they are interested in or read a research article and then prepare a poster for an in-class conference. They then present their posters to their class as they would at an academic conference. This activity combines all four of Nation’s strands, particularly if the learners present their poster multiple times and if vocabulary work in preparation by the students includes elements of language-focused learning based on the specialised vocabulary needed for the presentation, such as pronunciation, word stress and word parts. Nation (2016) provides discussion and guidelines for the Four Strands and word lists.

The Involvement Load Hypothesis The Involvement Load Hypothesis was conceptualised by Hulstijn and Laufer (2001) as a way to measure the level of involvement in a vocabulary-learning task. This hypothesis is a useful framework for materials design and classroom activities in particular, but can be applied to wider vocabulary curriculum as well. Need, search and evaluation are the three elements of this hypothesis. To Hulstijn and Laufer (2001), the more need, search and evaluation in a learning task, the higher the involvement load. An activity with a higher involvement load leads to better retention of vocabulary than a lower involvement load activity. The need element can be moderate or strong, depending on whether the need comes from outside the learner, perhaps imposed by a teacher (moderate) or comes from the learner (strong). The search element means that the learner has to find the meaning of a word perhaps through looking it up in a dictionary. For evaluation, learners have to consider how a word might fit into a particular context or decide whether to use one word or another (Hulstijn & Laufer, 2001). The Involvement Load Hypothesis has been operationalised in several vocabulary studies such as Hulstijn and Laufer’s (2001) research which compared three tasks: (1) reading comprehension with marginal glosses, (2) reading comprehension plus ‘fill in’ and (3) writing a composition and incorporating target words. The group who wrote a composition had the highest vocabulary load and scored better on a post-test than the other two groups. In a follow-up

study, Folse (2010) found better vocabulary retention from learners filling in gaps in repeated fill-in-the-blank exercises than from having students write original sentences with words. This finding supports the already well-established important role of repetition for vocabulary learning. Folse (2010) controlled the time on task, whereas Hulstijn and Laufer (2001) did not. It should be noted however that these studies have focused on low frequency lexical items in English classes, rather than on specialised vocabulary in English. What can this hypothesis offer classroom activities for specialised vocabulary? The Involvement Load Hypothesis suggests that vocabulary-learning tasks can be analysed to see whether they include the three elements of need, search and evaluation. Tasks and classroom materials can be manipulated to increase the amount of involvement load. For example, if a classroom activity does not involve searching for the meaning of a word, then this element can be added. If students are not required to evaluate specialised vocabulary in any way, then this element can be added.

Specialised vocabulary research into teaching and learning in ESP Many research studies recommend raising awareness of specialised vocabulary in sections on implications for pedagogy. For example, Pardillos (2016) argues for raising learners’ awareness of metaphor in Legal English as an important part of an ESP for legal purposes course. Littlemore, Chen, Liyen Tang, Koester and Barnden (2010), recommend training in recognition and understanding in metaphor for EAP students. Corpus-based learning is sometimes recommended as a technique for learning vocabulary. Hafner and Candlin (2007) discuss using corpus tools to work with Legal vocabulary. Charles (2012) provides support for corpus consultation to support specialised vocabulary use in writing in postgraduate classes at her university. With over 40 participants from a range of disciplines and 21 language backgrounds, Charles (2012) set out to use academic corpora as a means to develop grammatical and rhetorical skills. This work was following on from earlier research with postgraduate students by Lee and Swales

(2006), for example, where the learners develop their own discipline-specific corpora. This do-it-yourself corpora approach is also reported in independent learning research with postgraduate learners (see Starfield, 2004, for example). Charles (2012) reported on student feedback about the course and its benefits. Some students commented that consulting disciplinary-specific corpora helped with specialised vocabulary. As one student wrote, ‘It helps me use the same language as others in my field’ (Charles, 2012, p. 100). Chapter 4 in Charles and Pecorari’s (2016) Introducing English for Academic Purposes has a section of a corpus-based approach to EAP and a helpful discussion on direct and indirect uses of corpora for vocabulary-related learning and teaching. Some research papers focus on ways to use corpus analysis and specialised vocabulary for making classroom materials. An example comes from Vincent (2013), who recommends using a text-based technique for identifying frequent words and corpus consultation to check for common collocations of those words. With practical suggestions on how EAP teachers can integrate data-driven learning based on corpus-based discoveries into their teaching, this article provides helpful suggestions on ways to work with corpora and vocabulary. Another example is from Breeze (2015), who analyses specialised vocabulary in a corpus of legal documents and presents options for focusing on lexis such as arise and hold. Suggestions for working with clusters of specialised vocabulary, such as landlord, lease and premises are also made based on patterns from the corpus analysis. Rusanganwa (2013) reports benefits for students in learning technical vocabulary such as capacitor by undergraduate Physics students in Rwanda using multi-media. His students scored higher on post-tests having studied the vocabulary using multi-media compared to blackboard-based instruction. Breeze (2015) argues for a systematic approach to teaching and learning specialised vocabulary in ESP, where the lexis is drawn from the area of specialisation. While this study does not include a measurement of vocabulary learning, it is useful to look at the examples from Breeze, which are based on the Legal corpus analysis. Many of the examples in Breeze’s work use a gap fill or fill-in-the-blank format, or ask learners to choose between options. Exercises such as these tend not to lead to a great amount of learning according to Boers, Demecheleer, Coxhead and Webb (2014). Using the Involvement Load Hypothesis (Hulstijn & Laufer, 2001) as a framework, we might find that gap fills, for

example, contain some elements of need and evaluation, but if the exact words to fill the gaps are presented at the bottom of the page, then search is not part of this activity. If more words are provided for the gap fill than there are spaces in the text, then the element of evaluation is brought into play for the learners. Increasing the amount of involvement may help with retention of vocabulary. It is one thing to develop word lists of multi-word units, but it is another to take this research into classrooms (Byrd & Coxhead, 2010). How can these multiword units be focused on and how should they be integrated into classroom materials and course design? One example comes from Jones and Haywood (2004), who selected formulaic sequences from existing word lists to introduce into two EAP classes in a UK university. Over ten weeks, students in one class were trained to notice and learn target formulaic sequences, for example through concordancing, highlighting, analysing sequences and using them in writing. The other class was used as a control group. Jones and Haywood (2004) found that while the students in the experimental group had their awareness of formulaic sequences raised, post-tests showed that they did not learn the sequences very well, nor did they use them much in their writing. This finding relates to other studies such as Cortes (2013) which have noted that second language writers in English do not use the same amount of multi-word units in their writing as professional writers and native speakers do. Jones and Haywood (2004) attributed the lack of learning of formulaic sequences in their study to a range of factors, such as the learners failing to memorise the sequences well or using sequences they already knew in their writing. There is certainly an element of risk with using semi-familiar lexis in writing (Laufer, 1998; Coxhead, 2011b). Boers and Lindstromberg (2012) comment that learners in the Jones and Haywood study might have needed more exposure to the sequences and more practice using them to lay a stronger memory trace. They recommend intentional learning of sequences to aid memory. It is important when it comes to using specialised word lists in language learning to consider how these lists might be integrated into courses of study and how this integration might be assessed. In a study of an EAP postgraduate writing course in an Australian university, Storch and Tapper (2009) looked at the impact of a focus on academic vocabulary using Coxhead’s (2000) AWL. Storch and Tapper (2009) found that the learners in their study used more academic lexis

in their writing as a result of the course, and that the learners used it appropriately. The EAP course had multiple elements that could have supported the development of that lexical knowledge and use, including direct teaching, reading in specialised academic areas, feedback on lexis on student writing and class-based discussions about academic vocabulary. Using specialised vocabulary in writing is a particularly important element of writing in EAP and ESP. In a New Zealand–based study of the use of vocabulary in writing in English by second language university students, Coxhead (2012b) found that the writers had concerns about how their vocabulary selection and accuracy would be viewed by academic readers, how risky it might be to use lexis that was not familiar and whether they knew enough about a word to use it in writing. A tennis-playing friend once described these difficulties of the learners as being similar to ‘running around her backhand’ while playing tennis. That is, working hard to position herself on the court to hit the ball with her forehand because it was stronger than her backhand. The participants in Coxhead’s (2012b) study struggled with all three aspects of knowing a word, identified by Nation (2013): form, meaning and use. Form was difficult for some learners who could not recall how to spell some words. The form-meaning connection was difficult for others as they found it hard to connect a particular word with the content of their writing. The dilemma with use is that while learners might recognise a word or multi-word unit when reading, and know its meaning, they might not know enough about it to use it in writing. Mežek, Pecorari, Shaw, Irvine and Malmström (2015) compared the learning of specialised vocabulary by bilingual learners in a large-scale research project in Sweden. They investigated the learning of specialised vocabulary in Linguistics and Literature studies in English and how it is affected by being presented in reading or listening or in both modes. The findings of this study suggest that specialised vocabulary that is learned in both reading and listening (the order of presentation does not matter) is better remembered, even if lecturers only briefly mention the terms in English in lectures. Gablasova (2014) also compared the learning of specialised vocabulary in English and in the first language of students in Slovakia. Using an oral recall task after reading, the participants who learned the vocabulary in English demonstrated weaker recall of the meanings of the target words and weaker knowledge of the items initially than those who had

learned specialised vocabulary in Slovakian. A delayed test showed the level of forgetting in the English group was higher than for the Slovak group.

Using specialised word lists in learning and teaching Nation (2016) outlines ways to use word lists to design learning goals for vocabulary in curriculum design. A first principle in this decision-making process is to focus on the most frequent vocabulary to ensure better return for learning effort. Frequency-based word lists for ESP can help with these decisions because they rank the lists from the most frequent to the least frequent items. An example of such ranking is Coxhead’s (2000) AWL, which has ten sublists. The most frequent word families are in Sublist One, the next most frequent are in Sublist Two and so on. The division into ten lists was made to help make attainable learning goals. A word of caution is necessary when working with word lists in ESP or EAP courses. There may be a temptation to start at the top of an alphabetised list of words, without thinking about the importance of frequency in word lists first. Alphabetising a word list is often done because it provides an easy way to present a word list (see the example from the trades in Chapter 8). However, this format does not portray useful information on word selection, which can guide learners, teachers, and researchers about which words are the most important or should be tackled first. For this reason, the AWL is presented in several formats by Coxhead on the AWL website (www.victoria.ac.nz/lals/resources/academicwordlist/), including the ten frequency-based sublists of the AWL being available separately, and the word family-based AWL lists also have the most frequent word family members in the AWL corpus in italics. For example, assessment is the most frequent member of the word family with the headword assess. Another common problem with word lists is conflating counts based on word families in research, for example, to find out more about the vocabulary load of a text, with a mandate that all the words in the family need to be learned. It is important to take actual occurrence of lexis in context in combination with using word lists for guiding the learning of specialised vocabulary.

Another reason to think carefully about taking an alphabetical approach to word lists is that there is a strong possibility that presenting words that look very similar will create interference for learners. Nation (2000) warns that learners can easily confuse words that look or sound the same or are opposites. This problem can occur even when learners have some background knowledge of words already, as Coxhead (2011d) found in a study of vocabulary use in writing from academic input. Coxhead deliberately ensured that items which could cause interference were included in the input texts and that using these words was part of the writing task. Several learners commented in post-writing interviews about problems with interference between these words. One of the participants in the study, Crystal, confused the words ethnic and ethical, which were in a source reading on Internet banking. Crystal explained that she struggled to use those words, even though she knew them both, because their spelling was similar. She abandoned her attempts to use these words in writing because of her confusion. She also got confused over the word ensure, a word she already knew, and a word in the text which looked similar (assure). It is also important to think carefully about how a word list was developed and its purpose. An example is Coxhead’s AWL (2000), which was developed with second or foreign learners of English who were planning to study at university in mind. The list was developed using a corpus of written university-level academic texts in English. I often receive emails about the use of the AWL in a range of teaching and learning situations, including one request for word cards for the list so that a parent could use them with a three-year-old native speaker of English, to develop the child’s vocabulary. Byrd and Coxhead (2010, p. 51) provide some questions to ask when considering adopting or adapting a specialised word list into a classroom context, for example, whether the list was developed from a spoken or written corpus or both, the kinds of texts in the corpus, whether they represent the kinds of texts that students would read or write, the principles of selection, and any details on how the list was evaluated. A specialised word list could be used as a starting point for finding out more about the vocabulary learners know at the start of a course. A simple technique would be a yes/no kind of test, where learners just work quickly through the most frequent items in a list, and tick words they know. Any cause for concern that this knowledge might be overestimated or that the learners might only

recognise a word could be checked with a short interview and asking questions about the words. The students could then go back over the ones they identified as unknown and set some goals with them about learning the meaning and the spelling of these words as quickly as they can. If the high frequency words are well known already, then the learners could check through the rest of the sublists. It is important to check that learners recognise words in speaking and in writing, so fairly simple dictation or word recognition tasks could be devised to check.

Using word list research to analyse TED Talks for EAP Byrd and Coxhead (2010, p. 56) state, ‘Students who are preparing for academic study need to read academic texts rather than focusing extensively on stories or other literary types’. Having noted that TED Talks were often used by EAP teachers preparing learners for their university studies, Coxhead and Walls (2012) investigated the vocabulary load and the vocabulary profile of these readily available online talks to see whether TED Talks were useful preparation for EAP, from a lexical perspective. The corpus of TED Talks for this study was small, at just over 40,000 running words in total from six areas of talks: Business, Design, Entertainment, Global Issues, Science and Technology. One of the questions in the study was whether TED Talks were more similar to academic written texts or spoken texts in their vocabulary load. The Coxhead and Walls (2012) study showed that learners need to know about 4,000 word families plus proper nouns to reach 95% coverage of the TED Talks corpus. TED Talks reached 98% coverage at around 8,000–9,000 word families plus proper nouns. A second part of the Coxhead and Walls (2012) study was to find the coverage of word lists over the TED Talks corpus. Using the RANGE Programme (Heatley, Nation & Coxhead, 2002) for analysis, and West’s (1953) GSL, Coxhead’s (2000) AWL, and Coxhead and Hirsh’s (2007) pilot Science list for EAP, Coxhead and Walls (2012) found that these word lists and proper nouns covered 92.54% of the TED Talks corpus. Coxhead’s (2000) AWL covered almost 4% of the corpus (a common figure for the AWL and academic spoken texts), and examples in the

corpus from this word list include design, image, images, computer, percent and technology. The EAP Science List (Coxhead & Hirsh, 2007) covered only 0.79% of the TED Talks corpus. The TED Talks corpus contained specialised and current vocabulary (e.g. crowdsource, cymatics) and everyday spoken language (e.g. guys and amazing). These combined findings by Coxhead and Walls (2012) suggested that TED Talks have more in common from a vocabulary perspective with written than spoken academic texts, and that TED Talks would be similar to academic lectures in terms of their academic vocabulary load. Learners with vocabulary sizes of 4,000 word families would possibly find TED Talks easier to cope with than learners who have smaller vocabulary sizes.

Testing specialised vocabulary in ESP One of the main area of testing in EAP and ESP is course-based, where teachers in institutions and language schools, for example, carry out class-based testing at the end of a course. This activity is underrepresented in testing literature, according to Schmitt and Hamp-Lyons (2015), who also see EAP practitioners as needing stronger awareness of and literacy in testing. They argue that EAP teachers are expected to be knowledgeable about all elements of language testing: development, administration and interpretation, but degree courses at postgraduate level for language teaching professionals, more often than not, do not mandate language assessment courses. This lack of awareness in language assessments is a major problem, according to Schmitt and Hamp-Lyons (2015) because EAP teacher-testers face a myriad of similar questions that are not readily answered by mainstream assessment textbooks. This is arguably because these books primarily focus on how to create tests and exams that are similar in format to large-scale standardised examinations and thus give little attention to how to integrate assessments into curriculum planning or how to develop both curricula and assessment for specific groups of students. For example, textbooks offer little guidance to teacher testers in EAP programmes which are experiencing an influx of mature students who have performed poorly on standardised exams such as IELTS despite having considerable specialised language and knowledge in their specific field, e.g. academics who are required by their universities or governments to get PhDs in order to keep their jobs. (p. 5)

Another part of the tension in EAP and ESP testing comes from the very close relationship between content and linguistic knowledge. Douglas (2013) makes the point that it should always be part of the construct of specific purposes tests that learners’ specific purposes language needs include not only linguistic knowledge but also background knowledge relevant to the communicative context in which learners need to operate. (p. 371)

ESP teachers are experts in pedagogy and language, but not necessarily experts in the specialised fields of their learners. This particular tension plays out more in areas such as standardised testing in Aviation (see Knoch, 2014, for example). As well as localised testing in ESP and EAP classes, there are two other main areas of testing. In EAP, the site of the majority of language assessment development (Douglas, 2013), there are large, standardised tests, for example, in the form of IELTS, TOEFL and Pearson Academic, in the international sphere. At the local level, institutions may develop their own assessments for entry, such as the Diagnostic English Language Needs Assessment (DELNA) used at Auckland University in New Zealand (see Read, 2015; for more information go to www.delna.auckland.ac.nz/en.html). The second key area of ESP testing is English for Employment in fields such as Aviation, Medicine and Business (Douglas, 2013). As we have seen in Chapter 7, vocabulary is seen as central in Aviation assessment and is part of an integrated assessment/judgement (Knoch, 2014) rather than a separate section of a test as it is in Wette and Hawken’s (2016) English for Medical Purposes test. Read and Knoch (2009) examine Aviation English tests more fully from an Applied Linguistics perspective. All of this further begs the question of how specialised vocabulary fits into testing in ESP. Potentially thousands of words have been identified in specialised vocabulary in Medical English – for example, in Quero’s (2015) analysis of Medical textbooks, in the LATTE research (Coxhead, Demecheleer & McLaughlin, 2016) and in multi-word unit studies such as Ackermann and Chen (2013) and Simpson-Vlach and Ellis (2010). Read (2000) points out that how to best include multi-word units in assessment is not very clear. Other questions which arise in relation to specialised vocabulary in ESP include the following: How much of this lexis can be usefully addressed in courses, and be targeted in course-based testing? How can this research be usefully drawn on for larger-scale

tests? What elements of lexical knowledge might be usefully assessed and how? Word lists have been picked up as one way to focus on specialised vocabulary in testing, so let us look at that area in particular now.

Word lists and testing Several word lists in ESP have been used to inform vocabulary tests. The Vocabulary Levels Test (VLT) (Nation, 1983; Schmitt, Schmitt & Clapham, 2001) is a receptive test of vocabulary which includes a section based on the AWL (Coxhead, 2000). The AWL in the VLT has been used to assess vocabulary knowledge in EAP (see Akbarian, 2010 for an example). Read (2015) discusses the relationship between academic vocabulary and academic literacy in EAP, including the differentiation between specialised or technical vocabulary in disciplines and general academic vocabulary. Read notes that these general academic items, such as classify and describe are a source of difficulty for learners and EAP teachers are highly aware of the problems these less salient items present in texts for learners. Webb and Sasao (2013) outline the development of a test on the AWL. Equally, other lists of general academic vocabulary have yet to be incorporated into testing on a large scale, for example, Gardner and Davies’s (2014) AVL, to the best of my knowledge. Another example of a test which has been developed using word lists is Begler and Nation’s Vocabulary Size Test (VST) (2007) (see also Nation, 2013; Nation & Coxhead, 2014; Nation & Webb, 2011), but this test is not specifically for ESP purposes. This test draws on Nation’s (2012, 2006) frequency word lists based on the BNC and the COCA corpus. This test has been used to assess the vocabulary size of first and second language speakers in New Zealand secondary schools (Coxhead, Nation & Sim, 2015) and university (see Nation & Coxhead, 2014; Elgort & Coxhead, 2016). The VST uses sampling from frequency levels of the BNC. The format of the test has been the subject of some criticism from researchers such as Gyllstad, Vilkaite and Schmitt (2015), for example, that the format of the VST might lead to guessing and therefore inflated measures of vocabulary size. The BNC/COCA lists by Nation have also been used in a

Listening Vocabulary Levels Test (McLean, Kramer & Beglar, 2015). These tests provide an example of how a word list could be used to develop a size test for specific purposes. Nation (2016) has a useful chapter on word lists and vocabulary testing. Taking account of the purpose for assessment is important in ESP testing (Douglas, 2013). Studies into identifying specialised vocabulary, such as Nelson (2000), recognise this point by gathering spoken and written corpora that include texts which people need to read and write, as well as texts that they will hear and say in their occupations. The vocabulary used between professionals and between professionals and laypeople can be very different (see Chapter 7 for examples). Language testing needs to take these points into account. Knoch (2014) and Wette and Hawken (2016) discuss approaches to Aviation and Medical purposes testing which have integrated and segregated approaches to specialised lexis. Monitoring learning during courses could perhaps involve regular and often low-stakes assessment of lexical knowledge. File (2014) presents an option for inclass, low-stakes assessment of vocabulary, which can be easily adapted to EAP and ESP classrooms at any level of proficiency. In File’s approach, students are in charge of selecting the lexis for their in-class test, after a week or period of time when the learners would have dedicated attention to learning the target words. The procedure involves a set of tasks for the students, such as writing down all the 15 target words to be tested. This first step is about retrieving the words from memory and thinking about their spelling. The students then focus on adding word stress to each word and their part of speech. Evaluative tasks include deciding which words have positive or negative meanings. File (2014) suggests asking the students to talk about times in the week when they used their target vocabulary, as well as what they could do to try to use the words that they had not used during the week. This kind of activity models a way to address different aspects of word knowledge from Nation (2013), including form, meaning and use. Including low-stakes assessment like this example from File (2014) could have a positive effect on motivation for learners to study and learn specialised vocabulary in an organised way. A post-course assessment on technical vocabulary can provide information on the learning of the students overall for both learners and teachers, and help with refining the vocabulary programme for the next cohort.

Limitations of research into specialised vocabulary in ESP There has been little published research on integrating specialised vocabulary into EAP and ESP classes, and testing research in this area is also lacking (Schmitt & Hamp-Lyons, 2015). The exception is professional purposes research, such as Aviation and Medicine, where specialised vocabulary could be either integrated with content or a discrete section in a test. The two frameworks presented in this chapter, the Four Strands from Nation (2007) and the Involvement Load Hypothesis (Hulstijn & Laufer, 2001), provide a structure for decisions and critical thinking about the organisation of learning. However, the teaching and learning side of specialised formulaic language lacks theoretical underpinning (Granger, 1998), much like vocabulary research in general (Schmitt, 2010). This chapter has also highlighted the comparative lack of testing research on specialised vocabulary and any context apart from limited efforts in some professions.

Conclusion This chapter has looked at specialised vocabulary in research into teaching and learning, as well as testing. Overall, we have seen that while there are some examples of research into teaching and learning, there is scope for so much more, particularly given the huge number of EAP and ESP courses worldwide. Specialised word lists have had some impact in dictionary and textbook developments, particularly in EAP, but little research has examined the effectiveness of integrating word lists into curriculum, materials design and assessment.

Chapter 10 Future directions and conclusion

Introduction The focus of this chapter is future directions for vocabulary in ESP and it is framed around five main avenues for future research into vocabulary and ESP. The first avenue of research is the need for more qualitative studies and what they might bring to the field of vocabulary in ESP. The second is the need for more testing research, because this is a major gap in the field. The third is a push for theorising vocabulary studies overall and ESP lexical acquisition in particular. The fourth avenue is the need for comprehensive evaluations of research and replication in the field of vocabulary and ESP. The chapter ends with the final avenue for research, which calls for more work in particular in spoken, professional and multiword unit vocabulary. Note that Chapter 8 contains a section for trends-based vocabulary research.

The need for more qualitative research in vocabulary and ESP The majority of research into vocabulary and ESP reported in this book comes from quantitative research involving corpora. Some qualitative approaches have been built in to word list research, for example through consultation with context or language experts. Peters and Fernández (2013) used qualitative approaches to find out more about what vocabulary Spanish speakers of English as a second language in an Architecture course actually looked up in their dictionaries. In this

section, we will look at how qualitative approaches might be taken into account in vocabulary and ESP and what they might bring to the field. First of all, qualitative approaches, such as interviews, questionnaires, and observations can bring to light issues in research which can help guide quantitative research and potentially help learning and teaching communities in unexpected ways. For example, in the course of gathering corpus-based data for the Coxhead, Stevens and Tinkle (2010) study of vocabulary in textbooks at secondary level, it became clear that librarians in schools and teacher-education librarians at tertiary institutions needed help with deciding which books and resources were used by teachers and learners in schools. Therefore, the researchers decided to draw on several sources for more information, including interviews with teachers of different subject areas in schools, participation in an online forum for English Literature teachers, and a survey (Coxhead, 2011d, 2012a). We found that textbooks were not necessarily the main source of reading for classes in secondary schools, except for in the Sciences, as Parkinson (2013) notes. In the course of interviewing Science teachers and publishers, a clear picture emerged of the most used textbooks in New Zealand. The inquiry-based curriculum in New Zealand meant that teachers and learners in English Literature, for example, could be working with different texts in the classroom and that texts in this field could range from poems, plays, novels, advertising hoardings, websites and visual images with little text. This information on textbooks and resources helped form the corpora used for the secondary school studies reported in Coxhead and White (2012), Coxhead et al. (2010), and Coxhead (2012b). A result of the online group part of the study was a fairly comprehensive list of texts used in New Zealand schools at junior and senior levels. This list has now been sent back to the community as a resource for researchers, teachers and librarians in schools. Qualitative data can shed new light on quantitative studies and directions for research. For example, little research in vocabulary in ESP draws on more than just corpus data. New research into spoken academic vocabulary (Dang, Coxhead & Webb, in press) drew on survey data on high frequency vocabulary from language teachers, corpus data and vocabulary testing results of learners in Vietnam to design a word list that takes learner proficiency and teacher concerns into account. This means that learners at different levels of proficiency and with

general or academic goals and their teachers can decide where to start using the word list. It also means that further research can evaluate this research approach and perhaps adapt it for research into language learners in different contexts or with learners with different first language backgrounds. Taking the cultural contexts of learners and teachers into account can lead to new qualitative approaches in research methodologies in vocabulary and ESP. Chapter 8 on vocabulary in the trades discussed the adoption of a Pacific research framework called Talanoa (Vailoleti, 2006; Coxhead, Parkinson & Tu’amoheloa, under review). The Talanoa framework was adopted for part of the part of the LATTE project which involved the translation of specialised word lists into Tongan. Coxhead et al. (under review) worked with researchers, teachers, and students from Tonga or New Zealand-born Tongans to check translations of specialised trades-based word lists in English (for example, Coxhead et al., 2016 in Carpentry) into Tongan and investigating the usefulness of word lists built on New Zealand trades materials in different contexts. Thanks to the skills and contacts of Falakiko Tu’amoheloa this Talanoa methodology is based on the development and ongoing relationship between the participants and the researchers, built from communication face-to-face and on time together. This meant spending time and sharing meals with participants, and discussing everyday topics such as family and friends. This contact often resulted in introductions to others in the Tongan trades and education communities in Wellington and Tonga. Using a Pasifika approach in this context not only makes sense, but is culturally and linguistically appropriate (Coxhead et al., under review). The Talanoa framework has also helped with gathering ideas on what might be useful ways to present the findings of the study to the participants, how we might develop and evaluate materials based on the trades vocabulary research and confirming that more research into multiword unit analysis of the corpora is sorely needed. Such work is important, since, to adapt a phrase from Maxwell (2013), nobody is a native speaker of Carpentry.

The need for more testing research in vocabulary and ESP

Testing in vocabulary for ESP needs considerable research effort. Some professional areas of expertise have lexical elements built into them, such as Aviation and Medicine, and the Vocabulary Levels Test (Schmitt, Schmitt & Clapham, 2001) contains a section on Coxhead’s (2000) AWL. This test has been used extensively by teachers and researchers worldwide. The productive version of the Vocabulary Levels Test (Laufer & Nation, 1999) contains a section based on Xue and Nation’s University Word List (1984). Another example of a productive test which includes more formal vocabulary comes from Fountain and Nation (2000) with their dictation tests which control for lexical level. There are very few other examples of vocabulary for specific purposes tests readily available, which could be because many EAP or ESP programmes use in-house tests. Read (2015) reviews two research approaches from Corson (1997) for investigating the testing of productive academic vocabulary knowledge for diagnostic purposes before gaining entrance into university studies. The first approach involves interviewing participants and eliciting examples of language using pairs of words (for example, product-multiply; product/market). Another study which used interviews to tap into learner’s knowledge of vocabulary as a follow-up to testing vocabulary size is by Gyllstad, Vilkait and Schmitt (2015). Using academic vocabulary, according to Corson (1997), can be problematic for learners who are inexperienced with academic contexts, and they can choose to use high frequency vocabulary instead; meaning they might opt for an inaccurate or inappropriate word choice. The second approach outlined by Corson (1997) involves an analysis of written and spoken language from students to identify the percentage of Graeco-Latin vocabulary in the texts, because academic vocabulary has a large percentage of words from Greek and Latin origins. Coxhead (2000) identifies over 80% of the AWL words as Graeco-Latin, for example. Corson’s (1997) concern about this particular element of academic vocabulary is that learners from lower socio-economic backgrounds would not have encountered or used these kinds of words in their everyday lives as much as learners from high socio-economic backgrounds. Corson (1985) referred to these words as a ‘lexical bar’. Vocabulary tests are also needed to assess learners’ knowledge of specialised vocabulary before or after a course of study. They can also be used to research technical vocabulary size in a first or second language and how it develops over

time. Receptive and productive tests of multiword units and metaphor are also possible avenues of research.

The need for theorising in vocabulary and ESP Schmitt (2010) identifies theory as a gap in vocabulary research in terms of acquisition. He calls this gap, ‘the Holy Grail of vocabulary studies’ (p. 36) and attributes the gap to the complex and varied nature of acquisition. Granger (1998) finds the same gap in phraseological studies and McNamara (2015) focuses on this gap in Applied Linguistics as a field. While this volume is not focussed on vocabulary acquisition per se, it has hopefully shown, that there are many and varied ways to investigate specialised vocabulary in ESP, for example through identifying this vocabulary in different contexts, for different purposes, and in different ways, including single words and multiword units using written, spoken and multi-media data. We can research the teaching and learning of specialised vocabulary in classrooms as well as outside classrooms and online, for example, through experimental or qualitative or a mix of approaches. As Schmitt (2010) suggests in general vocabulary studies, through specialised accumulation of studies and data, we can build our understandings and theory of how specialised vocabulary is acquired. Chapter 9 identified two frameworks in vocabulary which have been developed from research from several fields of study, including Educational Psychology, such as retrieval, noticing and avoiding interference (see Nation, 2013) and Applied Linguistics. One framework is Nation’s (2007) Four Strands (meaning-focused input, meaning-focused output, language-focused learning and fluency) and the other is Hulstijn and Laufer’s (2001) Involvement Load Hypothesis (with its core elements of need, search and evaluate). These frameworks are helpful for identifying elements or factors that might affect vocabulary acquisition in some way. They are also helpful for analysing pedagogical tasks, classroom language learning and textbooks, for example. These frameworks are an important part of the task of building our understanding of theory in vocabulary in ESP.

The need for more evaluations of word lists and courses of learning and the need for replication studies Word lists As new word lists are developed, it is useful to compare and contrast any existing list with a new one to the benefit of researchers in the case of replication studies or the trialling of a new technique for identifying specialised vocabulary. For example, Gardner and Davies (2014) compare their AVL with Coxhead’s AWL, and new medical word lists can compare their results with existing studies such as Lei and Liu (2016), and Wang, Liang and Ge (2008). Importantly, these word lists are available for other researchers to use and explore, as Nelson does with his Business English Lexis website (http://users.utu.fi/micnel/business_english_lexis_site.htm). This website also contains data from Nelson’s (2000) PhD research. Corpora are often not publicly, which means it can be difficult to obtain them to run a replication study or to check findings, as Neufeld, Hancioğlu and Eldridge (2011) did in their reanalysis of Li and Qian’s (2010) study of the AWL in a Finance corpus. There are also gaps in the word list field for more specialised lists, such as Yang (2015) in Nursing, from Chapter 6, and multiword unit research in word lists also. Little word list research focuses on postgraduate research; instead, the majority of this research focuses on EGAP, some academic-specific purposes and some professional purposes (for example, Business). It is not clear, for example, what happens to vocabulary between undergraduate studies and postgraduate studies in the same field, let alone what vocabulary demands there are for learners who have studied in one undergraduate field and then change to another for their postgraduate studies. Furthermore, there does not seem to be much of a balance in word list research in terms of Becher’s (1989) framework of hard-pure, softpure, hard-applied and soft-applied areas of academia, but see Dang et al. (in press). Nation’s (2016) timely volume on word list research provides a particularly

strong framework for considering some of the main issues of word list research, such as units of counting, critical analysis of a word list study, multiword units and corpus selection and design. At a time when there seem to be many word lists available or being developed for teaching and learning purposes, it is particularly important that we understand what hallmarks there might be in word list development, what considerations need to be taken into account depending on whether a word list is general or specific in nature and what kind of corpus approach was taken in the development. The representation of the corpus is particularly important. Miller and Biber (2015) call for more consideration of internal representation in corpora to take variation in lexis and word list research into account, while acknowledging through their own experience that this is not a particularly easy task. Current thinking around high, mid and low frequency bands are based on word lists for English for general purposes (Schmitt & Schmitt, 2012; Nation, 2013). An important question to ask is whether or to what extent this framework is true of English for specific or academic purposes. This high-mid-low frequency framework is based on Nation’s extensive work on frequency lists using the BNC (2006). Readily available tools can help anyone who is interested apply this framework of frequency to their own learner or professional texts, for example, by using Heatley, Nation and Coxhead’s (2002) RANGE programme, Tom Cobb’s Lex Tutor (Cobb, n.d.), or Lawrence Anthony’s AntConc (Anthony, 2016). Questions to consider when looking at output from a frequency analysis from these programmes include the following: What category are most of the specialist words from the text in: high, mid or low frequency categories? Which ones occur most often and why? Which words are closely related to the topic of the text and to the field? Which of these words would you expect someone who has basic knowledge in the area or field to know already? What is specialised vocabulary in this text? Is it all the words which are closely related to the subject or only those that are unique to the subject area? Note that Schmitt, Cobb, Horst and Schmitt (2017) call for replication studies into the question of the amount of vocabulary needed for use in English, such as Nation’s (2006) work. Research that focuses on the international generalisability of word list findings misses opportunities to uncover local language use. For example, a Geography textbook from Aotearoa/New Zealand contains a range of commonly used words

from te reo Māori, such as tangata (people), waka (canoe), pohutukawa (the socalled New Zealand Christmas tree), Rangitoto (an island off the coast of Auckland) and wharenui (meeting house). Other examples include the names of iwi or tribes, leaders, regions, towns, and cities, food and so on. These examples illustrate how corpora can reflect the origin and purpose of the texts within them, and these lexical items could well be the focus of word list research, in the case of commonly used lexical items from one language in another, for example. Rather than being seen as a limitation of locally based research, in that it is not globally generalisable, this kind of research can inform and even perhaps inspire research projects in other regions.

Vocabulary in ESP courses of learning A serious gap in this field is research into pedagogical and institutional decisions based on word lists and the effect on learning and teaching. In a research project involving specialised vocabulary in secondary schools, Coxhead (2012b) gathered interview and survey data from teachers from a range of subject areas in New Zealand. She found a wide range of views on specialised vocabulary and classroom practices involving the selection of items for teaching and learning, the references and sources used in the classroom and the kinds of activities that teachers use with their students to focus on the specialised vocabulary of a subject area. One teacher remarked that the most important thing for her was the creation of a learning and teaching environment where the main aim is to discuss vocabulary rather than focusing on learning a ‘preconceived’ list of words (Coxhead, 2012a). One of the challenges of word list research, outlined by Byrd and Coxhead (2010), is bringing published word lists into learning environments. Much more research needs to be done to measure the effectiveness of word lists for language learning and to find out more about how they are used in everyday teaching and learning. There are examples of pedagogically oriented studies which draw on concordancing and qualitative analysis of vocabulary in context. Csomay and Petrović (2012) present research into specialised vocabulary in legal dramas and movies, taking a corpus-based quantitative approach. Examples of concordances

are presented in the study, as they are in other studies such as Breeze (2015) on Legal English. Evaluations of the success of such initiatives in courses are needed and could be carried out with pre and post testing of specialised vocabulary, as well as interviews or surveys to find out more about what learners and teachers say about concordancing as an approach and how it leads (or not) to more learning. The use of television dramas and movies could also be the subject of an evaluation in research, especially since in fields such as Medicine, there is plenty of opportunity to build on the initial research idea. Researchers such as Ward (2009) have focused on the specialised vocabulary of learners in particular contexts, in his case, Engineering students in a university in Thailand. A large-scale project could use the same basic framework and collect and analyse all texts to be read in a course of study. Based on the commonly occurring specialised vocabulary in the texts, a series of diagnostic tasks or tests for incoming students could be developed so that they could find out what vocabulary they need for each course before the course starts, and then work on developing their knowledge of this vocabulary to help prepare them for their studies. A post-test would also be needed to measure any gains in knowledge of this specialised vocabulary after their studies. The tests could also be used to investigate the size of technical vocabulary knowledge for this group, and the results of the pre-test could be used for learning and pedagogical purposes. Follow-up studies could use the Watson-Todd (2017) study of opaque vocabulary (see Chapter 6) to identify lexis which carried everyday meanings and specialised meanings in particular contexts. Chapter 9 presents an example of analysing the vocabulary of a commonly used website, TED Talks, to find out more about the specialised nature of these texts, whether they are more like academic written or academic spoken texts in terms of vocabulary, and what the website might offer in terms of support for coping with vocabulary for EAP students (Coxhead & Walls, 2012). This study focused on six minute TED Talks because of a pedagogical decision, in that we did not think teachers would use longer TED Talks in a listening class, given constraints on having to listen multiple times in a limited, perhaps hour-long, class. Longer talks would give more opportunities for vocabulary to occur. This research did not focus on the specialised vocabulary of the TED Talks, but this could be a useful area of possible research, particularly since the categorisation of

the talks on the TED Talks website helps with the organisation of any corpus and transcriptions of the talks are available to download. Future research could also examine the effect of a ‘book flood’ (Elley & Mangubhai, 1981) type experiment, where learners could undertake a programme of extensive listening in a particular subject area on TED Talks and reading in the same subject areas and their development of specialised vocabulary through that programme could be tracked (Coxhead & Walls, 2012). Like the TED Talks study which looked into texts which are commonly used in teaching, a written corpus study could involve selecting a text which represents typical reading for a particular group of professionals or students in a field, such as a book chapter, a journal article, a handout used in a class, or a lab manual from the sciences, making sure the text is in electronic form. Laurence Antony’s website (www.laurenceanthony.net/software.html) has freeware for converting files and tools for analysing texts, such as AntConc and AntWordProfiler. These tools can be used to help identify the vocabulary in a text, for example through a Key Word In Context (KWIC) analysis which shows the vocabulary in context in concordance lines or by running the text against Paul Nation’s BNC/COCA (2012) lists. Tom Cobb’s Compleat Lexical Tutor website also provides concordancing and vocabulary profiling (see www.lextutor.ca/). Finally, research to evaluate Nation’s (2007) Four Strands in vocabulary in ESP is needed. The Four Strands offer a framework for curriculum design that emphasises a balance in meaning-focused input, meaning-focused output, language-focused learning and fluency. Applying this framework to a course of study and evaluating its effectiveness is important, and ESP and EAP learning and teaching programmes would be a very useful place to carry out that research.

Replication studies Porte (2012) has called for more replication studies in Applied Linguistics. The journal, Language Teaching now has a dedicated slot for these thanks to Porte’s considerable work as editor of the journal. Schmitt et al. (2017) on vocabulary load research and Coxhead (2015) on replication of the Jones and Haywood (2004) formulaic sequences in an EAP course are examples of responses in vocabulary

studies to these calls for replication. There are plenty of studies in this book that lend themselves to replication, for example, the opaque analysis study by Watson-Todd (2017). Miller and Biber (2015) point out that word list research, in particular, would benefit from replication studies to find out whether the same list of words would be generated using a new corpus.

Figure 10.1 Examples of teacher talk from university lectures (Hunter & Coxhead, 2007)

Chung and Nation’s (2003) scale for identifying and categorising technical vocabulary is a study which could usefully be replicated. The main findings concern the amount of technical vocabulary in an Anatomy textbook (one word in three in a line of text) and an Applied Linguistics textbook (20%). A replication

study could select the same subject areas and textbooks and repeat the study; another replication could select textbooks from the same areas and repeat the study. For example, a study using the scale could be done but on a small scale, using Chung and Nation’s (2003) scale (see Table 2.3), and Figure 10.1 shows two examples of first-year university lectures from Hunter and Coxhead (2007). The first example is from Business Law and the second is from Media Studies. If we use the scale to identify words from these texts, we will see that damages in the Business Law lecture would be a candidate for a word that has both general and specialised meanings in this context. When using the scale, there are questions that could be considered in further research, such as the following: What steps in the scale does most of the vocabulary in the lectures belong to? What proportion of the texts is specialised vocabulary? And how might a larger-scale study be carried out, what kinds of texts, and in what subject areas?

Vocabulary in ESP in spoken corpora, different contexts and multiword units Spoken corpora Spoken corpora have so far featured far less often in corpus research (Flowerdew, 2015), let alone vocabulary in ESP research, and this gap is especially important. There are two elements of spoken research to consider: the vocabulary learners are exposed to in English and the vocabulary they produce in speaking. If we first consider the vocabulary that learners are exposed to, there are some studies in EAP in particular on vocabulary in lectures, however, in general, these corpora are not as large as the written corpus studies and include a smaller range of spoken text types. Studies such as Horst (2010) on vocabulary used in teacher talk in a general English community class, teacher talk examples in secondary school contexts (see Chapter 5), studies using the BASE and MICASE spoken corpora (see Chapter 6), Dang et al. (under review), and Biber’s (2006) work on university language for EAP show that there is some research on spoken corpora. Further

research into vocabulary in a range of classroom talk is needed. For example, we know little of the vocabulary which EAP students are exposed to in their daily studies, how this lexis relates to their future courses of study in university, or how or what specialised vocabulary is used in ESP courses. Future research using spoken texts could include analysis for multiword units by examining disciplinespecific spoken corpora and identifying formulaic language in those texts. Research in EAP so far is much more based on written corpora than on spoken corpora in EAP, for obvious funding and resources reasons. The studies in spoken corpora are mostly done on a small scale. That said, some spoken academic corpora are publicly available for researchers and teachers to work with, such as BASE (see Thompson & Nesi, 2001) and 40 lectures from MICASE (Simpson, Briggs, Ovens & Swales, 2002). Finally, general and discipline-specific research in vocabulary in ESP is in need of replication studies. For example, studies such as Ward (2009) and Hsu (2014) in Engineering could be replicated using materials used in other contexts as a corpus to find out whether the specialised vocabulary found in those studies overlaps with the vocabulary in the new context. To take up the second point about spoken learner corpora for specialised purposes, to the best of my knowledge, there are very few studies on the spoken vocabulary produced by learners in ESP, EAP or even in general English. In the LATTE project, for example, we focused on recording tutors in their practical and theory classes rather than recording the learners and their productive use of specialised vocabulary in learning contexts. This research gap means we know little about the productive use of specialised vocabulary by learners in speaking, whereas we know something about vocabulary use in writing, through studies in learner corpora in particular. That said, more research is also needed into the productive vocabulary of ESP learners as well.

Vocabulary in ESP in different contexts Chapters 5 to 8 of this book focus on vocabulary in different contexts: secondary school, pre-university and university studies, English for Occupational and Professional Purposes and vocabulary in the trades. There have been elements of selection in which areas to concentrate on in these chapters; it is, however, clear

that while areas such as general EAP, Sciences, Health Communication, Medicine, Engineering, and Aviation have been the subject of some research, much more needs to be done. There are a range of possible further research opportunities into vocabulary in secondary school contexts. Chapter 5 reported on research on only four subject areas in schools, English Literature, Mathematics, Science and Social Sciences, and all from predominantly English as a first language contexts. A wider range of subjects would be useful for expanding our understanding of different subject areas and language across the curriculum. More research on specialised vocabulary would be useful from countries where English is a foreign language, and in a larger range of schools, including more international and bilingual schools. It would be useful to investigate specialised vocabulary across year levels in textbooks and other texts used in classrooms. Coxhead et al. (2010), for example, investigated the vocabulary load of a series of Science textbooks used in New Zealand secondary schools. The researchers found, not surprisingly, that the vocabulary load increased from the junior to the senior texts. Unfortunately, as Nation (2016) points out, vocabulary load research needs to involve completely ‘clean’ texts, which means that all vocabulary in the texts can be completely analysed by computer. That is, there would be no unrecognised or uncategorised words in the texts. The Coxhead et al. (2010) study may have overestimated the vocabulary load of the Science texts because not all of the vocabulary in the texts had been categorised using the BNC lists, as at that time, the lists only went up to 14,000 words and the researchers did not take all the vocabulary into account. For this reason, the LATTE project has ensured that all of the written and spoken corpora from the four trades have been categorised according to Nation’s 25,000 BNC/COCA lists (see Nation, 2013) and words outside those lists have also been categorised. Such work is painstaking, but more accurate. Increased specialisation is a feature of Science education, as learners move from more general science topics in the junior school through to more specialised subjects in the senior school, for example, Biology and Chemistry. Mathematics also moves into specialisation at the senior school level, for example, Calculus and Statistics. A useful project for teachers and learners could be to map the development of technical vocabulary through from the junior school to the senior school. Furthermore, research into the actual vocabulary produced by learners in

schools would be useful, following on the learner corpora focus of colleagues working in this field, such as Sylviane Granger and others at Louvain University in Belgium. Chapter 8 focused on specialised vocabulary in the trades. There are many potential areas of research on vocabulary for this specialised field of education. It is important that more trades-based research projects in vocabulary are carried out as a way to validate findings from projects such as the LATTE project. It has become clear in the course of the LATTE project that there are major differences internationally in trades education, for example, in the Norwegian context, trades education follows a similar path to other tertiary level studies. An international project on vocabulary in the trades could help with understanding more about trades education in different education systems. The findings could be used to support learning and teaching in English and other languages. The LATTE project has shed light on the importance of oral communication in the trades for learners and teachers. The learners, as reported in Chapter 8, emphasise the importance of paying attention and focusing on the vocabulary of the trades as essential in their learning. Literacy is a challenge for some of these learners. A longitudinal study of a range of learners in trades education would be a useful way to develop our understanding of the vocabulary of the trades as it develops through a period of time. Such a study could also follow through from trades education to the workplace to investigate specialised vocabulary in use when learners are talking with clients and industry.

Multiword units Research into multiword units has tended to focus mostly on EAP, meaning that there is much more work to be done in other areas of ESP. What are the multiword units of secondary school subjects, for example, and what role do they play in written and spoken texts? Many of the lexical bundles studies in EAP in Chapters 4 and 6 involve categorisation of these lists in terms of functions, which is helpful for learners and teachers in terms of understanding why these items may be common in texts and what role they play. Discourse studies using move analysis (Swales, 1990) are contributing more to this field, as studies such as

Cortes (2013) show. Formulaic language research is beginning to pay more attention to slots or frames analysis, which presents some methodological problems that researchers such as Cheng (2012) and Greaves (2009) have paid particular attention to, for example. There is clearly scope in the field of metaphor for more research and analysis, not just in terms of identifying and categorising metaphors as in the work by Littlemore and colleagues (2010, 2011), and in different areas of specialisation as in Business and Medicine, but in terms of integration into courses of study and effective ways of learning and teaching, as in the work by Boers and colleagues (2014).

Conclusion The purpose of this chapter has been to point out areas of research that might make us look again at research that has already been done for inspiration for new areas of research, for ideas for expansion, or for consideration for replication. There has been a wide range of interesting and effortful work in vocabulary for academic and specific purposes, by people working alone or in research groups, and in many areas of the world. This book has been motivated by these people, their approach to research, their suggestions for future research and their desire to find out more about vocabulary in ESP to better serve the needs of learners and teachers. I hope this book has identified possible gaps in the field of specialised vocabulary in ESP and provided some suggestions for future research. There is much work to be done, and there are many good reasons to do it.

Appendix 1 Questions 4 and 5 from the online survey on how teachers decide what specialised vocabulary to focus on (Coxhead, 2011)

5. Please tell us about other ways you decide on what specialised vocabulary to focus on.

Appendix 2 Student questionnaire for the Language in the Trades Project

1. What qualification are you studying? 2. What courses you are currently taking? 3. Rank each skill in terms of their importance for your study. 1 = least important; 10 = most important Most important

Fairly important

Not very important

Least important

Reading Writing Listening Speaking Vocabulary 4. What reading do students need to do in courses that you are taking? Daily Weekly Monthly Never Course textbook Chapters from the textbook Course workbook Lecture slides/PowerPoints Worksheets/handouts Websites

Websites Instruction manuals Site plans Building codes Official documents – e.g. industry codes, manufacturer’s specifications Other (please specify)

Are you assessed on any of these?   □ Yes   □ No If yes, which ones are you assessed on? How are you assessed on them? 5. What writing do students need to do in courses that you are taking? Daily Weekly Monthly Never Report on what you do in the workshop Report on work done on-site Summaries Short answers to questions in workbooks Reports written in teams/groups Notes on work complete - e.g. Builders' Diaries/record of work Short answers to questions in assessments Other (please specify) Are you assessed on any of these?   □ Yes   □ No If yes, which ones are you assessed on? How are you assessed on them? 6. What speaking do students need to do in courses that you are taking? Daily Weekly Monthly Never Working with a group on writing tasks Pair work on writing tasks (1 × blank) Class discussions

Class discussions Presentation Working with a group in workshops (2 × blank) Pair work in workshops Working with a group on-site Pair work on-site Talking to site manager Talking to visiting officials – e.g. BCITO, building inspectors Other (please specify) Are you assessed on any of these?   □ Yes   □ No If yes, which ones are you assessed on? How are you assessed on them? 7. What listening do students need to do in courses that you are taking? Daily Weekly Monthly Never Listening to tutor in classes Listening to tutor in workshops Listening to tutor on-site Working with other students on writing tasks Working with other students in classes Working with other students in workshops Working with other students on-site Listening to site manager Listening to visiting officials – e.g. BCITO, building inspectors Other (please specify)

Are you assessed on any of these?   □ Yes   □ No If yes, which ones are you assessed on? How are you assessed on them? 8. What other language tasks do students have to do in courses you are taking? 9. What kind of words do you need to know to learn Carpentry? 10. What’s the most difficult thing for you about learning new vocabulary in your trade and why? 11. What do you do when you hear a word or phrase that is new? 12. How does your tutor support your with learning new words or terms related to your trade? 13. If you were advising a friend about taking this trade course next year, what advice would you give him or her about how to learn the vocabulary that they need? 14. What reading, writing, speaking and listening do you think you will have to do in your job?

References

Ackermann, K. & Chen, Y.-H. 2013. Developing the Academic Collocation List (ACL): A corpus-driven and expert-judged approach. Journal of English for Academic Purposes, 12(4): 235–247. Ädel, A. 2014. Selecting quantitative data for qualitative analysis: A case study connecting a lexicogrammatical pattern to rhetorical moves. Journal of English for Academic Purposes, 16: 68–80. Ädel, A. & Erman, B. 2012. Recurrent word combinations in academic writing by native and non-native speakers of English: A lexical bundles approach. English for Specific Purposes, 31: 81–92. Ädel, A. & Römer, U. 2012. Research on advanced student writing across disciplines and levels: Introducing the Michigan corpus of upper-level student papers. International Journal of Corpus Linguistics, 17(1): 3–34. Aiguo, W. 2007. Teaching aviation English in the Chinese context: Developing ESP theory in a non-English speaking country. English for Specific Purposes, 26: 121–128. Akbarian, I. 2010. The relationship between vocabulary size and depth for ESP/EAP learners. System, 38(3): 391–401. Altenberg, B. 1998. On the phraseology of spoken English: The evidence of recurrent word-combinations, in Phraseology: Theory, analysis, and application, edited by Anthony Cowie. Oxford: Clarendon Press: 101–122. Anthony, L. 7 Febuary 2016. Ant Word Profiler. [Online]. Available: www.laurenceanthony.net/software/antwordprofiler/. Ardasheva, Y. & Tretter, T. 2017. Developing science-specific, technical vocabulary of high school newcomer English learners. International Journal of Bilingual Education and Bilingualism, 20(3): 252–271. Barton, D. & Cox, A. 2013. Delta mathematics. Auckland: Pearson. Basturkmen, H. 2006. Ideas and options for English for Specific Purposes.

Mahwah, NJ: Continuum. Basturkmen, H. 2010. Developing courses in English for Specific Purposes. Basingstoke, Hampshire, UK: Palgrave Macmillan. Basturkmen, H. & Shackleford, N. 2015. How content lecturers help students with language: An observational study of language-related episodes in interaction in first year accounting classrooms. English for Specific Purposes, 37: 87–97. Bauer, L. & Nation, I. S. P. 1993. Word families. International Journal of Lexicography, 6: 253–279. Becher, T. 1989. Academic tribes and territories. Milton Keynes, UK: The Society for Research into Higher Education and Open University Press. Beck, I., McKeown, M. & Kucan, L. 2013. Bringing words to life: Robust vocabulary instruction (second edition). New York: Guildford Press. Belcher, D., Serrano, F. & Yang, H. 2016. English for professional academic purposes, in The Routledge handbook of English for academic purposes, edited by Ken Hyland & Phillip Shaw. Abingdon, Oxon, UK: Routledge: 502–514. Bennett, G. 2010. Using corpora in the language learning classroom. Ann Arbor, MI: University of Michigan Press. Biber, D. 2006. University language. Amsterdam: John Benjamins. Biber, D. & Barbieri, F. 2007. Lexical bundles in university spoken and written registers. English for Specific Purposes, 26(3): 263–286. Biber, D., Conrad, S. & Cortes, V. 2004. If you look at…: Lexical bundles in university teaching and textbooks. Applied Linguistics, 25(3): 371–405. Biber, D., Conrad, S. & Reppen, R. 1998. Corpus linguistics: Investigating language structure and use. Cambridge: Cambridge University Press. Biber, D., Johansson, S., Leech, G., Conrad, S. & Finegan, E. 1999. Longman grammar of spoken and written English. Harlow, England: Pearson Education. Biber, D., Reppen, R., Schnur, E. & Ghanem, R. 2016. On the (non)utility of Juilland’s D to measure lexical dispersion in large corpora. International Journal of Corpus Linguistics, 21(4): 439–464. Boers, F. 1997. No pain, no gain in a free-market: A test for cognitive semantics? Metaphor and Symbol, 12: 231–241. Boers, F. 2000. Enhancing metaphoric awareness in specialised reading. English for Specific Purposes, 19(2): 137–147.

Boers, F., Demecheleer, M., Coxhead, A. & Webb, S. 2014. Gauging the effectiveness of exercises on verb-noun collocations. Language Teaching Research, 18(1): 50–70. Boers, F. & Lindstromberg, S. 2012. Experimental and intervention studies on formulaic sequences in a second language. Annual Review of Applied Linguistics, 32: 83–110. Bosher, S. 2013. English for nursing, in The handbook of English for specific purposes, edited by B. Paltridge & S. Starfield. Boston: Wiley-Blackwell: 263– 281. Breeze, R. 2015. Teaching the vocabulary of legal documents: A corpus-driven approach. ESP Today, 3(1): 44–63. Brezina, V. & Gablasova, D. 2015. Is there a core general vocabulary? Introducing the new general service list. Applied Linguistics, 36(1): 1–22. Brown, D. 2010. An improper assumption? The treatment of proper nouns in text coverage counts. Reading in a Foreign Language, 22: 355–361. Browne, C. & Culligan, B. 2016. A New Business Service List. [Online]. Available: www.newgeneralservicelist.org/bsl-business-servicelist/. Browne, C., Culligan, B. & Phillips, J. 2013a. A New Academic Word List. [Online]. Available: www.newacademicwordlist.org/. [7 February, 2017]. Browne, C., Culligan, B. & Phillips, J. 2013b. A New General Service List. [Online]. Available: www.newgeneralservicelist.org/. [7 February, 2017]. Byrd, P. & Coxhead, A. 2010. On the other hand: Lexical bundles in academic writing and in the teaching of EAP. University of Sydney Papers in TESOL, 5: 31–64. Cameron, R. 1998. Language-focused needs analysis for ESL-speaking nursing students in class and clinic. Foreign Language Annals, 31: 203–218. Canziani, T. & Mungra, P. 2013. Lexicographic studies in medicine: Academic word list for clinical case histories. Ibérica, 25: 39–62. Carter, R. & McCarthy, M. 2006. Cambridge grammar of English. Cambridge: Cambridge University Press. Casey, H., Cara, O., Eldred, J., Grief, S., Hodge, R., Ivanič, R., Jupp, T., Lopez, D. & McNeil, B. 2006. ‘You wouldn’t expect a maths teacher to teach plastering…’: Embedding literacy, language and numeracy in post-16 vocational programmes: The impact on learning and achievement. London: National

Research and Development Centre for Adult Literacy and Numeracy. Chan, S. 2013. Learning a trade: Becoming a trades person through apprenticeship. Wellington: Ako Aotearoa. Available: https://akoaotearoa.ac.nz/download/ng/file/group-7/learning-a-trade.pdf. Charles, M. 2012. ‘Proper vocabulary and juicy collocations’: EAP students evaluate do-it-yourself corpus-building. English for Specific Purposes, 31: 93– 102. Charles, M. 2014. Getting the corpus habit: EAP students’ long term use of personal corpora. English for Specific Purposes, 35: 30–40. Charles, M. & Pecorari, D. 2016. Introducing English for academic purposes. London: Routledge. Charteris-Black, J. 2000. Metaphor and vocabulary teaching in ESP economics. English for Specific Purposes, 19: 149–165. Charteris-Black, J. & Musolff, A. 2003. ‘Battered hero’ or ‘innocent victim’? A comparative study of metaphors for euro trading in British and German financial reporting. English for Specific Purposes, 22(2): 153–176. Chen, Q. & Ge, C. 2007. A corpus-based lexical study on frequency and distribution of Coxhead’s AWL word families in medical research articles. English for Specific Purposes, 26: 502–514. Chen, Y.-H. & Baker, P. 2010. Lexical bundles in L1 & L2 academic writing. Language Learning & Technology, 14(2): 30–49. Cheng, W. 2012. Exploring corpus linguistics: Language in action. New York: Routledge. Cheng, W. 2014. Using concgrams to investigate research article sections, in Corpus analysis for descriptive and pedagogical purposes, edited by Maurizio Gotti & Davide S. Giannoni. Bern: Peter Lang: 63–84. Cheng, W., Greaves, C., Sinclair, J. & Warren, M. 2009. Uncovering the extent of the phraseological tendency: Towards a systematic analysis of concgrams. Applied Linguistics, 30(2): 236–252. Chujo, K. & Utiyama, M. 2006. Selecting level-specific specialized vocabulary using statistical measures. System, 34: 255–269. Chung, T. 2003. A corpus comparison approach for terminology extraction. Terminology, 9(2): 221–245. Chung, T. & Nation, I. S. P. 2003. Technical vocabulary in specialised texts.

Reading in a Foreign Language, 15(2): 103−116. Chung, T. & Nation, I. S. P. 2004. Identifying technical vocabulary. System, 32(2): 251−263. Clyne, M. 1985. Language maintenance and language shift: Some data from Australia, in Language of inequality: A reader in sociolinguistics, edited by Nessa Wolfson & Joan Manes. The Hague: Mouton: 195–206. Clyne, M. 1991. Community languages: The Australian experience. Cambridge: Cambridge University Press. Clyne, M. & Kipp, S. 1996 Language maintenance and language shift in Australia, 1991. Australian Review of Applied Linguistics, 19(1): 1–19. Cobb, T. n.d. Compleat Lexical Tutor. [Online]. Available: www.lextutor.ca/vp/comp/. [6 February, 2017]. Cobb, T. 2013. Frequency 2.0: Incorporating homoforms and multiword units into pedagogical frequency lists, in L2 vocabulary acquisition, knowledge and use: New perspectives on assessment and corpus analysis, edited by Camilla Bardel, Christina Lindqvist & Batia Laufer, Eurosla Monographs Series, 2, 79107. Cobb, T. & Horst, M. 2004. Is there room for an AWL in French? In Vocabulary in a second language: Selection, acquisition, and Testing, edited by Paul Bogaards & Batia Laufer. Amsterdam: John Benjamins: 15–38. Colley, H., James, D., Diment, K. & Tedder, M. 2003. Learning as becoming in vocational education and training: Class, gender and the role of vocational habitus. Journal of Vocational Education & Training, 55(4): 471–498, DOI: 10.1080/13636820300200240. Corson, D. 1985. The lexical bar. Oxford: Pergamon Press. Corson, D. 1997. The learning and use of academic English words. Language Learning, 47(4): 671–718. Cortes, V. 2013. The purpose of this study is to: Connecting lexical bundles and moves in research article introductions. Journal of English for Academic Purposes, 12: 33–43. Coxhead, A. 2000. A new academic word list. TESOL Quarterly, 34(2): 213−238. Coxhead, A. 2007. Factors and aspects of knowledge affecting L2 word use in writing, in Teaching and learning vocabulary in another language, edited by Peter Davidson, Christine Coombe, Dwight Lloyd & David Palfreyman.

Dubai: TESOL Arabia: 331–342. Coxhead, A. 2008. Phraseology and English for academic purposes challenges and opportunities, in Phraseology in language learning and teaching, edited by Fanny Meunier & Sylvian Granger. Amsterdam: John Benjamins: 149–161. Coxhead, A. 2011a. Exploring specialised vocabulary in secondary schools: What difference might subject, experience, year level, and school decile make? TESOLANZ Journal, 19: 37–52. Coxhead, A. 2011b. What is the exactly word in English? Investigating second language vocabulary use in writing. English Australia, 27(1): 3–17. Coxhead, A. 2011c. The academic word list 10 years on: Research and teaching implications. TESOL Quarterly, 45(2): 355–362. Coxhead, A. 2011d. Using vocabulary in writing in a second language: Writing from sources. Kö ln: LAP Lambert Academic Publishing. Coxhead, A. 2012a. Specialised vocabulary in secondary school classrooms: Teachers’ perspectives, in Teaching language to learners of different age groups, edited by Hannah Pillay & Marie Yeo. Singapore: SEAMEO Regional Language Centre: 194–205. Coxhead, A. 2012b. Academic vocabulary, writing and English for academic purposes: Perspectives from second language learners. RELC Journal, 43: 137– 145. Coxhead, A. 2012c. Researching vocabulary in secondary school English texts: The Hunger Games and more. English in Aotearoa, 78: 34–41. Coxhead, A. 2013. Vocabulary and ESP, in The handbook of English for specific purposes, edited by Brian Paltridge & Sue Starfield. Boston: Wiley-Blackwell: 115–132. Coxhead, A. 2014a. Identifying specialised vocabulary, in New ways in teaching vocabulary, revised, edited by Averil Coxhead. Alexandria, VA: TESOL Inc: 251–252. Coxhead, A. 2014b. Poster carousels, in New ways in teaching vocabulary, revised, edited by Averil Coxhead. Alexandria, VA: TESOL Inc: 266–268. Coxhead, A. 2015a. Vocabulary, English for academic purposes, and the Janus moment: Mind the gap, in Proceedings of the BALEAP 2013 conference: The Janus moment in EAP, edited by Michael Kavanagh & Lisa Robinson. Reading, UK: Garnett Education: 13–21.

Coxhead, A. 2015b. Replication research in pedagogical approaches to formulaic sequences: Jones & Haywood 2004 and Alali & Schmitt 2012. Language Teaching/FirstView Article, June: 1–11, DOI: 10.1017/S0261444815000221. [Published online: 5 June, 2015]. Coxhead, A. 2015c. Joists, dwangs & pink batts: Writing and the specialised vocabulary of Carpentry, paper presented to 14th Symposium on Second Language Writing, Auckland, NZ, 19–21 November 2015. Available: http://sslw.asu.edu/2015/programme.pdf. [3 August, 2017]. Coxhead, A. 2016a. Reflecting on Coxhead (2000) a new academic word list. TESOL Quarterly, 1: 181–185. Coxhead, A. 2016b. Acquiring academic and discipline specific vocabulary, in Routledge handbook of English for academic purposes, edited by Ken Hyland & Phillip Shaw. London: Routledge: 177–190. Coxhead, A. 2017a. Approaches and perspectives on teaching vocabulary for discipline-specific academic writing, in Discipline-specific writing: Theory into practice, edited by John Flowerdew & Tracey Costley. London: Routledge: 62–76. Coxhead, A. 2017b. The lexical demands of teacher talk: An international school study of EAL, maths & science, in Academic language in a Nordic setting: Linguistic and educational perspectives, edited by Ruth Vatvedt Fjeld, Kristin Hagen, Birgit Henriksen, Sofie Johannson, Sussi Olsen & Julia Prentice. Oslo Studies in Language, 9(1). (ISSN 1890–9639). Coxhead, A. in press. The lexical demands of teacher talk: An international school study of EAL, Maths and Science, in Academic language in a Nordic setting – linguistic and educational perspectives, edited by Ruth Fjeld, Kristin Hagen, Birgit Henriksen, Sofie Johannson, Sussi Olsen & Julia Prentice. Oslo: University of Oslo. Coxhead, A. & Byrd, P. 2012. Collocations and academic word list: The strong, the weak and the lonely, in Encoding the past, decoding the future: Corpora in the 21st century, edited by Isabel Moskowich & Begoña Crespo. Cambridge: Cambridge Scholars Publishing: 1–20. Coxhead, A. & Bytheway, J. 2015. Learning vocabulary using two massive online resources: You will not blink, in Learning beyond the classroom, edited by David Nunan & Jack Richards. New York: Routledge: 65–74.

Coxhead, A. & Demecheleer, M. Under review. Investigating the technical vocabulary of Plumbing: Using corpus research to support pedagogy. Coxhead, A., Demecheleer, M. & McLaughlin, E. 2016. The technical vocabulary of Carpentry: Loads, lists and bearings. TESOLANZ Journal, 24: 38–71. Coxhead, A. & Hirsh, D. 2007. A pilot science word list for EAP. Revue Française de Linguistique Appliqueé, 12(2): 65–78. Coxhead, A., Nation, I. S. P. & Sim, D. 2015. Vocabulary size and native speaker secondary school students. New Zealand Journal of Educational Studies, 50(1): 121–135. Coxhead, A., Parkinson, J. & Tu’amoheloa, F. Under review. Using Talanoa to develop bilingual word lists of technical vocabulary in the trades. Coxhead, A. & Quero, B. 2015. Investigating a science vocabulary list in university medical textbooks. TESOLANZ Journal, 23: 55–65. Coxhead, A., Stevens, L. & Tinkle, J. 2010. Why might secondary science textbooks be difficult to read? New Zealand Studies in Applied Linguistics, 16(2): 35–52. Coxhead, A. & Walls, R. 2012. TED Talks, vocabulary, and listening for EAP. TES OLANZ Journal, 20: 55–67. Coxhead, A. & White, R. 2012. Building a corpus of secondary school texts: First you have to catch the rabbit. New Zealand Studies in Applied Linguistics, 18(2): 67–73. Crawford Camiciottoli, B. 2007. The language of business studies lectures. Amsterdam: John Benjamins. Csomay, E. & Petrović, M. 2012. Yes, your honor!: A corpus-based study of technical vocabulary in discipline-related movies and TV shows. System, 40: 305–315. Cutting, J. 2012. English for airport ground staff. English for Specific Purposes, 31: 3–13. Dahm, M. 2011. Exploring perception and use of everyday language and medical terminology among international medical graduates in a medical ESP course in Australia. English for Specific Purposes, 30: 186–197. Dang, Y., Coxhead, A. & Webb, S. in press. The academic spoken word list. Language Learning. Dang, Y. & Webb, S. 2014. The lexical profile of academic spoken English. English

for Specific Purposes, 33: 66–76. Douglas, D. 2013. ESP and assessment, in The handbook of English for Specific Purposes, edited by Bryan Paltridge & Sue Starfield. Boston: Wiley-Blackwell: 367–383. Dudley-Evans, T. & St. John, M. 1998. Developments in English for Specific Purposes. Cambridge: Cambridge University Press. Durrant, P. 2008. Investigating the viability of a collocation list for students of English for academic purposes. English for Specific Purposes, 28: 157–169. Durrant, P. 2009. Investigating the viability of a collocation list for students of English for academic purposes. English for Specific Purposes, 28: 157–169. Durrant, P. 2014. Discipline-and level-specificity in university students’ written vocabulary. Applied Linguistics, 35(3): 328–356. Durrant, P. 2016. To what extent is the academic vocabulary list relevant to university student writing? English for Specific Purposes, 43: 49–61. Ebeling, S. & Hasselgård, H. 2015. Learner corpora and phraseology, in The Cambridge handbook of learner corpus research, edited by Sylvianne Granger, Gaëtanelle Gilquin & Fanny Meunier. Cambridge: Cambridge University Press: 207–230. Edwards, R., Minty, S. & Miller, K. 2013. The literacy practices for assessment in the vocational curriculum: The case of hospitality. Journal of Vocational Education and Training, 65(2): 220–235. Elgort, I. & Coxhead, A. 2016. An introduction to the vocabulary size test: Description, application and evaluation, in Trends in language assessment research and practice, edited by Janna Fox & Vahid Aryadoust. Cambridge: Cambridge Scholars Publishing: 286–301. Elley, W. & Mungubhai, F. 1981. The impact of a book flood in Fiji primary schools. Wellington: New Zealand Council for Educational Research. Ellis, R. 1999. Learning a second language through interaction. Amsterdam: John Benjamins. Ellis, R. 2005. Principles of instructed language learning. System, 33: 209–224. Eriksson, A. 2012. Pedagogical perspectives on bundles: Teaching bundles to doctoral students of biochemistry, in Input, process and product: Developments in teaching and language corpora, edited by James Thomas & Alex Boulton. Brno: Masaryk University Press: 195–211.

Erman, B. & Warren, B. 2000. The idiom principle and the open choice principle. Text, 20(1): 29–62. Estival, D. 2016. Aviation English: A linguistic description, in Aviation English: A lingua Franca for pilots and air traffic controllers, edited by Dominique Estival, Candace Farris & Brett Molesworth. London: Routledge: 22–53. Estival, D., Farris, C. & Molesworth, B. 2016. Aviation English: A lingua Franca for pilots and air traffic controllers. London: Routledge. Estival, D. & Molesworth, B. 2016. Native English speakers and EL2 pilots: An experimental study, in Aviation English: A lingua Franca for pilots and air traffic controllers, edited by Dominique Estival, Candace Farris & Brett Molesworth. London: Routledge: 22–53. Evans, S. & Morrison, B. 2011. The first term at university: Implications for EAP. ELT Journal, 65(4): 387–397. Fang, Z. 2006. The language demands of science reading in middle school. International Journal of Science Education, 28(5): 491–520. Farrell, P. 1990. Vocabulary in ESP: A lexical analysis of the English of electronics and a study of semi-technical vocabulary. CLCS Occasional Paper No. 25. Trinity College. Ferguson, G. 2013. English for medical purposes, in The handbook of English for specific purposes, edited by Brian Paltridge & Sue Starfield. Boston: WileyBlackwell: 343–261. File, K. 2014. A low-stakes vocabulary test, in New ways in teaching vocabulary, revised, edited by Averil Coxhead. Alexandria, VI: TESOL Inc: 107–110. Flowerdew, J. 2015. Some thoughts on English for Research Publication Purposes (ERPP) and related issues. Language Teaching, 48(2): 250–262. Flowerdew, L. 2014. Which unit of linguistic analysis of ESP corpora of written text?, in Corpus analysis for descriptive and pedagogical purposes, edited by Maurizio Gotti & Davide Giannoni. Bern: Peter Lang: 25–41. Flowerdew, L. 2015a. Learner corpora and language for academic and specific purposes, in The Cambridge handbook of learner corpus research, edited by Sylviane Granger, Gaëtanelle Gilquin & Fanny Meunier. Cambridge: Cambridge University Press: 465–484. Flowerdew, L. 2015b. Corpus-based research and pedagogy in EAP: From lexis to genre. Language Teaching, 48(1): 99–116.

Folse, K. 2010. Is explicit vocabulary focus the reading teacher’s job? Reading in a Foreign Language, 22(1): 139–160. Fountain, R. L. & Nation, I. S. P. 2000. A vocabulary-based graded dictation test. RELC Journal, 31(2): 29–44. Francis, W. N. & Kucera, H. 1979. A standard corpus of present-day edited American English, for use with digital computers. Providence, RI: Department of Linguistics, Brown University. Franken, M. & Hunter, J. 2012. The Midlands Health Literacy Project, Phase 3: Research Report. [Online]. Available: www.midlandshn.health.nz/uploads/midland-health-phase-3-researchreport.pdf. [16 May, 2017]. Fraser, S. 2007. Providing ESP learners with the vocabulary they need: Corpora and the creation of specialized word lists. Hiroshima Studies in Language and Language Education, 10: 127–143. Fraser, S. 2009. Breaking down the divisions between general, academic, and technical vocabulary: The establishment of a single, discipline-based word list for ESP learners. Hiroshima Studies in Language and Language Education, 12: 151–167. Gablasova, D. 2014. Learning and retaining specialized vocabulary from textbook reading: Comparison of learning outcomes through L1 and L2. The Modern Language Journal, 98(4): 976–991. Gardner, D. 2013. Exploring vocabulary: Language in action. London and New York: Routledge. Gardner, D. & Davies, M. 2007. Pointing out frequent phrasal verbs: A corpusbased analysis. TESOL Quarterly, 41(2): 339–359. Gardner, D. & Davies, M. 2014. A new academic vocabulary list. Applied Linguistics, 35(3): 305–327. Gardner, D. & Davies, M. 2016. A response to ‘To what extent is the academic vocabulary list relevant to university student writing?’. English for Specific Purposes, 43: 62–68. Ghadessy, M. 1979. Frequency counts, word lists, and material preparation: A new approach. English Teaching Forum, 17(1): 24–27. Ghadessy, M., Henry, A. & Roseberry, R. 2001. Small corpus studies and ELT theory and practice. Amsterdam: John Benjamins.

Gibbons, P. 2006. Bridging discourses in the ESL classroom: Students, teachers and researchers. London: Continuum. Gilquin, L., Granger, S. & Paquot, M. 2007. Learner corpora: The missing link in EAP pedagogy. Journal of English for Academic Purposes, 6: 319–335. Gledhill, C. 2000. The discourse function of collocation in research article introductions. English for Specific Purposes, 19: 115–135. Gleeson, M. 2010. Language AND content: How do curriculum teachers of year 12 English language learners combine two disciplines? Unpublished PhD thesis. Wellington: Victoria University of Wellington. Gotti, M. & Giannoni, D. 2014. Corpus analysis for descriptive and pedagogical purposes. Bern: Peter Lang. Gouverneur, C. 2008. The phraseological patterns of high-frequency verbs in advanced English for general purposes: A corpus-driven approach to EFL textbook analysis, in Phraseology in foreign language learning and teaching, edited by Fanny Meunier & Sylvianne Granger. Amsterdam: John Benjamins: 223–243. Grabowski, L. 2015. Keywords and lexical bundles within English pharmaceutical discourse: A corpus-driven description. English for Specific Purposes, 38: 23– 33. Granger, S. 1998. Prefabricated patterns in advanced EFL writing: Collocations and formulae, in Phraseology theory, analysis and applications, edited by Anthony Cowie. Oxford: Clarendon Press: 145–160. Granger, S. & Paquot, M. 2008. Disentangling the phraseological web, in Phraseology: An interdisciplinary perspective, edited by Sylviane Granger & Fanny Meunier. Amsterdam: John Benjamins: 27–50. Granger, S. & Paquot, M. 2010. Customising a general EAP dictionary to meet learner needs in eLexicography in the 21st century: New challenges, new applications, in Proceedings of ELEX2009, edited by Sylvianne Granger & Magoli Paquo. Cahiers du CENTAL 7, Louvain-la-Neuve: Presses universitaires de Louvain: 87–96. Greaves, C. 2009. ConcGram: A phraseological search engine. Amsterdam: John Benjamins. Greene, J. 2008. Academic vocabulary and formulaic language in middle school content area textbooks. Unpublished doctoral dissertation. Atlanta: Georgia

State University. Greene, J. & Coxhead, A. 2015. Academic vocabulary for Middle School students: Research-based lists and strategies for key content areas. Baltimore: Brookes Publishing. Gyllstad, H., Vilkaite, L. & Schmitt, N. 2015. Assessing vocabulary size through multiple-choice formats: Issues with guessing and sampling rates. International Journal of Applied Linguistics, 166: 276–303. Ha, Y.-H. 2015. Technical vocabulary in finance: A corpus-based study of annual reports and earnings calls. Unpublished PhD thesis. Hong Kong: The University of Hong Kong. Hafner, C. & Candlin, C. 2007. Corpus tools as an affordance to learning in professional legal education. Journal of English for Academic Purposes, 6: 303–318. Harmon, J., Hedrick, W. & Wood, K. 2005. Research on vocabulary instruction in the content areas: Implications for struggling readers. Reading & Writing Quarterly, 21: 261–280. Hartig, A. & Lu, X. 2014. Plain English and legal writing: Comparing expert and novice writers. English for Specific Purposes, 33: 87–96. Heatley, A., Nation, I. S. P. & Coxhead, A. 2002. RANGE Programme. [Online]. Available: www.victoria.ac.nz/lals/about/staff/paul-nation. [7 February, 2017]. Henriksen, B. & Danelund, L. 2015. Studies of Danish L2 learners’ vocabulary knowledge and the lexical richness of their written production in English, in Lexical issues in L2 writing, edited by Päivi Pietilä, Katalin Doró & Renata Pípalová. Newcastle upon Tyne: Cambridge Scholars Publishing: 1–27. Hiltunen, T. & Mäkinen, M. 2013. Formulaic language in Economics papers: Comparing novice and published writing, in Corpus analysis for descriptive and pedagogical purposes, edited by Maurizio Gotti & Davide S. Giannoni. Bern: Peter Lang: 347–368. Hincks, R. 2003. Pronouncing the Academic Word List: Features of L2 student oral presentations, in Proceedings of the 15th International Congress of Phonetics Sciences, edited by Maria-Josep Solé, Daniel Recasens & Joaquin Romero. Barcelona: Causal Productions: 1545–1548. Hirsh, D. & Coxhead, A. 2009. Ten ways of focusing on science-specific

vocabulary in EAP. English Australia Journal, 25(1): 5–16. Hoang, H. 2014. Metaphor and second language learning: The state of the field. TESL-EJ, 18(2). [Online]. Available: www.teslej.org/wordpress/issues/volume18/ej70/ej70a5//. [7 February, 2017]. Holmes, J. & Woodhams, J. 2013. Building interaction: The role of talk in joining a community of practice. Discourse & Communication, 7(3): 275–298. Hook, G. 2005. Science year 9. Auckland: New House. Horst, M. 2010. How well does teacher talk support incidental vocabulary acquisition? Reading in a Foreign Language, 22(1): 161–180. Hsu, W. 2013. Bridging the vocabulary gap for EFL medical undergraduates: The establishment of a medical word list. Language Teaching Research, 17(4): 454–484. Hsu, W. 2014. Measuring the vocabulary load of engineering textbooks for EFL undergraduates. English for Specific Purposes, 33: 54–65. Hu, M. & Nation, I. S. P. 2000. Vocabulary density and reading comprehension. Reading in a Foreign Language, 23: 403–430. Hulstijn, I. H. & Laufer, B. 2001. Some empirical evidence for the involvement load hypothesis in vocabulary acquisition. Language Learning, 51(3): 539–558. Humphrey, S. 2016. EAP in school settings, in The Routledge handbook of English for academic purposes, edited by Ken Hyland & Phillip Shaw. Abingdon, UK: Routledge: 447–460. Hunter, J. & Coxhead, A. 2007. New technologies in university lectures and tutorials: Opportunities and challenges for EAP Programmes. TESOLANZ, 15: 30–41. Hwang, K. & Nation, I. S. P. 1995. Where would general service vocabulary stop and special purposes vocabulary begin? System, 23(1): 35–41. Hyland, K. 2008. As can be seen: Lexical bundles and disciplinary variation. English for Specific Purposes, 27: 4–21. Hyland, K. & Tse, P. 2007. Is there an ‘academic vocabulary’? TESOL Quarterly, 41(2): 235–253. Ivanič, R. 1998. Writing and identity: The discoursal construction of identity in academic writing. Amsterdam: John Benjamins. Ivanič, R., Edwards, R., Barton, D., Martin-Jones, M., Fowler, Z., Hughes, B., Manion, G., Miller, K., Satchwell, C. & Smith, J. 2009. Improving learning in

college: Rethinking literacies across the curriculum. London: Routledge. Izwaini, S. 2003. Corpus-based study of metaphor in information technology. Proceedings of the workshop on corpus-based approaches to figurative language, Corpus Linguistics. Lancaster: Lancaster University: 1–8. Joe, A. 1998. What effects do text-based tasks promoting generation have on incidental vocabulary acquisition? Applied Linguistics, 19(3): 357–377. Johns, A. 2016. The common core in the United States: A major shift in standards and assessment, in The Routledge handbook of English for academic purposes, edited by Ken Hyland & Phillip Shaw. Abingdon, UK: Routledge: 461–501. Jones, M. & Haywood, S. 2004. Facilitating the acquisition of formulaic sequences: An exploratory study in an EAP context, in Formulaic sequences, edited by Norbert Schmitt. Amsterdam: John Benjamins: 269–291. Khani, R. & Tazik, K. 2013. Towards the development of an academic word list for applied linguistics research articles. RELC Journal, 44(2): 209–232. Kim, H. Y. 2016. Talking to learn: The hidden curriculum of a fifth-grade science class. English for Specific Purposes, 43: 1–12. Knoch, U. 2014. Using subject specialists to validate an ESP rating scale: The case of the International Civil Aviation Organisation (ICAO) rating scale. English for Specific Purposes, 33: 77–86. Konstantakis, N. 2007. Creating a business word list for teaching business English. Elia, 7: 79–102. Krisnamurthy, R. & Kosem, I. 2007. Issues in creating a corpus for EAP pedagogy and research. Journal of English for Academic Purposes, 6: 356–373. Kwary, D. 2011. A hybrid method for determining technical vocabulary. System, 39(2): 175–185. Lam, J. 2001. A study of semi-technical vocabulary in computer science texts, with special reference to ESP teaching and lexicography. Hong Kong: Hong Kong University of Science & Technology. Laufer, B. 1989. What percentage of text-lexis is essential for comprehension?, in Special language: From humans thinking to thinking machines, edited by Christer Laurén & Marianne Nordman. Clevedon: Multilingual Matters: 316– 323. Laufer, B. 1998. The development of passive and active vocabulary in a second language: Same or different? Applied Linguistics, 19(2): 255–271.

Laufer, B. & Hulstijn, J. 2001. Incidental vocabulary acquisition in a second language: The construct of task-induced involvement. Applied Linguistics, 22(1): 1–26. Laufer, B. & Nation, P. 1999. A vocabulary size test of controlled productive ability. Language Testing, 16(1): 33–51. Lee, D. & Swales, J. 2006. A corpus-based EAP course for NNS doctoral students: Moving from available specialized corpora to self-compiled corpora. English for Specific Purposes, 25: 56–75. Leech, G., Rayson, P. & Wilson, A. 2001. Word frequencies in written and spoken English. Harlow: Longman. Lei, L. & Liu, D. 2016. A new medical academic word list: A corpus-based study with enhanced methodology. Journal of English for Academic Purposes, 22: 42–53. Leung, C. 2005. Mathematical vocabulary: Fixers of knowledge or points of exploration? Language and Education, 19(2): 126–134. Li, Y. & Qian, D. 2010. Profiling the Academic Word List (AWL) in a financial corpus. System, 38(3): 402–411. Littlemore, J. 2001. Metaphor as a source of misunderstanding for overseas students in academic lectures. Teaching in Higher Education, 6(3): 333–351. Littlemore, J., Chen, P., Liyen Tang, P., Koester, A. & Barnden, J. 2010. The use of metaphor and metonymy in academic and professional discourse and their challenges for learners and teachers of English, in Fostering language teaching efficiency through cognitive linguistics, edited by Sabine De Knop, Frank Boers & Teun De Rycker. Berlin: de Gruyter: 189–211. Littlemore, J., Trautman Chen, P. & Koester, A. 2011. Difficulties in metaphor comprehension faced by international students whose first language is not English. Applied Linguistics, 32(4): 408–429. Liu, D. 2012. The most frequently-used multiword constructions in academic written English: A multi-corpus study. English for Specific Purposes, 31: 25– 35. Llinares, A., Morton, T. & Whittaker, R. 2012. The roles of language in CLIL. Cambridge: Cambridge University Press. Lopez, S., Condamines, A. & Joseelin-Leray, A. 2013. An LSP learner corpus to help with English radiotelephony teaching, in Twenty years of learner corpus

research: Looking back, moving ahead, edited by Sylvian Granger, Gaëtanelle Gilquin & Fanny Meunier. Louvain-la-Neuve: Presses universitaires de Louvain: 301–311. Luxton, J., Fry, J. & Coxhead, A. 2017. Exploring the knowledge and development of academic English vocabulary of students in NZ secondary schools. SET, 17(1): 12–22. Lynch, T. & McLean, J. 2000. Exploring the benefits of task repetition and recycling for classroom language learning. Language Teaching Research, 4: 221–250. Lynn, R. W. 1973. Preparing word lists: A suggested method. RELC Journal, 4(1): 25–32. Maher, P. 2016. The use of semi-technical vocabulary to understand the epistemology of a disciplinary field. Journal of English for Academic Purposes, 22: 92–108. Marín, M. J. 2014. Evaluation of five single-word term recognition methods on a legal English corpus. Corpora, 9(1): 83–107. Marra, M. 2013. English in the workplace, in The handbook of English for specific purposes, edited by Bryan Paltridge & Sue Starfield. Boston: Wiley-Blackwell: 175–192. Marston, J. & Hansen, A. 1985. Clinically speaking: ESP for refugee nursing students. MinneTESOL Journal, 5: 29–52. Martínez, I. 2003. Aspects of theme in the method and discussion sections of biology journal articles in English. Journal of English for Academic Purposes, 2: 103–123. Martínez, I., Beck, S. & Panza, C. 2009. Academic vocabulary in agriculture research articles. English for Specific Purposes, 28: 183–198. Martinez, R. & Schmitt, N. 2012. A phrasal expressions list. Applied Linguistics, 33(3): 299–320. Maxwell, L. 2013. Common core ratchets up language demands for English learners. Education Week. [Online]. Available: www.edweek.org/ew/articles/2013/10/30/10cc-academiclanguage.h33.html. [6 February, 2017]. McEnery, T., Xiao, R. & Tono, Y. 2006. Corpus-based language studies: An advanced resource book. London: Routledge.

McLean, S., Kramer, B. & Beglar, D. 2015. The creation and validation of a listening vocabulary levels test. Language Teaching Research, 19(6): 741–760. McNamara, T. 2015. Applied linguistics: The challenge of theory. Applied Linguistics, 36(4): 466–477. Meunier, F. & Granger. S. 2008. Phraseology in language learning and teaching. Amsterdam: John Benjamins. Mežek, Š., Pecorari, D., Shaw, P., Irvine, A. & Malmström, H. 2015. Learning subject-specific L2 terminology: The effect of medium and order of exposure. English for Specific Purposes, 38: 57–69. Miller, D. 2011. ESL reading textbooks vs. university textbooks: Are we giving our students the input they may need? Journal of English for Academic Purposes, 10: 32–46. Miller, D. & Biber, D. 2015. Evaluating reliability in quantitative vocabulary studies: The influence of corpus design and composition. International Journal of Corpus Linguistics, 20(1): 30–53. Miller, J. 2009. Teaching refugee learners with interrupted education in Science: Vocabulary, literacy and pedagogy. International Journal of Science Education, 31: 571–592. Ministry of Education. 2010. The literacy learning progressions. Wellington: Learning Media. Ministry of Education. n.d. Te Kete Ipurangi. [Online]. Available: https://www.tki.org.nz/. [24 July, 2017]. Moder, C. L. 2013. Aviation English, in The handbook of English for specific purposes, edited by Bryan Paltridge & Sue Starfield. Boston: Wiley-Blackwell: 227–242. Moder, C. & Halleck, G. 2012. Designing language tests for specific social uses, in Handbook of language testing, edited by Glenn Fulcher & Fred Davidson. Abingdon, UK: Routledge. Moore, T. & Morton, J. 2005. Dimensions of difference: A comparison of university writing and IELTS writing. Journal of English for Academic Purposes, 4: 43–66. Mudraya, O. 2006. Engineering English: A lexical frequency instructional model. English for Specific Purposes, 25: 235–256. Nation, I. S. P. 1983. Testing and teaching vocabulary. Guidelines, 5(1): 12–25.

Nation, I. S. P. 2000. Learning vocabulary in lexical sets: Dangers and guidelines. TESL Journal, 9(2): 6–10. Nation, I. S. P. 2001. Learning vocabulary in another language. Cambridge: Cambridge University Press. Nation, I. S. P. 2006. How large a vocabulary is needed for reading and listening? Canadian Modern Language Review, 63(1): 59–82. Nation, I. S. P. 2007. The four strands. Innovation in Language Learning and Teaching, 1(1): 2–13. Nation, I. S. P. 2008. Teaching vocabulary: Strategies and techniques. Boston: Heinle, Cengage Learning. Nation, I. S. P. 2012. The BNC/COCA Word Family Lists. [Online]. Available: www.victoria.ac.nz/lals/about/staff/paul-nation. [15 May, 2017]. Nation, I. S. P. 2013. Learning vocabulary in another language (second edition). Cambridge: Cambridge University Press. Nation, I. S. P. 2016. Making and using word lists for language learning and testing. Amsterdam: John Benjamins. Nation, I. S. P. & Beglar, D. 2007. A vocabulary size test. The Language Teacher, 31(7): 9–13. Nation, I. S. P. & Coxhead, A. 2014. Vocabulary size research at Victoria University of Wellington, New Zealand. Language Teaching, 47(3): 398–403. Nation, I. S. P., Coxhead, A., Chung, T. M. & Quero, B. 2016. Specialized word lists, in Making and using word lists for language learning and testing, edited by Paul Nation. Amsterdam: John Benjamins: 145–151. Nation, I. S. P. & Kobeleva, P. 2016. Proper nouns, in Making and using word lists for language learning and testing, edited by Paul Nation. Amsterdam: John Benjamins: 55–64. Nation, I. S. P. & Sorrell, J. 2016. Corpus selection and design, in Making and using word lists for language learning and testing, edited by Paul Nation. Amsterdam: John Benjamins: 95–105. Nation, I. S. P. & Webb, S. 2011. Researching and analyzing vocabulary. Boston: Heinle. Nation, I. S. P. & Yamamoto, A. 2012. Applying the four strands. International Journal of Innovation in English Language Teaching and Research, 1(2): 167– 181.

National Governors Association Center for Best Practices & Council of Chief State School Officers. 2010. Common Core State Standards Initiative. [Online]. Available: http://www.corestandards.org/. [3 August, 2017]. Nattinger, J. & DeCarrico, J. 1992. Lexical phrases and language teaching. Oxford: Oxford University Press. Nelson, M. n.d. Mike Nelson’s Business English Lexis Site. [Online]. Available: http://users.utu.fi/micnel/business_english_lexis_site.htm. [6 February, 2017]. Nelson, M. 2000. A corpus-based study of Business English and Business English teaching materials. Unpublished PhD Thesis. Manchester: University of Manchester. Nesi, H. 2001. EASE: A multimedia materials development project, in CALL: The challenge of change, edited by Keith Cameron. Exeter: Elm Bank Publications: 287–292. Nesi, H. & Basturkmen, H. 2006. Lexical bundles and discourse signalling in academic lectures. International Journal of Corpus Linguistics, 11(3): 283–304. Nesi, H. & Gardner, S. 2012. Genres across the disciplines: Student writing in higher education. Cambridge: Cambridge University Press. Nesselhauf, N. 2005. Collocations in a learner corpus. Amsterdam: John Benjamins. Neufeld, S., Hancioğlu, N. & Eldridge, J. 2011. Beware the range in RANGE, and the academic in AWL. System, 39(4): 533–538. New Zealand Qualifications Authority. 2017. Carry Out Safe Working Practices on Construction Sites. [Online]. Available: www.nzqa.govt.nz/. [3 February, 2017]. Nguyen, L. T. C. & Nation, I. S. P. 2011. A bilingual vocabulary size test of English for Vietnamese learners. RELC Journal, 42: 86–99. Northcott, J. 2013. Legal English, in The handbook of English for specific purposes, edited by Brian Paltridge & Sue Starfield. Boston: Wiley-Blackwell: 213–226. Nurweni, A. & Read, J. 1999. The English vocabulary knowledge of Indonesian university students. English for Specific Purposes, 18(2): 161–175. O’Hagan, S., Maias, E., Elder, C., Pill, J., Woodward-Kron, R., McNamara, T., Webb, G. & McColl, G. 2014. What counts as effective communication in nursing? Evidence from nurse educators’ and clinicians’ feedback on nurse interactions with simulated patients. Journal of Advanced Nursing, 70(6):

1344–1355. Paltridge, B. & Starfield, S. 2013. The handbook of English for specific purposes. Boston: Wiley-Blackwell. Paquot, M. 2010. Academic vocabulary in learner writing. London: Continuum. Paquot, M. & Granger, S. 2012. Formulaic language in learner corpora. Annual Review of Applied Linguistics, 32: 130–149. Pardillos, M. 2016. Increasing metaphor awareness in legal English teaching. ESP Today, 4(2): 165–183. Parkinson, J. 2013. English for science and technology, in The handbook of English for specific purposes, edited by Brian Paltridge & Sue Starfield. Boston: Wiley-Blackwell: 155–173. Parkinson, J., Demecheleer, M. & Mackay, J. 2017. Writing like a builder: Acquiring a professional genre in a pedagogical setting. Journal of English for Specific Purposes, 46: 29–44. Parkinson, J. & Mackay, J. 2016. The literacy practices of vocational training in carpentry and automotive technology. Journal of Vocational Education & Training, 68(1): 33–50. Perea-Barberá, M. & Bocanegra-Valle, A. 2014. Promoting specialised vocabulary learning through computer-assisted instruction, in Languages for specific purposes in the digital era, edited by Elena Bárcena, Timothy Read & Jorge Arus. Cham, Switzerland: Springer: 129–154. Pérez, M. & de los Rios, M. 2015. The financial language used to communicate the same socio-economic events in English and Spanish press through metaphor and metonymies. ESP Today, 3(2): 216–237. Peters, P. & Fernández, T. 2013. The lexical needs of ESP students in a professional field. English for Specific Purposes, 32: 236–247. Pinna, A. 2007. Exploiting LSP corpora in the study of foreign languages, in Languages for specific purposes: Searching for common solutions, edited Dita Gálová. Newcastle, UK: Cambridge Scholars Publishing: 146–162. Porte, G. (Ed.). 2012. Replication research in Applied Linguistics. Cambridge: Cambridge University Press. Quero, B. 2015. Estimating the vocabulary size of L1 Spanish ESP learners and the vocabulary load of medical textbook. Unpublished PhD thesis. Wellington, New Zealand: Victoria University of Wellington.

Radford, P. 2013. The tyranny of (semi-)technical vocabulary: Challenges facing the student of computer science. Unpublished MA thesis. Wellington: Victoria University of Wellington. Read, J. 2015. Assessing English proficiency for university study. Basingstoke: Palgrave Macmillan. Read, J. & Knoch, U. 2009. Clearing the air: Applied linguistic perspectives on aviation communication. Australian Review of Applied Linguistics, 32(3): 1– 21. Rusanganwa, J. 2013. Multimedia as a means to enhance teaching technical vocabulary to physics undergraduates in Rwanda. English for Specific Purposes, 32: 36–44. Salvi, R. 2014. Exploring political and banking language for institutional purposes, in Corpus analysis for descriptive and pedagogical purposes, edited by Maurizio Gotti & Davide S. Giannoni. Bern: Peter Lang: 241–261. Schmid, H.-J. 2000. English abstract nouns as conceptual shells: From corpus to cognition. Berlin: Mouton de Gruyter. Schmitt, D. & Hamp-Lyons, L. 2015. The need for EAP teacher knowledge in assessment. Journal of English for Academic Purposes, 18: 3–8. Schmitt, N. 2010. Researching vocabulary. Palgrave Macmillan. Schmitt, N., Cobb, T., Horst, M. & Schmitt, D. 2017. How much vocabulary is needed to use English? Replication of van Zeeland & Schmitt (2012), Nation (2006) and Cobb (2007). Language Teaching, 50(2): 212–226. Schmitt, N., Jiang, X. & Grabe, W. 2011. The percentage of words known in a text and reading comprehension. The Modern Language Journal, 95: 26–43. Schmitt, N. & Schmitt, D. 2012. A reassessment of frequency and vocabulary size in L2 vocabulary teaching. Language Teaching, 47(4): 484–503. Schmitt, N., Schmitt, D. & Clapham, C. 2001. Developing and exploring the behaviour of two new versions of the Vocabulary Levels Test. Language Testing, 18(1): 55–88. Scott, M. & Tribble, C. 2006. Textual patterns: Key words and corpus analysis in language education. Amsterdam: John Benjamins. Shin, D. & Nation, I. S. P. 2008. Beyond single words: The most frequent collocations in spoken English. ELT Journal, 62(4): 339–348. Simpson, R., Briggs, S., Ovens, J. & Swales, J. 2002. The Michigan corpus of

academic Spoken English. Ann Arbor, MI: The Regents of the University of Michigan. Simpson-Vlach, R. & Ellis, N. 2010. An academic formulas list: New methods in phraseology research. Applied Linguistics, 31(4): 487–512. Skorczynska Sznajder, H. 2010. A corpus-based evaluation of metaphors in a business English textbook. English for Specific Purposes, 29: 30–42. Staples, S. & Biber, D. 2014. The expression of stance in nurse-patient interactions: An ESP perspective, in Corpus analysis for descriptive and pedagogical purposes, edited by Maurizio Gotti & Davide S. Giannoni. Bern: Peter Lang: 123–142. Starfield, S. 2004. Why does this feel empowering? Thesis writing, concordancing, and the ‘corporatising’ university, in Critical pedagogy and language learning, edited by Bonny Norton & Kelleen Toohey. Cambridge: Cambridge University Press: 138–157. Storch, N. & Tapper, J. 2009. The impact of an EAP course on postgraduate writing. Journal of English for Academic Purposes, 8: 207–223. Sutarsyah, C., Nation, I. S. P. & Kennedy, G. 1994. How useful is EAP vocabulary for ESP? A corpus based case study. RELC Journal, 25(2): 34–50. Swain, M. 1995. Three functions of output in second language learning, in Principle and practice in Applied Linguistics: Studies in honour of H.G. Widdowson, edited by Guy Cook & Barbara Seidelhofer. Oxford: Oxford University Press: 125–144. Swales, J. 1990. Genre analysis: English in academic and research settings. Cambridge: University Press. Taboarda, A. 2012. Relationships of general vocabulary, science vocabulary, and student questioning with science comprehension in students with varying levels of English proficiency. Instructional Science, 40(6): 901–923. Tangpijaikul, M. 2014. Preparing business vocabulary for the ESP classroom. RELC Journal, 45(1): 51–65. Thai, C. & Boers, F. 2016. Repeating a monologue under increasing time pressure: Effects on fluency, complexity, and accuracy. TESOL Quarterly, 50: 369–393. Thompson, P. 2006. A corpus perspective on the lexis of lectures, with a focus on economics lectures, in Academic discourse across disciplines, edited by Ken Hyland & Marina Bondi. New York: Peter Lang: 253–270.

Thompson, P. & Nesi, H. 2001. The British Academic Spoken English (BASE) corpus project. Language Teaching Research, 5(3): 263–264. Tribble, C. 2011. Revisiting apprentice texts: Using lexical bundles to investigate expert and apprentice performances in academic writing, in A taste for corpora: In honor of Sylviane Granger, edited by Fanny Meuniers, Sylvie De Cock, Gaëtanelle Gilquin & Magali Paquot. Amsterdam: John Benjamins: 85– 108. Vaioleti, T. M. 2006. Talanoa research methodology: A developing position on Pacific research. Waikato Journal of Education, 12: 21–34. Valipouri, L. & Nassaji, H. 2013. A corpus-based study of academic vocabulary in chemistry research articles. Journal of English for Academic Purposes, 12: 248–263. van Tongeren, G. 1997. Metaphor in medical texts. Amsterdam: Rodopi. van Zeeland, H. & Schmitt, N. 2013. Lexical coverage in L1 and L2 listening comprehension: The same or different from reading comprehension? Applied Linguistics, 34(4): 457–479. Vaughn, K., Bonne, L. & Eyre, J. 2015. Knowing practice: Vocational thresholds for GPs, carpenters, and engineering technicians. Wellington: New Zealand Council for Educational Research and Ako Aotearoa. Verdaguer, I., Laso, N. J. & Salazar, D. 1996. Biomedical English: A corpus-based approach. Amsterdam: John Benjamins. Vincent, B. 2013. Investigating academic phraseology through combinations of very frequent words: A methodological exploration. Journal of English for Academic Purposes, 12: 44–66. Vongpumivitch, V., Huang, J. & Chang, Y.-C. 2009. Frequency analysis of the words in the Academic Word List (AWL) and non-AWL content words in applied linguistics research papers. English for Specific Purposes, 28(1): 33–41. Wang, H., Runtsova, T. & Chen, H. 2013. A comparative study of metaphor in English and Russian economic discourse. Text & Talk, 33(2): 259–288. Wang, J., Liang, S. & Ge, G. 2008. Establishment of a medical academic word list. English for Specific Purposes, 27: 442–458. Ward, J. 1999. How large a vocabulary do EAP engineering students need? Reading in a Foreign Language, 12(2): 309–324. Ward, J. 2007. Collocation and technicality in EAP engineering. Journal of

English for Academic Purposes, 6: 18–35. Ward, J. 2009. A basic engineering English word list for less proficient foundation engineering undergraduates. English for Specific Purposes, 28(3): 170–182. Watson-Todd, R. 2017. An opaque engineering word list: Which words should a teacher focus on? English for Specific Purposes, 45: 31–39. Webb, S. & Chang, A. 2012. Second language vocabulary growth. RELC Journal, 43(1): 113–126. Webb, S. & Paribakht, T. 2015. What is the relationship between the lexical profile of test items and performance on a standardized English proficiency test? English for Specific Purposes, 38: 34–43. Webb, S. & Sasao, Y. 2013. New directions in vocabulary testing. RELC Journal, 44(3): 263–277. Weltec. 2016. Certificate in automotive technology programme handbook. Wellington: Weltec. West, M. 1953. A general service list of English words. London: Longman, Green and Co. Wette, R. & Hawken, S. 2016. Measuring gains in an EMP course and the perspectives of language and medical educators as assessors. English for Specific Purposes, 42: 38–49. White, M. 2003. Metaphor and economics: The case of growth. English for Specific Purposes, 22(2): 131–151. Wittrock, M. 1974. Learning as a generative process. Educational Psychologist, 11(1): 87–96. Wood, D. C. & Appel, R. 2014. Multiword constructions in first year business and engineering university textbooks and EAP textbooks. Journal of English for Academic Purposes, 15: 1–13. Woodward-Kron, R. 2008. More than just jargon: The nature and role of specialist language in learning disciplinary knowledge. Journal of English for Academic Purposes, 7(4): 234–249. Wray, A. 2000. Formulaic sequences in second language teaching: Principle and practice. Applied Linguistics, 21(4): 463–489. Wray, A. 2002. Formulaic sequences and the lexicon. Cambridge: Cambridge University Press. Xue, G. & Nation, I. S. P. 1984. A university word list. Language Learning and

Communication, 3(2): 215-229. Yang, M.-N. 2015. A nursing academic word list. English for Specific Purposes, 37: 27–38. Zipf, G. K. 1935. Psycho-biology of language. New York: Houghton-Mifflin.

Index

abbreviations 34, 39, 142–3 academic collocations 48–51; Academic Collocations List 48; discipline-specific collocations 51 Academic Formulas List 56–7 academic vocabulary 87–105; Academic Vocabulary List 8–9, 40, 93–4; Academic Word List 23, 90–2, 115, 157, 159; postgraduate specialised vocabulary 103–5; pre-university specialised vocabulary 90–4; Science vocabulary at university 95–6 Ädel, A. 56–7 Agriculture vocabulary 96–7 Anthony, L. 33, 113, 121, 167 Applied Linguistics vocabulary 103–5 Architecture vocabulary 16, 106–7 Ardasheva, Y. 18, 76–7 Automotive Technology vocabulary 140–2 Aviation vocabulary 108–10 Baker, P. 54–5 Basturkmen, H. 53, 117 Biber, D. 11–12, 46–7, 52–4, 57, 94–5, 119, 166–7 bilingual word lists 139 Boers, F. 58–9 Breeze, R. 110–11, 153 British National Corpus frequency lists see Nation, I. S. P. Browne, C. 92, 113–14 Builders’ Diaries 133–7 Business studies vocabulary 112–15; Business word lists 112–15 Byrd, P. 49–53, 60–1

Carpentry vocabulary 37–8, 132–7 categorising specialised vocabulary 30–9 challenges of specialised vocabulary in secondary schools 65–7, 84–6; in professions 106, 108; in the trades 125, 173 Chemistry vocabulary 97–8 Chen, Q. 100 Chen, Y.-H. 48, 54–5 Cheng, W. 115 Chujo, K. 19–20 Chung, T. 2, 8, 13–15, 18–19, 103–4, 129, 135, 170 classification of specialised vocabulary 30–9 classroom-based approaches 17–18, 153–4, 169 classroom-based research into specialised vocabulary 167–9 Cobb, T. 33–4, 37, 167 collocations 48–55 compounds 38–9 Computer Science vocabulary 98–9 concordancing 12–13 consulting experts 15–16, 137–9; expert decision tasks 137–9 corpus-based word list development 24–9, 127–30; corpus analysis 47–8; corpus comparison 8–9; corpus design 10–12; spoken corpora 171–2 Crawford Camiciottoli, B. 113 curriculum design and specialised vocabulary 147–51 Dang, T. N. Y. 25, 92 Davies, M. 8–9 Douglas, D. 160 Durrant, P. 48–50 EAP Science List 25–7, 40–1, 77–9, 157 Elgort, I. 160 Ellis, N. 56–7 Engineering vocabulary 102–3 Estival, D. 108–10

Fabrication (welding) vocabulary 142–4 Farris, C. 108–10 Ferguson, G. 117 Finance vocabulary 115–16 Flowerdew, J. 89, 108 Flowerdew, L. 24, 57, 147 formulaic sequences 46, 60, 153–4 Four Strands see Nation, I. S. P. frames 57 Franken, M. 116–17 Gardner, D. 8–9 Ge, C. 100 general academic vocabulary 90–4 Gibbons, P. 130, 166 Gledhill, C. 51 glossaries 18–19 Granger, S. 45, 52, 56, 60, 148 Greene, J. 67–8, 70, 74, 81–2 Ha, Y.-H. 115–16 health-care communication 116–18 high frequency vocabulary 31–4, 47, 73, 119–20 highly specialised vocabulary 37–8 Hirsh, D. 25–7, 40–1, 77–9, 157 Horst, M. 130, 167 Hsu, W. 101–2 Hulstijn, J. 151–2 Hunter, J. 116–17, 170–1 hybrid methods of identifying specialised vocabulary 19–20 Hyland, K. 55, 93 identifying specialised vocabulary 6–20

international school vocabulary 71, 79–80 interviews 16–17, 132–5 Involvement Load Hypothesis 151–2 keyword analysis 9–10 Knoch, U. 109 Konstantakis, N. 113 Lam, J. 98–9 Language in the Trades (LATTE) project 123, 127–46 Laufer, B. 151–2 Legal vocabulary 110–12 lexical bundles 52–6; in EAP 52–3; in learner corpora 55–6; in student writing and professional writing 55; in textbooks 53–5 Li, Y. 115 Littlemore, J. 58–9, 152, 174 Liu, D. 94, 100, 166 MacKay, J. 38, 123 materials design 152–5 Medical vocabulary 9, 99–101 metaphor in EAP and ESP 58–60 Middle School vocabulary 67–8; English grammar and writing list 70; Mathematics list 74–5; Social Sciences and History list 81–2 Miller, D. 11–12, 40, 90–2, 166–7 multi-word units 45–8, 94, 173–4; multi-word unit research limitations 60–2 Nassaji, H. 97–8 Nation, I. S. P.: BNC/COCA frequency lists 34–5, 37–9, 73, 115, 127, 130–1, 135–7; Four Strands 147, 149–51, 169; multi-word units 47; rating scale 8, 13–15, 170; technical vocabulary 2, 7–8, 18–19, 103–4, 107; University Word List 4, 88, 164; vocabulary knowledge 72, 156; Vocabulary Levels Test 159; Vocabulary Size Test 159–60; word lists 22, 25, 29, 31–2, 42, 88, 155, 166, 175 Nelson, M. 112–13 Nesi, H. 7, 12, 53, 171

Nursing vocabulary 118–20 Parkinson, J. 38, 123, 125, 139, 163 Plumbing vocabulary 137–9 proper nouns 34–7, 118 Qian, D. 115 qualitative approaches 12–19; need for more 162–4 quantitative approaches 7–12 Quero, B. 9, 34–5, 96, 118 questionnaires 16–17 Radford, P. 98–9 RANGE program 29, 33–4, 157, 167 Read, J. 159, 164 replication studies in vocabulary in ESP 166–7 research and teaching specialised vocabulary 152–5, 167–9 Römer, U. 57 Schmitt, D. 159, 165, 175 Schmitt, N. 159, 175 secondary school vocabulary 63–7, 173; English Literature (fiction) texts 68–70; Mathematics 72–4, 75–6; Science in secondary school contexts 76–81, 95–6; Social Sciences 80–1 semantic rating scale 13–15 specialised vocabulary for professional purposes 106–8 specialised vocabulary in localised contexts 22, 35, 65–6, 82, 168 spoken academic vocabulary 25, 92 spoken vocabulary in the trades 130–2 surveys 16 teachers and specialised vocabulary 82–6 teacher talk and specialised vocabulary in international schools 66; English as an Additional Language classes 71–2; Mathematics 75–6; Science 79–80 technical dictionaries 15

testing specialised vocabulary 157–60, 164–5 trades vocabulary 122–6 undergraduate specialised vocabulary 92, 94–5 unit of counting 29 University Word List 4, 88, 164 Utiyama, M. 19–20 validation of word lists 40 Valipouri, L. 97–8 Vincent, B. 152 vocabulary in TED Talks 157 vocabulary load analysis 21, 40–2, 70, 78, 80, 175 Ward, J. 51, 102 Watson-Todd, R. 102–3 Webb, S. 25, 29, 92 West, M. 23, 115, 157 Wette, R. 119 Wood, D. 46, 54 Woodward-Kron, R. 2–3 word families 29 word lists for ESP 24–7; in learning and teaching 155–6; missing elements in word list research 42–4; principles for selecting items for word lists 29–30; using word lists in ESP research 40–2; word list evaluation 166–7; word lists and testing 159–60 Wray, A. 46 Yang, M.-N. 119–20 Zipf’s law 25

Vocabulary and English For Specific Purposes Research - Quantitative and Qualitative Perspectives ( - PDFCOFFEE.COM (2024)

References

Top Articles
Latest Posts
Recommended Articles
Article information

Author: Foster Heidenreich CPA

Last Updated:

Views: 6263

Rating: 4.6 / 5 (76 voted)

Reviews: 83% of readers found this page helpful

Author information

Name: Foster Heidenreich CPA

Birthday: 1995-01-14

Address: 55021 Usha Garden, North Larisa, DE 19209

Phone: +6812240846623

Job: Corporate Healthcare Strategist

Hobby: Singing, Listening to music, Rafting, LARPing, Gardening, Quilting, Rappelling

Introduction: My name is Foster Heidenreich CPA, I am a delightful, quaint, glorious, quaint, faithful, enchanting, fine person who loves writing and wants to share my knowledge and understanding with you.