STATISTICAL MEASURES IN CORPUS LINGUISTICS
DOI:
https://doi.org/10.5281/zenodo.17947238Keywords:
Corpus linguistics, statistical measures, word frequency, collocation, dispersion, association metrics, concordance, linguistic analysis, corpus analysis, quantitative linguistics.Abstract
Corpus linguistics is a branch of linguistic study that examines language through systematically compiled and structured collections of texts known as corpora. The use of statistical measures in corpus linguistics allows researchers to analyze language patterns quantitatively and objectively. Such measures help in identifying word frequency, distribution, collocation, and patterns of association, providing insights into both general and specialized language use. This article discusses the key statistical tools employed in corpus analysis, including frequency counts, concordances, dispersion indices, and association metrics, and highlights their applications in language teaching, lexicography, discourse analysis, and computational linguistics.
References
McEnery, T., & Hardie, A. (2012). Corpus Linguistics: Method, Theory and Practice. Cambridge University Press.
Stubbs, M. (2001). Words and Phrases: Corpus Studies of Lexical Semantics. Blackwell Publishers.
Sinclair, J. (1991). Corpus, Concordance, Collocation. Oxford University Press.
Anthony, L. (2019). AntConc (Version 3.5.9) [Computer Software]. Waseda University.
Halliday, M. A. K., & Matthiessen, C. (2014). Halliday’s Introduction to Functional Grammar (4th ed.). Routledge.
Leech, G., Rayson, P., & Wilson, A. (2001). Word Frequencies in Written and Spoken English: Based on the British National Corpus. Longman.
Kilgarriff, A. (2001). Comparing Corpora. International Journal of Corpus Linguistics, 6(1), 97–133.
Gries, S. T. (2009). Quantitative Corpus Linguistics with R: A Practical Introduction. Routledge.
McEnery, T., Xiao, R., & Tono, Y. (2006). Corpus-Based Language Studies: An Advanced Resource Book. Routledge..