In my last blog post, I wrote about a free online grammar-checking software called PaperRater. This week I focus on using a “free” corpus from Brigham Young University known as COCA (The Corpus of Contemporary American English). Using corpora in language studies, especially as an analysis tool in second language writing classes, offers the user an insight into the way a word is used in its native language in different textual genres. COCA, representing the American variety of English, is a corpus containing 450 million words! It is divided equally among the five genres of text: academic journals, fiction, magazines, newspapers, and spoken. What makes this corpora even more appealing is the promise made by its developer, Mark Davies, to update it twice a year (though it seems to be stuck at materials dating back to the summer of 2012). If you’re a novice at using data-driven learning like me and/or are new to using COCA in your writing classes then read on. Comments/suggestions from more proficient users of corpora studies in the writing curriculum are more than welcomed in the comments section. How it works: As I mentioned before, COCA is free to a degree. Non registered users are limited to 20 queries per day and the number of entries retrieved are also limited. Ultimately, it would be best to register with the good folks at BYU first. By doing so, they get a sense of who you are and the volume of usage they can expect from you as a user in order to regulate the bandwidth among the tens of thousands of users who use it every month (as of the writing of this post there were over 51,000 unique users for the day). Registered as an undergraduate student, you may not get as many results on a search then if you registered as a graduate student or a researcher. Nevertheless, you get a lot of “mileage” for registering. For example, an undergraduate user is allowed to obtain 100 general queries per day and may perform a maximum of 10,000 KWIC (Key Word in Context) entries per day with resulting hits ranging between 5,000 to 30,000 depending on the word.
In my experience, I, at first, limited my searches to single word entries in the “academic” genre using the KWIC feature. Since I introduced it to my university writing classes this semester, I have come to appreciate choosing the “list” feature with a wild card search because I can evaluate the frequency of a word and its other derivations in the word family based on frequency particular to the part of speech. For example, if I a run a “list” search for the word clear and add the wild card (*) immediately at the end (see the screenshot below), a breakdown of the word clear and its family members becomes “listed” in the top right corner and sample sentences appear in the bottom right area.
When I performed the above search, my students were surprised to find out that the adverb clearly was used more frequently than the verb or past-participle.
I’m starting to realize that wild card searches can reveal an incredible wealth of data. For example, a search for clearly [v*] will show results for the word clearly and any verb combination (see screenshot below). In my students’ academic essays, I have come to notice that they are struggling with using a “booster” like the word clearly in their follow up comments that they want to write after a quoting a source to support their claim. So, I encouraged my students to click on the third most popular result on the list: clearly shows. The first hit (see the bottom left area of the above screenshot) is from a source called “Parenting. Early Years” and it uses the target expression as follows: “could also be genetically wired to be more sensitive to bitter foods, as research clearly shows some kids are. # Work with it: You’ve no doubt heard”. The students were able to notice as research clearly shows as an effective expression to use in a sentence that follows up a quoted source.
If the learners feel that the snippet of sentence is still not enough to provide a full understanding of the word, they can click on the source (clicking on “Parenting. Early Years”, for example) and a paragraph with clearly shows in it will appear in bold, thus providing even more context for them to appreciate the full range of the expression (see screenshot below).
The following syntax can be used to limit searches to particular queries: use [n*] to search for nouns; [jj*] for adjectives; [rr*] to search for adverbs; [ii*] for prepositions; and [at*] to search for articles.
Since introducing the concordance to my students, I have asked them to highlight words that they’ve researched in COCA and subsequently use in their essays. Overall, I am starting to see some very intelligent sentence structures, especially in the area of prepositional usage. Nevertheless, I believe my students can benefit from structured lessons that focus on one aspect of sentence writing such as constructing noun and verb phrases or more complex searches such as language used to support claims. I believe the introduction of a tool like COCA coupled with a grammar checker like PaperRater can go a long way to helping emerging writers become more self-directed if not more confident in their writing tasks. If you decide to give COCA a try, by all means let us know how it goes.