CALLing all Brains! (2016)

call and brain 2016Are you interested in the role digital technology can play in second language learning? Do you want to find out more about the intersection of neuroscience and language acquisition?  If so, join us at the JALTCALL2016 Conference!

JALTCALL2016 will be held at Tamagawa University (Tokyo, Japan) from June 3rd to 5th. This year we are having a joint conference with the recently-formed BRAIN SIG (neuroscience). The theme is “CALL and the BRAIN” and we are planning to host a wide range of presentations and workshops on these topics.

Our keynote speaker this year will be Mark Pegrum (University of Western Australia),  a key researcher in the field of digital literacies, and the author of Mobile Learning: Languages, Literacies and Cultures. 



There will also be a virtual presentation by our plenary speaker, Tracey Tokuhama-Espinosa (FLACSO, Quito, Ecuador), leading researcher in the neuroscience of learning, and author of 
Making Classrooms Better and Mind, Brain, and Education Science.

The call for proposals is open so if you have something to share, please make a submission. The deadline is February   15th, 2016. For more information about submissions and the conference itself, please go to

Hope to see you there!



Language Datasets and You: A Primer

Collocate Clusters

What are language datasets? An example of such datasets most teachers will be familiar with are word lists such as the General Service List or the Academic Word List.

There are some publicly available sources for language datasets (for example the Speech and Language Data Repository or the Language Goldmine) yet most won’t be of much immediate use to teachers. Furthermore some teachers like Paul Raine are using such datasets in a form that is usable by fellow professionals.

I would like to make the case that playing with such datasets ourselves can be beneficial.

It is reasonable to assume that to write well it is necessary (but not sufficient) to read well. Similarly spending time playing with language data can have positive benefits for language awareness or knowledge about language.

For example, I was reading an article titled Towards an n-Grammar of English which argues for using continuous sequences of words (n-grams) taken from corpora as a basis for an alternative grammar syllabus. It uses a publicly available language data set of 5-grams to make its case. As I was reading the paper I wanted to see how the authors derived their example language patterns.

The first thought was to download the text file and import it into Excel. One problem, the text file contains more rows than Excel can take. An option here is to split the file over several sheets in Excel. However this is cumbersome so another option is to use what is called an IPython Notebook.

IPython Notebook is an environment that allows you to use computer code, text, images, graph plots. It was originally designed as a way to show reproducible work.

Below is a screenshot of an (incomplete) notebook for the article I was reading. Learning commands is relatively straightforward depending on what you want to do.

Screenshot from example notebook

The screenshot shows the first command is to import a module called pandas that will be used to query the data. The next command imports the data file which is tabbed separated. For those interested in exploring python notebooks there are many resources available on the net. Usually when I want to look for a command I include the word “pandas” in a search.

As an example of how making an ipython notebook helped me understand the article, is my initial confusion of why “I don’t want to was not in the top 100 n-grams. “I don’t want to has 12659 instances. Using the ipython notebook I saw that the grammar pattern which instantiates this [ppis1 vd0 xx vvi to] has only 51 types (or rows in the dataset) whereas the number one ranked pattern [at nn1 io at nn1] has 7272 rows.

ppis1 – 1st person sing. subjective personal pronoun (I); vd0 – do, base form (finite); xx – not, n’t; vvi – infinitive (e.g. to give… It will work…); at – article (e.g. the, no); nn1 – singular common noun (e.g. book, girl); io – of (as preposition)
from Claws7 tagset.

Note. Links to information on how to set up a python notebook and to the n-gram grammar paper are included in the example notebook.

Datasets can also come from research papers. I have used a word list of the top 150 phrasal verbs and their most common meanings to create a phrasal verb dictionary. This is a step beyond simply querying a dataset (as can be done using an IPython Notebook or Excel) and may not be for everyone. However, I imagine many teachers have used paper based word lists when designing lessons, hence such datasets and ways of manipulating them will not be completely unfamiliar.

Luckily, as mentioned before, people like Paul Raine are using publically available datasets that are easy for teachers to use. On his apps4efl site he has a paired sentence app that uses the Tatoeba Corpus of sentence pairs (which internet users have translated), a wiki close app (that uses Wikipedia data), video activities (using YouTube) and so on (see list below).

The most well-known type of datasets are corpora. Interfaces to such data such as the BYU interfaces to COCA (Corpus of contemporary American English), or the BNC (British National Corpus), are most popular. I won’t go into detail about exploiting such data, for those interested you can read more about such datasets on my blog, or over at the G+ Corpus Linguistics community. Suffice it to say that this kind of data is becoming better supported now on the net.

Hopefully this short primer on the value of language datasets may encourage you to start to explore them; or, if you are already, why not drop a comment? Readers may also know of publicly available language datasets that they would like to share. If so, please share!

List of datasets:

Speech and Language Data Repository

Language Goldmine

COCA n-grams

Thanks to Paul Raine for the following that he uses for apps4efl:

Wikis (Creative Commons license)
Native English Wikipedia (via API)
Simple English Wikipedia (via API)
Native English WikiNews (via API)

TED (via download) (Creative Commons non-derivative)

VOA Learn English (via download) (Public Domain, copyright info here:

Example sentences
Tatoeba corpus (via download: (Creative Commons license)

The Open Multilingual Wordnet (via download: (Creative Commons license)
CMU Pronouncing Dictionary (via download: (BSD license)

The New General Service List, New Academic Word List (via download: (Creative Commons license)



Rosetta Stone: Not Quite There Yet

Recently I downloaded the Rosetta Stone app for Android in great anticipation. I think the Rosetta Stone concept is fantastic (putting an immersion type approach to language learning into a CALL environment), and having a mobile version of the Rosetta Stone system just makes sense in today’s mobile world. However, all that glitters is not gold and to my great disappointment I found this to be the case with Rosetta Stone’s current offering.

At first glance, the mobile app seems to be all that it claims. Rosetta Stone on a mobile. However after a bit of testing, I found certain key functions did not promote language learning as expected.

For my first test, I excitedly dove right in to a relatively new language for me, Arabic. I opened the Arabic language learning option and found the typical (to anyone familiar with Rosetta Stone) beginners course. It starts by teaching the target language’s words for boy, girl, man, and woman. The Rosetta Stone system first provides pictures and pronunciation teaching the target language, and then a picture matching exercise where the learner matches the correct picture to the audio spoken, followed by a pronunciation practice stage. It was in the pronunciation practice stage that things began to go awry.

At first it seemed I was an expert in Arabic pronunciation… perhaps even a little too much of an expert… My hackles of suspicion rose. So, I tried an experiment; I purposely began pronouncing differently from what I was being taught.

RosettaStoneAdAt first I made slight changes, then gradually larger and larger mispronunciations. It seemed I could not go wrong! It was virtually impossible for me to mispronounce a word unless it was entirely off in syllable count. Then I tried purposely pronouncing the Arabic word for girl, when I was prompted for boy, and vice versa. To my great chagrin, I was rewarded with approval for my completely and contextually wrong pronunciation. This, with only my first two Arabic words!

Because of my determination to solve the riddle (for I was sure it was a riddle), I proceeded to test other languages which I was already familiar with. I tried the course in Japanese (which I have a fairly strong command of), and my native language of English (American form). Again, I was met with similar results. It was nearly impossible to mispronounce words.

Then I had an idea; perhaps there was a setting I was unaware of.  So, I searched the apps menu to find the culprit setting. I was somewhat relieved when I discovered a setting for accuracy relating to pronunciation checking. It was preset to easy. Though I am not sure that entirely wrong pronunciation constitutes “easy,”  I adjusted the setting to the highest difficulty level and tried again. Much to my disappointment, there was little if any change to the programs operation. I finally retired the app.

It seems to me that from a programming perspective simply making a call to Android’s build in speech-to-text feature, would enable the Rosetta Stone app to function better than it does now.  When simply using the phones built in speech-to-text software, I have to pronounce things very clearly to get it to recognize my intended phrase. In fact, here is a very simple Android app which attempts to teach pronunciation doing just that.

RSad2So what was the programming decision, or perhaps marketing decision, which brought about the offering currently available?  And what is the purpose of a piece of software that fools someone into thinking they are learning a language, when in fact they are learning to speak gibberish? Well, that will have to remain the domain of our imagination. But, whatever Rosetta Stone’s reason’s are, one thing seems clear. Our jobs as language teachers are still secure.