Are you interested in the role digital technology can play in second language learning? Do you want to find out more about the intersection of neuroscience and language acquisition? If so, join us at theJALTCALL2016Conference!
JALTCALL2016will be held at Tamagawa University (Tokyo, Japan) from June 3rd to 5th. This year we are having a joint conference with the recently-formed BRAIN SIG (neuroscience). The theme is “CALL and the BRAIN” and we are planning to host a wide range of presentations and workshops on these topics.
Our keynote speaker this year will be Mark Pegrum (University of Western Australia), a key researcher in the field of digital literacies, and the author of Mobile Learning: Languages, Literacies and Cultures.
There will also be a virtual presentation by our plenary speaker, Tracey Tokuhama-Espinosa (FLACSO, Quito, Ecuador), leading researcher in the neuroscience of learning, and author of Making Classrooms Better and Mind, Brain, and Education Science.
The call for proposals is open so if you have something to share, please make a submission. The deadline is February 8 15th, 2016. For more information about submissions and the conference itself, please go to conference2016.jaltcall.org.
What are language datasets? An example of such datasets most teachers will be familiar with are word lists such as the General Service List or the Academic Word List.
There are some publicly available sources for language datasets (for example the Speech and Language Data Repository or the Language Goldmine) yet most won’t be of much immediate use to teachers. Furthermore some teachers like Paul Raine are using such datasets in a form that is usable by fellow professionals.
I would like to make the case that playing with such datasets ourselves can be beneficial.
It is reasonable to assume that to write well it is necessary (but not sufficient) to read well. Similarly spending time playing with language data can have positive benefits for language awareness or knowledge about language.
For example, I was reading an article titled Towards an n-Grammar of Englishwhich argues for using continuous sequences of words (n-grams) taken from corpora as a basis for an alternative grammar syllabus. It uses a publicly available language data set of 5-grams to make its case. As I was reading the paper I wanted to see how the authors derived their example language patterns.
The first thought was to download the text file and import it into Excel. One problem, the text file contains more rows than Excel can take. An option here is to split the file over several sheets in Excel. However this is cumbersome so another option is to use what is called an IPython Notebook.
IPython Notebook is an environment that allows you to use computer code, text, images, graph plots. It was originally designed as a way to show reproducible work.
Below is a screenshot of an (incomplete) notebook for the article I was reading. Learning commands is relatively straightforward depending on what you want to do.
The screenshot shows the first command is to import a module called pandas that will be used to query the data. The next command imports the data file which is tabbed separated. For those interested in exploring python notebooks there are many resources available on the net. Usually when I want to look for a command I include the word “pandas” in a search.
As an example of how making an ipython notebook helped me understand the article, is my initial confusion of why “I don’t want to“ was not in the top 100 n-grams. “I don’t want to“ has 12659 instances. Using the ipython notebook I saw that the grammar pattern which instantiates this [ppis1 vd0 xx vvi to] has only 51 types (or rows in the dataset) whereas the number one ranked pattern [at nn1 io at nn1] has 7272 rows.
ppis1 – 1st person sing. subjective personal pronoun (I); vd0 – do, base form (finite); xx – not, n’t; vvi – infinitive (e.g. to give… It will work…); at – article (e.g. the, no); nn1 – singular common noun (e.g. book, girl); io – of (as preposition)
from Claws7 tagset.
Note. Links to information on how to set up a python notebook and to the n-gram grammar paper are included in the example notebook.
Datasets can also come from research papers. I have used a word list of the top 150 phrasal verbs and their most common meanings to create a phrasal verb dictionary. This is a step beyond simply querying a dataset (as can be done using an IPython Notebook or Excel) and may not be for everyone. However, I imagine many teachers have used paper based word lists when designing lessons, hence such datasets and ways of manipulating them will not be completely unfamiliar.
Luckily, as mentioned before, people like Paul Raine are using publically available datasets that are easy for teachers to use. On his apps4efl site he has a paired sentence app that uses the Tatoeba Corpus of sentence pairs (which internet users have translated), a wiki close app (that uses Wikipedia data), video activities (using YouTube) and so on (see list below).
The most well-known type of datasets are corpora. Interfaces to such data such as the BYU interfaces to COCA (Corpus of contemporary American English), or the BNC (British National Corpus), are most popular. I won’t go into detail about exploiting such data, for those interested you can read more about such datasets on my blog, or over at the G+ Corpus Linguistics community. Suffice it to say that this kind of data is becoming better supported now on the net.
Hopefully this short primer on the value of language datasets may encourage you to start to explore them; or, if you are already, why not drop a comment? Readers may also know of publicly available language datasets that they would like to share. If so, please share!
My son has an infatuation with Minecraft and watching Minecraft videos on YouTube. Some may find it strange that a parent let their child spend so much time watching YouTube videos, but they have been an important supplemental source of authentic English for our children.
Trying to raise bilingual children in Japan is challenging, so we try and find authentic sources of English wherever we can. However, we are careful to make sure they are watching safe Youtube channels. A great Minecraft related channel for kids is Joseph Garrett‘s channel featuring an orange cat named Stampy.
Minecraft is a sandbox game, meaning players have a lot of control over the environment. In Minecraft, players and can freely build structures and make up their own games. It provides a lot of room for creativity. Think Legos, but digital.
The First Person View (FPV) aspect of Minecraft, inspired me to try and connect students telecollaboratively using Minecraft as a virtual classroom. I thought that at the very least, learners from different countries could collaboratively build structures using English as their lingua franca. In fact, I spent nearly 6 months in preparation to teach such a course using a modification to the software created for teachers called MinecraftEdu.
Not only does MinecraftEdu offer great LMS style features, there is also a thriving community of educators surrounding it who share the worlds they create. There are over 100 worlds free to use in the MinecraftEdu world library. There are also a number of great mods available through MinecraftEdu which add interesting features such as support for teaching computer programming (computercraftedu), and quantum physics (qcraft).
Unfortunately my university’s computer center refused to open what is commonly referred to as the Minecraft Port (port 25565) through their server firewall, listing some very generic “security” concerns. This sent my research to simmer indefinitely on the back burner.
Recently, however, I have been using a private server to host minecraft interaction between my 7-year-old son here in Japan, and his friends in the United States. Collaborating with other parents, we use Skype (video off) as our Voice over IP to engage our children in telecollaborative Minecraft play.
My son also plays Minecraft with his grandmother in the United States. Yes, that’s right, I said grandmother! Another grandson (U.S. based) had already installed Minecraft on her computer at home, and so it wasn’t too difficult to set up an account so she could try it out. She is highly motivated to figure it out, so that she can interact with her grandchildren in Japan. As Christmas approached this year, and grandma was feeling particularly detached from her grandchildren, we organized a special 2-hour Minecraft play session.
My son is not the only one who can play. His 4-year-old sister also interacts with grandma using the keyboard and mouse to move her avatar through a virtual world. Here they are playing hide-and-seek.
As you can see in the video, they are using English to communicate. Playing Minecraft telecollaboratively has increased our children’s pool of English speaking play partners, helped them keep in touch with friends who have moved away, and even allowed them to spend time over the holidays with grandma.
The kids have a lot of fun keeping in touch with family overseas using Minecraft. It’s become so much a thing in our house that we plan on making some customized “skins” for our Minecraft avatars so that they better represent grandma and the grandkids during their digital play. My son is happy with a zombie avatar, but for grandma, we may need something special.