L2 Korean Pronunciation Over Time - A Graphical Look at Error Rates

I am working on a project that investigates the effect of a supplemental pronunciation instructional treatment for learners of Korean as a foreign language. As part of this project, in the spring semester we had 36 learners in their first or second year of Korean study complete speaking tasks 11 weeks apart. Learners were split into a control group and a treatment group, the latter of which received 8 hours of classroom-based supplemental instruction over the course of 9 weeks. One analysis I'm looking at is how their error rates changed over time.

Today, I looked at one task- a paragraph read-aloud task. The pre- and post-test recordings were phonemically transcribed by two native speakers of Korean. Their agreement on roughly 12% of the data was 93%; these disagreements were resolved via discussion before each coder began working independently. When the transcripts came in, I tallied the total sentences, words, syllables, and phonemes, and then coded each non-standard deviance from the script as being a segment error (e.g., a single segment-for-segment substitution) or a syllable error (e.g., deleting or inserting a syllable where there shouldn't be one). I also looked at sentence intonation, which ultimately wasn't very interesting (like many languages, Korean uses falling intonation for declarative sentences, and most learners had no problem with this- it will probably be more interesting in another task I have to analyze, though..). Because the task was timed and not all learners completed the full paragraph, plus I eliminated "rough starts" (conventionally first utterances with stutters/hesitations are deleted), I normed these errors to get rates.

So without further ado, plots! Each plot compares control (C) and treatment (T) groups. The color of the lines shows L1 (English or Chinese), and the shape of the points shows year in the program (first or second). Each line and set of points represents one student, so these plots highlight individual variation and provide a way to visually evaluate change patterns among the groups.

Fig. 1. Segmental error rates over time.

Fig. 2. Syllable Error rates over time.

Fig. 3. Total error rates (based on total syllables) over time, including segmental, syllable, and sentence intonation errors.
These plots could use some tweaks, but overall I'm pretty happy with what I'm able to see. As I've seen in other data from the project, the control group did, overall, improve their pronunciation. I think this is partially explainable by the large number of beginners in the group- at such early stages of language learning, there's a lot of phonological/pronunciation development going on quite rapidly just via standard instruction and input. But looking at the individual data, things look noisier on the left (control) side, with a bit more backsliding (more people with error rates increasing). Second year students (the triangles) in both groups seem to show lower error rates and less improvement.

Another trend seems to be that learners with higher error rates to start with see larger improvements; this is more apparent in the control group but also visible in the treatment. It's also interesting to see the differences between L1 Chinese and L1 English learners- particularly in syllable structure errors, Chinese students had generally higher error rates to begin with, but also made some substantial improvements.

Plots were made in R with ggplot2.  Any tips appreciated!

L2 Research: Statistical Analysis and Asking the Right Questions

It's no secret that there are some problems with how research and knowledge-creation works in the social sciences (and medical science, too, by the way). Long story short is that social science researchers have a very hard time replicating results when the same study is run again (a team of psychologists recently replicated 100 published studies, read about it here).

One source of this problem is a research framework called null-hypothesis significance testing (NHST). For those of you who aren't knee-deep in doing or reading primary research, this is where the now-common phrase "significant results" comes from. Basically, NHST begins with the idea that there is no effect of whatever it is you are researching- a type of classroom instruction, a pill, whatever. If you collect your data, run your stats, and then get something called a p-value that is small enough, you get to reject the idea that there was no effect of the instruction or the pill. This lets you say your study (or, your instruction or your pill) has siginificant results, because it is different than nothing. p-values are malleable little things, and very sensitive to sample size; large samples with very small effects will be "significant" while smaller samples with medium to large effect sizes might end up being "non-significant" (this is a concept called power in statistics). The thing is, this p-value says nothing about how large the effect is; the word "significant" only means that it passed some (arbitrary) threshold. Now, people do tend to look at how big the difference is (an effect size), but the results of the significance test tend to overshadow other results, and lead to biases in publication (no significant effect = much lower chance of getting published).

Andrew Gelman, a statistician working in the social sciences, recently wrote a piece about how the idea of things having absolutely no effect is kind of absurd. Rather, he advocates moving from attempting to "discover" an effect to "measuring" the size of the effect, or creating models that best explain what is going on. This is right in line with a recent statement by the American Statistical Association calling for a major move away from p-values as the be-all-end-all in research (on a side note: if you do quantitative research, or know people who do, read that statement). Thinking about this for L2 research, it really strikes a chord for me- after all, even very bad L2 instruction results in some kind of learning or development. The more important question is how much?

In L2 research, I think it is fair to say that we're still primarily in an NHST mode of thinking (though we are seeing changes and improvements). We're asking questions like Does working memory have an effect on reading comprehension? and Does explicit instruction have an effect on the processing of case marking?  instead of asking how much of an effect or to what degree. And while L2 research is not unique among social science disciplines in its preference for NHST and p-value uber alles evaluation of research findings, I do wonder if certain theoretical concerns have contributed to the popularization and entrenchment of NHST in the field.

Specifically, in second language acquisition, a distinction has long been made between acquisition (integration of forms into the implicit linguistic system- you have 'real' language in your head) and learning (gaining declarative knowledge of linguistic forms- you can talk about how past tense works), which runs parallel with with the ideas of implicit knowledge and explicit knowledge, which extends further to implicit instruction (providing lots of input, perhaps featuring a particular form) and explicit instruction (offering metalinguistic explanation of a form). There are then theoretical positions which hold explicit knowledge has NO effect on acquisition, and in turn that explicit instruction has NO effect on acquisition (in both cases, it is able to have an effect on learning, however). This theoretical position is arguably well-served by NHST, but I think there are still shortcomings. First, the idea that explicit knowledge or instruction has absolutely zero effect on learning just seems extremely unlikely, even though I'd be inclined to agree that it has relatively little effect. Second, it tends to result in glossing over effect sizes. An implicit instructional treatment might have a statistically significant effect but small effect size, or an explicit instructional treatment might not be statistically significant but have a non-trivial effect size, but the focus is often the two being "significantly different."

Why does this matter? Well, unfortunately, the bits of a study about statistical significance are what help it get published in the first place, and they are primarily what comes out of a published study when it is picked up by teachers, media, and other interested parties. I also think this way of using stats and asking questions prevents us from considering more informative and interesting results that could come out of studies. Here's hoping that L2 research keeps moving away from NHST and starts taking more heed of folks like Gelman and the ASA.

Coding Linguistic Data: Down the Rabbit Hole

Language is complicated. It's an intricate system of concrete signs that link to mental abstractions. Looking directly at those mental abstractions is... difficult, to say the least (though things being done in neurolinguistics are getting us a bit closer). So we often look at the concrete signs of language- spoken or written words, which we can analyze by pinning down soundwaves or letters on a page and making inferences about what is going on mentally.

One tool for analyzing linguistic data is coding. Not writing computer programs, but tagging segments or sections of text (spoken or written, we use the word "text"). One thing we're particularly interested in in L2 research is errors- in some sense, error analysis kick-started the whole field of second language acquisition (SLA). So take the following sentence, for example (just made up, but typical of a learner of English):

  • He go to store today.
You don't need to be a linguist or even an armchair grammar guru to spot a couple errors there: "go" should be "goes", "store" should be preceded by an article (most likely "the" in this context, but "a" could work). So at a very basis level of coding, we could say this sentence has 2 errors.  That's informative, but it might not be fine-grain enough to answer many questions about SLA, so we end up with more elaborate coding schemes. The first error becomes an agreement error (the verb does not agree with the subject) and the second error becomes an article error (missing or misuse of an article).

Coding can get harder when you're dealing with less isolated chunks of text, requiring more inferences on the researcher's part.  Let's add a little context:

  • Teacher: What did he do today?
  • Student:  He go to store today.
Now that verb error is harder to classify. On the surface, it's an agreement error, because English does not permit "He go". But, given the context, it would be more appropriate to say "He went" (the question was about completed actions). So now it could also be considered a tense error.

Coding, and reliably pinning down increasingly fine-grained subcategories, is just as hard for analyzing other linguistic features. I'm working on a project that deals with L2 pronunciation, and I'm knee-deep in coding phonological errors in transcripts of learner speech samples. Let's consider pronunciation related error (another hypothetical L2 English example):

  • I like bet dug
First, we assume that the speaker meant "pet dog" for "bet dug" (this inference is easier to support when the speakers were asked to produce very controlled speech samples, like reading a sentence aloud or filling in a sentence template based on a picture). We see two apparent errors- "b" for "p", and "u" ("uh" sound) for "o" ("aw" sound). Do we stop there, or do we jump down the rabbit hole? The first error is a consonant, the second a vowel. We could also categorize both errors as substitutions, a common phonological error where one sound is switched for another (usually, a sound that's easier to produce or found in the L1). But wait, there's more- do we care about the particulars of the substitutions- do we want to note what the specific sound swap was and count them up? We could even look at the context- "b" is word and syllable initial, "u" is found inside a word/syllable. As you can see, this gets to be potentially labyrinthine.

For me, it's tempting to keep falling down that rabbit hole while coding- my logic is something like "might as well knock it all out while I'm here." But ultimately this slows down your progress, and you might not end up needing such a fine grain size to answer your research questions. You also might not be able to reliably code when your scheme is overly elaborate. I also know that you can always go back to your data later for a different analysis. I'm a relatively novice researcher, so I haven't had the personal experience of doing that so much with my own data, but a recent project I worked on did involve going back to my colleague's dataset and doing more detailed phonological analyses of learner-learner interactions.

Is there a moral to this story? I don't know, I just needed a break from coding! But I'll try to leave a couple bits of advice, mostly for myself:
  1. Keep your original goals in sight. Research can/does evolve, but your original RQs can provide guidance.
  2. Get comfortable with the idea of going back to your data for subsequent analysis. It might be a post-hoc in the same project/article, or if you get a really novel inspiration while doing primary coding, you can return to it later for a fresh analysis and write-up.

Language Learning and Pokémon Go (and mobile gaming/apps more generally)

Reality augmented with purple rats... purple rats everywhere!
Everyone is talking about Pokémon Go, the new Augmented Reality (AR) mobile game from Nintendo and Niantic Labs. National Public Radio is posting articles and talking about it on the radio! Go to any college campus or high-volume public space and you'll see why: people are walking around with their smartphones, pausing every few steps to catch the little critters. Language teachers aren't oblivious to Pokémon Go's popularity, and they're talking about it, too- (how) could Pokémon Go be used to support language learning?


In the specific case of Pokémon Go, I do not see much potential for language learning. The game itself lacks chat or messaging functions, so no in-game interaction with other speakers of your target language. Also, the game is pretty simple, and there's not a large quantity or variety of text- you would become familiar with the meaning of few messages and words/phrases pretty quickly, and ultimately there's not a lot of input provided by the game. On a side note, even after setting my phone and Google account to my target language, I couldn't get the game to display in Korean. I would say that the biggest potential for language learning with Pokémon Go is its ability to provide face-to-face interaction opportunities with a variety of speakers. In my experience so far playing, I've had a number of short chats with strangers, all quite pleasant. For a language learner on a college campus, the game could be a fantastic icebreaker that sparks some authentic, meaningful communication, which is key for language learning. This aspect of playing the game could be very helpful for international students who are shy or have trouble connecting with domestic student peers on campus.

Beyond Pokémon Go, I'm more interested in the underlying technology behind AR and context-sensitive applications. While displaying purple rats in random locations does not hold much promise for language learning, displaying useful phrases for sending a package when you get near a post office does. Especially if, let's say, you could "catch" the words or phrases that were new to you for flashcard-study later (the app could also remember to display those words prominently the next time you get near a post office). Apps could also keep up with your reading, listening, and/or viewing habits, and suggest articles/songs/videos of interest in a target language you have indicated you wish to learn when you're at home in the evening... and imagine if you could take a vocabulary size test periodically to help the app select appropriate material for you? I think functions like these are the future of language learning games and apps- useful resources keyed in to meaningful contexts good learning habits.

An Odd Lexical Form Error: Mis-Ordered Compounds

"Where is the paper-toilet?"

Have you ever heard a language learner flip the order of a compound word? Have you ever done it yourself? I have caught myself doing this on multiple occasions. Here's a recent example from a text chat:

Me saying "outage power" instead of "power outage" in Korean.
If you don't read Korean, basically what's happening here is I tried to ask if a power outage was still going on. Instead of "power outage", though, I wrote "outage power" (actually, the examples work the other way around in Korean, which will be important later). My wife (KyuJin Lee in the chat window) corrected me. Somehow, I had retrieved this word in the wrong order but with the correct constituent meanings. The two stems/roots in the Korean word for 'power outage' (or 'black out') are phonemically and orthographically similar, differing only by the type of nasal consonant in the coda (ㄴ v. ㅇ). But this has happened to me in the past with less similar stems. What could be going on here? And furthermore, why couldn't I recall any of my former ESL students doing this? (not to say that they never did, but I just couldn't recall it- and couldn't recall reading about these kinds of errors, where the lexical form and root meaning of individual stems was kept in tact).

For one thing, I probably had weak form representation in my mental lexicon for this particular word. I had only learned it the day before! You can see in this next screenshot where I engage in some negotiation of meaning and Kyujin provides an explanation ('the lights don't turn on') and a L1 translation for me: 


My first encounter with 정전 'power outage'.
I don't think it was purely related to weak phonological/orthographic form issues, though... I am very familiar with 전 as a stem that refers to electricity. And again, I've caught myself in the past flipping stems in compounds that are more phonemically/orthographically distinct. I asked r/linguistics for any ideas, and was suggested to look into morphological headedness and possible crosslinguistic transfer by u/Darkgamma- something that hadn't even crossed my mind.

Briefly, the concept of a 'head' in linguistics is how language is organized around some key, prime component. Syntactically, English is head-initial: the head typically comes at the beginning of the phrase or clause ('in the park', 'the man I met yesterday'). Korean, syntactically, is head-final ('park-in', 'yesterday met man'). However, morphologically, both languages are considered to be head-right. For most compounds, this means the primary element will come last. Take 'toothbrush' for example- it's a brush for teeth; it's essential category of being is 'brush'; 'tooth' just specifies what it's used for. Similarly, Korean for 'toothbrush' is 칫솔 ('tooth'+'brush'). So in my case, morphological headedness transfer shouldn't lead to errors. However, it can for people whose first language is left-headed, such as Spanish or French. Nicoladis (2002) reported that young French-English bilingual children were more likely to flip English compounds than monolingual peers.

Anyway, with headedness transfer not sufficiently explaining the matter in my case, I turned to alternate explanations, and found some promise in Ko, Wang, and Kim (2011) (there are a number of studies looking at Korean-English and Chinese-English compounds, I am finding). Looking and Korean-English bilinguals, they found beneficial processing effects for direct translations ('toothbrush' would fit this category) compared to translations that didn't line up (English 'bankbook' does not translate directly to Korean, which uses a compound formed from different stems to express the same concept). They also found an effect for word frequency of constituents in the compound, and concluded that there is cross-language lexical activation and morphological decomposition involved in L2 compound word recognition. This at least satisfies my error with 'power outage'/정전: despite both languages being morphologically head-right, Korean places the stem for 'power' on the right while English has it on the left, resulting in a misalignment for me cross linguistically- I tried to put 'power' on the left in Korean! Additionally, the stem 정 doesn't exactly line up semantically with 'out' or 'outage'... it's closer to 'stay' (as in stop moving, remain in place for awhile) in this sense. It would be very interesting to collect more instances of this sort of error in production and see how this all pans out.

If you have any stories about flipping compounds, I'd definitely be interested in hearing them!


References:

Ko, I. Y., Wang, M., & Kim, S. Y. (2011). Bilingual reading of compound words. Journal of Psycholinguistic Research, 40, 49-73.

Nicoladis, N. (2002). What's the difference between 'toilet paper' and 'paper toilet'? French-English bilingual children's crosslinguistic transfer in compound nouns. Journal of Child Language, 29, 843-863.


Second Languages and Summer Travels: Racialized Languages

This summer, I've had the pleasure of two overseas trips: 2.5 weeks in South Korea and 1 week in Italy. The trip to Korea was for personal reasons (visiting in-laws, and of course eating copious amounts of Korean food) and the trip to Italy was for presenting at the Language Testing Research Colloquium in Palermo, Sicily (with a lot of eating also going on- but this isn't a food blog!). Both trips were highly enjoyable, and they provided a sharp contrast in terms of race and language.

Korean in Korea


I speak Korean reasonably well, though I am far from proficient (I have been certified as Intermediate-High based on an ACTFL SOPI test, or roughly B1 in terms of the Common European Framework of Reference). My wife is Korean, and I need the language to communicate with my in-laws. It's also quite nice to be able to follow along and get in a word or two in Korean social situations without having everyone else switch to English to accommodate me. For the 2.5 weeks I spent in Korea this summer, I had lots of opportunity to converse in Korean with family and friends, which was really enjoyable.

However, more public contexts of communication were different. On planes, at stores, and in restaurants, it was more common for strangers to initiate encounters in English (or speak only to my Korean companions). This was also true when I lived in Korea in the past. Although it could be frustrating back when I was a resident of Korea, it nonetheless makes sense from the perspective of the cashier or flight attendant: it would be very uncommon for a Caucasian (or Black/Latin@) tourist to have an easy time handling a service encounter in Korean. Even among resident foreigners (i.e., people with non-tourist visas), the majority of Western non-Asians are US military or English teachers who typically stay for a year or two and do not progress very far with Korean (though of course there are exceptions). So, the waitress/clerk/attendant makes a choice to use his or her linguistic resources in a way that is most likely to be successful, and you can't necessarily fault them for it.

Italian in Italy


I speak virtually no Italian (Novice Low ACTFL, sub-A1 CEFR). I picked up a few phrases before leaving and a few more while I was there, along with a smattering of single-word vocabulary items. I also do not have any personal relationships with Italians in Palermo. Nonetheless, I had lots of opportunities to speak the language- well, at least try to speak it, and often switching to English shortly after. Shopkeepers, cashiers, baristas, and flight attendants typically initiated encounters in Italian. Although I may not necessarily look like a prototypical Italian, I blended in well enough with the Caucasian majority. And it was fun to struggle through the simple transactions of buying tickets or coffee, telegraphing my requests with my limited vocabulary and gestures. And of course, many Italians who work with tourists have excellent English and were very accommodating in using it for more complicated matters.

Racialization of Language

Reflecting on these experiences, which were separated by just a couple of weeks, highlighted a contrast and made me think about how race interacts with language use in multi/plurilingual settings. Racialization is the phenomenon of associating a social practice with a particular racial/ethnic group. A common example is religion, such as how Middle Easterners are often assumed to be Muslim. This happens with languages, too, and I think the phenomenon is magnified in public and tourist realms. In the case of Korea and Korean, it's often assumed that white people speak English and cannot speak Korean. In Italy, and I hesitate to speak too definitively about this because I know so little of the country, it may be assumed that whites speak Italian: even in places with lots of tourist traffic, I'd imagine the majority of whites one encounters would be Italian speaking locals. In the case of Italy, it may also be the case that Italian comes first; if communication in Italian fails then comes the utilization of other linguistic resources (and Italy has become quite racially/ethnically diverse, so this practice would also make a lot of sense).

It's very interesting to think about how racialization of language as a broader social phenomenon affects the individual and potentially affects language learning. In my case as a white dude with passable Korean and virtually non-existent Italian, my individual linguistic competencies were washed-over by larger social practices- but in ways that made sense, logically, from the perspectives of the people I was communicating with. I recall similar things happening when my wife (who is Korean) and I traveled to Japan and Hong Kong: even when doing very touristy things in touristy areas, waiters and clerks would often lock their gaze on my wife and address her in Japanese or Cantonese (where she no doubt felt like I did in Italy). From the view of language learning, racialization has the potential to affect the quantity, and perhaps quality, of linguistic input that is crucial for learning, particularly at the beginning stages.

Racialization of language, though not necessarily malicious, has also had effects on language and political policy in countries around the world. For example, it determines schooling for students in Singapore (the Mother Tongue system) and has been used as one means of separating "non-Hispanic whites" and Lantin@s in US demographics. It also has effects on the hiring of language teachers (e.g., job postings for EFL teachers specifically requesting Caucasians). A topic worth thinking about, for sure!

-------------------------------------------------

So obviously, I had a lot of time to get lost in thought while waiting in airports and sitting on planes. Hope you enjoyed the read, and feel free to share your thoughts or experiences related to racialization of language, Korean, or Italian (or anything else) in a comment below.