Friday, November 25, 2011

A formula to calculate language-learning success

One of the first formulas you'll learn in high-school physics is that distance travelled (D) equals speed (s) multiplied by time travelled (t), or:

D = s * t
So if you're going 30 km/h for 3 hours, you know you've travelled 90 kilometers.

The same formula can be applied to language learning, where learning (D) equals learning speed (s) multiplied by time spent learning (t). So if you've been learning 1 new item (vocab word, grammar rule, character, etc.) every 3 minutes and your exposure time is 100 hours (or 6,000 minutes), you've learned 2,000 items.

Your language learning goal, or what you ultimately want D to be, will be a constant; if your goal is fluency, there's only so much you need to know to reach that goal (despite how bottomless learning a language may seem). In order to get to that goal, you've then got two variables to play with: time spent and learning speed.

Time spent will also to some extent be static. Your goal should always be to maximize time spent on the language, but this will be constricted by unrelated factors, e.g., sleep, work, etc. Your control over this factor will be limited to things like skipping that television show in your native tongue in exchange for some language-learning time, and so on.

Your learning speed is where you'll have the most control, as this is largely determined by learning method, and you're fully in control of how you go about learning. So, to speed up the pace, you can, for example, skip out on the slow-paced classes and dive into some exposure on your own.

This concept would actually lead to a great way to compare language-learning resources and methods: items per minute. We'd just have to come up with some standard of "items" (e.g., the individual prompt-response pairs used in SRS programs) and some standard of "learning" (e.g., having a 90% or greater chance to be able to recall the item one year from now), and suddenly you'd have apple to apple comparisons for anything out there and could quickly determine what's efficient and what's not.


  1. So - counting the amount of data learned... doesn't really say much about how well one knows the language, does it?

  2. Sure it would. If someone knows, say, 15,000 items in a language (which would of course mostly be vocab), that'd be a pretty darn good indicator of how well they know the language.

    The tricky part of the metric would be defining what an item is to make sure it covers various usages of language, including listening and speaking.

    What data would you add to the formula?

  3. But the concept the of *knowing* an item is pretty fuzzy. You could know something vaguely, or you could know it with complete mastery. What is the metric for saying that you know something?

    The concept of *item* is also pretty fuzzy. If you were evaluating say a Pimsleur CD, how would you define what an item is?

    Since both of those things are fuzzy, quantifying them probably wouldn't be very meaningful.

    I do think you may be onto something though. Speed and time are interesting ways to think about learning. I just doubt that speed could be meaningfully quantified.

  4. I agree, testing the knowing, of something in a meaningful way would be too difficult (you could test it in a consistent way but that is not often very useful). How many levels of knowing are there with  in Chinese 了 or 的? Look how many meanings the simple word om can be used for in Afrikaans in fact I have discovered that a number of small filler words in this language are a like Swiss army knives and what they actually mean is so dependant on context.

    The data can be tested in isolation but also has to be tested in multiple contexts to demonstrate useful knowlege.

    Assuming you could come up with a way to test the data  then I bet that you would get more useful results from just three hour long disscussions with a native speaker (who had some training in the testers expectations) and getting them to mark a level even though this is subjective.

    A true objective test of the data would be huge undertaking.

  5. This sort of somewhat relates to something Khatzumoto posted over at AJATT about how important consistency is.  Doing this sort of calculating you're talking about is more important than you may realize, because you need to figure out just how much you can do every day SUCH THAT you can consistently study the language EVERY SINGLE DAY--doing a little every day is far better than a lot every now and then or, even worse, burning yourself out and giving up before you reach your goal.


  6. hi, first time visiting ur blog, and fell in love with it straight away.

  7. The concept of "knowing" doesn't have to be fuzzy. At the extreme, I think anyone would agree that you "know" something if the chance of you forgetting it during your lifetime is negligible. If you know someone's exposure to something, SRS algorithms can figure out how likely a person will be to recall that thing at a given point in the future, so all we'd need to do is pick a metric from that data to mean "know". Perhaps it's a 95% chance of recall 1 year from now. Or perhaps it's an 80% chance of recall 5 years from now.

    For items, Pimsleur CDs would be no different than anything else, in that you'd just make each item the base prompt-response unit that SRSs call for. If context is needed to understand the prompt, then it'd have to be included. It would take some effort to systematize this for things like grammar rules, but I don't think it's beyond our capabilities.

    From there, it'd just be a matter of being consistent to have an oranges-to-oranges way to compare language-learning speed.

  8. It would certainly be a huge and difficult undertaking, but it wouldn't be impossible, and it would probably be one of the most valuable things language learners could have: consistent data to compare any given method A to any given method B.