Monday, February 11, 2013

How many words do you need to know in a foreign language?

When looking into what seems to be the never-ending abyss of learning a language, it's nice to have an idea of where your finish line might be. Most of the individual pieces of language data that you'll be storing in your head consist of vocabulary, so knowing how much vocab you'll need to reach the vaunted native level is a pretty good indicator of where your finish line is.

So how many words does an average native speaker know? Good numbers are pretty hard to come by and the jury still seems to be largely out on any conclusive numbers, but there does seem to be a rough consensus that with 20,000 or so words you'll pretty much be covered in anything you want to use the language for.

When I first put this post up, my web searching somehow failed to bring me to this very thorough on-point post on How to Learn Spanish. That's worth a read on its own, but the key thing I'd highlight from it is a quote from Alexander Arguelles (a guy who has devoted his life to language learning):
The maddening thing about these numbers and statistics is that they are impossible to pin down precisely and thus they vary from source to source. The rounded numbers that I use to explain this to my students I usually write in a bull’s eye target on the whiteboard, but I don’t have the computer skills to draw circles in this post, so I will just have to give a list:
  1. 250 words constitute the essential core of a language, those without which you cannot construct any sentence.
  2. 750 words constitute those that are used every single day by every person who speaks the language.
  3. 2500 words constitute those that should enable you to express everything you could possibly want to say, albeit often by awkward circumlocutions.
  4. 5000 words constitute the active vocabulary of native speakers without higher education.
  5. 10,000 words constitute the active vocabulary of native speakers with higher education.
  6. 20,000 words constitute what you need to recognize passively in order to read, understand, and enjoy a work of literature such as a novel by a notable author.
And as I do have the computer skills to draw circles in this post, here are those numbers in a proportional bull's-eye target:


Let's dive into some specific languages to see how those numbers play out.


English

English seems to have some fairly good data. To sum up the findings of a number of sources, native-English speakers know somewhere around 15,000 to 20,000 "base words", which are actually word families that include all inflected and derivative forms ("run", "runs", "running", "runner", etc.). According to this indirect source, each base word equals about 1.6 actual words, so the total range of words would be approximately 24,000 to 32,000 words.

English, however, might not be very representative of other languages:
The English language is likely to contain the most words of all languages, according to the Oxford English Dictionary, and estimates for the number of words range from one to two million.
So let's turn to some other languages.


Spanish

Spanish might bear out English's lack of representation. Gerald Erichsen estimates that Spanish has about half as many words as English, so it might be a reasonable presumption to extend that to what the average native speaker knows, i.e., somewhere around 15,000 words for a native-Spanish speaker.


German

Years ago I heard that the average German speaker only needs to know 10,000 base words (which they can then mix and match to form more complex words), although I was unable to find a source stating that on the net.


Japanese

Japanese, on the other hand, seems to have some higher numbers. NTT's Vocabulary Count Estimate Test estimates that university students know 45,000-50,000 words. However, I was unable to determine how "word" is defined (the test is based on a book, which is where that definition would be). My score on the test ranged variously from 12,000 words to 19,000 words, which would make me no more functional than an elementary school student based on their scoring method, but both my use of Japanese at work and the large understanding gap that remains between me and my daughter (who, according to her Japanese teachers, is reading somewhere around an end-of-elementary-school level) seem to indicate something's off in how these numbers play out.


Chinese

For Chinese, user rezaf on Chinese-forums.com suggests what seems to be a pretty reasonable estimate of 21,000 for Chinese:
In the last few months, I have focused on memorizing the useful words of my dictionary chosen by my Chinese friends. My dictionary has 120,000 words and 2,000 pages. I have noticed that on average they choose between 10 to 11 words per page which might mean 21,000 words when I finish this project in 3 or 4 years. … (What I mean by word includes chengyu and other stuff as you can see in the attachment.)


Russian

And here's a table summarizing one language learner's data from Russian:

Most common words% of occurrences
7540%
20050%
52460%
1,25770%
2,92580%
7,44490%
13,37495%
25,50899%




What to take home from all this? The numbers are far from exact, but it seems like 10,000 words and 20,000 words are two pretty reasonable goal posts in whatever foreign language you're learning. If you reach the 10,000 marker and find you're struggling to come up with new vocab (as might happen in Spanish or German), then you might be able to forego the 20,000 marker altogether and just pick up words as they come. In English, Chinese or Japanese, on the other hand, it seems clear that 20,000 would be a better marker. No number of words will ever be a clear end point (as even native speakers will continue to learn new words throughout their lives, including words that aren't yet created when any given word count is made), but as far as language-learning goal-setting goes, they make for good rough estimates of what you should be aiming at.

If you've got any info on how many words native speakers of other languages know, or anything else on the languages mentioned above, please drop a line in the comments!

Update: There were a bunch of updates to the post above based on Andrew's comment below, which pointed me to some great resources I had missed earlier (luckily those resources ended up with the same 20,000 rule-of-thumb number I had settled on previously).

27 comments:

  1. Actually, yes, I wrote a very long and detailed article about this exact subject as it regards Spanish a couple of years ago and it's become one of my most popular posts: How Many Words Do You Need to Know in Spanish (or any other foreign language)? And WHICH Words Should You Be Learning?
    .

    The funny thing is that the data on this for Spanish is shockingly sparse, only 2 solid studies as I recall in the last 50 years or so, though luckily the more recent one by Mark Davies was superb and has been turned into a book in the form of a frequency dictionary which is readily available to anyone who wants to read it.

    The more interesting aspect of this, for me and language learners in general, is the discovery that you really only need a relatively small vocabulary (generally estimated at around 2000 words, but they have to be the right 2000 words) in order to be functional and capable of expressing pretty much anything you want to--you may not be able to be poetic or sophisticated in your speech, but you will be able to express whatever you need to, though it may be in somewhat simplistic terms, and you should be able to understand almost everything anyone else says in the language even when they use words that are not part of that 2000 word vocabulary because you'll be able to infer their meaning from the context most of the time.  This is where frequency lists and dictionaries come in: learn the most commonly used words in either speech or text (depending on your focus, and yes they're different, see my link at the top) thereby making the most efficient possible use of your "vocabulary learning time" while learning a language.  Make sense?

    Hope that helps.

    Cheers,
    Andrew

    ReplyDelete
  2. Note that you can estimate your vocabulary size for "base words" in any language using a dictionary. Count the number of words that you know (up to you to define "know") on one or more pages, then multiply by the number of pages in the dictionary and divide by the number of pages you counted.

    ReplyDelete
  3. Great tip.  I'd just add as a postscript that the more pages you actually count the more accurate your end result will be.

    ReplyDelete
  4.  That is the most efficient way of doing things but be aware that you will not be efficient (paradoxically, yes) in explaining things. For example, for a native English speaker, my Spanish is very advanced (college degree, lived abroad a while, speak Spanish ALL the time) and even after years of learning/speaking, I still come across a LOT of words I don't know. For example, I just learned how to say manhunt (cacería humana) and junkyard (deshuesadero).

    To explain manhunt, I would have to say something like,

    "¿Cómo se llama eso cuando la policía está persiguiendo a un sospechoso?"

    If I knew the word, I could just say:

    And for junkyard, I would have to say,

    "Es el sitio donde echas las cosas que no quieres, como los coches descompuestos, etc, etc"

    So your speech will be pretty inefficient unless you have lived abroad for years....

    ReplyDelete
  5.  Wow, thanks so much for the link, that was really cool of you!

    Cheers,
    Andrew

    ReplyDelete
  6.  This is completely true and a very good point, I'm always shocked upon realizing that I don't know a word in Spanish that's really common in English and that any 6-grader would know--this happens, as you said, even at a very advanced level from time to time, it's maddening.

    Cheers,
    Andrew

    ReplyDelete
  7. This is completely true and a very good point, I'm always shocked upon realizing that I don't know a word in Spanish that's really common in English and that any 6-grader would know--this happens, as you said, even at a very advanced level from time to time, it's maddening.

    Cheers,
    Andrew

    ReplyDelete
  8. Among all languages English language is an international language.English language is spoken commonly everywhere like in airports,banks,offices,industries,schools,colleges,bus stations,online ordering,coffee shops etc.,I have started learning English with videos http://www.youtube.com/user/twominenglish/videos which is easy and interesting way.

    ReplyDelete
  9. I agree with all of this, my question is what are the words specifically? Surely someone has done the research as to what the first 250, 750, 2,500 etc. It seems that sequencing would be quite important. Trying to memorize the word for "manhunt" before having acquired words like "television" or "knife" seems counterproductive.

    ReplyDelete
  10. Specifically, the words you should start learning are the words you'll most commonly encounter. Frequency lists are a good place to start. If you used Wictionary's TV and movie script frequency list to study English, for example, "television" would be the 1,873rd word you'd learn, "knife" the 1,980th, and "manhunt" the 18,573rd. However, after you learn the most common words, frequency lists are less useful as you'll want for context.

    ReplyDelete
  11. Very useful topic, thank you

    ReplyDelete
  12. how long does it take to learn 20,000 English words when you study 2 hours 5 days a week

    ReplyDelete
  13. That's because "number of words" and "ability to speak fluently" are 2 unrelated things... Some people speak fluently with very few words, some can't speak with a wide vocabulary...

    Speaking is about your confidence not your vocabulary. Of course it helps, but if you don't know a word and can speak, you explain what you want to say people help you and you know a new word that you'll hardly forget...

    I liked the comparison with "how many notes do you need to know to compose a masterpiece?" it says it all IMO.

    ReplyDelete
  14. great article , thanks!

    ReplyDelete
  15. you are a genius Jonathan, i had noticed that too!

    ReplyDelete
  16. OMG I would kill to possess your writing ability. I am Russian but a great passioned about english language. I would like to write a best seller book in english hahhaha :)

    ReplyDelete
  17. Nice post, but what about actually learning 20,000 words? It is hard enough to learn 2,000-4,000 words.

    ReplyDelete
  18. Should also takes pages with different starting letters!

    ReplyDelete
  19. Thank you for the great article, I would like to take the chance to invite English tutors and students who wants to have conversations with English native speakers online via Skype to have a free view on this excellent website that offers English conversation classes http://preply.com/en/skype/english-native-speakers I am currently taking English conversation classes over there with native speakers and the quality presented is professional and satisfying.

    ReplyDelete
  20. Omg xD Andrew you aren't speaking very well the Spanish , I am a native Spanish speaker, so I can correct you:
    Manhunt: cazador, just
    When you are looking for a wicked boy: buscando al malechor
    And about your question, It's better like this: que se dice cuando los pacos persiguen al gamberro?

    ReplyDelete
  21. Thanks for sharing this valuable informative. Its really useful for me. If anyone wants to get German Classes in Chennai reach us FITA, rated as No.1 German Training Institutes in Chennai.

    ReplyDelete
  22. Nico Álvarez AcebalJul 18, 2015, 11:51:00 PM

    I agree with your comment. When you are abroad, you are learning things that are more related to the culture of the country or of the province; if you are in Spain you will learn another kind of Spanish that you will learn in Latin America. And when you learn (let us say) "academic Spanish", you are learning a neutral Spanish without dialects or cultural influences, that serves you to communicate with native people but always showing them that you are a foreign speaker of the language. I have the experience of having stayed in Germany for two years with no previous knowledge of the language. After one year and just talking to people and practicing the vocabulary I learnt at home I reached the level of a native speaker. Nonetheless of a native speaker of this province! I usually use interjections that I do not notice and do not appear in German learning books and are not either used outside of the province where I lived like "Hömma!" or "Wat is dat denn?". People who are not told that I am foreign do not notice it and treat me as a native person. So the most important thing for me is that you have native people to talk to (to talk and to listen to is more active than to write or to read) when you are learning a language. A language is not like mathematics, it is influenced passively by many factors (geography, politics, economy, other languages). What we are thaught at a normal school about languages is the "what" but not the "why". The "why" is made by the association of a word with its meaning, and the "what" is the translation of a single word in a similar word of another language. That is why I, as Asturian and Spanish native speaker, cannot recognize the word "deshuesadero" and use instead of it "desguace".
    PS.: Siento decirlo, pero los coches descompuestos no existen, sino más bien aquellos corroídos por óxido o coches destrozados, quemados, etc...
    May you all be happy and have a nice day,
    Nico.

    ReplyDelete