When looking into what seems to be the never-ending abyss of learning a language, it's nice to have an idea of where your finish line might be. Most of the individual pieces of language data that you'll be storing in your head consist of vocabulary, so knowing how much vocab you'll need to reach the vaunted native level is a pretty good indicator of where your finish line is.
So how many words does an average native speaker know? Good numbers are pretty hard to come by and the jury still seems to be largely out on any conclusive numbers, but there does seem to be a rough consensus that with 20,000 or so words you'll pretty much be covered in anything you want to use the language for.
When I first put this post up, my web searching somehow failed to bring me to this very thorough on-point post on How to Learn Spanish. That's worth a read on its own, but the key thing I'd highlight from it is a quote from Alexander Arguelles (a guy who has devoted his life to language learning):
The maddening thing about these numbers and statistics is that they are impossible to pin down precisely and thus they vary from source to source. The rounded numbers that I use to explain this to my students I usually write in a bull’s eye target on the whiteboard, but I don’t have the computer skills to draw circles in this post, so I will just have to give a list:And as I do have the computer skills to draw circles in this post, here are those numbers in a proportional bull's-eye target:
- 250 words constitute the essential core of a language, those without which you cannot construct any sentence.
- 750 words constitute those that are used every single day by every person who speaks the language.
- 2500 words constitute those that should enable you to express everything you could possibly want to say, albeit often by awkward circumlocutions.
- 5000 words constitute the active vocabulary of native speakers without higher education.
- 10,000 words constitute the active vocabulary of native speakers with higher education.
- 20,000 words constitute what you need to recognize passively in order to read, understand, and enjoy a work of literature such as a novel by a notable author.
Let's dive into some specific languages to see how those numbers play out.
English seems to have some fairly good data. To sum up the findings of a number of sources, native-English speakers know somewhere around 15,000 to 20,000 "base words", which are actually word families that include all inflected and derivative forms ("run", "runs", "running", "runner", etc.). According to this indirect source, each base word equals about 1.6 actual words, so the total range of words would be approximately 24,000 to 32,000 words.
English, however, might not be very representative of other languages:
The English language is likely to contain the most words of all languages, according to the Oxford English Dictionary, and estimates for the number of words range from one to two million.So let's turn to some other languages.
Spanish might bear out English's lack of representation. Gerald Erichsen estimates that Spanish has about half as many words as English, so it might be a reasonable presumption to extend that to what the average native speaker knows, i.e., somewhere around 15,000 words for a native-Spanish speaker.
Years ago I heard that the average German speaker only needs to know 10,000 base words (which they can then mix and match to form more complex words), although I was unable to find a source stating that on the net.
Japanese, on the other hand, seems to have some higher numbers. NTT's Vocabulary Count Estimate Test estimates that university students know 45,000-50,000 words. However, I was unable to determine how "word" is defined (the test is based on a book, which is where that definition would be). My score on the test ranged variously from 12,000 words to 19,000 words, which would make me no more functional than an elementary school student based on their scoring method, but both my use of Japanese at work and the large understanding gap that remains between me and my daughter (who, according to her Japanese teachers, is reading somewhere around an end-of-elementary-school level) seem to indicate something's off in how these numbers play out.
For Chinese, user rezaf on Chinese-forums.com suggests what seems to be a pretty reasonable estimate of 21,000 for Chinese:
In the last few months, I have focused on memorizing the useful words of my dictionary chosen by my Chinese friends. My dictionary has 120,000 words and 2,000 pages. I have noticed that on average they choose between 10 to 11 words per page which might mean 21,000 words when I finish this project in 3 or 4 years. … (What I mean by word includes chengyu and other stuff as you can see in the attachment.)And here's a table summarizing one language learner's data from Russian:
|Most common words||% of occurrences|
What to take home from all this? The numbers are far from exact, but it seems like 10,000 words and 20,000 words are two pretty reasonable goal posts in whatever foreign language you're learning. If you reach the 10,000 marker and find you're struggling to come up with new vocab (as might happen in Spanish or German), then you might be able to forego the 20,000 marker altogether and just pick up words as they come. In English, Chinese or Japanese, on the other hand, it seems clear that 20,000 would be a better marker. No number of words will ever be a clear end point (as even native speakers will continue to learn new words throughout their lives, including words that aren't yet created when any given word count is made), but as far as language-learning goal-setting goes, they make for good rough estimates of what you should be aiming at.
If you've got any info on how many words native speakers of other languages know, or anything else on the languages mentioned above, please drop a line in the comments!
Update: There were a bunch of updates to the post above based on Andrew's comment below, which pointed me to some great resources I had missed earlier (luckily those resources ended up with the same 20,000 rule-of-thumb number I had settled on previously).