Saturday, December 20, 2008

Top 10,000 words in Dutch, English, French, and German

This page has lists of the top 10,000 words in each of Dutch, English, French, and German. As the page is in German, I've put together a little table to take you directly to the lists. The lists unfortunately do not have translations.

After the jump, the table and a another word frequency list for French.

DutchTop 100Top 1,000Top 10,000
EnglishTop 100Top 1,000Top 10,000
FrenchTop 100Top 1,000Top 10,000
GermanTop 100Top 1,000Top 10,000

About.com has a top-100 word frequency list for French here. In contrast to the lists above, About.com's does have English translations.

7 comments:

  1. This is really helpful - but a word of warning: these lists are obviously based on print sources (likely newspapers), not on spoken language. Still helpful but the distribution will probably be biased on the side of formal language. Thanks for the easy links!!!! :-)
    ReplyDelete
  2. Very true, and it's a problem with most frequency lists. Frequency lists that include spoken speech are much rarer because it's a heckuva lot harder to get speech into lists; you need to have some way to get the spoken word put into a database, which can be quite a hassle whether you're recording, using speech recognition software, or just writing it down.
    ReplyDelete
  3. There are duplicates in these lists, or at least the English list.
    ReplyDelete
  4. There are duplicates in these lists.anyway thanks.
    ReplyDelete
  5. This is human nature ....to find first the flaws, then the good parts. Take it as it is guys, improve it and post it yourselfs.
    Congrats for the lists! Nice Job!
    ReplyDelete
  6. These lists have been a good help, but I find ones based on TV subtitles rather than print sources better. That's an easy way you can get something that approximates real speech. I haven't managed to find this for Dutch though, only for Spanish. Here's the Spanish one for anyone interested:


    http://en.wiktionary.org/wiki/Wiktionary:Frequency_lists#Spanish

    The Dutch list has "the"  at least twice (it has "The" and "the). As far as I know, "the" is not a Dutch word and wiktionary isn't turning up any Dutch results. I think this is from when Dutch news sources quote organizations and keep their name in English.  It also has several nationalities (Belgische, Amerikaanse) and acronyms (CDA), which makes the list a lot harder to wade through...
    Other English words that have crept somehow in are "many" and "workshops". I could see the latter italicized as an English word in many publications, but I don't see how "many" got in.
    ReplyDelete