Monday, November 21, 2011

Learning with Texts Review: Great for languages that use spaces, cumbersome but still useful for those that don't

Learning with Texts allows you to copy and paste in a body of text, note which words within that text you do and don't know, quickly look up the words you don't, and create flashcards from them. It was designed with languages that use spaces in mind (to determine where one word stops and the next begins), and works quite well with them. For languages that don't use spaces, like Japanese, it's much more cumbersome (as some elbow grease will be needed to indicate where one word stops and the next begins) but it's still a useful tool. The interface is not very intuitive and there's definitively a significant learning curve to climb before you get your sea legs, but I'd recommend breaking out your climbing gear because the price is right ($0) and there's no other free tool that does the same thing.

In fact, the only place you can do the same thing that I am aware of (to the comments if you know of another!) is LingQ. However, LingQ only allows you to input 100 terms for free; from there, you have to subscribe to get more. While I've found that LingQ is a bit more user friendly and intuitive, it's hard to beat free.

My initial approach to reviewing Learning with Texts was to simply pick some article I was reading, throw it up there, run through the process with it, and then report back in the form of a review. However, the initial article I selected was in Japanese, and it quickly became apparent that the Learning with Texts experience is going to be vastly different depending on whether you're using a language with spaces, like all major Western languages, or a language without spaces, like Japanese. As such, I also decided I'd add the text of a short comment from my blog that was written in Portuguese to test out how it works with languages that use spaces.

Adding a text. To get a text into the system: (1) sign in; (2) click on "My Texts" from the list; (3) click on "New Text…" above the table; (4) select the language, enter a title, copy and paste the text, and then click "Save and Open". Once that's done, you can start marking the words you do and don't know.

Identifying vocab in languages without spaces. Soon after Benny first announced the implementation of Learning with Texts on his site, I hopped over and added a Japanese text to the system, only to find the text was broken down character by character instead of word by word. My thought was, "Oh, damn, doesn't work with Japanese," and I set it aside for a while.

Upon deciding to give it a second go, it took some poking around in the forums to discover that, while Japanese does work, it doesn't exactly work well. Basically, the system is not designed to work with languages that don't have spaces, such as Japanese or Chinese. Generally, the system relies on spaces to tell it where one word ends and another begins, but take away the spaces and suddenly it's much more difficult to figure out what constitutes a word; to do it automatically, you'll need some kind of language-specific parser for each such language, and those are not part of the Learning with Texts package. (The forums seem to indicate that you can use things like MeCab or Kekasi to parse Japanese text for use with Learning with Texts, and if anyone could point me to an explanation of how to do that for Japanese or for Chinese it would be much appreciated.)

Learning with Texts won't find the words for you in languages that don't use spaces, but that doesn't mean you're without options. There are basically two ways to go about it. You can either manually put spaces between words before importing the text, or you can manually combine single characters into multi-character words as needed. Either one of these is cumbersome and is going to be difficult for beginners (as it's not always readily apparent where one word ends and the next begins).

The pop-up window when you click on the サ sa of サービス sa-bisu ("service"). You need to click on "4..ビス" to indicate that it's a 4-character word (ending in ビス bisu).
As my text was already imported into Learning with Texts, I went for the latter and began combining single characters into multi-character words. To do this, you select the first character of each word and then select the number of characters you want to extend the word to from the "Expr" list in a pop-up window (see accompanying image). Unfortunately, there is an arbitrary limit of nine characters; while that might work OK in a Western language (where it's used for multi-word, not multi-character, expressions), it's overly limiting in Japanese. For instance, the article I tested the system out with uses the term オペレーションチーム opere-shon chi-mu ("operation team"), which weighs in at 10 characters so cannot be added to the system using this method. (The character limitation was raised in this forum post, but unfortunately the only answer from a Learning with Texts project member was unhelpfully "So long???" Yes, so long!!!)

Once you set the length of the word, it will appear on the right side of the screen as a new term. For each of those words that you already know, you need to select "WKn", or well-known, from the right side of the screen and click save; otherwise, it goes back to being a string of single-character terms.

Contrast this to languages that use spaces (see below), in which all words that you do not create a term for are automatically deemed to be known when you press the "I know all" button at the end, and you can see how much more cumbersome this is.

That all said, the more you use this in Japanese (or any language that doesn't use spaces), the fewer and fewer terms you will need to add. Thus, with repeated use, the burden of adding multi-character terms will continually decline.

There were several other issues I noticed when using Japanese:

  • Words in a Roman script are ignored, and there didn't seem to be a way to get these recognized as terms. While this generally will avoid English words simply being used in Japanese, it will also ignore homemade Japanese terms that use Roman letters, such as OL ("office lady").

  • Numbers are also ignored, even when they form part of a word; e.g., 1つ hitotsu can only be made into a term as つ tsu. Like with Roman letters, there doesn't seem to be a way to include these in a term in Japanese.

  • Similarly, the character 々 is ignored. This character indicates a repetition of the previous Chinese character, and thus forms an integral part of the word in question. In a two-character term using 々, you either need to limit the term to the previous character or include something after it, neither of which will be exactly right.

  • There is an issue when a new term starts with an existing term of two or more characters. The text I used first contained 上手 jouzu and then 上手い umai. In order to enter umai as a term after already having entered jouzu as a term, I first needed to delete jouzu and then re-add it after entering umai. While this is a pain in a single text, it becomes pretty unworkable if the term you need to delete is in another text, as you'll need to track that term down. (This problem does not occur if the existing term consists of only one character or if the existing term is somewhere other than at the beginning of the term, which makes the behavior seem like a bug rather than an intentional feature.)

  • While you can control the size of the text in the body of text itself, you can't control text size in the dictionary search field, which led to some more-complex characters being difficult to read. I remedied this by using my browser's zoom feature, but this led to me later having trouble finding the "I know all" button because it's contained in a frame of static size and will get pushed down below the visible fold when zoomed in. I was stumped as to where that button was until I remembered to zoom out.
Adding a new language. Turning to Portuguese after all that Japanese complication, I was looking forward to an easy-breezy time, but was quickly faced with the painful fact that Portuguese is not a language that's supported out of the box (which was surprising because I figured that Benny would have incorporated at least all of the languages he has learned), meaning that I'd have to add a new language.

Here's the form for adding a new language:

Ouch. If that's not smarting for some user friendliness TLC, I don't know what is.

Rather than wading into the muck to try to calibrate a dictionary to look things up properly, etc., I simply typed "Portuguese" in the language field and pressed save, which resulted in the Google translate (or "GTr") being the default dictionary.

Identifying vocab in languages with spaces. Once I got through the language-adding process, things were a breeze. Google translate worked well enough (although I only had one term that I needed to look up), and Learning with Texts easily added a bunch of terms to my "known terms" list, without any of the hassle of Japanese. So for languages that use spaces between words, marking known vocab with Learning with Texts is a cakewalk.

Inflexibility in adding terms. One annoying thing that seems to apply to any language is that you can only create terms from the text exactly as they appear in the document; if a Spanish text contains only the conjugated hablo ("I speak") and you try to edit that into the unconjugated form hablar ("to speak") as a term, you'll get an error message (which makes me wonder why editing is permitted at all). For instance, when I tried to change the Portuguese plural word esboços ("outlines") to its singular form esboço ("outline"), I got this somewhat-unclear error:

The same would of course apply to Japanese volitional forms, declined nouns in German, etc. This weakens Learning with Texts use in creating flashcards; if you already know the grammatical changes but have just come across a new word, you're still forced to create a term from whatever form happens to be in your text rather than a standard word form that might be more useful. More flexibility here would be great.

Adding vocab to a spaced-repetition system. Learning with Texts has a built in flashcard system, but I'd much rather incorporate the vocab into a full-fledged spaced-repetition system like Anki. And that's possible, although it requires a long, not-so-user-friendly set of steps to make it happen.

User friendliness. My initial impression, and one that proved true as I continued to use it, was that Learning with Texts very open source-y, in that it's full of features but the design isn't intuitive. This will mean that there's a learning curve as you figure out what obscure abbreviations like "Expr", "WKn", "Ign" and "St" mean (although on-mouse-over tooltips are helpful for those), and, as noted above, things like adding a new language and exporting to Anki are far from user friendly. I also found that there's a lot of stuff on various pages the use of which isn't clear, which resulted in me simply ignoring that stuff.

There do seem to be explanations for all this stuff if you look hard enough—some are straightforward and provided by Benny, but for others you'll need to go spelunking into the forums. It's nice to have explanations, of course, but it's even nicer not to need them.

Growing pains. Learning with Texts as hosted on Fluent in 3 Months also seems to be experiencing some growing pains. It seemed to inexplicably load very slowly a number of times, and one time it got so slow for me that I thought my internet connection had gone out, but other websites were loading fine. That time, I walked away from my computer for a while and came back to see a 404 error, although reloading the page at that point brought it up right away. Later, I found Learning with Texts completely inaccessible, but Benny was already on the case, apparently dusting off some of his programming skills. I'm sure that these are the kinds of things that will be ironed out over time, but they do make the system more of a pain to use at the moment.

*     *     *

Although there's plenty of room for improvement, especially with respect to languages without spaces, this is still a great, free tool for picking out the vocab you need to focus on from texts you read and then getting that vocab into your spaced repetition system. Like Lang-8, RhinoSpike, Anki, and others, this fits perfectly into my language-learning workflow and looks primed to become one of my regular language-learning tools, and I'd recommend climbing the learning curve and starting to use this tool right away, even as I look forward to that curve getting flattened.


  1. A couple of advantages that LingQ still has over LWT are: a built in community, and a large available content library complete with audio for every text.  Of course, since the vast majority of the content is free at LingQ, I'm sure most will realize before long that they can simply import content from the LingQ library into LWT.

  2. Agreed. LingQ's content library is excellent, as is its community, although I think Fluent in 3 Months will give (or is giving) it a run for its money in the community aspect. (And, as an aside, LingQ's content library is so good that it's long been one of the links on the right bar of this blog.)

  3. Thorough review! :) 

    I've tried LWT too and was really enthusiastic about it. I used to love LingQ too but they introduced some changes (introduce is a soft term...) that I didn't really like (no, I'm not talking about the price). 

    So I tried LWT too but in my humble opinion it is really, I mean, waaay too complicated for an average user. I was really looking forward to export the sentences to ANKI - which turned out to be a hassle. Couldn't be bothered. What a shame. 

    Plus, unfortunately for LWT and many other services that use Google Translate's API, they will crash and burn as Google will put a hold on the API in December this year. :( 

    Anyway, thanks Vincent for the review! This was my 2c. :) 

  4. "...they will crash and burn as Google will put a hold on the API in December this year."
    That's simply not true, the Google Translate API is NOT used.

    "...which turned out to be a hassle"
    Export from LWT into Anki is very easy with the Anki template in the standard disribution.

    " LWT is ... waaay too complicated for an average user."
    If you learn French, etc., it's very easy. If you learn a language with a special character set, you have to configure that language. With LingQ you cannot do at all, with LWT, you can.

    And don't forget: it's free.

  5. One issue I ran into with Turkish was that it is a highly agglutinating language - meaning it used lots of suffixing, infixing and such.  So in English I might come up with a handful of variations to a word like run (run, runs, running, runner, ran) in Turkish I could feasably have hundereds of variations of the same word.  And here is the probem.  I added three blog posts about trail running.  All similar but about different races.  I went through the first and had to figure out what to do with the 20 different times a derivitive of 'run' was used.  Then I went to the second article - so much blue still!  So many more forms of the word run.  The third was the same and it felt like I was spending a lot of time ignoring 'run' words and it was a bit discouraging.  

    Does that make sense?  Any way around it?

    I too found lingQ a bit easier to use, but no Turkish . . . yet.

  6. Yes, it makes sense. In the beginning you need to create a lot of nearly similar terms with different suffixes to learn all these variations. Later, if you come across a word plus suffix you know both well, just ignore the term and press "I know all" after creating all terms you really don't know. You have to use LWT in a creative way. If you later come across a WELL-KNOWN term you don't remember, just create it then and set the status to 1,2,..

  7. I did try that, but it really killed the performance of the database queries, having 100,000 level-zero terms.  Had to remove them.

  8. Ok, it took me a while (and watching Benny's video on it) to understand what exactly this is and what it does, but I think it's pretty cool.  It sort of streamlines a bunch of stuff that you could do separately before (hover/click on a word and get a definition of it, SRS, audio recordings, etc.) and puts it together into an effective system specifically designed for language-learners. Very cool.  Thanks for posting about this.


  9. I would not import 100,000 terms but only the 10,000 most important, if this is feasible. Or buy a faster machine...

  10. Hi SSLL! Sorry to post so randomly but I've been having trouble getting this to work (following the link to the kanji koohii forum didn't help :(). Is there any chance you could explain how to use mecab to parse the text? I tried the mecab input.txt -o output.txt etc but nothing seems to work, or I'm putting the text file for input in the wrong place. Any tips? Thanks a lot in advance! And apologies again.

  11. Anyone manage to get mecab to parse #Japanese text for use on Learning with Texts? I've tried a few times but never managed to get it working, so any tips would be appreciated.

  12. That's true, but I ended up using LingQ simply because the whole experience is more streamlined and user friendly. With LWT I spent more time preparing my texts and creating terms than I do with LingQ, which means I get to spend more time actually reading and encountering new words in LingQ.

    Using audio at the same time as reading texts has a lot of advantages (it's also the idea behind the Language Bridge method) and I have found it hard to find lots of sources for such text and audio. Furthermore, simply taking all the content off LingQ and loading it into LWT would require more steps and would waste time.

    I also like the 4 different ways of testing vocabulary in LingQ. The tutors are also excellent and the whole system just looks and feels better to use.

    That said, LWT is a fantastic free program and if I didn't have LingQ or couldn't afford it I would definitely use LWT more.