Anki is perhaps the most important tool in my endeavor to learn French. Every day I dutifully spend 20 – 40 minutes reviewing my flashcards, and a couple of times a week I spend more time adding new cards.
I’m following Gabe Wyner’s overall strategy for learning the language, and when it comes to vocabulary, Gabe recommends learning words from a frequency dictionary. I’ve been following this approach for over a year now.
Recently I’ve been wondering — about how many French words do I know? I know it’s about 1,000-2,000 of the most frequent words, but how can I be more precise? I decided to write a small Ruby script to mine Anki’s database and find out more about my French vocabulary.
The Counting Problem
Gabe has two approaches to creating vocabulary flashcards:
- The French word on one side and a picture on the other side.
- A fill-in-the-blank card, where I see a picture and a French sentence with a missing word, and I need to fill in the blank.
The trouble is, as I work my way through the frequency list, some words are simply too abstract or too difficult for me to create a picture card for right away. Gabe’s suggestion? Skip it! I can always come back later and add some fill-in-the-blank cards when I have a stronger overall vocabulary. But this means I can’t simply glance at my frequency dictionary to see how many words I know.
Tools
- ActiveRecord – Anki’s data is stored in a SQLite database.
- Anki 2 Annotated Schema – Thank goodness Shawn Moore (sartak) documented the schema! The table and column names are not immediately obvious.
- Ruby, of course, drives the script.
The Ruby script is available in this GitHub repository. For the rest of this post, I will assume a familiarity with Anki’s basic data structures – notes, models, fields, etc. Please see the Anki manual for additional information.
Counting Word-Picture Cards
The first of my two types of cards are picture cards, which are represented by the vocabulaire model. Each note using the vocabulaire model yields two cards — one card with the French word on the front and the picture on the back, and another card with the picture on the front and the French word on the back. If I can count how many vocabulaire notes I have, then I know how many words I have.
…Except that isn’t quite right. For some words, when I find some extra special pictures, or need additional help learning the word, I’ll add multiple picture cards for one word. In this case, the extra cards do not have the revers field set on the note. For simplicity’s sake, I’m going to skip notes that do not have the revers field set.
Given Anki’s database schema and nested JSON data, to count vocabulaire notes, I need to:
- Find the vocabulaire model and get its id.
- Figure out which index the revers field is at in each note.
- Find all of the notes referencing the above model id.
- Filter out notes which have a blank value for the revers field.
- Get a final count of the remaining notes.
As you can see, counting these notes isn’t trivial, but it’s not too bad either. Check out the count_vocabulaire_words method for the implementation details.
Counting Fill-in-the-Blank Cards
The second type of vocabulary card I create are fill-in-the-blank cards using the vocabularie – cloze model via cloze deletions. I like how my vocabulaire notes generate two cards per word, so I also create two fill-in-the-blank notes for every word.
To count this type of note, I need to:
- Find the vocabulaire – cloze model and get its id.
- Find all of the notes referencing this model id.
- Divide the number of notes by two.
Again, you may reference the count_vocabulaire_cloze_words implementation to see the details. This method is quite straightforward. Thanks, ActiveRecord.
Conclusion
The result? 1,432 vocabulaire words plus 258 vocabulaire – cloze words means I know about 1,690 French words. Of course, this is a rough estimate, but it looks about right. In reality, I probably know quite a few more French words that I’ve picked up in my grammar book, Easy French Reader, and other readings, but scouring the Anki database gives me 1,690 words, and I’m happy with that number for now.
It’s also worth noting that I probably could’ve gotten a reasonable estimate by using the browser built into Anki. But this was a nice way to test out connecting ActiveRecord to the Anki database, and it makes me wonder what kind of interesting queries I can write next.
P.S.
If you’re interested in learning a new language, be sure to check out Gabe’s wildly successful Kickstarter project. Jared and I backed the project and are looking forward to Gabe’s pronunciation decks coming out later this year!
P.P.S.
In the above vocabulaire – cloze card example, Julie has made a typical error that you’d expect from a native English speaker. Do you see it?
Edit 2/18/2014: I changed the name of the GitHub repository and have updated the link in this post.