2023-06-25

Kickoff: Anki Data Merging Project

Goal: Reduce Anki's number of entries from 1.2 million by one third, to 800K. This will have a huge upside for future migration into Obsidian. The plan is to further exclude the 450K-entry JMYH dictionary and just migrate 350K entries. This is still a huge challenge, but at least it has a much stronger chance of success than 800K.

$

Anki data merging project (WIP w ongoing change log)

#rfp
#project/started
Take the normalized headword inflection as an example: produce the following single tab-separated value file

Processing

  • Rid of escaped double-quotes (") around or inside a field, so that one physical double quote is written as is without any escaping.
  • Turn in-line \n characters into
    .
  • Turn #word the content portion of a record into _word_ (surrounded by a pair of underscores)

Output for New Anki Merged

Headword (col 1)

inflection\t

Record (col 2)

## Inflection n.<br/>
line 1
line 2<br/>
<br/>
{src:A::A-Dict}<br/>
{tags: #md #d/4}<br/>
<br/>
---<br/>
<br/>
## inflection, an<br/>
line 1<br/>
<br/>
{src:A::ROy::Bil::EC::JMYH}<br/>
{tags: #JMYH}<br/>
<br/>
---<br/>
<br/>
## inflection<br/>
line 1
line 2  
BrE _inflexion_ (line 3)
line 4<br/>
<br/>
{src:A::Aud}<br/>
{tags: #mw #aud #md}\t

Tags (col 3)

``^merged mw d::4 aud JMYH md\n`


Change log

tag: #(tag)
#(買)書a