Backup of 金山詞霸 text (from MDX) format with one source per line

date-created:: 2023-08-23

My never-ending struggle between two competing glossary formats

A few years ago I decided to merge millions of 金山詞霸 entries around the headword in the interest of reducing the total number of entries for Anki. This is generally the right design approach. But over recent weeks I discovered several reasons that argue against it. The key issues are the lack of a consistent and robust database schema and my sad loss of the ability to automate through programming. This makes it hard to come up with an automated workflow for updating such merged entries.

With this realization, combined with having given up on Obsidian, Logseq, Tana and <insert-the-latest-hyped PKM systems> as my glossary system and deciding to go back to the oldest Anki given its web interface's speed, I am thankful and relieved to find that I still have a copy of the pre-merged old text format of 金山詞霸. This will allow me to gradually add selected sources into my main Anki profile, excluding bloated and unsuitable general-dictionary sources such as the American Heritage Dictionary. There's no need to reverse-engineer the prior merge. Glad I didn't rashly scrap the original format.

The moral of the story: Always save a copy of data of a given format. You never know when you might need it again.