2023-06-07

$

Book digitization

I began scanning some of my paper reference books in an attempt to digitize them for better search quite early when I purchased the then-expensive Fujitsu ScanSnap SV600 book scanner.

Book digitization is not something for the impatient. Countless hours have been spent on this.

There are some reference books that are simply a lost cause in digitizing. Take this book, 萬用英文手冊 as an example: Its frameless tabular print layout is anathema to even the most capable OCR tool there is. Scanning the book in, say, half an hour only gets 1% of the work done. The subsequent OCR with a huge amount of manual adjustment required is clearly not worth carrying out.

!_attachments/Screen Shot 2023-06-07 at 05.10.32.png

Luckily, all is not lost on other books.