Follow Us

We use cookies to provide you with a better experience. If you continue to use this site, we'll assume you're happy with this. Alternatively, click here to find out how to manage these cookies

hide cookie message

How to digitise your paper documents

Scan your documents into electronic format

Article comments

The space required to store paper documents can be a problem. Digitising your documents renders them exquisitely portable, you can store an entire library on your ebook reader with ease. And because paper documents can be turned into editable computer documents, they become searchable.

Compare typing "Roosevelt" in a search field with spending all day scanning micro-fiche and old newspapers by eye to research the Square Deal or the New Deal. The digital document is a boon to researchers the world over.

You can store documents digitally in one of two ways, as images or as text files. Images require far more space, but retain the character and flavour of the original document. Converting a scanned image to a text or word processing file involves what's called optical character recognition or OCR. It's a bit of misnomer, since you're actually processing digital information, but the term has stuck.

If the original document was written by hand or is art, storing it as an image is generally more desirable, the style of the handwriting can be as meaningful as the words themselves. The other reason for storing handwritten documents as images is that there are no commercially available handwriting recognition packages that can interpret handwritten characters from scans. So far, it's a technology stuck in the PDA and tablet world.

Anne-Sophie Bellaud of Vision Objects (a purveyer of handwriting recognition software) explains that with tablets you know the order in which hand-printed or -scripted characters were entered. This provides huge clues for the software. Without an entry timeline, handwriting is not nearly as easy to recognise.

Scanners

No matter which way you'll be storing your documents, as images or as text files, you'll need a scanner to digitise them. If you have relatively few documents to process, a multifunction printer or a dedicated flatbed scanner such as those discussed in "Digitise Your Pictures" will suffice. They're relatively slow, however and only the more expensive models have automatic document feeders to handle multipage documents.

Fujitsu ScanSnap S1500 is a compact scanner that can help make the job easy.Though pricey, sheet-fed scanners are just the ticket if you need to process a lot of documents. Units such as Fujitsu's ScanSnap S1500 and HP's ScanJet Professional 3000 scan both sides of a document at once and average 20 pages per minute or better.

I'll give the HP props for slightly more reliable paper feeding with mixed document types, but the Fujitsu has the superior, better-integrated software.

OCR Software

Most scanners ship with OCR software that you can install on your PC, but if yours lacks it, you can buy the software separately. ABBYY's FineReader 9 Express, Nuance's OmniPage 17 Standard and Adobe's Acrobat X Standard are all good choices. Nuance's PaperPort 12 Standard also scans, does OCR and adds document management features that make it easier to keep track of your documents. Less expensive versions exist for most of these programs, so slow your heart rate.

In my hands-on tests with clean 300-dpi scans, Acrobat did the best job of converting documents, followed closely by FineReader, and not so closely by OmniPage and PaperPort. But the latter three products did better with the three low-quality, 150-dpi scans that I included among my test documents.

For documents stored as images, 150 to 200 dpi is usually fine, but OCR software works much better with 300 dpi scans. Much depends on your needs. If you just want to retain legibility, you may be able to drop the dpi and reduce your storage requirements.

Web OCR

Several online services, such as www.free-ocr.com, www.newocr.com and www.ocronline.com, are good for small scale projects or one-offs. First you scan the original to your PC, then upload the document to the website.

The services have limitations: My tests yielded results that weren't very accurate. Also, only text is recognised, not lines and other page elements.

The first service mentioned above, Free OCR, is free, but files can be no larger than 2MB, and no wider or higher than 5000 pixels (about 150 dpi for a letter-sized page) and you can do no more than 10 uploads per hour.

Another service, www.newocr.com, is also free, but the interface is primitive. It does a much better job, though, of pulling text than free-ocr.com, and it allows documents up to 5MB in size.

Finally, www.ocronline.com requires creating a free account, but allows 4MB images (about 200 dpi per page) and up to 15 uploads per hour. You get 10 free credits, but after that you must pay for them. The site sells credits in varying quantities, from 50 for $3.95 (8 cents per page) up to 5000 pages for $49.95 (1 cent per page). I got good results with this service, which handles graphic elements as well as text, though it wasn't up to the standards of Acrobat X or FineReader 10.

Ebooks

There's nothing like the feel, smell and visual stability of a real book, but more and more people are happily reading virtual books using Kindles, Nooks, iPads and other devices. You simply can't beat their portability, and the texts are searchable.

It's even possible to have a decent reading experience on smartphones and iPods. I use the latter and no, the frequent page-turning does not bother me, though I'll undoubtedly go for something larger eventually. You can purchase most books from an online store, but you may have some books in your own collection that aren't available in digital format.

To convert a physical book into an ebook requires first scanning it page by page, and then for lack of a better term OCR'ing it. This is tedious at best, so use a fast scanner. If you are willing to destroy the book, or know how to rebind, use a sheet-fed scanner. Most of the aforementioned OCR programs have features that help organise the pages.

Once you have the text file (in PDF, Word or other format) in place, grab Calibre, a very capable and free ebook reader, organiser, editor and publisher. Convert the file to the format appropriate for your device: EPUB or PDF, say. Once you've created a viewable file, use a reader app such as Stanza to load the ebook onto your device. Your device or app must support side-loading, that is loading from a PC.


Share:

More from Techworld

More relevant IT news

Comments

shredding San Antonio said: Print only what you need Important paper documents need to be safeguarded or properly disposed Converting your paper documents into digital ones is the most convenient way of archiving it This ensures easy retrieval of your files Aside from this you do not have to worry of cluttering your space with piles of paper These benefits are the reason why many are opting to store their documents digitally nowadays



Send to a friend

Email this article to a friend or colleague:

PLEASE NOTE: Your name is used only to let the recipient know who sent the story, and in case of transmission error. Both your name and the recipient's name and address will not be used for any other purpose.

Techworld White Papers

Choose – and Choose Wisely – the Right MSP for Your SMB

End users need a technology partner that provides transparency, enables productivity, delivers...

Download Whitepaper

10 Effective Habits of Indispensable IT Departments

It’s no secret that responsibilities are growing while budgets continue to shrink. Download this...

Download Whitepaper

Gartner Magic Quadrant for Enterprise Information Archiving

Enterprise information archiving is contributing to organisational needs for e-discovery and...

Download Whitepaper

Advancing the state of virtualised backups

Dell Software’s vRanger is a veteran of the virtualisation specific backup market. It was the...

Download Whitepaper

Techworld UK - Technology - Business

Innovation, productivity, agility and profit

Watch this on demand webinar which explores IT innovation, managed print services and business agility.

Techworld Mobile Site

Access Techworld's content on the move

Get the latest news, product reviews and downloads on your mobile device with Techworld's mobile site.

Find out more...

From Wow to How : Making mobile and cloud work for you

On demand Biztech Briefing - Learn how to effectively deliver mobile work styles and cloud services together.

Watch now...

Site Map

* *