January 23, 2004

How to Digitize Eight Million Books

Fascinating article about how Stanford University set about to digitize the books in its library: bq. About two and a half years ago, a good friend by the name of Christopher Warnock, the CEO and founder of Ebrary.com, an e-book distributor, came to talk to me when we were digitizing some Stanford University Press books on Latin America. He said, "You have to meet a couple of guys about a pretty interesting robot." They were Ivo Iossiger and Danick Bionda, founders of 4Digital Books, which is based in Switzerland. They showed me a video of their robot scanner. I immediately realized that if we could achieve the speeds they were talking about with their robot, we would have a breakthrough in how fast and how consistently we could digitize our materials. bq. When you're turning pages by hand, you can do maybe 150 to 200 pages per hour. It's slow. But the robot can easily do 600 to 1,200 pages per hour without damaging the books. And it's rigorously consistent -- the page is always flat, the image is always good, and software conversion allows you to index the text so you can search it. bq. But it's not just the scanning robot that's needed. There are the servers, the software, the network, the storage. Right now, it is an investment that can only be made by a big place like Stanford that already has a lot of this capacity in place. Even for us, though, a big issue is the large scale required to deal with our collection. With eight million volumes, if we were to digitize everything, we would end up with about a petabyte and a half of data. A petabyte is 10 to the 15th power. Managing the metadata for each individual bibliographic entity and each volume, the coding that allows you to search in a book, or in a collection of books associated by various parameters -- classification, subject heading, author, publisher, place of publication and so forth -- is another petabyte and a half. We're talking about gargantuan-sized memories and massively parallel supercomputers to whiz through this stuff. Not many institutions in this country have that kind of capacity. Maybe it will require a national effort to really do this. WOW! They have some photos of the robot as well a description of how it works. The company that makes the robot (4Digital) has their website here Posted by DaveH at January 23, 2004 2:16 PM