mainoff.gif
lastdyoff.gif
lastwkoff.gif
treeoff.gif
searchoff.gif
helpoff.gif
contactoff.gif
creditsoff.gif
homeoff.gif


The Daltaí Boards » Archive: 2005- » 2008 (September-October) » Archive through September 03, 2008 » Distributed OCR « Previous Next »

Author Message
Top of pagePrevious messageNext messageBottom of page Link to this message

Bearn
Member
Username: Bearn

Post Number: 614
Registered: 06-2007


Posted on Tuesday, August 19, 2008 - 09:19 am:   Small TextLarge TextEdit Post Print Post

Computer users are digitizing books quickly and accurately with Carnegie Mellon method

Don’t know if they are doing the Gaeilge bit tho…


Millions of computer users collectively transcribe the equivalent of 160 books each day with better than 99 percent accuracy, despite the fact that few spend more than a few seconds on the task and that most do not realize they are doing valuable work, Carnegie Mellon University researchers reported today in Science Express. They can work so prodigiously because Carnegie Mellon computer scientists led by Luis von Ahn have taken a widely used Web site security measure, called a CAPTCHA, and given it a second purpose - digitizing books produced prior to the computer age. When Web visitors solve one of the distorted-letter puzzles so they can register for email or post a comment on a blog, they simultaneously help turn the printed word into machine-readable text. More than a year after implementing their version, called reCAPTCHA, http://recaptcha.net/ on thousands of Web sites worldwide, the researchers conclude that their word deciphering process achieves the industry standard for human transcription services - better than 99 percent accuracy. Their report, published online today, will appear in an upcoming issue of the journal Science. Furthermore, the amount of work that can be accomplished is herculean. More than 100 million CAPTCHAs are solved every day and, though each puzzle takes only a few seconds to solve, the aggregate amount of time translates into hundreds of thousands of hours of human effort that can potentially be tapped. During the reCAPTCHA system's first year of operation, more than 1.2 billion reCAPTCHAs have been solved and more than 440 million words have been deciphered. That's the equivalent of manually transcribing more than 17,600 books.

http://www.brightsurf.com/news/headlines/39571/Computer_users_are_digitizing_boo ks_quickly_and_accurately_with_Carnegie_Mellon_method.html

http://www.archive.org/search.php?query=Irish%20dictionary

sold!



©Daltaí na Gaeilge