http://www.newscientist.com/article.ns?id=dn7998
"The software [] goes further than existing cellphone camera technology by allowing entire documents to be scanned simply by sweeping the phone across the page.
Commuters in Japan already anger bookstore owners and newsagents by using existing cellphone software to try to take snapshots of newspaper and magazine articles to finish reading on the train to work.
This is only possible because some phones now offer very rudimentary optical character recognition (OCR) software which allows small amounts of text to be captured and digitised from images."
HAND SCANNERS!!!!!!!!!!!!
Welcome back, ~1990. : )
This makes me wonder about the power of Japanese OCR.. see at least with English you have 26 characters + upper case + punctuation, probably is less than 100 characters. Japanese on the other hand is a shitload more, several thousand just in Kanji.
>>3 haha - yeah.
I think the part "This is only possible because some phones now offer very rudimentary optical character recognition" is bull.
I have yet to see a single good English OCR program, the result is full of errors even on a clean scan. So for Japanese it's even more unreal.
Assuming the report is correct (that some people scan newpapers to read later), it means they read the page directly - not using OCR.
How the phone is susposed to get around the old-age B-8 problem with the relatively low-resolution cameras they're equipped with.
Mind you, perhaps Japanese has enough differences in its characters to make this possible. From what I remember from my years studying Japanese (you wouldn't guess it though, with the amount I can speak) each character is very much unique -- so I'd say the chances of correctly identifying each glyph are very much higher for Japanese than English. (where '1', 'I' and 'l' all look very alike)
Nah Japanese is terrible to decode, at least for me. Just ソ(so) vs ン(n) is driving me up the wall sometimes, especially in mags where they use the same character for both kana, argh. And those are simple kana. With complex kanji if the manga scan is only a little too dark or the resolution a little too low I have to use a lot of guesswork to find the kanji, if ever.
Do Japanese OCR programs even exist? I can't seem to find any.
>>6
Maybe http://www.gtk.org/~otaylor/kanjipad/ can help identifying those kanji.
>>7 That's handwriting recognition. You have to write it in by hand, stroke by stroke, in the correct sequence.
Also: "However, if you get the stroke order wrong, or you write sloppily, it may have difficulties." You cannot get the stroke order from a printed page. That's yet another obstacle for OCR.
>>6 Ahh okay, wasn't aware of that one. Mind you, Japanese has 3 different character sets, a standard one (can't recall its name), Kanji (borrowed from Chinese sets AFAIK), and Katakana (used for "borrowed" words from foreign languages).
The standard one is the only one I've seen in any sort of detail.
The example you give there would almost certainly cause problems for the OCR scanner.
Basically, what this means is, there's more to it than just optical scanning -- one actually has to look at the words themselves to see which character is the more likely candidate.
>>9
Oh I'm sure there would problems to OCR higarana too.
For example っ(+tsu) vs つ(tsu). In quite a few mags, the small tsu character is nearly identical to the big tsu. So I have to find out if the full word exists as a small or a big tsu before I write it down. This is complicated by the fact that all kana are together in Japanese, no space between "words" like in Korean or English.
Well, it figures that with such a context-based language as Japanese you'd have to have an obligatory word processor to go with your OCR.
Okay, according to http://www.sandg-anime-reviews.net/scanner.htm there are plenty of OCR programs, often given out with the purchase of a scanner.
"There is a multitude of them here in Japan (for Japanese Windows), their price ranging from about $100 to $1300."
But the results are so-so.
"Rather surprisingly, the results with Kanji can be startlingly good, with a very high percentage correct achieved at a very high speed. The results with hiragana/katakana are generally disappointing, with a high rate of errors, not helped by the variable size of kana characters."
And extensive proofreading is needed afterwards, meaning one needs to master the language already.
"I personally think that yes, it is faster to scan in and OCR a Japanese text than to type it in, but it is significantly slower than to do the same with English."
The other way OCR could get around difficult to read characters is looking at the surrounding characters/words.
eg: 大学行_ます。<- so the _ is the missing character. Because of the kanji before, and kana after the only common solution would be:
大学行きます。
>>14
Aaaah I wish I had a Japanese spell checker.
Google does nothing "Your search - 大学行ます - did not match any documents." while in English Google tries to correct my spelling.
Don't they have spell checkers in Japan? I need to research this.
Spell checking Japanese I don't think would be an easy task for a machine to do. It's not an easy task even for humans to do imho.
>>15
google suggest in japanese might help a little:
http://www.google.co.jp/webhp?complete=1&hl=ja
>>17 Nope, no correction there either.
大学行ますに該当するページが見つかりませんでした。(page not found)
Try using the correct particle to make it a full sentence; 大学に行きます returns over 600 hits.
Oh yeah... you needed the き part too...