Camera phones will be high-precision scanners (20)

1 Name: Sling!XD/uSlingU 2005-09-15 14:06 ID:G2Dr17iB

http://www.newscientist.com/article.ns?id=dn7998
"The software [] goes further than existing cellphone camera technology by allowing entire documents to be scanned simply by sweeping the phone across the page.

Commuters in Japan already anger bookstore owners and newsagents by using existing cellphone software to try to take snapshots of newspaper and magazine articles to finish reading on the train to work.

This is only possible because some phones now offer very rudimentary optical character recognition (OCR) software which allows small amounts of text to be captured and digitised from images."

2 Name: Alexander!DxY0NCwFJg!!muklVGqN 2005-09-16 09:18 ID:TPRZhrNI

HAND SCANNERS!!!!!!!!!!!!

Welcome back, ~1990. : )

3 Name: 2005-09-17 12:54 ID:Heaven

This makes me wonder about the power of Japanese OCR.. see at least with English you have 26 characters + upper case + punctuation, probably is less than 100 characters. Japanese on the other hand is a shitload more, several thousand just in Kanji.

4 Name: Sling!XD/uSlingU 2005-09-17 20:04 ID:cWmOSdr2

>>3 haha - yeah.

I think the part "This is only possible because some phones now offer very rudimentary optical character recognition" is bull.

I have yet to see a single good English OCR program, the result is full of errors even on a clean scan. So for Japanese it's even more unreal.
Assuming the report is correct (that some people scan newpapers to read later), it means they read the page directly - not using OCR.

5 Name: Redhatter 2005-09-18 11:38 ID:Heaven

How the phone is susposed to get around the old-age B-8 problem with the relatively low-resolution cameras they're equipped with.

Mind you, perhaps Japanese has enough differences in its characters to make this possible. From what I remember from my years studying Japanese (you wouldn't guess it though, with the amount I can speak) each character is very much unique -- so I'd say the chances of correctly identifying each glyph are very much higher for Japanese than English. (where '1', 'I' and 'l' all look very alike)

6 Name: Sling!XD/uSlingU 2005-09-18 16:08 ID:cWmOSdr2

Nah Japanese is terrible to decode, at least for me. Just ソ(so) vs ン(n) is driving me up the wall sometimes, especially in mags where they use the same character for both kana, argh. And those are simple kana. With complex kanji if the manga scan is only a little too dark or the resolution a little too low I have to use a lot of guesswork to find the kanji, if ever.

Do Japanese OCR programs even exist? I can't seem to find any.

7 Name: CyB3r h4xX0r g33k 2005-09-18 20:09 ID:XpzpRUkc

>>6
Maybe http://www.gtk.org/~otaylor/kanjipad/ can help identifying those kanji.

8 Name: Sling!XD/uSlingU 2005-09-18 21:17 ID:cWmOSdr2

>>7 That's handwriting recognition. You have to write it in by hand, stroke by stroke, in the correct sequence.
Also: "However, if you get the stroke order wrong, or you write sloppily, it may have difficulties." You cannot get the stroke order from a printed page. That's yet another obstacle for OCR.

9 Name: Redhatter 2005-09-19 08:46 ID:Heaven

>>6 Ahh okay, wasn't aware of that one. Mind you, Japanese has 3 different character sets, a standard one (can't recall its name), Kanji (borrowed from Chinese sets AFAIK), and Katakana (used for "borrowed" words from foreign languages).

The standard one is the only one I've seen in any sort of detail.
The example you give there would almost certainly cause problems for the OCR scanner.

Basically, what this means is, there's more to it than just optical scanning -- one actually has to look at the words themselves to see which character is the more likely candidate.

10 Name: CyB3r h4xX0r g33k 2005-09-19 09:44 ID:Heaven

>>9

>a standard one

Hiragana.

11 Name: Sling!XD/uSlingU 2005-09-19 12:30 ID:06mR4gB7

>>9
Oh I'm sure there would problems to OCR higarana too.
For example っ(+tsu) vs つ(tsu). In quite a few mags, the small tsu character is nearly identical to the big tsu. So I have to find out if the full word exists as a small or a big tsu before I write it down. This is complicated by the fact that all kana are together in Japanese, no space between "words" like in Korean or English.

12 Name: CyB3r h4xX0r g33k 2005-09-19 12:35 ID:nPciuBGl

Well, it figures that with such a context-based language as Japanese you'd have to have an obligatory word processor to go with your OCR.

13 Name: Sling!XD/uSlingU 2005-09-19 12:48 ID:06mR4gB7

Okay, according to http://www.sandg-anime-reviews.net/scanner.htm there are plenty of OCR programs, often given out with the purchase of a scanner.
"There is a multitude of them here in Japan (for Japanese Windows), their price ranging from about $100 to $1300."

But the results are so-so.
"Rather surprisingly, the results with Kanji can be startlingly good, with a very high percentage correct achieved at a very high speed. The results with hiragana/katakana are generally disappointing, with a high rate of errors, not helped by the variable size of kana characters."

And extensive proofreading is needed afterwards, meaning one needs to master the language already.
"I personally think that yes, it is faster to scan in and OCR a Japanese text than to type it in, but it is significantly slower than to do the same with English."

14 Name: CyB3r h4xX0r g33k 2005-09-19 21:37 ID:Heaven

The other way OCR could get around difficult to read characters is looking at the surrounding characters/words.
eg: 大学行_ます。<- so the _ is the missing character. Because of the kanji before, and kana after the only common solution would be:
大学行きます。

15 Name: Sling!XD/uSlingU 2005-09-19 23:13 ID:06mR4gB7

>>14
Aaaah I wish I had a Japanese spell checker.
Google does nothing "Your search - 大学行ます - did not match any documents." while in English Google tries to correct my spelling.
Don't they have spell checkers in Japan? I need to research this.

16 Name: CyB3r h4xX0r g33k 2005-09-20 00:31 ID:Heaven

>>15

Spell checking Japanese I don't think would be an easy task for a machine to do. It's not an easy task even for humans to do imho.

17 Name: CyB3r h4xX0r g33k 2005-09-21 14:18 ID:edwIcp35

>>15
google suggest in japanese might help a little:
http://www.google.co.jp/webhp?complete=1&hl=ja

18 Name: Sling!XD/uSlingU 2005-09-21 15:51 ID:Heaven

>>17 Nope, no correction there either.
大学行ますに該当するページが見つかりませんでした。(page not found)

19 Name: Albright!LC/IWhc3yc 2005-09-22 15:37 ID:04R+w4Ui

Try using the correct particle to make it a full sentence; 大学行きます returns over 600 hits.

20 Name: Albright!LC/IWhc3yc 2005-09-22 15:38 ID:Heaven

Oh yeah... you needed the き part too...

This thread has been closed. You cannot post in this thread any longer.