How do I regex non-Latin alphabets?

How do I regex non-Latin alphabets? (4)

1 Name: 4n0n4ym0u5 h4xx0r : 2008-03-29 17:19 ID:6t89q39/ [Del]

If I wanted to, say, look for katakana in a block of Japanese text, or look for men and women of the same family in a Cyrillic passage (because Russian surnames change with gender), how would I do that?

2 Name: 4n0n4ym0u5 h4xx0r : 2008-03-30 11:35 ID:Heaven [Del]

It probably depends on the regular expression parser, but with Java's you can use \p{InKatakana} to match a single character in the katakana block (substitute Katakana for the name of whatever block you need, it seems to be case insensitive too.)

3 Name: 4n0n4ym0u5 h4xx0r : 2008-03-30 19:13 ID:LfNbRF7m [Del]

It doesn't matter you just use the UTF character.

4 Name: 4n0n4ym0u5 h4xx0r : 2008-04-01 13:04 ID:PttmDpY7 [Del]

Certainly for simple tests you can just iterate over the characters, yes. For anything more complicated having regular expressions is useful.

	Return Entire thread Last 50 posts
	This thread has been closed. You cannot post in this thread any longer.