Web Framework (97)

64 Name: #!/usr/bin/anonymous : 2008-02-14 14:44 ID:i+ITJfDJ

>>63 That's as stupid as insisting you can't use a void* to refer to an int because you don't know that it is infact an int. Look at this another way: Your text files on your hard disk don't have character set tagging and yet you can read from them just fine.

If you're reading utf8 files into strings, you know the bytestring contains utf8. If you're reading shift-jis files into strings you know the bytestring contains shift-jis. You generally isolate all of your charset and locale-awareness into a specific part of your program. You don't need to pepper it all over the fucking place to go wordwrap(s,72).

wchar_t was bad engineering. It convinced a lot of people that you needed another set of string-api, and another kind of string. You don't. Your filesystem doesn't support those kinds of strings anyway, so it doesn't really add any features, or give you any new expressiveness (or conciseness), but it does introduce new strange places to hide bugs.

The fact remains: unicode-support in the language doesn't buy you anything, and costs you a lot. You still need to (as the programmer) be aware of charset-conversion at input and output because that information isn't available from the environment. Two trivial examples that don't exist in reality don't change that. Your isspace() example could simply be called utf8_isspace() because you still need to know what was inputted was utf8 anyway.

Maybe this'd be different if the filesystem encoded charset and locale information reliably. It doesn't though, so you're still tasked (as the programmer) of working primarily in bytestrings, and transcoding explicitly when directed.

This thread has been closed. You cannot post in this thread any longer.