Web Framework (97)

74 Name: #!/usr/bin/anonymous : 2008-02-16 13:36 ID:ONvOLVru

>>72 I most certainly did not! I said you can use utf8_* function if you know the content is utf8 and it matters. If It doesn't matter, don't transcode it. Don't even look at it! Most of the things you want to do with a string are the same when treating it as a bytestring. The special cases are: locale-sensitive compare, and a character-sensitive elipsing/wordwrapping. If you're writing routines to do this over and over again then yes, you should have it in your language. However if you're not, then why are you translating to bignum arrays all the time? Why is substr so slow?

If you think there are other special cases, I'd love to hear about them. Nobody seems to post any of them here.

>>73 "Almost all" situations? I was specifically talking about serialization, but platform-default encoding is a better example.

What exactly is the platform-default encoding anyway? When you save html files to a windows pc, do you convert it to us-ascii? Or do you violate MIME and at least avoid destroying data by converting to to the current codepage, and god-forbid the user change it?

On a Linux pc, what exactly is he default coding? Or a mac?

If there were a meaningful default and lossless coding it might be useful to operate this way, but as it is, the "default" IO often simply destroys data, and nobody ever notices it which I think makes it exactly the topic of conversation: Unicode hides bugs.

Unicode doesn't solve anything in programming languages because the messy nonsense is in locale-specific things and in system-specific things and history demonstrates that programming languages can't really solve either of those things. Because of that I contend that unicode in the language is simply a place to hide bugs and unexpected gotchas, for no real benefit.

http://www.moserware.com/2008/02/does-your-code-pass-turkey-test.html

brought up this exact topic- although to very different conclusions. The author suggests you pepper your code with NumberFormatInfo.Invariant and StringComparison.CurrentCultureIgnoreCase. Using strcmp for asciiz strings and strcoll when you're comparing user input seems fine to me. If the environment is unicode, it had to fit into a bytestring anyway for it to get into argv. As you can see the cool thing is that you get better i18n support without putting unicode in your language because thinking in terms of characters being the primitive part of IO is what's wrong.

This thread has been closed. You cannot post in this thread any longer.