Web Framework (97)

1 Name: #!/usr/bin/anonymous : 2008-01-14 05:50 ID:EkVUNjls

What should I use for my web application that isn't PHP? I would have gone straight to mod_perl a few years ago, but now that there is so many. Is there anything better?

2 Name: dmpk2k!hinhT6kz2E : 2008-01-14 06:03 ID:Heaven

There are so many -- good luck!

It really depends on what you want to do and what you want to invest. Do you seek good documentation, longevity, ease of deployment, scalability, or explicit control? Python and Ruby in particular have a small army of frameworks that are scattered all about these axes.

The two big ones at the moment appear to be Rails and Django. Other notables are Pylons, Catalyst and Seaside.

mod_perl was never a framework, but can be used as the basis of one. It's a bit like mod_php not being CakePHP.

3 Name: #!/usr/bin/anonymous : 2008-01-14 08:16 ID:Heaven

Ruby on Rails is nifty, but it's pretty slow.
Pylons is good. I really like it, and recommend it to anyone who wants a framework to take care of all the technical stuff but not give up control to it, but you really have to know some Python and be willing to write code in order to make decent use of it.

4 Name: #!/usr/bin/anonymous : 2008-01-14 09:34 ID:9ONFgUdb

>>3

>but you really have to know some Python and be willing to write code in order to make decent use of it.

Frameworks are just like Java, they create dumb programmers. Do your own damn work and learn more, get more controll while you can avoid potential problems in the framework easier and provide support for all your code to the client.

I liked what >>3 said about Pylons though, that's what a framework should do, if anything.

I don't like these big frameworks, it's like we're moving into a time where programmers just write do_guestbook() or do_messageboard(), they're not really evolving that way.

In my opinion, abstract languages or frameworks like these create dumbed down programmers, i wouldn't even call them programmers, i'd call them scripters.

5 Name: #!/usr/bin/anonymous : 2008-01-14 10:27 ID:EkVUNjls

Framework perhaps was the wrong word. What I meant was simply scripting/programing languages.

I'm looking at mod_perl mainly because of it's easy to use page caching functions, which PHP had no support for. Not even the larger projects (Mediawiki, Wordpress, etc) had correct support for conditional GETs.

But I'm willing to take a look at anything with good performance. BTW is there any benchmarks on all the current languages (Ruby, PHP, Python, Perl, etc)? There doesn't seem to be any that are horribly outdated.

Thanks.

6 Name: dmpk2k!hinhT6kz2E : 2008-01-14 17:17 ID:Heaven

> But I'm willing to take a look at anything with good performance.

http://shootout.alioth.debian.org/gp4sandbox/benchmark.php?test=all&lang=all

I decided to write a mini-framework for myself in Lua. Performance was only one reason though because in practice the largest bottleneck by far will be the database (and memcached if you use it), followed by templating.

I generally recommend FastCGI over mod_[whatever] because it isn't tied to a single HTTP server, has better isolation (thus you don't get threading issues like mod_php and MPM Worker in Apache2), and it scales just as well. However, I vaguely recall there being problems with Conditional GETs, so do some research on it first?

7 Name: #!/usr/bin/anonymous : 2008-01-14 17:25 ID:j/yqEPJq

>>5

You can drop Ruby off the list immediately if you're looking for speed, at least. The others are at least comparable in speed, depending of course on the particular task.

8 Name: #!/usr/bin/anonymous : 2008-01-14 21:19 ID:0Illg7NJ

web.py

9 Name: #!/usr/bin/anonymous : 2008-01-15 00:38 ID:Heaven

>>8
Is fugly. Aaron Schwartz halfassed his way through the code just so he could dump something off onto Reddit and make a lot of money on it, and then flake out and leave the Reddit guys on their own to maintain it. He did the same with a couple of other sites as well.

Perhaps one of web.py's most dubious benefits is you can rewrite your app without it fairly easily if you decide later that you don't need it. (Incidentally, I don't think Reddit is even using web.py anymore.)

10 Name: #!/usr/bin/anonymous : 2008-02-03 02:14 ID:fMQv8GHe

>>7

It's not Ruby, it's Rails. Repeat after me, Rails is a slow pig and Ruby is not. Have a look at Merb if you're genuinely interested and not just regurgitating some meme..

Anyway, even if Ruby itself were slower executing than Python and Perl, there are two major factors that preclude execution speed alone from eliminating Ruby: YARV and Caching.

I'll leave Ruby 1.9/2.0/YARV/Rubinius/JRuby for the truly interested, but just know that within 2008, Ruby's raw execution speed will be equal to or surpass that of Python.

Of course if you just cache properly in the first place, execution speed won't mean dick, and then you can spend your time optimizing your web (nginx++) and proxy servers or the like.

So.. with all that said, Ruby really is a fucking blast to develop with. :-)

11 Name: #!/usr/bin/anonymous : 2008-02-03 02:51 ID:VBqqpniK

ARC is wonderful. I'm wary of new lisps because I've been spoiled by the good ones, but ARC is positively wonderful. Think of it as a lisp designed specifically for web applications.

12 Name: #!/usr/bin/anonymous : 2008-02-03 03:25 ID:Heaven

>>11

For web applications that use <table>s and do not handle character sets?

13 Name: #!/usr/bin/anonymous : 2008-02-03 04:01 ID:Heaven

>>12
It can't be that bad??

14 Name: dmpk2k!hinhT6kz2E : 2008-02-03 08:33 ID:Heaven

> Of course if you just cache properly in the first place, execution speed won't mean dick

Only for small sites with no dynamic content.

> optimizing your web (nginx++) and proxy servers or the like.

A microoptimization in the wrong place.

I'll never understand most web developers. They're like people who use V-tec stickers on a hatchback.

15 Name: #!/usr/bin/anonymous : 2008-02-03 09:16 ID:Heaven

>>14

>A microoptimization in the wrong place.

Caching and front end optimization are critically important, don't even try to say otherwise. And anyway, don't get so tripped up on that one comment. I was just using it as an example of not being execution bound (which nobody in the real world is...), you could easily replace web/proxy in that sentence with database and memcache.

Me? I'll never understand Asperger syndrome.

16 Name: #!/usr/bin/anonymous : 2008-02-03 12:29 ID:A7jyxFQb

>Only for small sites with no dynamic content.

That's a misconception. A site doesn't have to be 100% static to benefit from proper caching. For instance, say you have some information that updates hourly. Do you follow the dmpk2k strategy of saying that caching is a waste of time and hit the system every request, or do you follow a sensible strategy of properly setting the cache parameters so that caches know to keep their own copy for the correct period of time?

17 Name: #!/usr/bin/anonymous : 2008-02-03 13:12 ID:Heaven

>>13

> Which is why, incidentally, Arc only supports Ascii.
> ...
> Arc embodies a similarly unPC attitude to HTML. The predefined libraries just do everything with tables.

http://www.paulgraham.com/arc0.html

18 Name: #!/usr/bin/anonymous : 2008-02-03 18:05 ID:Heaven

>>17

Well, from an otherwise respectable source, that just seems like an odd and IMHO wrong set of decisions. One of his motivation is to not get in the way of working HTML, and a bunch of useless tables is better than pure semantic structure!?

19 Name: dmpk2k!hinhT6kz2E : 2008-02-03 18:18 ID:Heaven

>>15
Reread what you're replying to in >>14, please. That specific line is about Nginx. Do you seriously think Nginx as a reverse proxy for mongrels driving some Ruby framework is going to make one wit of difference?

Except that a lot of people in the Ruby webapp community get in a lather about it. Methinks they've never run anything beyond a minuscule site -- or written any C.

>>16
Is that what passes for dynamic? It's not changing per request -- if you dumped it to a static HTML file it would behave the same. In that case of course Squid and Cache-Control can help.

Back in the real world, where frameworks are usually used to change what's served to each visitor, caching won't be so great a help. You can cache images and CSS all you want, but serving them wasn't expensive on the CPU in the first place compared to the actual webapp.

20 Name: #!/usr/bin/anonymous : 2008-02-03 18:53 ID:Heaven

>>19

>Do you seriously think Nginx as a reverse proxy for mongrels driving some Ruby framework is going to make one wit of difference?

I have that exact setup here, what are you asking?

>Back in the real world, where frameworks are usually used to change what's served to each visitor,

Are they, really? Pick any site from Alexa's top 100 and see that the vast majority of those pages aren't customized. In fact most of the sites make an extreme effort to avoid customization (no usernames, generic links, etc.).

>caching won't be so great a help.

So what happens when we're serving 1000 Req/s and we cache 1/2 of a page in memory, for a even as little as a single second? What did we just do to our execution time (aka. server load ;-)

21 Name: dmpk2k!hinhT6kz2E : 2008-02-03 19:54 ID:Heaven

> I have that exact setup here, what are you asking?

Use top to determine the difference between the HTTPD and the webapp. Nginx is probably hovering somewhere around 0%. I don't understand why many Ruby webapp developers care so much about their reverse-proxy [Apache/Lighttpd/Nginx/whatever it is now] when it won't make a difference?

> Pick any site from Alexa's top 100 and see that the vast majority of those pages aren't customized.

I can't argue with that. At that scale everything changes -- they usually have their own custom frameworks (probably Jaaaavaaaa J2EE enterprisey), the most heavily hit pages are static, they use a CDN, and so on. Ignoring bandwidth, one reason they have to do that is because execution time matters, although it's probably their databases they're worried about.

> So what happens when we're serving 1000 Req/s and we cache 1/2 of a page in memory

A good idea, as is using memcache and the whole spiel. Your execution time will still mean something though -- the load on app servers comes from somewhere. Also, with a slower language some problems become less feasible. I saw a presentation where a person was using divide-and-conquer in a Google Maps mashup, and had to rewrite that part in C. Wouldn't it be nice if that wasn't necessary?

Actually, my biggest beef with MRI (and presumably YARV) isn't that it's slow -- the real bottlenecks in webapps lie elsewhere, particularly the problem of scaling the database. No, I take issue with the GC marking object directly, thus preventing the OS from using copy-on-write to any real effect. So you're stuck with ten or so mongrels per box because few pages are shared across processes.

22 Name: #!/usr/bin/anonymous : 2008-02-03 20:20 ID:VBqqpniK

The whole complaint about <tables> and unicode is stupid. The programmer doesn't deal with either of those things in arc. The fact that the programmer has to deal with those things in other languages points out what's stupid about those languages, and isn't infact exposing anything about arc.

23 Name: #!/usr/bin/anonymous : 2008-02-04 00:47 ID:j/yqEPJq

>>22

You... you aren't making enough sense for me to even start responding to that.

Could you try to explain that one more time?

24 Name: #!/usr/bin/anonymous : 2008-02-04 02:05 ID:Heaven

>>21

>I don't understand why many Ruby webapp developers care so much about their reverse-proxy [Apache/Lighttpd/Nginx/whatever it is now] when it won't make a difference?

Again, try not to stumble on the Nginx thing. I use it for content expiration, url rewriting, and output compression, nothing more. And for me at least, those three things make a huge difference..

25 Name: #!/usr/bin/anonymous : 2008-02-04 21:33 ID:Heaven

>>23 The programmer of an application doesn't write HTML in lisp-languages. They don't use templates either. They don't generally deal with unicode transformations, and they don't worry so much about such low-level things.

Did you understand that?

A Python programmer has to be aware that the unicode text transformation occurs transparently and automatically and you need to remember to normalize and sanitize your inputs. A PHP programmer has to be aware of the various ini-file magicks and to be dilligent to make sure your HTML tags balance.

A lisp programmer doesn't do these things. The people complaining about <table> and unicode think that because they deal with these things every day that they need to deal with them. They don't; their programming language is just stupid. Lisp is better because you can make it less stupid.

Take tables for example: You emit effectless semantics with div and class and then "pretty those things up". That prettying up takes time away from buildin your app. Using tables gives you your app faster, and you can still "pretty those things up".

Or take unicode: When someone posts a unicode string, why do you care that it's unicode? You're either going to save it, display it, or ignore it. You only care if it's going to cause special effects so simply escape it and move on. You need to elipsize it? Make a utf8_elipse function. You don't need the language to automatically transform unicode for you in order to deal with unicode strings.

A lisp programmer writes these things if they need to. If it comes up twice, they macro it. If it comes up zero times, they don't bother.

26 Name: #!/usr/bin/anonymous : 2008-02-04 22:57 ID:Heaven

>>25

>Using tables gives you your app faster, and you can still "pretty those things up".

And that's why we can't have nice things.......

27 Name: #!/usr/bin/anonymous : 2008-02-05 03:52 ID:Heaven

>>25
"What" to the 50th power.

28 Name: #!/usr/bin/anonymous : 2008-02-05 04:12 ID:Heaven

>>27 "What" will have to be qualified. I'm not stuttering. If you don't understand something, you will have to say what in particular.

29 Name: dmpk2k!hinhT6kz2E : 2008-02-05 06:28 ID:Heaven

> When someone posts a unicode string, why do you care that it's unicode?

I've found that sorting and shortening strings is useful. I like my regex working too.

30 Name: #!/usr/bin/anonymous : 2008-02-05 12:25 ID:vwnO7WC3

>>25

> Or take unicode: When someone posts a unicode string, why do you care that it's unicode? You're either going to save it, display it, or ignore it.

Or, you know, process it. Like you do in a programming language. You take input, and you process it, and you produce output. And you can't process a string without understanding its character encoding.

Look, you can ignore the character set in every other language too. You don't have to know it. This is not a feature. Arc or Lisp is not better than everybody else because you can only deal with the equivalent of char pointers. It just means you're on the same level as C code.

> Take tables for example: You emit effectless semantics with div and class and then "pretty those things up". That prettying up takes time away from buildin your app. Using tables gives you your app faster, and you can still "pretty those things up".

In a similar vein, you can output shitty HTML in every other language, too. But that's not really a feature either, is it?

31 Name: #!/usr/bin/anonymous : 2008-02-05 12:51 ID:A7jyxFQb

What? Using tables doesn't give you the app any faster than leaving it in unstyled divs. Additionally it makes it harder later on when you do want to reorder the blocks, and it slows down rendering time.

And indeed, without knowing the encoding there isn't much you can do with a string. You can't even find the first character in it without knowing the encoding. You can't even uppercase it or lowercase it. You can't reverse it (but then again who does, honestly...)

If a language is going to remove the need to think about encodings, the best way for it to go about it is to expose everything as unicode characters in the first place. Java is half way there but even a String in Java may have a length() different from the count of code points in the string. :-/

32 Name: #!/usr/bin/anonymous : 2008-02-05 17:26 ID:p5s4fl90

>>31
javascript is already there. having to write code to convert strings to shift jis for http://hotaru.thinkindifferent.net/trip.html from scratch because no one had ever had to do it in javascript before was a little annoying...

33 Name: #!/usr/bin/anonymous : 2008-02-05 17:40 ID:Heaven

>>31 Wrong.

(tab (map row data))

is shorter than:

print "<div class=\"data\">"
for i in data:
print "<div class=\"row\">"
for j in i:
print "<div class=\"col\">",j,"</div>"
print "</div>"
print "</div>"

And no, it doesn't slow rendering time. All browsers render tables faster than float or grid views made from divs.

> without knowing the encoding there isn't much you can do with a string.

You're chasing phantoms here. The client sends data using ISO-8859-1 or UTF-8. Your <form> has a hidden input field in it called "charset_detect" that contains a byte-representation that is different between ISO-8859-1 and UTF-8 (like &nbsp;). You then use this information to upcode ISO-8859-1 into UTF-8. That sounds like library code to me.

Once there, you can normalize it if you like. You can also compose the characters. This requires a database of codepoints so it too sounds like library work.

You can compare the string as bytes, and you can compare a substring. This is easier with bytes because you don't have surprise recodings!

Collating? Sorting? Upcasing? Downcasing? These are always library routines because they involve a database. They don't need to be built in to the language.

Look at Java's mistake as a prime example of how not to do unicode. By thinking unicode would always fit into 16 bits, they made it look like it's going to work most of the time, but fail in subtle hard-to-test ways.

Character sets was a bad hack that interoperability layers need to deal with. Those interoperability layers best belong in library code so they can be improved separately without requiring secret knowledge of the object internals- and painting yourself into a corner like Java and Win32 did.

34 Name: #!/usr/bin/anonymous : 2008-02-05 17:46 ID:i+ITJfDJ

>>29

> I've found that sorting and shortening strings is useful. I like my regex working too.

Good. Let me know when a language supports unicode that does that.

Here's a hint: Where does ÞORN get sorted? After Z? Between T and U? After TH but before TI? Mix with TH? Sorted as Y? Mixed with P? Transliterated as TH? Transliterated as T?

By you thinking your language supports unicode, you write code that doesn't handle these cases. Your program will suddenly generate an error when faced with this data and your user will be unhappy.

On the other hand, by simply treating everything as bytes you know exactly how involved you are and need to be. You can avoid algorithms that depend on sorting characters (which seems to be locale-specific) and you can avoid algorithms that change case (which also seems to be locale-specific). That's becuase you're supposed to be avoiding these things anyway. Your language has made you lazy and stupid and the way out ISN'T to just be more careful- to just try harder. It's to stop worrying about this crap altogether.

If someone ever figures out how to do unicode right, or if this were an easy thing, I could possibly agree, but it isn't. Unicode is really fucking hard, and nobody has gotten it right.

35 Name: #!/usr/bin/anonymous : 2008-02-05 18:22 ID:Heaven

> Your program will suddenly generate an error when faced with this data and your user will be unhappy.

Hmm. Let's see here...

>>> a = [u'ÞORN', 'PORN', 'YARN', 'ZEBRA', 'TOUHOU', 'TANK', 'PRALINE', 'PAGAN', 'THEME', 'TITMOUSE']
>>> a.sort()
>>> for i in a: print i

...
PAGAN
PORN
PRALINE
TANK
THEME
TITMOUSE
TOUHOU
YARN
ZEBRA
ÞORN

Oh look, no error.

And suppose you wanted to make an alphabetical index:

>>> for i in sorted(set(i[0] for i in a)): print i

...
P
T
Y
Z
Þ

Still no error, and it works fine. Now if you were just blindly manipulating strings, as you suggest, you would have a problem, because you'd be dumping the first byte of a multi-byte character. But you're right! Why would people need to see the entire letter anyway? They can just guess.

36 Name: #!/usr/bin/anonymous : 2008-02-05 18:28 ID:6za5PNDF

>>34
At least in Python and Ruby, you can redefine the comparison operator for any object, strings included. Whatever you put in there affects all sorting operations. So you can put thorn wherever you want.

And in Perl, I've had to get UTF-8 straight in order to process database output. I had to convert certain columns to uppercase. And we have another app where the character-length (as opposed to byte-length) of strings is very important.

And I just dabble in internationalization, really. For all I know, there are more elegant solutions than the one I used. And you might ultimately be right; in some cases, you don't care about the encoding, you're just shunting bits to and fro. I, for one, want to make sure my app always knows what kind of data it's dealing with.

37 Name: #!/usr/bin/anonymous : 2008-02-05 19:25 ID:i+ITJfDJ

>>35 Þ sorts differently in different languages.

> Now if you were just blindly manipulating strings, as you suggest, you would have a problem,

Read my post again. I didn't say anything about blindly doing anything: I actually said the exact opposite.

>>> print sorted(file("test.txt","r").readlines())
['PAGAN\n', 'PORN\n', 'PRALINE\n', 'TANK\n', 'THEME\n', 'TITMOUSE\n', 'YARN\n', 'YOUHOU\n', 'ZEBRA\n', '\xc3\x9eORN\n']

What's the encoding of this file again? I know, let's assume utf-8!

>>> for i in file("test.txt","r").readlines(): print i.decode('utf-8')

Well that seems to work. Let's just hope users never actually control the contents of test.txt:

>>> for i in file("test.txt","r").readlines(): print i.decode('utf-8')
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.5/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-1: invalid data

It happens all the time. People do a lot of work, then some user posts actual unicode where it isn't expected and through the magic of transcoding the entire database is hosed.

38 Name: #!/usr/bin/anonymous : 2008-02-05 19:35 ID:i+ITJfDJ

>>36

> At least in Python and Ruby, you can redefine the comparison operator for any object, strings included. Whatever you put in there affects all sorting operations. So you can put thorn wherever you want.

Holy crap! How is redefining the comparison operator have anything to do with what I'm saying?

sorting is used for lots of things- not just presentation. For example it's often used for uniqueness verification(!) There it doersn't matter what order is actually produced just so long as it's stable. Having the built-in sort() operator depend on the locale settings means that you can't safely share sorted data between user-sessions.

That sounds like an almost invisible bug that nobody would ever notice.

And none of you have even suggested a single reason why it needs to be in the language. I've given plenty of counter examples, but the best you've got is "oh you might want to know what the first character is". That's stupid. You don't need all these bugs and problems and danger-areas just to get that.

> And I just dabble in internationalization, really. For all I know, there are more elegant solutions than the one I used. And you might ultimately be right; in some cases, you don't care about the encoding, you're just shunting bits to and fro. I, for one, want to make sure my app always knows what kind of data it's dealing with.

Really, I'm only arguing that this is complicated. That there isn't anything easy about "adding unicode" and that the expectation is ridiculous. Everybody is perfectly fine using languages with no unicode support, and the people that do use unicode-enabled languages frequently have subtle bugs that they don't notice until much later. Getting it right is hard- and I say that because nobody has gotten it right yet; there isn't a language with unicode support where that unicode support isn't so filled with implementation-specific gotchas.

It's not just languages either- protocols do it too. HTTP insists the default charset is iso-8859-1 even though mime says it's text/plain. That means you can't save an HTML document unless your filesystem is character-set aware. How stupid is that!?

Seriously. Really smart people fuck this up totally. To blame arc for not getting it right on a version-0 when it's hard, and not altogether important to begin with, it's just missing out.

I swear, the first language to support unicode correctly will xor all of the code points with 0x1234 just to make sure values that aren't byte-packed will actually get tested...

39 Name: dmpk2k!hinhT6kz2E : 2008-02-05 20:00 ID:Heaven

>>34

> Good. Let me know when a language supports unicode that does that.

Because almost nothing out there supports Unicode 5.0 100% we should stick to octets? Is this the Better is Better philosophy, how not to win big?

> That's becuase you're supposed to be avoiding these things anyway.

The things you've just so casually dismissed are things that I've had to implement the last project I worked on. I really can't imagine "we're supposed to be avoiding these things" will go over well with management when it's part of the spec; the site is supposed to be international, after all.

> To blame arc for not getting it right on a version-0 when it's hard, and not altogether important to begin with, it's just missing out.

Arc is built on MzScheme. MzScheme supports Unicode, but Arc does not. What is wrong with this picture?

40 Name: #!/usr/bin/anonymous : 2008-02-05 20:06 ID:Heaven

>>37
AHow about learning a few things about the language before you start bashing it?

And since in this scenario we're taking user input, let's be a bit lenient with broken input data, too. Because, you know, Python allows you to do that.

>>> import codecs
>>> f = codecs.open('test.txt', encoding='utf-8')
>>> print sorted(f.readlines())

[u'PAGAN\n', u'PORN\n', u'PRALINE\n', u'TANK\n', u'THEME\n', u'TITMOUSE\n', u'TOUHOU\n', u'YARN\n', u'ZEBRA\n', u'\xdeORN\n']

Oh wow, imagine that. I got Unicode data out of it, without having to screw around with .decode() on every damn string.

Now supposing the character has a couple of broken characters in it, I could add errors='replace' to the open() call, and I'll get back Unicode data with the (standard) Unicode replacement character, instead of garbled crap. Not the ideal solution, but the ideal solution would be for nobody to have invalid characters in the first place. Ignoring broken characters doesn't make them go away, but handling them properly will, and as an added bonus, if you want to let your users know that their data might be corrupt, you can do that. Not so if you're just shoveling raw bit strings around.

And if you really have no idea what encoding a file is using, try this: http://chardet.feedparser.org/

> Read my post again. I didn't say anything about blindly doing anything: I actually said the exact opposite.

How is your statement -- "the way out ISN'T to just be more careful- to just try harder. It's to stop worrying about this crap altogether." -- not equivalent to "don't bother to handle character encodings"?

41 Name: #!/usr/bin/anonymous : 2008-02-05 20:26 ID:i+ITJfDJ

> The things you've just so casually dismissed are things that I've had to implement the last project I worked on. I really can't imagine "we're supposed to be avoiding these things" will go over well with management when it's part of the spec; the site is supposed to be international, after all.

What are you saying? Either your language has a locale-sensitive sort() or it doesn't. Both are wrong for something, at least when you don't pretend bytes are bignums sort() is simple and fast.

As soon as you need a sort() that can actually handle multiple languages (a sort for presentation) you need one that accepts locales as well, and frankly your language doesn't have such a beast so you have to write it yourself anyway. You simply can't use the builtin sort() operator for this, so why have a builtin sort() that is slower than it has to be?

Case-folding is another one. Why bother having a str.islower() that gives the wrong answer for unicode? Why bother having a str.lower() that is wrong for unicode?

> Arc is built on MzScheme. MzScheme supports Unicode, but Arc does not. What is wrong with this picture?

That MzScheme doesn't support unicode correctly? What's your problem?

You want arc to support something poorly just because mzscheme does?

What's the point of striving for the most powerful language if you're just going to shit on it with something so obviously important and so obviously complicated that nobody seems to be able to get it right?

42 Name: #!/usr/bin/anonymous : 2008-02-05 20:38 ID:i+ITJfDJ

>>40

> Oh wow, imagine that. I got Unicode data out of it, without having to screw around with .decode() on every damn string.
>>> import codecs
>>> f = codecs.open('test.txt', encoding='utf-8')
>>> print sorted(f.readlines())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.5/codecs.py", line 626, in readlines
return self.reader.readlines(sizehint)
File "/usr/lib/python2.5/codecs.py", line 535, in readlines
data = self.read()
File "/usr/lib/python2.5/codecs.py", line 424, in read
newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 57-58: invalid data

What was your point exactly?

How about you learn a few things about the language before you support it?

> Now supposing the character has a couple of broken characters in it, I could add errors='replace' to the open() call

So you're saying that you'd rather destroy user's data than do the right thing?

Seriously, what if I had put an 0xC2 0xA8 in there instead? That's valid to both utf-8 and iso-8859-1 and won't raise an error. You'll just get gibberish.

> and I'll get back Unicode data with the (standard) Unicode replacement character, instead of garbled crap.

I.e. You garble the entire string.

> Not the ideal solution, but the ideal solution would be for nobody to have invalid characters in the first place.

No, the ideal solution depends on why you're transcoding it in the first place.

> Ignoring broken characters doesn't make them go away, but handling them properly will,

There are no broken characters. The example I gave was simply in an unknown encoding. You would rather destroy the user's data, and I find that utterly sophomoric.

> and as an added bonus, if you want to let your users know that their data might be corrupt, you can do that. Not so if you're just shoveling raw bit strings around.

Yeah. Shoveling arrays of bignums around that you can't do anything with is so much better.

43 Name: ☆ゆたか☆ : 2008-02-05 20:54 ID:6EmpZiO7

おはようございます!!

44 Name: dmpk2k!hinhT6kz2E : 2008-02-05 21:12 ID:Heaven

> That MzScheme doesn't support unicode correctly? What's your problem?

I would rather have some support than no support. That's where we differ. You're welcome to perfection, but you're using a dynamically-typed language and I doubt you're formally proving your code.

Having said that, it looks like Arc does provide for Unicode, at least so long as it's sitting on MzScheme. Good enough.

> Either your language has a locale-sensitive sort() or it doesn't.

Or maybe it works most of the time for the expected data. I'll make do with a few customers sometimes seeing partially unsorted data rather than all customers seeing completely unsorted data or throwing out a useful site feature. Almost all software development is about good enough.

> What's the point of striving for the most powerful language

How old is Ikarus or Termite Scheme? Clojure? Factor or Cat? They're more interesting than Arc and all are younger.

45 Name: #!/usr/bin/anonymous : 2008-02-05 22:01 ID:Heaven

> How about you learn a few things about the language before you support it?

First you randomly picked an encoding (utf8), and then claimed that, because you picked the wrong encoding, Python's unicode implementation is broken? In a perfect world there would only be one representation of the data on disk, but we have to deal with a lot of different encodings. That's when you make a decent effort to guess, and give the user a prompt in cases where you can't identify the data. You seem to be assuming that users never want to read the data, and that all you're doing is tunneling it from one place to another, which is perfectly fine if all you're doing is writing a proxy script or the like. However, at some point the user is going to want to look at the data, and there's exactly no way at all to present it if you don't know what format it's in. After all, if you don't know the format, you can't manipulate anything, and unless your goal is to reimplement 'dd', what use is your program?

> So you're saying that you'd rather destroy user's data than do the right thing?

If it's supposed to be utf8, but it's got broken characters, it's already been destroyed. Unless your definition of the right thing is making a file-copy command that writes out exactly what it reads in, you're making assumptions about the input data that you can't verify without handling the encoding to some degree. What if the input is actually utf-16? Then suddenly all your functions that iterate line-by-line will destroy the data because the EOL is two bytes, and depending on how well you're handling the rest of the data, you might end up with nothing at all -- since utf-16 contains null bytes within the string itself.

You still have yet to supply anything to support your argument. All you're doing is hand-waving.

46 Name: #!/usr/bin/anonymous : 2008-02-05 23:32 ID:Heaven

>>45

> You still have yet to supply anything to support your argument. All you're doing is hand-waving.

I was going to say the same thing to you.

You haven't demonstrated what having unicode in the language is good for.

I've demonstrated that it's a good way to hide bugs.

>>44

> I'll make do with a few customers sometimes seeing partially unsorted data rather than all customers seeing completely unsorted data or throwing out a useful site feature.

You don't get it. You're making do with that because your language's sort operator has to work on its strings and it's sort operator cannot be useful to humans. You need a human-friendly sort, and you need to be aware of locales.

This is complicated stuff. Saying "mzscheme has it so arc has it" is naive. mzscheme's unicode support isn't better than anyone elses and it causes problems.

47 Name: dmpk2k!hinhT6kz2E : 2008-02-05 23:41 ID:Heaven

> You don't get it.

Probably not. My understanding so far is that you're arguing that I should make no attempt to sort at all until it's perfect. Yes/no?

> You're making do with that because your language's sort operator has to work on its strings and it's sort operator cannot be useful to humans.

Could you rephrase this?

This thread has been closed. You cannot post in this thread any longer.