archive.org appreciation. (5)

1 Name: 404 - Name Not Found 2005-04-27 14:34 ID:Heaven

Archive.org, and the "wayback machine" (silly hippies) are a big foundation that mirror the internet. They are nice enough to comply with robots.txt, however, in sheer boredom tonight and talking to people on ICQ, I found some websites that archive.org have kept mirrors off, for all of us.

Before I continue, these links are obviously NOT FUCKING WORKSAFE PEOPLE

Firstly, we have goatse, who lost their domain (thanks christmas islands, ho ho ho to you too)
http://web.archive.org/web/*/http://www.goatse.cx
Lemonparty: http://web.archive.org/web/20040210024601/http://lemonparty.org/
Bakla: http://web.archive.org/web/20030801170659/http://www.bakla.net/
And hell, why not. TUBGIRL : http://web.archive.org/web/*/http://www.tubgirl.com
So, you can see posters, that archive.org has a real copy of the internet. Who needs censorship?

2 Name: 404 - Name Not Found 2005-04-27 15:02 ID:Heaven

Our ancestors:

http://web.archive.org/web/*/http://world2ch.net

Cool world2ch threads can be found here: http://wakaba.c3.cx/soc/kareha.pl/1099711854

> that archive.org has a real copy of the internet.

Except they respect robot.txt :-(

3 Name: 404 - Name Not Found 2005-04-27 17:39 ID:ojVznUEM

what is robot.txt?

4 Name: 404 - Name Not Found 2005-04-27 18:26 ID:Heaven

>>3

"The robots exclusion standard or robots.txt protocol is a convention to prevent well-behaved web spiders and other web robots from accessing all or part of a website. The information specifying the parts that should not be accessed is specified in a file called robots.txt in the top-level directory of the website."

http://en.wikipedia.org/wiki/Robots.txt

This thread has been closed. You cannot post in this thread any longer.