Democratic Underground Latest Greatest Lobby Journals Search Options Help Login
Google

White House blocks search engines - Iraq content specifically

Printer-friendly format Printer-friendly format
Printer-friendly format Email this thread to a friend
Printer-friendly format Bookmark this thread
This topic is archived.
Home » Discuss » Archives » General Discussion (Through 2005) Donate to DU
 
DBoon Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Oct-27-03 04:56 PM
Original message
White House blocks search engines - Iraq content specifically
As "reported" in Slashdot:

http://yro.slashdot.org/comments.pl?sid=83767&threshold=0&commentsort=0&tid=103&tid=126&tid=95&tid=99&mode=thread&cid=7322178

Based on:

http://www.bway.net/~keith/whrobots/

It appears the White House public Web site is using a "robots.txt" to stop search engines from cataloging much of its content. Apparently, much of the off-limits content has to do with Iraq. The suspicion is that the White House does not want search engines or other archival directories to keep a history of its statements on Iraq.
Printer Friendly | Permalink |  | Top
Rooktoven Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Oct-27-03 04:57 PM
Response to Original message
1. The referenced link:
Printer Friendly | Permalink |  | Top
 
RobertSeattle Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Oct-27-03 05:02 PM
Response to Original message
2. Wow - check it out!!!
(I'm not a Robot.txt guru)

http://www.whitehouse.gov/robots.txt

Sample:
Disallow: /911/911day/iraq
Disallow: /911/911day/text
Disallow: /911/heroes/iraq
Disallow: /911/heroes/text
Disallow: /911/iraq
Disallow: /911/patriotism/iraq
Disallow: /911/patriotism/text
Disallow: /911/patriotism2/iraq
Disallow: /911/patriotism2/text
Disallow: /911/progress/iraq
Disallow: /911/progress/text
Disallow: /911/remembrance/iraq
Disallow: /911/remembrance/text
Disallow: /911/response/iraq
Disallow: /911/response/text
Disallow: /911/sept112002/iraq
Disallow: /911/sept112002/text
Disallow: /911/text
Printer Friendly | Permalink |  | Top
 
htuttle Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Oct-27-03 05:03 PM
Response to Original message
3. Guess I'm going to have to (ht)dig there myself
I'm sure the crack WhiteHouse I.T. staff is aware of it, but the 'robots.txt' file is more 'advisory' than anything else. It doesn't actually 'prevent' indexing, but most of the major search engines honor it.

Since they went to the trouble, I'll have to make sure to add the whole whitehouse site to my local htdig database!

Printer Friendly | Permalink |  | Top
 
LiberalFighter Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Oct-27-03 05:10 PM
Response to Reply #3
4. Please explain
Edited on Mon Oct-27-03 05:10 PM by LiberalFighter
What does that involve.

I was thinking that if nothing else just download the whole site.

The major search engines should ignore the robot.txt of public government websites.
Printer Friendly | Permalink |  | Top
 
htuttle Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Oct-27-03 05:58 PM
Response to Reply #4
7. htDig is a webcrawler search/indexing engine
Very much like what google or Altavista uses underneath (although both of theirs are custom). (I think bpilgrim has it running at http://globalfreepress.com for searching various things, for example). The nice thing about it is that you just point it at a URL, and it will index the whole site, even a remote site).

The website is here:

http://www.htdig.org/

To get it to ignore the robots.txt file, you need to hack it yourself, since it's considered 'impolite' to not honor the robots.txt file.

That being said, to hack it, merely search through the code that looks for the 'robots.txt' file, and change it to something else, like 'norobots.txt', and recompile (I made that sound easy, didn't I).

Note, this will cause your htdig engine to ignore ALL robots.txt files for all sites.

(obviously the htDig people do not recommend this, since it's rude, and goes against the 'Standards for Robot Exclusion', but hey, this is war...).

Finally, if all of this was unintelligible to you, be aware that it is a fairly technical process to set it all up. It would definitely be easier to just mirror the whole site locally, and search at will in that case.
Printer Friendly | Permalink |  | Top
 
htuttle Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Oct-27-03 06:27 PM
Response to Reply #7
8. And for those of you playing along at home...
Edited on Mon Oct-27-03 06:30 PM by htuttle
Comment out Line 76 in htdig/Server.cc

From this:

robotstxt(doc);
break;

to this:
// robotstxt(doc);
break;


That should do it.

I'll post a link if I get it indexed and up on my website. The modified version just finished linking, but it will take a few hours to index the whole WH site.
Printer Friendly | Permalink |  | Top
 
htuttle Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Oct-27-03 06:54 PM
Response to Reply #8
9. I'm an idiot...
I always do things the hard way.

:eyes:

Instead of building a custom htdig, I just realized that I could simply set this htdig.conf variable, to make the robot look for a different 'robot name'.

Changing this directive in the config file:
robotstxt_name: htdig

to this (or adding it if it's not already there):
robotstxt_name: whsearch

will cause it to use the same, almost non-existant exclude list that the white house's own search engine uses (and includes all the documents), as opposed to the huge, paranoid exclude list mentioned above.

The idea is that your htdig would then appear to be the whitehouse's own search engine, and wouldn't exclude any pages from your index.

Oh well, the custom one I built is already well into indexing...I'm going to let it run and see what I get.
Printer Friendly | Permalink |  | Top
 
denverbill Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Oct-27-03 05:51 PM
Response to Reply #3
6. Yes, please do.
And let us know what you find.

Bush must've made a big speech last year in his big buildup to war, back when he still 'hadn't decided' to go to war (yuck, yuck).
Printer Friendly | Permalink |  | Top
 
JohnOneillsMemory Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Oct-27-03 05:30 PM
Response to Original message
5. Invisible tech wars abound. I know lots of folx whose comp's have slowed.
Whether it is from viruses or spyware, I don't know. But many people I know here in the North Bay area are having slowed web-page problems, both loading and navigating. Being naturally paranoid, we wonder if there aren't regional attacks and roadblocks put on the internet as well.
Printer Friendly | Permalink |  | Top
 
Generic Other Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Oct-27-03 07:34 PM
Response to Original message
10. A simpleton's analysis
The White House takes a speech we all hear on tv full of lies, evasiveness and pure bull all delivered in the president's halting, tongue twisted syntactical style. An aide quickly cleans up the script, repairs the prose and posts the lies online in an "edited" form which makes it sound as if the original words made sense. At a later date, someone else restricts access to the laundered lies which have since formed the foundation of more lies that have only multiplied compounding the original deceit. All of this to hide the trail of a simpleminded fool.

So what are they afraid of? The Boogie man?
Printer Friendly | Permalink |  | Top
 
supercrash Donating Member (412 posts) Send PM | Profile | Ignore Mon Oct-27-03 07:40 PM
Response to Reply #10
11. Hehe
Sounds like a very efficiant Ministry of Truth
Printer Friendly | Permalink |  | Top
 
DU AdBot (1000+ posts) Click to send private message to this author Click to view 
this author's profile Click to add 
this author to your buddy list Click to add 
this author to your Ignore list Thu Dec 26th 2024, 09:55 AM
Response to Original message
Advertisements [?]
 Top

Home » Discuss » Archives » General Discussion (Through 2005) Donate to DU

Powered by DCForum+ Version 1.1 Copyright 1997-2002 DCScripts.com
Software has been extensively modified by the DU administrators


Important Notices: By participating on this discussion board, visitors agree to abide by the rules outlined on our Rules page. Messages posted on the Democratic Underground Discussion Forums are the opinions of the individuals who post them, and do not necessarily represent the opinions of Democratic Underground, LLC.

Home  |  Discussion Forums  |  Journals |  Store  |  Donate

About DU  |  Contact Us  |  Privacy Policy

Got a message for Democratic Underground? Click here to send us a message.

© 2001 - 2011 Democratic Underground, LLC