Democratic Underground Latest Greatest Lobby Journals Search Options Help Login
Google

White House blocking site directories called "Iraq" from search engines

Printer-friendly format Printer-friendly format
Printer-friendly format Email this thread to a friend
Printer-friendly format Bookmark this thread
This topic is archived.
Home » Discuss » Archives » General Discussion (Through 2005) Donate to DU
 
chascarrillo Donating Member (1000+ posts) Send PM | Profile | Ignore Sun Oct-26-03 05:31 PM
Original message
White House blocking site directories called "Iraq" from search engines
Edited on Sun Oct-26-03 05:43 PM by chascarrillo
(Sorry for the unwieldly title - if some mod wants to change this to be somewhat legibile, go ahead!)

From: http://condi.topcities.com/whrobots/index.html

Why is whitehouse.gov (the official White House website) disallowing "Iraq" directories from search engine crawling?

<...>

As of Oct 24, 2003 the robots.txt file at whitehouse.gov (you can access the current version here or an archived version, here) is 1631 lines long. There are two blank lines between sections, one line (at the very top) that identifies the file, and 8 lines at the very bottom that are instructions to a user-agent called "whsearch" which appears to be the internal whitehouse.gov crawler. The bulk of the file is the section directed to all external search engine robots /crawlers / spiders, which is 1,620 lines long and has 1,620 "Disallow" statements.

There are 862 instances of the term "text" in the file, which is easily explained because whitehouse.gov generally uses directory paths that end in "text" for printable pages -- the pages that are duplicates of the normal display pages except that they are formatted for printing. It's easy to see why the term "text" appears so often in this file, since disallowing these directories helps lessen the "clutter" in search by excluding the essentially duplicate pages.

There are 783 instance of the term "iraq" in this file, almost all of them appended to paths that already exist in the file. These appear to have been added haphazardly, since the term appears in many path names for which no such terminal "iraq" directory exists, such as:

Disallow: /holiday/2002/barney/iraq

or

Disallow: /kids/eggroll/iraq

However, this robots.txt file does exclude external search engine robots from some 75 directories that actually exist on whitehouse.gov.


<...>

Google's cache (retrieved from Google on 10/26/03, but actual caching date unspecified) of whitehouse.gov robots.txt. I've archived the cache as it is at this writing here . This file is 1579 lines long, with 754 instances of "iraq."

The most current whitehouse.gov file archived at the Internet Archive is from April 16, 2003. This file is 780 lines long, with only 10 instances of the word "iraq."

Sometime between April 2003 and late October, 2003, hundreds of instances of the term "iraq" were added to the whitehouse.gov robots.txt file.
Printer Friendly | Permalink |  | Top
E_Zapata Donating Member (1000+ posts) Send PM | Profile | Ignore Sun Oct-26-03 05:36 PM
Response to Original message
1. Andy? Andy Card? WTF are you up to you clever little devil?
tsk tsk
Printer Friendly | Permalink |  | Top
 
w13rd0 Donating Member (1000+ posts) Send PM | Profile | Ignore Sun Oct-26-03 05:42 PM
Response to Original message
2. So they are atttempting to prevent caching...
...of anything having to do with Iraq. Likely so they can change things at a later time and then pretend "it's always been that way". More mischief from the crooks occupying the people's house.
Printer Friendly | Permalink |  | Top
 
Cocoa Donating Member (1000+ posts) Send PM | Profile | Ignore Sun Oct-26-03 05:57 PM
Response to Reply #2
6. preemptive revisionist history
They got burned with their "Mission accomplished" scrubbing. Didn't someone show that scrubbed page on the Senate floor, or was that something else?
Printer Friendly | Permalink |  | Top
 
LeahMira Donating Member (1000+ posts) Send PM | Profile | Ignore Sun Oct-26-03 05:42 PM
Response to Original message
3. OK, techies...
So I didn't even understand the "for dummies" explanation. In language you would use with your seven year old, please explain what is going on here.

Is the White House blocking access to just stuff coming from the White House or is it blocking everything that anybody puts up? Or do I have no idea what's happening?

Sign me... :dunce:
Printer Friendly | Permalink |  | Top
 
chascarrillo Donating Member (1000+ posts) Send PM | Profile | Ignore Sun Oct-26-03 05:45 PM
Response to Reply #3
4. Just White House stuff
The robots.txt file prevents automated search engine robots from indexing the documents on the site. It only applies to the site in question: in this instance, whitehouse.gov
Printer Friendly | Permalink |  | Top
 
Stephanie Donating Member (1000+ posts) Send PM | Profile | Ignore Sun Oct-26-03 05:47 PM
Response to Reply #4
5. Does it only concern external search engines?
Would you still pull up the text if you searched directly at whitehouse.gov?
Printer Friendly | Permalink |  | Top
 
snippy Donating Member (1000+ posts) Send PM | Profile | Ignore Sun Oct-26-03 06:06 PM
Response to Original message
7. This so typical of these evil little fucks.
Printer Friendly | Permalink |  | Top
 
kentuck Donating Member (1000+ posts) Send PM | Profile | Ignore Sun Oct-26-03 06:35 PM
Response to Original message
8. Anybody else have al Jazeera taken off history or favorites ?
:shrug:
Printer Friendly | Permalink |  | Top
 
kentuck Donating Member (1000+ posts) Send PM | Profile | Ignore Sun Oct-26-03 06:47 PM
Response to Reply #8
9. Sorry.....
It was in DU's history. It does not save the site when entered..
Printer Friendly | Permalink |  | Top
 
DU AdBot (1000+ posts) Click to send private message to this author Click to view 
this author's profile Click to add 
this author to your buddy list Click to add 
this author to your Ignore list Thu Dec 26th 2024, 11:25 AM
Response to Original message
Advertisements [?]
 Top

Home » Discuss » Archives » General Discussion (Through 2005) Donate to DU

Powered by DCForum+ Version 1.1 Copyright 1997-2002 DCScripts.com
Software has been extensively modified by the DU administrators


Important Notices: By participating on this discussion board, visitors agree to abide by the rules outlined on our Rules page. Messages posted on the Democratic Underground Discussion Forums are the opinions of the individuals who post them, and do not necessarily represent the opinions of Democratic Underground, LLC.

Home  |  Discussion Forums  |  Journals |  Store  |  Donate

About DU  |  Contact Us  |  Privacy Policy

Got a message for Democratic Underground? Click here to send us a message.

© 2001 - 2011 Democratic Underground, LLC