This is a carry-over from the following thread:
http://www.democraticunderground.com/discuss/duboard.php?az=show_topic&forum=104&topic_id=495211 In that thread I was trying to make two points -- first, that if you compared the "no" votes on the recall question to the votes for Arnold on the candidate question, that Arnold's "lead" was almost nonexistent. My second point was that even this "lead" was not valid because an unknown number of voters voted "no" on the recall and also voted for Arnold on the candidate question, which indicates that their first preference was to keep Gray Davis in office and it was only their second choice to have Arnold replace Davis. If that unknown number of voters (which might be estimated through exit poll data) was greater than the number of votes of Arnold's "lead" then that showed that a plurality of voters preferred keeping Davis to replacing him with Arnold and that it was only the inherently unfair format of the election that made it seem otherwise.
When I first crunched the numbers, there were 96-97% percent of precincts reporting (I didn't note the exact percentage but I recall it was 96 point something). This was at 5:58 Pacific Time. Arnold was leading "no" votes by only 38,000 votes, or 1%. It was certainly plausible at that time that Arnold's "lead" would disappear if you mentally subtracted the number of votes that overlapped with a "no" vote on the recall. Over 42,000 people voted for Huffington even though she dropped out of the race, so 38,000 voters opposing the recall but choosing Schwarzenegger as their next choice was not beyond believability.
Anyway, I wanted to keep my numbers updated as more precincts reported in. That's when I noticed that the numbers for Arnold were starting to change. (See post #12 in referenced thread.) By four hours later, his "lead" over "no" votes had become 3.3%. I subtracted the earlier numbers from the later numbers to determine the makeup of just those votes that had been reported in the four-hour time span. They differed dramatically from the data for the first 96-97% of precincts reporting. I made the following observations:
"Does it seem plausible that these later votes coming in would be so different from the first 96% or so of precincts reporting? This is a sincere question and not an accusation. I know just a little about statistics and even less about how votes are reported in elections. I realize that this is not a random sample and that there may be logical reasons for late-reporting precincts to differ from early-reporting precincts. Maybe they use different voting technologies that take different amounts of time to process votes. Or maybe larger precincts take more time than smaller. But it still seems like a very large difference to me. Because you are comparing a quarter-million votes against 7 or 8 million votes, you would expect the larger sample to be more "stable" (I don't know the statistical term). But this quarter-million was able to make a noticable difference in the later votes compared to the earlier votes. Anyone with any knowledge of statistics who can shed some light here? Also, does anyone know what determines the order of how precincts report? Are conservative areas traditionally the last to report?"
I'd still like to know if anyone can answer any of those questions. I know that Bev posted my data on the BBV forum. Anyway, I'm going to post the final numbers with 100% of precincts reporting, compared to the data with 96-97% reporting, and finally a look at just the votes for the last 3-4% of precincts reporting. I'll do that in a separate post so that I can use plain text and have the columns line up better...