What does this methodology predict for out of sample data? Using this method to predict 2008 with averages through 2004 and registration from 2008, it underestimates Obama’s 2008 vote by 7% and overestimates McCain’s by 22%.
There is a lot of variance in the Cuyahoga registration numbers and in the percent of independents captured by the respective parties. The standard deviation in registration numbers for Dems is 26% of the mean, for R’s it’s 19% of the mean, and I’s its 24% of the mean. For the % of Independents captured, the standard deviation for dems is 9% of the mean and for R’s its 14% of the mean.
I’m not a statistician, but I don’t think this model has any meaningful predictions to make since the numbers it’s based on are themselves highly variable due to factors not used by the model, like’s what’s happening in the world and whether the candidate is good or terrible. Extrapolating such a weak model from one county over the entire state makes even less sense.
Any competent statisticians or modelers out there care to comment?
LS has been doing excellent work too- he and Ravi are digging down and showing us the facts.
Was a professional math man for 16 years (actuary).
It is impossible to make firm predictions, due to Rs in 2008 voting for Hitlery in the primary, same for Demonrats for Romey/Akin in 2012, etc. Throw in a Demon SoS in 2008, vs. a R in 2012, and I get a 10 point swing, at least, modified by demographics giving Obama another 2, net-net Romney by 2 or more. Standard dev on the ground Oct-Nov is about 2.4% on this, so say 80% chance Romney wins Ohio, 78% chance Romney wins. But don’t bet on PA, unless Philly has a near-honest vote.
I am convinced that the only way to poll this election is to only call people who voted in 2008, by using their voting history information ... then screen for who is planning to vote this year ... voting this year, which cuts down and changes model. Then model the new voters, the ones who did not vote in 2008.
4 group of voters:
yes ‘08, yes ‘12
yes ‘08, no ‘12
no ‘08, yes ‘12
no ‘08, no ‘12
then you can weight by horserace McCain vs Obama to see if the first 2 groups give you the correct number by actual ‘08 vote.
this is not simply a polling problem, but require research on how many new voters to expect ..... the 18 to 21 age is all new.
Since most polls do not use actual voting lists with voting history ... most are useless, they cannot even identify which party the person belonged to in ‘08.
Honored to reply to a post by the famous Buckhead. Thanks for 2004....
All very good points. I like the spreadsheet but it really is not completely determinative as you say. There are quite a few variables at play here.