Posted on 07/11/2012 11:22:45 AM PDT by Ernest_at_the_Beach
Anthony Watts, Lucia Liljegren , and Michael Tobis have all done a good job blogging about Jeff Masters’ egregious math error. His error was that he claimed that a run of high US temperatures had only a chance of 1 in 1.6 million of being a natural occurrence. Here’s his claim:
U.S. heat over the past 13 months: a one in 1.6 million event
Each of the 13 months from June 2011 through June 2012 ranked among the warmest third of their historical distribution for the first time in the 1895 – present record. According to NCDC, the odds of this occurring randomly during any particular month are 1 in 1,594,323. Thus, we should only see one more 13-month period so warm between now and 124,652 AD–assuming the climate is staying the same as it did during the past 118 years. These are ridiculously long odds, and it is highly unlikely that the extremity of the heat during the past 13 months could have occurred without a warming climate.
All of the other commenters pointed out reasons why he was wrong … but they didn’t get to what is right.
Let me propose a different way of analyzing the situation … the old-fashioned way, by actually looking at the observations themselves. There are a couple of oddities to be found there. To analyze this, I calculated, for each year of the record, how many of the months from June to June inclusive were in the top third of the historical record. Figure 1 shows the histogram of that data, that is to say, it shows how many June-to-June periods had one month in the top third, two months in the top third, and so on.
Figure 1. Histogram of the number of June-to-June months with temperatures in the top third (tercile) of the historical record, for each of the past 116 years. Red line shows the expected number if they have a Poisson distribution with lambda = 5.206, and N (number of 13-month intervals) = 116. The value of lambda has been fit to give the best results. Photo Source.
The first thing I noticed when I plotted the histogram is that it looked like a Poisson distribution. This is a very common distribution for data which represents discrete occurrences, as in this case. Poisson distributions cover things like how many people you’ll find in line in a bank at any given instant, for example. So I overlaid the data with a Poisson distribution, and I got a good match
Now, looking at that histogram, the finding of one period in which all thirteen were in the warmest third doesn’t seem so unusual. In fact, with the number of years that we are investigating, the Poisson distribution gives an expected value of 0.2 occurrences. In this case, we find one occurrence where all thirteen were in the warmest third, so that’s not unusual at all.
Once I did that analysis, though, I thought “Wait a minute. Why June to June? Why not August to August, or April to April?” I realized I wasn’t looking at the full universe from which we were selecting the 13-month periods. I needed to look at all of the 13 month periods, from January-to-January to December-to-December.
So I took a second look, and this time I looked at all of the possible contiguous 13-month periods in the historical data. Figure 2 shows a histogram of all of the results, along with the corresponding Poisson distribution.
Figure 2. Histogram of the number of months with temperatures in the top third (tercile) of the historical record for all possible contiguous 13-month periods. Red line shows the expected number if they have a Poisson distribution with lambda = 5.213, and N (number of 13-month intervals) = 1374. Once again, the value of lambda has been fit to give the best results. Photo Source
Note that the total number of periods is much larger (1374 instead of 116) because we are looking, not just at June-to-June, but at all possible 13-month periods. Note also that the fit to the theoretical Poisson distribution is better, with Figure 2 showing only about 2/3 of the RMS error of the first dataset.
The most interesting thing to me is that in both cases, I used an iterative fit (Excel solver) to calculate the value for lambda. And despite there being 12 times as much data in the second analysis, the values of the two lambdas agreed to two decimal places. I see this as strong confirmation that indeed we are looking at a Poisson distribution.
Finally, the sting in the end of the tale. With 1374 contiguous 13-month periods and a Poisson distribution, the number of periods with 13 winners that we would expect to find is 2.6 … so in fact, far from Jeff Masters claim that finding 13 in the top third is a one in a million chance, my results show finding only one case with all thirteen in the top third is actually below the number that we would expect given the size and the nature of the dataset …
w.
Data Source, NOAA US Temperatures, thanks to Lucia for the link.
****************************************************************
I will look thru them and post a few.
**************************************EXCERPT*******************************************
*****************************EXCERPT***********************************************
Murray Grainger says:
1 in 2.6 is close enough to 1 in 1.6 million for the average climate alarmist; what’s your beef? Nothing that a little data adjustment won’t fix.
**************************************EXCERPT********************************************
Willis, Can I please borrow your brain for a few days.. I could make a gazillion dollars with all that extra smarts and speed of thought. I can’t even comprehend the amount of work and effort that it even took to come up with the line of analysis, let alone sit down with the data. But then, I still don’t have your brain. But then, unfortunately, not many scientists do either.
Thank you, Sir.
**************************************EXCERPT******************************************
Steve R says:
This whole 1 in 1.6 million issue has been great entertainment. It’s also been an eye opener, to see so many climate scientists struggling with a fairly basic statistical concept.
*************************************************************
Poisson distribution From Wikipedia, the free encyclopedia
**************************************EXCERPT*****************************************
In probability theory and statistics, the Poisson distribution (pronounced [pwasɔ̃]) is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since the last event.^{[1]} (The Poisson distribution can also be used for the number of events in other specified intervals such as distance, area or volume.)
Suppose someone typically gets 4 pieces of mail per day. That becomes the expectation, but there will be a certain spread: sometimes a little more, sometimes a little less, once in a while nothing at all.^{[2]} Given only the average rate, for a certain period of observation (pieces of mail per day, phonecalls per hour, etc.), and assuming that the process, or mix of processes, that produce the event flow are essentially random, the Poisson distribution specifies how likely it is that the count will be 3, or 5, or 11, or any other number, during one period of observation. That is, it predicts the degree of spread around a known average rate of occurrence.^{[2]}
The distribution's practical usefulness has been explained by the Poisson law of small numbers.^{[3]}
**************************************EXCERPT************************************
Dave Wendt says:
It would appear that inadvertently Mr. Masters, or whoever provided him with his numbers, has arrived at a ratio that is quite correct, the only problem being the ratio is applied to the wrong query. If you ask “what are the odds of a story about a human caused plague of horrendous heatwaves, which appears in any Lamestream Media source, NOT being complete BS?” the ratio of 1 in 1.6 million appears, to my eye at least, to be just about spot on.
fyi
Anyway just letting you know about it.
Related thread:
May I?... or do you want to?
I think a lot of eyes see stuff here.
OK Ernest. I’ll post it to the general news forums.
I can’t post it. It is a blog site. I tried. FR bounced me.
Just saves the headache.
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.
As always, astounding reasoning, Willis. I find your conclusion flawless. I find that it also supports something that I have long suspected — few people are actually qualified to work with statistics or make statistical pronouncements. From what I recall, Jeff was only quoting some ass at NOAA, so perhaps it isn’t his fault. However, you really should communicate your reasoning to him. I think that there is absolutely no question that you have demonstrated that it is a Poisson process with significant autocorrelation — indeed, from the histogram (exactly as one would expect) and that as you say if anything it suggests that there have (probably) been other thirteen month stretches. It is also interesting to note that the distribution peaks at 5 months. That is, the most likely number of months in a year to be in the top 1/3 is between 1/3 and 1/2 of them!
Yet according the reasoning of the unknown statistician at NOAA, the odds of having any interval of 5 months in the top third are . They seem to think that every month is an independent trial or something.
Sigh.
rgb