Wherein I play with the lovely Google Charts API and expose my total incompetence in statistics, economics, agriculture, and geography. And quite possibly other things too.
So I was reading the Open Knowledge Foundation blog and came across this article featuring US wheat production, which points to this dataset of wheaty goodness. My recent work on Clear Climate Code had made me already aware of the availability of GISTEMP’s summary data products.
So it occurred to me that this could be used to answer the question “when the weather is warmer, does more wheat grow?”.
So the wheat data is US wheat production, including yields in bushels/acre, sigh. GISTEMP even do a dataset that shows the temperature anomaly for the US. I think this is incredibly parochial, but it happens to be just what I want.
So the wheat yield (volume of wheat per harvested unit area) has a general upward trend. At least from the mid 1930′s or so. Because I’m only interested in the local variation I have detrended the wheat data:
My hypothesis is that any deviation of the temperature from the long term average will lower wheat yields. I think this because I would expect that over the thousands of years of selection humans will have cultivated a variety of wheat that is optimised to grow at the average temperatures and it will do less well when temperatures deviate.
So what do we see? Here’s wheat yields and temperatures together:
Well, there’s no obvious correlation to eyeball. Scattergram:
(which is almost just changing ‘cht=lc’ to ‘cht=s’ in the above chart URL)
Bit of a blurry mess. If anything a slight negative trend, which would mean that colder temperatures gave a higher wheat yield. And indeed Pearson’s correlation is about -0.3 (assuming my calculations are correct) indicating a weak negative correlation.
There are problems. One problem is that I have no p-value. That’s partly because I haven’t read that far on the Wikipedia page (I’m not using some fancy stats package for my analysis; everything is hand-coded in Python), and partly because I have a degrees of freedom problem. Temperature is autocorrelated, so whilst I have 128 samples, that’s fewer than 128 degrees of freedom, so the standard assumption of independent variables is incorrect.
The other problem is that it looks like the detrending might have introduced a bit of an alarming feature into the wheat anomalies. There’s a gentle hump from 1866 to about 1940 and a similar one from about 1940 to 2000. This is almost certainly because I’ve used a cubic polynomial to fit to the data to detrend it. It looks like a two-leg linear fit would be better (with a kink around 1942), but I haven’t found how to do that. I have a sneaking suspicion I have some FORTRAN code lying around here to do it, but I’m too scared to look.
Final tiny problem almost too small to be worth mentioning: the wheat data is for the entire US, whereas the temperature data is for the contiguous 48. I’m guessing that Alaska and Hawaii make so little wheat contribution that it doesn’t matter.
In any case it doesn’t really look like fixing these problems would ever indicate a strong positive trend between temperature anomalies and wheat yields. So we can reject the notion that warmer weather means higher wheat yields. Of course warmer weather might mean we can grow more of something else (possibly just a different variety of wheat); it also might mean that the available belt of land for growing wheat is larger (but this is unlikely since it probably means the available belt of land for growing wheat has moved North).