Tuesday, 20 September 2016

Global Warming goes viral

NOAA's Climate Monotoring Chief Deke Arndt tweeted in May this year: "I look at this stuff every day and it still astonishes me" and showed this graph. Note, the locally typical units.

Honey, I broke the graph. Again.

How the temperature has risen the last years is amazing.

The prediction made this week for 2016 by Gavin Schmidt of NASA GISS does not look good. With a bit of British understatement he is 99% confident 2016 will be a record year.

Attention grabbers

That temperature jump is one reason people have woken from their slumber. Another reason is that people have started visualising the global temperature increase in interesting new ways. Let me try to explain why they work. It all started with the animate spiral of the global temperature of Ed Hawkins that went viral.

The spiral went viral because it was a new way to present the data. It was also very good timing because the spiral especially shows well how extraordinary the current temperature jump is.

The modern currency is attention.

So just after the Olympics I tried to top Ed Hawkins with this visualisation.

By my standards it went viral. It works because visual connects global warming to the famous Olympic photo of Usain Bold running so fast that he can afford to look sideways and smile at the camera.

I guess the virus did not spread beyond the few thousand people discussing climate change every day because without the axes you need to know the temperature signal to get it. Adding axes that would destroy the beauty.

In the last episode of Forecast Scott St George tells about his project to convert the climate signal into music. The different instruments are different climate regions. At the time this creative idea generated a lot of media attention. Works well on radio and TV than a static graph.

More regional detail can be seen in a so-called Hovmöller plot. The plot by Kevin Anchukaitis shows time on the horizontal axis and the colours indicate the average temperature over latitudinal bands. In the lower half of the figure you see the Southern Hemisphere, which warms less than the Northern Hemisphere at the top.

The additional energy that is available due to the stronger greenhouse effect can go into warming the air or evaporating water. The Northern Hemisphere has much more land, is drier. Thus evaporation increases less and warming more.

The front of the new State of the Climate also shows the observed temperature signal in red and brown.

Understanding climate change

Probably the most eye opening graph to understand the difference between short-term fluctuations and long-term trends of the temperature signal is this one. An important source of fluctuations is the El Nino in the Pacific ocean. By plotting years with El Nino, its counterpart La Nina and neutral conditions separately, you immediately see that they all have about the same long-term trend and that El Nino is mainly important for the short term fluctuations. No need for statistics skillz.

I realise that it is just a small tweak, but I like this graph by Karsten Haustein because it emphasises that the data in WWII is not very reliable. The next step would be to also give the decades around 1900 the colour of the orange menace. The data in this period also has some issues and it may well be warming.

If you plot monthly temperatures rather than yearly averages, the warming graph becomes more noisy. Mitigation sceptics like to plot data that way; the trends naturally stay the same size, but the noise makes them seem smaller. This beautiful solution by John Kennedy plots every month separately and thus shows that all months are warming without distracting noise.

You can also show the seasonal cycle like in this NASA example, or animate it.

The sun is the source of nearly all energy on Earth and naturally important for the climate, but also for cliamte change? The sun was quite stable the last century and the last decades the sun may even have become less bright. By plotting the sun and temperature together Stefan Rahmstorf illustrates that the variations of the sun are too small to influence the climate much.

A longer perspective

Back in 2013 Jos Hagelaars combined the temperature reconstructions (from tree rings and other indirect information sources), the instrumentally measured temperatures and the temperature projection up to 2100 into one graph, also called The Wheelchair. It makes clear how large and fast the expected warming will be in a historical perspective.

The projected warming jumps up so fast in the graph of Hagelaars that you cannot see well how fast it is. Randall Munroe of the comic XKCD solved this by rotating the graph, so that it can be plotted a lot longer. To see how we are warming the planet, the graph had to become very long. Happy scrolling. See you below.

I hope you did not miss the mouse-over text:
"[After setting your car on fire]
Listen, your car's temperature has changed before.

Some complain that the temperature reconstructions are smoother than the instrumental data. This although it is even explained in the comic. How much more geeky can you get?

They want to suggest that thus a peak like we see now could be hidden in the reconstructions. That is theoretical possible, but there is no evidence for that. More importantly: the current warming is not a peak, it is a jump, it will still get a lot warmer and it will stay warm for a very long long time. If anything we are doing to the climate now had happened in the past it would jump out in the reconstruction.

Chris Colose solves the problem a little more technically and puts the current and future warming in the context of the period during which our civilization developed in this animation.

Mitigation sceptics

HotWhopper gathered and visualised some predictions from mitigation sceptics. Pictures that stand in sharp contrast to scientific predictions and in a sane rational world would discredit their political movement forever.

David Archibalds' prediction from 2006.
Based on solar maxima of approximately 50 for solar cycles 24 and 25, a global temperature decline of 1.5°C is predicted to 2020, equating to the experience of the Dalton Minimum.

Pierre Gosselin of the German blog No Truth Zone in 2008:
-2.5°C by 2020!...My prediction is we’ve started a nasty cold period that will make the 1960s look balmy. We’re about to get caught with our pants down. And a few molecules of CO2 is not going to change it.

Don Easterbrook in 2001.

Christopher Monckton prediction in 2013.
A math geek with a track-record of getting stuff right tells me we are in for 0.5 Cº of global cooling. It could happen in two years, but is very likely by 2020. His prediction is based on the behavior of the most obvious culprit in temperature change here on Earth – the Sun.
Maybe the Lord's math geek got a minus sign wrong. Will be hard to get so much cooling in 2020, especially as promissed due to the sun.

Normal science

A normal scientific graph can be very effective. I loved it how the audience was cheering and laughing after having to endure the nonsense of Australian Senator [[Malcolm Roberts]], when the physicist Brian Cox replied with: "I brought the graph." A good sign that the public is fed up with the misinformation campaign of the mitigation sceptical movement.

The graph Cox showed was similar to this one of NASA-GISS.

You know what they say about laughing at geniuses. I hope.

Related information

Q&A smackdown: Brian Cox brings graphs to grapple with Malcolm Roberts

Temperature observation problems in WWII and the early instrumental period

Early global warming

A similar idea as the orchestra playing global warming is this tune based on the flow of the river Warta in Poland. The red noise of nature sounds wonderful.

* Temperature curve of XKCD used under a Creative Commons Attribution-NonCommercial 2.5 License.

Tuesday, 13 September 2016

Publish or perish is illegal in Germany, for good reason

Had Albert Einstein died just after his wonder year 1905 he would only have had a few publications on special relativity, the equivalence of mass and energy, Brownian motion and Photoelectric Effect on his name and would nowadays be seen as a mediocre researcher. He got the Nobel prize in 1921 "for his services to Theoretical Physics, and especially for his discovery of the law of the photoelectric effect", not for relativity, not for Brownian motion. This illustrates how hard it is to judge scientific work, even more than a decade afterwards, much less in advance.
Managing scientists is hard. It is nearly impossible to determine who will do a good job, who is doing a good job and even whether someone did a good job in the past. The last decades science managers have largely given up trying to assess how good a scientist is in most of the world and instead assess how many articles they write and how high the prestige is of the journals the articles appear in.

Unsurprisingly, this has succeeded in increasing the number of articles scientists write. Especially in America scientists are acutely aware that they have to publish or perish.

Did this hurt scientific progress? It is unfortunately impossible to say how fast science is progressing and how fast it could progress. The work is about the stuff we do not understand yet after all. The big steps, evolution, electromagnetism, quantum mechanics, have become rare the last decades. Maybe the low hanging fruit is simply gone. Maybe it is also modern publish-or-perish management.

There are good reasons to expect publish-or-perish management to be detrimental.
1. The most basic reason: The time spend writing and reading articles the ever increasing number of articles is not spend on doing research. (I hope no one is so naive as to think that the average scientist actually became several times more productive.)
2. Topics that quickly and predictably lead to publications are not the same topics that will bring science forward. I personally try to write a mix because only working on more risky science you expect is important is unfortunately too dangerous.
3. The stick and carrot type of management works for manual labor, but for creative open-ended work it is often found to be detrimental. For creative work mastery and purpose are the incentives.

German science has another tradition, trusting scientists more and focusing on quality. This is expressed in the safeguards for good scientific practice of the German Science Foundation (DFG). It explicitly forbids the use of quantitative assessments of articles.
Universities and research institutes shall always give originality and quality precedence before quantity in their criteria for performance evaluation. This applies to academic degrees, to career advancement, appointments and the allocation of resources. …

criteria that primarily measure quantity create incentives for mass production and are therefore likely to be inimical [harmful] to high quality science and scholarship. …

Quantitative criteria today are common in judging academic achievement at all levels. … This practice needs revision with the aim of returning to qualitative criteria. … For applications for academic appointments, a maximum number of publications should regularly be requested for the evaluation of scientific merit.
For a project proposal to the German Science Foundation this "maximum number" means that you are not allowed to list all your publications, but only the 5 best ones (for a typical project, smaller projects even less).

While reading the next paragraphs, please hear me screaming YES, YES, YES in your ear at an unbearable volume.
An adequate evaluation of the achievements of an individual or a small group, however, always requires qualitative criteria in the narrow sense: their publications must be read and critically compared to the relevant state of the art and to the contributions of other individuals and working groups.

This confrontation with the content of the science, which demands time and care, is the essential core of peer review for which there is no alternative. The superficial use of quantitative indicators will only serve to devalue or to obfuscate the peer review process.
I fully realize that actually reading someone’s publications is much more work than counting them and that top scientists spend a large part of their time reviewing. In my view that is a reason to reduce the number of reviews and trust scientists more. Hire people who have a burning desire to understand the world, so that you can trust them.

Sometimes this desire goes away when people get older. For the outside world this is most visible in some older participants of the climate “debate” who hardly produce new work trying to understand climate change, but use their technical skills and time to deceive the public. The most extreme example I know is a professor who was painting all day long, while his students gave his lectures. We should be able to get rid of such people, but there is no need for frequent assessments of people doing their job well.

You also see this German tradition in the research institutes of the Max Planck Society. The directors of these institutes are the best scientists of the world and they can do whatever they think will bring their science forward. Max Planck Director Bjorn Stevens describes this system in the fourth and best episode of the podcast Forecast. The part on his freedom and the importance of trust starts at minute 27, but best listen to the whole inspiring podcast about which I could easily write several blog posts.

Stevens started his scientific career in the USA, but talks about the German science tradition when he says:
I can think of no bigger waste of time than reviewing Chris Bretherton’s proposals. I mean, why would you want to do that? They guy has shown himself to have good idea, after good idea, after good idea. At some point you say: go doc, go! Here is your budget and let him go. This whole industry that develops to keep someone like Chris Bretherton on a leash makes no sense to me.
Compare scientists who sets priorities within their own budgets with scientists who submit research proposals judged by others. If you have your own budget you will only support what you think is really important; if you do A, you cannot do B. Many project proposals are written to fit into a research program, because a colleague wants to collaborate and apart from the time wasted on writing it, there are no downsides for asking for more funding. If you have your own budget, the person with the most expertise and with the most skin in the game decides. This while they call the project funding, where the deciders have no skin in the game, competitive. It is Soviet style planning; that it works at all shows the dedication and altruism of the scientists involved. Those are scientists you could simply trust.

I hope this post will inspire the scientific community to move towards more trust in scientists, increase the fraction of unleashed researchers and reduce the misdirected quantitative micro-management. Please find below the full text of the safeguards of the German Science Foundation on performance evaluation; above I had to skip many worthwhile parts.

Recommendation 6: Performance Evaluation

Universities and research institutes shall always give originality and quality precedence before quantity in their criteria for performance evaluation. This applies to academic degrees, to career advancement, appointments and the allocation of resources.

For the individual scientist and scholar, the conditions of his or her work and its evaluation may facilitate or hinder observing good scientific practice. Conditions that favour dishonest conduct should be changed. For example, criteria that primarily measure quantity create incentives for mass production and are therefore likely to be inimical to high quality science and scholarship.

Quantitative criteria today are common in judging academic achievement at all levels. They usually serve as an informal or implicit standard, although cases of formal requirements of this type have also been reported They apply in many different contexts: length of Bachelor, Master or PhD thesis, number of publications for the Habilitation (formal qualification for university professorships in German speaking countries), as criteria for career advancements, appointments, peer review of grant proposals, etc. This practice needs revision with the aim of returning to qualitative criteria. The revision should begin at the first degree level and include all stages of academic qualification. For applications for academic appointments, a maximum number of publications should regularly be requested for the evaluation of scientific merit.

Since publications are the most important “product” of research, it may have seemed logical, when comparing achievement, to measure productivity as the number of products, i.e. publications, per length of time. But this has led to abuses like the so-called salami publications, repeated publication of the same findings, and observance of the principle of the LPU (least publishable unit).

Moreover, since productivity measures yield little useful information unless refined by quality measures, the length of publication lists was soon complemented by additional criteria like the reputation of the journals in which publications appeared, quantified as their “impact factor” (see section 2 5).

However, clearly neither counting publications nor computing their cumulative impact factors are by themselves adequate forms of performance evaluation. On the contrary, they are far removed from the features that constitute the quality element of scientific achievement: its originality, its “level of innovation”, its contribution to the advancement of knowledge. Through the growing frequency of their use, they rather run the danger of becoming surrogates for quality judgements instead of helpful indicators.

Quantitative performance indicators have their use in comparing collective activity and output at a high level of aggregation (faculties, institutes, entire countries) in an overview, or for giving a salient impression of developments over time. For such purposes, bibliometry today supplies a variety of instruments. However, they require specific expertise in their application.

An adequate evaluation of the achievements of an individual or a small group, however, always requires qualitative criteria in the narrow sense: their publications must be read and critically compared to the relevant state of the art and to the contributions of other individuals and working groups.

This confrontation with the content of the science, which demands time and care, is the essential core of peer review for which there is no alternative. The superficial use of quantitative indicators will only serve to devalue or to obfuscate the peer review process.

The rules that follow from this for the practice of scientific work and for the supervision of young scientists and scholars are clear. They apply conversely to peer review and performance evaluation:
  • Even in fields where intensive competition requires rapid publication of findings, quality of work and of publications must be the primary consideration. Findings, wherever factually possible, must be controlled and replicated before being submitted for publication.
  • Wherever achievement has to be evaluated — in reviewing grant proposals, in personnel management, in comparing applications for appointments — the evaluators and reviewers must be encouraged to make explicit judgements of quality before all else. They should therefore receive the smallest reasonable number of publications — selected by their authors as the best examples of their work according to the criteria by which they are to be evaluated.

Related information

Episode 4 of Forecast with Max Planck Director Bjorn Stevens on clouds, aerosols, science and science management. Highly recommended.

Memorandum of the German Science Foundation: Safeguarding Good Scientific Practice. English part starts at page 61.

On of my first posts explaining why stick and carrot management makes productivity worse for cognitive tasks: Good ideas, motivation and economics

* Photo of Albert Einstein at the top is in the public domain.

Sunday, 4 September 2016

Believe me, the GOP needs to open itself to rational debate

Major Tom (Coming Home)
Peter Schilling

4, 3, 2, 1
Earth below us
Drifting, falling
Floating weightless
Calling, calling home

Second stage is cut
We're now in orbit
Stabilizers up,
Runnning perfect
Starting to collect
Requested data
What will it affect
When all is done?
Thinks Major Tom

Back at ground control,
There is a problem
Go to rockets full
Not responding
Hello Major Tom.
Are you receiving?
Turn the thrusters on
We're standing by
There's no reply

4, 3, 2, 1
Earth below us
Drifting, falling
Floating weightless
Calling, calling home

Across the stratosphere,
A final message
Give my wife my love
Then nothing more

Far beneath the ship
The world is mourning
They don't realize
He's alive
No one understands
But Major Tom sees
Now the light commands
This is my home
I'm coming home

Earth below us
Drifting, falling
Floating weightless
Coming home
Earth below us
Drifting, falling
Floating weightless
Coming, coming
Home, home

Much better German original
Major Tom (Völlig Losgelöst)
Peter Schilling

...Völlig losgelöst
Von der Erde
Schwebt das Raumschiff
Völlig schwerelos

Die Erdanziehungskraft
Ist überwunden
Alles läuft perfekt -
Schon seit Stunden

Doch was nützen die
Am Ende
Denkt sich Major Tom

Im Kontrollzentrum
Da wird man panisch
Der Kurs der Kapsel der
Stimmt ja gar nicht

"Hallo Major Tom
Können Sie hören
Woll'n Sie das Projekt
Denn so zerstören?"
Doch, er kann nichts hörn'
Er schwebt weiter...

...Völlig losgelöst
Von der Erde
Schwebt das Raumschiff
Völlig schwerelos

Die Erde schimmert blau
Sein letzter Funk kommt:
"Grüsst mir meine Frau!"
Und er verstummt

Unten trauern noch
Die Egoisten
Major Tom denkt sich
"Wenn die wüssten -
Mich führt hier ein Licht
Durch das All
Das kennt ihr noch nicht
Ich komme bald
Mir wird kalt."

Völlig losgelöst
Von der Erde
Schwebt das Raumschiff
The Grand Old Party has created a monster and now it has turned on them.

It would be tempting to simply call the monster Donald Trump, but the unnamed monster has many aspects: Trump, conservative media, anti-science, rejection of adult debate, corporate corruption, climate change denial, racism, fear.

One reason to call the monster Trump would that Trump has taken the syndrome to such extremes. A second is that Trump made many prominent conservatives realize that the monster threatens their party. This threat becomes visible in the list of positions Trump is able to sell that are far from Republican and Trump has started attacking conservative politicians directly.

To solve the problem the GOP will have to return to rational debate, rather than ending every second sentence with "believe me". Conservative readers, did you notice that "believe me" did not work when you thought you should believe me? It is just as stupid coming from your side. Give people reasons to accept what you are saying.

The GOP had rational debate in the past, like non-US conservative parties do. What makes rational debate hard is that US politicians now take position based on what the donors want. The incoherent mess a politician then has to defend cannot be defended rationally. Rational debate thus needs to be replaced with misinformation. To make the misinformation palatable politicians need to stoke fear to suppress critical thought and fuel tribalism.

John Ziegler, a nationally syndicated conservative talk show host, points to the role of conservative media in this. Initially conservative media was a comfortable way for conservative politicians to spread their talking points without getting critical questions. To increase market share conservative media has convinced its followers that other information sources are biased against conservatives. Ted Newton, former communications adviser to 2012 Republican presidential nominee Mitt Romney, said:
"What it became, essentially, was they were preaching this is the only place you can get news. This is the only place you can trust. All other media outlets are lying to you. So you need to come to us."
That now makes it hard for conservatives to contain Trump and point out his lies. The term "lie" may not even be appropriate for Trump. To be a lie you have to be aware that what you are saying is wrong and for a conman like Trump right and wrong are irrelevant, what counts is whether a message sounds convincing.

When what Trump finds convincing does not fit to the GOP platform, its politicians cannot point to fact-checkers or The New York Times because conservatives have been convinced these sources are lying. Reinforced by Trump tweeting about "The failing New York Times" or the "the disgusting and corrupt media".

John Ziegler naturally searches for the problem in conservative media:
"We've ... reached the point, I say, we've left the gravitational pull of the rational Earth, where we are now in a situation where facts don't matter, truth doesn't matter, logic doesn't matter. ...

The conservative establishment that needs to be gotten rid of is the conservative media establishment. Sean Hannity needs to go. Bill O'Reilly needs to go. Sadly, Rush Limbaugh needs to go.

Here's what I'll be very disappointed in: If Trump does lose, as I am very confident that he will, and let's say it's not super close, if he loses by a significant margin and Sean Hannity and people like him have not experienced some significant career pain, if not destruction, because of their role, then it's over. It is over.

Because if there is no price to pay for conservative-media elements having sold out to Donald Trump, then guess what? It's going to happen again and again and again. ... If that doesn't happen, then I think we're done. It's over."
I am not sure how much purging specific persons would help. The system needs to change. The media is nowadays financed more and more per view, per click. This pushes the system to scandal and rubbish, to emotion, fear and exaggeration. Europe benefits enormously from a public media system, that may be more boring, but normally gets the facts right and this forces other media sources to also deliver higher quality.

The population also has an important role in keeping the media and politicians honest by doing their due diligence, giving feedback and selecting credible sources. I think twice before I click on a link to a Murdoch publication because every click converts misinformation and vitriol into cash. Due diligence is hard in a culture where people have little time for citizenship because of the stress society produces and a focus on working long hours over working effectively and creatively.

"I think the conservative media is the worst thing that has ever happened to the Republican Party on a national level,"
John Ziegler, conservative radio host

There is a movement to newspapers, magazines and video news and entertainment that is supported by members. This will lead to more partisanship and a splintering of the media landscape. Still, members will likely be people interested in quality. Thus hopefully the partisanship will be limited to having a clear point of view and finding certain stories interesting, but the facts will be right. If the quality is right that would be progress. If the quality is right, the splintering of the media does not have to lead to a splintering of society because there would be a common basis that makes communication possible.

Next to the [[Fourth Estate]], the media, also science is important for creating a foundation of knowledge that makes civilized debate possible. I would call Science the Fifth Estate. This role of science is as important as sending people to the Moon or designing a non-sticking frying pan. Physicist and philosopher of science John Ziman focuses on this aspect when he argues:
Objectivity is what makes science so valuable in society. It is the public guarantee of reliable disinterested knowledge. Science plays a unique role in settling factual disputes. This is not because it is particularly rational or because it necessarily embodies the truth: it is because it has a well-deserved reputation for impartiality on material issues. The complex fabric of democratic society is held together by trust in this objectivity, exercised openly by scientific experts. Without science as an independent arbiter, many social conflicts could be resolved only by reference to political authority or by a direct appeal to force.
I would expect that it is no coincidence that modern science and nation state were born at about the same time and larger nation states only came up when science had spread. You need to be able to talk with each other.

Anti-science sentiments in the USA are thus worrying. We should also not freak out. Scientists are still one of the most trusted professions and even the enemies of science typically claim to be friends of Science. In Canada even literally. This shows how strong science still is culturally.

Still when scientists speak truth to power, it is worrying how easy it is for US corporations to hit back, via think tanks, FOIA harassment and bribing politicians. Republican politicians are the best investment for corporations because conservatives tend to follow the leader more. Corporations are not charities, those campaign contributions are investments with high rate of return. Also for the sake of the economy corporations need to compete on the market again.

The New York Times reports that congressional Republicans are unwilling to help communities in Florida to cope with the consequences of sea level rise and block the Navy from adapting to the ongoing changes. When America is invaded during high tides, I hope that Republican congressman Buck of Colorado will repeat his claim that the military should not be distracted by a "radical climate change agenda". There was a time when national defense was one of the highest priorities of the Republican party, now corporate brides make them weaken national defense and ignore communities in need. Even communities in swing states.

The good news is that Republican voters are just as fed up with the corrupting influence of money on politics as Democrats. The bad news is that the current politicians got into their positions because they are good in finding donors and do not want to disappoint them. Still begging for money is no fun and many politicians got into politics for good reasons. So together with some people power, it should be possible to reduce the influence of money.

Accepting the money and misinforming your constituents gives a short term boost, especially for the incumbents. In the long term you cannot do anything without trust and you lose contact with the ground.

The American political system is more vulnerable for bribery because the voter does not have much choice. If the special interest can convince party A that they can also convince party B with a generous contribution, there is no downside for any of the parties.

Furthermore, because of the two-party system your vote nearly never matters. That makes it less motivating to pay attention what happens and whether politicians do a good job. If no one is looking, it is less dangerous to do the bidding of the donors rather than the voters. Here the crisis in the media also returns because less journalists also means that less people are looking.

whenever the people are well-informed, they can be trusted with their own government; that, whenever things get so far wrong as to attract their notice, they may be relied on to set them to rights.
Thomas Jefferson

I prefer parliamentary democracies, but if you want to introduce more competition between the parties within the district system used in the USA, you could introduce Preferential Voting, like Australia does. Here the voters are required to indicate their first preference, but they can also indicate the order of preference of the other candidates. Free-market Republicans could thus vote for Gary Johnson as first preference and vote for Trump or Clinton as second preference to make sure the vote does not go wasted. Similarly Bernie Sanders supporters could indicate Green party candidate Jill Stein without the risk of this causing a Trump catastrophe.

Time is running out for the Republican party. If the best case scenario for American and the world comes true and Trump is defeated a discussion about the future of the Republican party will break out. NBC News sees four options:
1. The Republican Party remade in Trump's image,
2. Refined Trumpism (Trump without the bigotry),
3. The Party Establishment Wins and
4. The Stalemate.

Without a return to rationality, this process is going to be very messy. I will not call it likely, but I would not even be surprised if the Republican party would fall apart like the [[Whig party]] did over slavery. The GOP did not have a real leader for years. The party spans a broad coalition, many groups of which have radicalized suffering under a black president, but because their politics were limited to blocking everything, there was hardly any discussions about the political program. Add to this the frustration of losing and bad future prospects due to demographics, which Trump made a lot worse, and you get a dangerous situation, especially when you cannot negotiate and debate rationally.

The GOP meddling in private lives of consenting adults, their xenophobia and anti-science stance will make their demographic problems with young people below 45, science enthusiasts and non-whites larger and larger. There is no need for that. What people do in their bedrooms could just as well be seen as a private affair than as something the Washington should police. Immigrants are on average quite conservative and most would likely vote conservative if the conservatives would not reject them. Conservative parties outside of USA embrace science, scientists used to be proud conservatives and in Europe a large part of the faculty, if not the majority, is conservative.

A debate is not possible when the only response to inconvenient facts is "Lying Ted" and insults. Already the body language of the Trump supporter indicates that he will defend Trump no matter the argument. The same attitude I encounter when I visit the mitigation skeptical blog Watts Up With That. They are determined not to have a productive debate.

Related reading

Joshua Green at Bloomberg wrote a very favorable bio on Breitbart's Steve Bannon: This Man Is the Most Dangerous Political Operative in America

Washington Post on the 2013 Republican National Committee’s Growth and Opportunity Project report: GOP autopsy report goes bold.

Media Matter on the new book "I’m Right and You’re an Idiot: The Toxic State of Public Discourse and How To Clean it Up": New Book Explains Media’s Role In Today’s Toxic State Of Public Discourse

Sykes on Morning Joe: GOP Made Itself "Hostage" to Trump

National Review: Conservative Scams Are Bringing Down the Conservative Movement

Experts worry Trump’s war on America’s democratic institutions could do long-term damage

Charlie Sykes: Have We Created This Monster? Talk radio and the rise of Donald Trump

Conservative media reaches a large audience: U.S. Media Publishers and Publications – Ranked for July 2016

* Caricature at the top, Donald Trump - Riding the Wrecking Ball by DonkeyHotey used under a Creative Commons Attribution-Share Alike 2.0 Generic (CC BY-SA 2.0) license.

Dinosaur Birthday Cupcakes by abakedcreation used under a Creative Commons Attribution-NonCommercial-NoDerivs 2.0 Generic (CC BY-NC-ND 2.0) license.

Monday, 29 August 2016

Blair Trewin's epic journey to 112 Australian weather stations

Blair Trewin is a wonderful character and one of the leading researchers of the homogenization community. He works at the Australian Bureau of Meteorology (BOM) and created their high-quality homogenized datasets. He also developed a correction method for daily temperature observations that is probably the best we currently have. Fitting to his scientific love of homogenization, he has gone on a quest to visit all 112 weather stations that are used to monitor the Australian climate. Enjoy the BOM blog post on this "epic journey".

To Bourke and beyond: one scientist’s epic journey to 112 weather stations

There are 112 weather observation stations that feed into Australia’s official long-term temperature record—and Bureau scientist, Blair Trewin, has made it his personal mission to visit all of them! Having travelled extensively across Australia—from Horn Island in the north to Cape Bruny in the south, Cape Moreton in the east to Carnarvon in the west—Blair has now ticked off all but 11 of those sites.

Map: the 112 observation locations that make up Australia's climate monitoring network

Some of the locations are in or near the major cities, but many are in relatively remote areas and can be difficult to access. Blair says perhaps his most adventurous site visit was on the 2009 trip at Kalumburu, an Aboriginal community on the northernmost tip of the Kimberley, and two days’ drive on a rough track from Broome. ‘I asked the locals the wrong question—they said I’d be able to get in, but I didn’t ask them whether I could get back out again’. After striking trouble at a creek crossing leaving town, he spent an unplanned week there waiting for his vehicle to be put on a barge back to Darwin.

While these locations are remote now, in some ways they were even more remote in the past. These days you can get a signal for your mobile phone in Birdsville, Queensland, but as recently as the 1980s, the only means of rapid communication was often-temperamental radio relays through the Royal Flying Doctor Service. Today distance is no longer an issue; the majority of weather stations in the Bureau’s climate monitoring network—including Birdsville—are automated, with thermometers that submit the information electronically.

Photo: Blair Trewin at the weather observation station at Tarcoola, in the far north of South Australia. The Stevenson screen houses a resistance temperature device (thermometer) and a relative humidity probe

But, even some of the sites closer to home have posed a challenge for Blair’s mission. To get to Gabo Island in Victoria for example, you need to either fly or take a boat, and the runway is just a few hundred metres long, so it can only be used in light winds. ‘I spent two days in Mallacoota waiting for the winds to drop enough to get over there’.

Similarly, the site at the Wilsons Promontory lighthouse, if you don’t use a helicopter, is accessed through a 37 km return hike, which Blair did as a training run with one of his Victorian orienteering teammates.

You can read the rest of this adventure at the Blog of the Australian Bureau of Meteorology.

Sunday, 21 August 2016

Naïve empiricism and what theory suggests about errors in observed global warming

In its time it was huge progress that Francis Bacon stressed the importance of observations. Even if he did not do that much science himself, his advocacy for the Baconian (scientific) method, gave him a place as one of the fathers of modern science together with Nicolaus Copernicus and Isaac Newton.

However, you can also become too fundamentalist about empiricism. Modern science is characterized by an intricate interplay of observations and theory. An observation is never free of theory. You may not be aware of it, but you make theoretical assumptions about what you see in any observation. Theory also guides what to observe, what kind of experiments to make.

Charles Darwin often claimed to adhere to Bacon's ideals, but he had another side. University of California professor of biology and philosophy Francisco Ayala writes in Darwin and the scientific method:
“Let theory guide your observations.” Indeed, Darwin had no use for the empiricist claim that a scientist should not have a preconception or hypothesis that would guide his work. Otherwise, as he wrote, one “might as well go into a gravel pit and count the pebbles and describe the colors. How odd it is that anyone should not see that observation must be for or against some view if it is to be of any service”
But his ambivalence is seen in Darwin's advice to a young scientist:
Let theory guide your observations, but till your reputation is well established be sparing in publishing theory. It makes persons doubt your observations.
The same ambivalence is seen in Einstein. Mitigation skeptics like this quote:
No amount of experimentation can ever prove me right; a single experiment can prove me wrong.
They quote this when the observations show less changes than the model. If the observations show more changes than the model/theory the observations, they quickly forget Einstein and the observations are suddenly wrong.

In practice Einstein was more realistic. Prof in molecular physics [[John Rigden]] wrote in his book about Einstein's wonder year 1905: "Einstein saw beyond common sense and, while he respected experimental data, he was not its slave."

That is perfectly reasonable. When theory and observations do not match, the theory can be wrong, the observations can be wrong and the comparison can be wrong. What is called observations is nearly always something that was computed from observations and also that computation can be imperfect. Only when we understand the reason, can we say what it was.

The main blog of the mitigation skeptical movement, WUWT, on the other hand is famous for calling trying to understand the reasons for discrepancies: "excuses".

Global mean temperature

That was a long introduction to get to the graph I wanted to show, where theory suggests the global mean temperature estimates in some periods may have problems.

The graph was compute by Andrew Poppick and colleagues and it looks as if the manuscript is not published yet. They model the temperature for the instrumental period based on the known human forcings — mainly increases in greenhouse gasses and aerosols (small airborne particles from combustion) — and natural forcings — volcanoes and solar variations. The blue line is the model, the grey line the temperature estimate from NASA GISS (GISTEMP).

The fit is astonishing. There are two periods, however, where the fit could be better: world war II and the first 40 to 50 years. So either the theory (this statistical model) is incomplete or the observations have problems.

It is expected that the observations in the WWII are more uncertain. Especially the sea surface temperature changes are hard to estimate because the type of ships and thus the type of observations changed radically in this period. The HadSST estimate of the measurement methods is shown below. During WWII American war ships dominated and they mainly used Engine Room Intake observations, whereas before and after the war merchant ship would often measure the temperature of a bucket of sea water.

The figure above are the observational methods estimated by the UK Hadley Centre for HadSST. Poppick's manuscript uses GISTEMP. Its sea surface temperature comes from ERSST v4. (The land data of GISTEMP comes from the stations gathered by NOAA (GHCNv3) and additional Antarctic stations).

ERSST estimates the observational methods of ships by comparing the sea surface temperature to the night marine air temperature (NMAT). This relationship is only stable over larger areas and multiple years. They can thus not follow the fast changes in the WWII observational methods well.

Also for HadSST it is not clear whether these corrections are accurate and they are large: in the order of 0.3°C. What makes this assessment more difficult is that in the beginning of WWII there was a strong and long [[El Nino event]]. Thus a bit of a peak is expected, but it is not clear whether the size is right.

I would not mind if a reviewer would request to add a statistical model that includes El Nino as predictor in Poppick's paper. That would reduce the noise further (part of the remaining noise is likely explained by El Nino) and that would make it easier to assess how well the temperature fits in the WWII.

The Southern Oscilation Index (SOI) of the Australian Bureaux of Meteorology (BOM). Zoomed in to show the period around WWII. Values below -7 indicate El Nino events and above +7 La Nina events.

It would be an important question to resolve. The peak in the WWII is a large part of the hiatus (a real one) we see in the period 1940 to 1980. If you think the peak in the 1940s away, this hiatus is a lot smaller. The lack of warming in this period is typically explained with increases in aerosols. It ended when air pollution regulations slowed the growth of aerosols; especially in the industrialised air quality improved a lot. I guess that if this peak is smaller, that would indicate that the influence of aerosols is smaller than we currently think.

While the observations hardly showed any warming the first 40 to 50 years, the statistical model suggests that there should have been some warming. The global climate models also suggest some warming. And also several other climate variables suggest warming: the warming in winter, the time lakes and rives freeze and break up, the retreat of glaciers, temperature reconstructions from proxies, and possibly sea level rise. See for example this graph of the dates rivers and lakes froze up and broke up.

I wrote about these changes in my previous post on "early global warming". Poppick's statistical model adds another piece of evidence and suggests that we should have a look whether we understand the measurement problems in the early data well enough.

By comparing the observations with the statistical model we can see periods in which the fit is bad. Whether the long-term observed trend is right cannot be seen this way because the statistical model would still fit well, just with a different coefficient for the long-term forcings. This relationship is likely biased in a similar way as the simple statistical models used to estimate the equilibrium climate sensitivity from observations. This model, and thus theory, does provide a beautiful sanity check on the quality of the observations and suggests periods which we may need to study better.

Related reading

Falsifiable and falsification in science

Early global warming

On the naive empirical view of Australian politician Malcolm Roberts on science: What Climate Change Skeptics Aren’t Getting About Science

Piers Sellers in The New Yorker: Space, Climate Change, and the Real Meaning of Theory


Andrew Poppick, Elisabeth J. Moyer, and Michael L. Stein, 2016: Estimating trends in the global mean temperature record. unpublished manuscript.

* Portrait of Francis Bacon at the top is taken from Wikipedia and is in the public domain.

Monday, 15 August 2016

Downscaling temperature fields with genetic programming

Sierpinski fractal

This blog is not called Variable Variability for nothing. Variability is the most fascinating aspect of the climate system. Like a fractal you can zoom in and out of a temperature signal and keep on finding interesting patterns. The same goes for wind, humidity, precipitation and clouds. This beauty was one of the reasons why I changed from physics to the atmospheric sciences, not being aware at the time that also physicists had started studying complexity.

There is variability on all spatial scales, from clusters of cloud droplets to showers, fronts and depressions. There is variability on all temporal scales. With a fast thermometer you can see temperature fluctuations within a second and the effect of clouds passing by. Temperature has a daily cycle, day to day fluctuations, seasonal fluctuations and year to year fluctuations and so on.

Also the fluctuations fluctuate. Cumulus fields may contain young growing clouds with a lot of variability, older smoother collapsing clouds and a smooth haze in between. Temperature fluctuations are different during the night when the atmosphere is stable, after sun rise when the sun heats the atmosphere from below and the summer afternoon when thermals develop and become larger and larger. The precipitation can come down as a shower or as drizzle.

This makes measuring the atmosphere very challenging. If your instrument is good at measuring details, such as a temperature or cloud water probe on an aircraft, you will have to move it to get a larger spatial overview. The measurement will have to be fast because the atmosphere is changing continually. You can also select an instrument that measures large volumes or areas, such as a satellite, but then you miss out on much of the detail. A satellite looking down on a mountain may measure the brightness of some mixture of the white snow-capped mountains, dark rocks, forests, lush green valleys with agriculture and rushing brooks.

The same problem happens when you model the atmosphere. A typical global atmospheric oceanic climate model has a resolution of about 50 km. Those beautiful snow-capped mountains outside are smoothed to fit into the model and may have no snow any more. If you want to study how mountain glaciers and snow cover feed the rivers you can thus not use the simulation of such a global climate model directly. You need a method to generate a high resolution field from the low resolution climate model fields. This is called downscaling, a beautiful topic for fans of variability.

Deterministic and stochastic downscaling

For the above mountain snow problem, a simple downscaling method would take a high-resolution height dataset of the mountain and make the higher parts colder and the lower parts warmer. How much exactly, you can estimate from a large number of temperature measurements with weather balloons. However, it is not always colder at the top. On cloud-free nights, the surface rapidly cools and in turn cools the air above. This cold air flows down the mountain and fills the valleys with cold air. Thus the next step is to make such a downscaling method weather dependent.

Such direct relationships between height and temperature are not always enough. This is best seen for precipitation. When the climate model computes that it will rain 1 mm per hour, it makes a huge difference whether this is drizzle everywhere or a shower in a small part of the 50 times 50 km box. The drizzle will be intercepted by the trees and a large part will evaporate quickly again. The drizzle that lands on the ground is taken up and can feed the vegetation. Only a small part of the heavy shower will be intercepted by trees, most of it will land on the ground, which can only absorb a small part fast enough and the rest runs over the land towards brooks and rivers. Much of the vegetation in this box did not get any water and the rivers swell much faster.

In the precipitation example, it is not enough to give certain regions more and others less precipitation, the downscaling needs to add random variability. How much variability needs to be added depends on the weather. On a dreary winters day the rain will be quite uniform, while on a sultry summer evening the rain more likely comes down as a strong shower.

Genetic Programming

There are many downscaling methods. This is because the aims of the downscaling depend on the application. Sometimes making accurate predictions is important; sometimes it is important to get the long-term statistics right; sometimes the bias in the mean is important; sometimes the extremes. For some applications it is enough to have data that is locally realistic, sometimes also the spatial patterns are important. Even if the aim is the same, downscaling precipitation is very different in the moderate European climate than it is in the tropical simmering pot.

With all these different aims and climates, it is a lot of work to develop and test downscaling methods. We hope that we can automate a large part of this work using machine learning: Ideally we only set the aims and the computer develops the downscaling method.

We do this with a method called "Genetic Programming", which uses a computational approach that is inspired by the evolution of species (Poli and colleagues, 2016). Every downscaling rule is a small computer program represented by a tree structure.

The main difference from most other optimization approaches is that GP uses a population. Every downscaling rule is a member of this population. The best members of the population have the highest chance to reproduce. When they cross-breed, two branches of the tree are exchanged. When they mutate, an old branch is substituted by a new random branch. It is a cartoonish version of evolution, but it works.

We have multiple aims, we would like the solution to be accurate, we would like the variability to be realistic and we would like the downscaling rule to be small. You can try to combine all these aims into one number and then optimize that number. This is not easy because the aims can conflict.
1. A more accurate solution is often a larger solution.
2. Typically only a part of the small-scale variability can be predicted. A method that only adds this predictable part of the variability, would add too little variability. If you would add noise to such a solution, its accuracy goes down again.

Instead of combining all aims into one number we have used the so-called “Pareto approach”. What a Pareto optimal solution is is best explained visually with two aims, see the graphic below. The square boxes are the Pareto optimal solutions. The dots are not Pareto optimal because there are solutions that are better for both aims. The solutions that are not optimal are not excluded: We work with two populations: a population of Pareto optimal solutions and a population of non-optimal solutions. The non-optimal solutions are naturally less likely to reproduce.

Example of a Pareto optimization with two aims. The squares are the Pareto optimal solutions, the circles the non-optimal solutions. Figure after Zitzler and Thiele (1999).

Coupling atmospheric and surface models

We have the impression that this Pareto approach has made it possible to solve a quite complicated problem. Our problem was to downscale the fields near the surface of an atmospheric model before they are passed to a model for the surface (Zerenner and colleagues, 2016; Schomburg and colleagues, 2010). These were, for instance, fields of temperature, wind speed.

The atmospheric model we used is the weather prediction model of the German weather service. It has a horizontal resolution of 2.8 km and computes the state of the atmosphere every few seconds. We run the surface model TERRA at 400 m resolution. Below every atmospheric column of 2.8x2.8 km, there are 7x7 surface pixels.

The spatial variability of the land surface can be huge; there can be large differences in height, vegetation, soil type and humidity. It is also easier to run a surface model at a higher spatial resolution because it does not need to be computed so often, the variations in time are smaller.

To be able to make downscaling rules, we needed to know how much variability the 400x400 m atmospheric fields should have. We study this using a so-called training dataset, which was made by making atmospheric model runs with 400 m resolution for a smaller than usual area for a number of days. This would be too much computer power for a daily weather prediction for all of Germany, but a few days on a smaller region are okay. An additional number of 400 m model runs was made to be able to validate how well the downscaling rules work on an independent dataset.

The figure below shows an example for temperature during the day. The panel to the left shows the coarse temperature field after smoothing it with a spline, which preserves the coarse scale mean. The panel in the middle shows the temperature field after downscaling with an example downscaling rule. This can be compared to the 400 m atmospheric field the coarse field was originally computed from on the right. During the day, the downscaling of temperature works very well.

The figure below is the temperature field at night during a clear sky night. This is a difficult case. On cloud-free nights the air close to the ground cools and gathers in the valleys. These flows are quite close to the ground, but a good rule was to take the temperature gradient in the lower model layers and multiply it with the height anomalies (height differences from spline-smoothed coarse field).

Having a population of Pareto optimal solutions is one advantage of our approach. There is normally a trade of between the size of the solution and its performance and having multiple solutions means that you can study this and then chose a reasonable compromise.

Contrary to working with artificial neural networks as machine learning method, the GP solution is a piece of code, which you can understand. You can thus select a solution that makes sense physically and thus more likely works as well in situation that are not in the training dataset. You can study the solutions that seem strange and try to understand why they work and gain insight into your problem.

This statistical downscaling as an interface between two physical models is a beautiful synergy of statistics and physics. Physics and statistics are often presented at antagonists, but they actually strength each other. Physics should inform your statistical analysis and the above is an example where statistics makes a physical model more realistic (not performing a downscaling is also a statistical assumption, just less visible and less physical).

I would even argue that the most interesting current research in the atmospheric sciences merges statistics and physics: ensemble weather prediction and decadal climate prediction, bias corrections of such ensembles, model output statistics, climate model emulators, particle assimilation methods, downscaling global climate models using regional climate models and statistical downscaling, statistically selecting representative weather conditions for downscaling with regional climate models and multivariate interpolation. My work on adaptive parameterisation combining the strengths of more statistical parameterisations with more physical parameterisations is also an example.

Related reading

On cloud structure

An idea to combat bloat in genetic programming


Poli, R., W.B. Langdon and N. F. McPhee, 2016: A field guide to genetic programming. Published via Lulu.com (With contributions by J. R. Koza).

Schomburg, A., V.K.C. Venema, R. Lindau, F. Ament and C. Simmer, 2010: A downscaling scheme for atmospheric variables to drive soil-vegetation-atmosphere transfer models. Tellus B, doi: 10.1111/j.1600-0889.2010.00466.x, 62, no. 4, pp. 242-258.

Zerenner, Tanja, Victor Venema, Petra Friederichs and Clemens Simmer, 2016: Downscaling near-surface atmospheric fields with multi-objective Genetic Programming. Environmental Modelling & Software, in press.

Zitzler, Eckart and Lothar Thiele, 1999: Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach. IEEE transactions on Evolutionary Computation 3.4, pp. 257-271, 10.1109/4235.797969.

* Sierpinski fractal at the top was generated by Nol Aders and is used under a GNU Free Documentation License.

* Photo of mountain with clouds all around it (Cloud shroud) by Zoltán Vörös and is used under a Creative Commons Attribution 2.0 Generic (CC BY 2.0) license.

Wednesday, 3 August 2016

Climate model ensembles of opportunity and tuning

Listen to grumpy old men.

As a young cloud researcher at a large conference, enthusiastic about almost any topic, I went to a town-hall meeting on using a large number of climate model runs to study how well we know what we know. Or as scientists call this: using a climate model ensemble to study confidence/uncertainty intervals.

Using ensembles was still quite new. Climate Prediction dot Net had just started asking citizens to run climate models on their Personal Computers (old big iPads) to get the computer power to create large ensembles. Studies using just one climate model run were still very common. The weather predictions on the evening television news were still based on one weather prediction model run; they still showed highs, lows and fronts on static "weather maps".

During the questions, a grumpy old men spoke up. He was far from enthusiastic about his new stuff. I see a Statler or Waldorf angrily swing his wooden walking stick in the air. He urged everyone, everyone to be very careful and not to equate the ensemble with a sample from a probability distribution. The experts dutifully swore they were fully aware of this.

They likely were and still are. But now everyone uses ensembles. Often using them as if they sample the probability distribution.

Before I wrote about the problems confusing model spread and uncertainty made in the now mostly dead "hiatus" debate. That debate remains important: after the hiatus debate is before the hiatus debate. The new hiatus is already 4 month old.* And there are so many datasets to select a "hiatus" from.

Fyfe et al. (2013) compared the temperature trend from the CMIP ensemble (grey histogram) to observations (red something) implicitly assuming that the model spread is the uncertainty. While the estimated trend is near the model spread, it is well within the uncertainty. The right panel is for a 20 year period: 1993–2012. The left panel starts in the cherry picked large El Nino year: 1998–2012.

This time I would like to explain better why the ensemble model spread is typically smaller than the confidence interval. These reasons suggest other questions where we need to pay attention: It is also important for comparing long-term historical model runs with observations and could affect some climate change impact studies. For long-term projections and decadal climate prediction it is likely less relevant.

Reasons why model spread is not uncertainty

One climate model run is just one realisation. Reality has the same problem. But you can run a model multiple times. If you change the model fields you begin with just a little bit, due to the chaotic nature of atmospheric and oceanic flows a second run will show a different realisation. The highs, lows and fronts will move differently, the ocean surface is consequently warmed and cooled at different times and places, internal modes such as El Nino will appear at different times. This chaotic behaviour is mainly found at the short time scales and is one reason for the spread of an ensemble. And it is one reason to expect that model spread is not uncertainty because models focus on getting the long term trend right and differ strongly when it comes to the internal variability.

But that is just reason one. The modules of a climate model that simulate specific physical processes have parameters that are based on measurements or more detailed models. We only know these parameters within some confidence interval. A normal climate model takes the best estimate of these parameters, but they could be anywhere within the confidence interval. To study how important these parameters are special "perturbed physics" ensembles are created where every model run has parameters that vary within the confidence interval.

Creating a such an ensemble is difficult. Depending on the reason for the uncertainty in the parameter, it could make sense to keep its value constant or to continually change it within its confidence interval and anything in between. It could make sense to keep the value constant over the entire Earth or to change it spatially and again anything in between. The parameter or how much it can fluctuate may dependent on the local weather or climate. It could be that parameter X is high also parameter Y is high (or low); these dependencies should also be taken into account. Finally, also the distributions of the parameters needs to be realistic. Doing all of this for the large number of parameters in a climate model is a lot of work, typically only the most important ones are perturbed.

You can generate an ensemble that has too much spread by perturbing the parameters too strongly (and by making the perturbations too persistent). If you do it optimally, the ensemble would still show too little spread because not all physical processes are modelled because they are thought not to be important enough to justify the work and the computational resources. Part of this spread can be studied by making ensembles using many different models (multi-model ensemble), which are developed by different groups with different research questions and different ideas what is important.

That is where the title comes in: ensembles of opportunity. These are ensembles of existing model runs that were not created to be an ensemble. The most important example is the ensemble of the Coupled Models Intercomparison Project (CMIP). This group coordinates the creating of a set of climate model runs for similar scenarios, so that the results of these models can be compared with each other. This ensemble will automatically sample the chaotic flows and it is a multi-model ensemble, but it is not a perturbed physics ensemble; these model runs are model aiming at the best possible reproduction of what happened. For this reason alone the spread of the CMIP ensemble is expected to be too low.

The term "ensembles of opportunity" is another example the tendency of natural scientists to select neutral or generous terms to describe the work of colleagues. The term "makeshift ensemble" may be clearer.

Climate model tuning

The CMIP ensemble also has too little spread when it comes to the global mean temperature because the model are partially tuned to it. There is just an interesting readable article out on climate model tuning in BAMS**, which is intended for a general audience. Tuning has a large number of objectives, from getting the mean temperature right to the relationship between humidity and precipitation. There is also a section on tuning to the magnitude of warming the last century. It states about the historical runs:
The amplitude of the 20th century warming depends primarily on the magnitude of the radiative forcing, the climate sensitivity, as well as the efficiency of ocean heat uptake. ...

Some modeling groups claim not to tune their models against 20th century warming, however, even for model developers it is difficult to ensure that this is absolutely true in practice because of the complexity and historical dimension of model development. ...

There is a broad spectrum of methods to improve model match to 20th century warming, ranging from simply choosing to no longer modify the value of a sensitive parameter when a match is already good for a given model, or selecting physical parameterizations that improve the match, to explicitly tuning either forcing or feedback both of which are uncertain and depend critically on tunable parameters (Murphy et al. 2004; Golaz et al. 2013). Model selection could, for instance, consist of choosing to include or leave out new processes, such as aerosol cloud interactions, to help the model better match the historical warming, or choosing to work on or replace a parameterization that is suspected of causing a perceived unrealistically low or high forcing or climate sensitivity.
Due to tuning models that have a low climate sensitivity tend to have stronger forcings over the last century and model with a high climate sensitivity a weaker forcing. The forcing due to greenhouse gasses does not vary much, that part is easy. The forcings due to small particles in the air (aerosols) that like CO2 stem from the burning of fossil fuels and are quite uncertain and Kiehl (2007) showed that high sensitivity models tend to have more cooling due to aerosols. For a more nuanced updated story see Knutti et al. (2008) and Forster et al. (2013).

Kiehl (2007) found an inverse correlation between forcing and climate sensitivity. The main reason for the differences in forcing was the cooling by aerosols.
This "tuning" initially was not an explicit tuning of model parameters, but mostly because modellers keep working until the results look good. Look good compared to observations. Bjorn Stevens talks about this in an otherwise also recommendable Forecast episode.

Nowadays the tuning is often performed more formally and an important part of studying the climate models and understanding their uncertainties. The BAMS article proposes to collect information on tuning for the upcoming CMIP. In principle a good idea, but I do not think that that is enough. In a simple example of climate sensitivity and aerosol forcing, the groups with low sensitivity and forcing and the ones with high sensitivity and forcing are happy with their temperature trend and will report not to have tuned. But that choice also leads to too little ensemble spread, just like the groups that did need to tune. Tuning makes it complicated to interpret the ensemble, it is no problem for a specific model run.

Given that we know the temperature increase, it is impossible not to get a tuned result. Furthermore, I mention several additional reasons why the model spread is not the uncertainty above that complicate the interpretation of the ensemble in the same way. A solution could be to follow the work in ensemble weather prediction with perturbed-physics ensembles and to tune all models, but to tune them to cover the full range of uncertainties that we estimate from the observations. This should at least cover the the climate sensitivity and ocean heat uptake, but preferably also other climate characteristics that are important for climate impact and climate variability studies. Large modelling centres may be able to create such large ensembles by themselves, the others could coordinate their work in CMIP to make sure the full uncertainty range is covered.

Historical climate runs

Because the physics is not perturbed and especially due to the tuning, you would expect that the CMIP ensemble spread is too low for global mean temperature increase. That the CMIP ensemble average fits well to the observed temperature increase shows that with reasonable physical choices we can understand why the temperature increased. It shows that known processes are sufficient to explain it. That is fits so accurately, does not say much. I liked the title of an article from Reto Knutti (2008): "Why are climate models reproducing the observed global surface warming so well?" Which implies it all.

Much more interesting to study how good models are, are spatial patterns and other observations. New datasets are greeted with much enthusiasm by modellers because they allow for the best comparison and are more likely to show new problems that need fixing and lead to a better understanding. Also model results for the deep past are important tests, which models are not tuned for.

That the CMIP ensemble mean fits to the observations is no reason to expect that the observations are reliable

When the observations peak out of this too narrow CMIP ensemble spread that is to be expected. If you want to make a case that our understanding does not fit to the observations, you have to take the uncertainties into account, not the spread.

Similarly, that the CMIP ensemble mean fits to the observations is no reason to expect that the observations are reliable. Because of the overconfidence in the data quality also many scientists took the recent minimal deviations from the trend line too seriously. This finally stimulated more research into the accuracy of temperature trends, into inhomogeneities in the ERSST sea surface temperatures, into the effect of coverage and how we blend sea, land and ice temperatures together. There are some more improvements under way.

Compared to the global warming of about 1°C up to now, these recent and upcoming corrections are large. Many of the problem could have been found long ago. It is 2016. It is about time to study this. If funding is an issue we could maybe sacrifice some climate change impact studies for wine. Or for truffles. Or caviar. The quality of our data is the foundation of our science.

That the comparison of the CMIP ensemble average with the instrumental observation is so central to the public climate "debate" is rather ironic. Please take a walk in the forest. Look at all the different changes. The ones that go slower as well as the many that go faster than expected.

Maybe it is good to emphasise that for the attribution of climate change to human activities, the size of the historical temperature increase is not used. The attribution is made via correlations with the 3-dimensional spatial patterns between observations and models. By using the correlations (rather than root mean square errors), the magnitude of the change in either the models or the observations is no longer important. Ribes (2016) is working on using the magnitude of the changes as well. This is difficult because of inevitable tuning, which makes specifying the uncertainties very difficult.

Climate change impact studies

Studying the impacts of climate change is hard. Whether dikes break depends not only on sea level rise, but also on the changes in storms. The maintenance of the dikes and the tides are important. It matters whether you have a functioning government that also takes care of problems that only become apparent when the catastrophe happens. I would not sleep well if I lived in an area where civil servants are not allowed to talk about climate change. Because of the additional unnecessary climate dangers, but especially because that is a clear sign of a dysfunctional government that does not prioritise protecting its people.

The too narrow CMIP ensemble spread can lead to underestimates of the climate change impacts because typically the higher damages from stronger than expected changes are larger than the reduced damages from smaller changes. The uncertainty monster is not our friend. Admittedly, the effect of the uncertainties is rather modest. This this is only important for those impacts we understand reasonably well already. The lack of variability can be partially solved in the statistical post-processing (bias correction and downscaling). This is not common yet, but Grenier et al. (2015) proposed a statistical method to make the natural variability more realistic.

This problem will hopefully soon be solved when the research programs on decadal climate prediction mature. The changes over a decade due to greenhouse warming are modest, for decadal prediction we thus especially need to accurately predict the natural variability of the climate system. An important part of these studies is assessing whether and which changes can be predicted. As a consequence there is a strong focus on situation specific uncertainties and statistical post-processing to correct biases of the model ensemble in the means and in the uncertainties.

In the tropics decadal climate prediction works reasonably well and helps farmers and governments in their planning.

In the mid-latitudes, where most of the researchers live, it is frustratingly difficult to make decadal predictions. Still even in that case, we would still have an ensemble where the ensemble can be used as a sample of the probability distribution. That is important progress.

When a lack of ensemble spread is a problem for historical runs, you might expect it to be a problem for projecting for the rest of the century. This is probably not the case. The problem of tuning would be much reduced because the influence of aerosols will be much smaller as the signal of greenhouse gasses becomes much more dominant. For long term projections the main factor is that the climate sensitivity of the models needs to fit to our understanding of the climate sensitivity from all studies. This fit is reasonable for the best estimate of the climate sensitivity, which we expect to be 3°C for a doubling of the CO2 concentration. I do not know how well the fit is for the spread in the climate sensitivity.

However, for long-term projections even the climate sensitivity is not that important. For the magnitude of the climatic changes in 2100 and for the impact of climate change in 2100, the main source of uncertainty is what we will do. As you can see in the figure below the difference between a business as usual scenario and strong climate policies is 3 °C (6 °F). The uncertainties within these scenario's is relatively small. Thus the main question is whether and how aggressively we will act to combat climate change.

Related information

Discussion paper suggesting a path to solving the difference between model spread and uncertainty by James Annan and Julia Hargreaves: On the meaning of independence in climate science.

Is it time to freak out about the climate sensitivity estimates from energy budget models?

Fans of Judith Curry: the uncertainty monster is not your friend

Are climate models running hot or observations running cold?

Forecast: Gavin Schmidt on the evolution, testing and discussion of climate models

Forecast: Bjorn Stevens on the philosophy of climate modeling

The Guardian: In a blind test, economists reject the notion of a global warming pause


Forster, P.M., T. Andrews, P. Good, J.M. Gregory, L.S. Jackson, and M. Zelinka, 2013: Evaluating adjusted forcing and model spread for historical and future scenarios in the CMIP5 generation of climate models. Journal of Geophysical Research, 118, 1139–1150, doi: 10.1002/jgrd.50174.

Fyfe, John C., Nathan P. Gillett and Francis W. Zwiers, 2013: Overestimated global warming over the past 20 years. Nature Climate Change, 3, pp. 767–769, doi: 10.1038/nclimate1972.

Golaz, J.-C., J.-C. Golaz, and H. Levy, 2013: Cloud tuning in a coupled climate model: Impact on 20th century warming. Geophysical Research Letters, 40, pp. 2246–2251, doi: 10.1002/grl.50232.

Grenier, Patrick, Diane Chaumont and Ramón de Elía, 2015: Statistical adjustment of simulated inter-annual variability in an investigation of short-term temperature trend distributions over Canada. EGU general meeting, Vienna, Austria.

Hourdin, Frederic, Thorsten Mauritsen, Andrew Gettelman, Jean-Christophe Golaz, Venkatramani Balaji, Qingyun Duan, Doris Folini, Duoying Ji, Daniel Klocke, Yun Qian, Florian Rauser, Cathrine Rio, Lorenzo Tomassini, Masahiro Watanabe, and Daniel Williamson, 2016: The art and science of climate model tuning. Bulletin of the American Meteorological Society, published online, doi: 10.1175/BAMS-D-15-00135.1.

Kiehl, J.T., 2007: Twentieth century climate model response and climate sensitivity. Geophysical Research Letters, 34, L22710, doi: 10.1029/2007GL031383.

Knutti, R., 2008: Why are climate models reproducing the observed global surface warming so well? Geophysical Research Letters, 35, L18704, doi: 10.1029/2008GL034932.

Murphy, J.M., D.M.H. Sexton, D.N. Barnett, G.S. Jones, M.J. Webb, M. Collins and D.A. Stainforth, 2004: Quantification of modelling uncertainties in a large ensemble of climate change simulations. Nature, 430, pp. 768–772, doi: 10.1038/nature02771.

Ribes, A., 2016: Multi-model detection and attribution without linear regression. 13th International Meeting on Statistical Climatology, Canmore, Canada. Abstract below.

Rowlands, Daniel J., David J. Frame, Duncan Ackerley, Tolu Aina, Ben B. B. Booth, Carl Christensen, Matthew Collins, Nicholas Faull, Chris E. Forest, Benjamin S. Grandey, Edward Gryspeerdt, Eleanor J. Highwood, William J. Ingram, Sylvia Knight, Ana Lopez, Neil Massey, Frances McNamara, Nicolai Meinshausen, Claudio Piani, Suzanne M. Rosier, Benjamin M. Sanderson, Leonard A. Smith, Dáithí A. Stone, Milo Thurston, Kuniko Yamazaki, Y. Hiro Yamazaki & Myles R. Allen, 2012: Broad range of 2050 warming from an observationally constrained large climate model ensemble. Nature Geoscience, 5, pp. 256–260, doi: 10.1038/ngeo1430 (manuscript).

Aurélien Ribes
Abstract. Conventional D&A statistical methods involve linear regression models where the observations are regressed onto expected response patterns to different external forcings. These methods do not use physical information provided by climate models regarding the expected response magnitudes to constrain the estimated responses to the forcings. As an alternative to this approach, we propose a new statistical model for detection and attribution based only on the additivity assumption. We introduce estimation and testing procedures based on likelihood maximization. As the possibility of misrepresented response magnitudes is removed in this revised statistical framework, it is important to take the climate modelling uncertainty into account. In this way, modelling uncertainty in the response magnitude and the response pattern is treated consistently. We show that climate modelling uncertainty can be accounted for easily in our approach. We then provide some discussion on how to practically estimate this source of uncertainty, and on the future challenges related to multi-model D&A in the framework of CMIP6/DAMIP.

* Because this is the internet, let me say that "The new hiatus is already 4 month old." is a joke.

** The BAMS article calls any way to estimate a parameter "tuning". I would personally only call it tuning if you optimize for emerging properties of the climate model. If you estimate a parameter based on observations or a specialized model, I would not call this tuning, but simply parameter estimation or parameterization development. Radiative transfer schemes use the assumption that adjacent layers of clouds are maximally overlapped and that if there is a clear layer between two cloud layers that they are random overlapped. You could introduce two parameters that vary between maximum and random for these two cases, but that is not done. You could call that an implicit parameter, which shows that distinguishing between parameter estimation and parameterization development is hard.

*** Photo at the top: Grumpy Tortoise Face by Eric Kilby, used under a Attribution-ShareAlike 2.0 Generic (CC BY-SA 2.0) license.