Friday, 15 June 2012

Information and statistics - thoughts following Kevin McConway's Inaugural Lecture

Kevin McConway, Open University Professor of Applied Statistics, in his Inaugural Lecture: Statistical thinking: the good, the bad and the ugly, "explored how, for many people statistics is about firm, objective answers that are either right or wrong while one of its main goals is coping with uncertainty, not certainty".

The lecture should be available here http://stadium.open.ac.uk/berrill/ in due course, and is well worth watching: informative, authoritative and entertaining.

I think he only used the word information once and that in reference to the content of a website, but, for me, his talk was all about information. But then I see information ideas everywhere these days.

For one thing, he talked of data and knowledge (and 'facts', 'theory', 'opinion', 'evidence', 'truth', 'lies') which immediately put me in mind of the DIKW* hierarchy with information a notable absence (my issues with DIKW notwithstanding). Data in this context is largely taken to be numbers, and it is the job of statistics and statisticians to turn this into something more useful: knowledge, meaning, or, maybe, information. So perhaps we have a 'special case' of the trapezium, with the input at the bottom as numerical data and the trapezium as the statistics/statistician. I'm increasingly thinking, though, of information as somehow this process of getting meaning out of something. In parallel with a semiotic triangle, where the sign is emphatically not just the signifier but is the combination of signifier, signified and signification (or whatever you have chosen to call the vertices), so, too, information is defined by the input, the output and the trapezium itself. "The difference that makes a difference": "The difference", yes, but only together with "a difference" and "that makes a".

* DIKW: Data-Information-Knowledge-Wisdom. I thought I'd written about that previously, but I can't find anything. If I find it I'll add a link the post. If not, I'll blog about it at a later date! My key point is that none of these are absolute levels. Everything is relative.

A key message of Kevin's talk was about the misunderstanding/misuse/misinterpretation of the p-value in the null hypothesis, or rather the use of the null hypothesis generally. He referred to papers with titles like "Why most published research findings are false".  I'd thought, broadly speaking, that I understood how hypothesis testing works, but Kevin said something along the lines of 'if you think you understand it you probably don't', and a recurring theme of my life as an academic (and not just as an academic) has been to discover that I don't understand things I'd previously thought I did understand, so I'm sure Kevin is right.  The one thing I want to pick up on, though, is that Kevin said - I think - that testing against the null hypothesis is about looking for differences, and of course that word 'difference' brings us back to information!  The test is to see if there is a difference that can make a difference in the data. That is, to see if the data combined with the statistics and the conclusion constitute information.

No comments: