Sunday 6 July 2014

Debunking Big Data

One of the dangers of computer technology is that the facility in cranking vast calculations can lead to to a dulling of critical thought. Something like this is happening in the mass of 'big data' analytics which dominate the contemporary academic landscape. People produce pretty pictures, tables of key phrases and so on as if they were pulling bunnies from a hat: yet, like the bunnies, there is a sleight of hand going on - but one which most of us are struggling to fathom. It seems that abstract algorithmic calculations applied to the agglomerations of data that each of us has contributed to (through writing on social media, journalism, academic papers, etc) produce insights for us to gasp at and think "isn't it astonishing that Facebook knows so much!" or "well, now I know that, the next time I hear someone say..." But like any magic trick, the game is between the magician, their technique and the audience - and it works because the audience is led to believe they have seen something which they could not have expected to see. The debunkers of magic tricks show that the audience's perception that they could not have expected what they saw is in fact wrong: if the audience had thought logically, they would not have been at all surprised. The skill of the magician is to deflect the audience from logic. I'm sure this is why Heinz Von Foerster loved conjuring!

It is important to distinguish magic from science. Unfortunately, in our current academic landscape, there are many over-serious people who believe they are performing science, when they are in fact performing magic tricks. The nature of scientific discovery is, of course, disputed. For Hume, it is about regular successions of events and social construction of causes. "Obviously, this is wrong" says the wonderful Rom Harre, whose PhD student Roy Bhaskar went on to argue that the social constructivism in Hume's theory couldn't be right because beyond the closed-system conditions, the socially-constructed causes still operate: if they didn't, we wouldn't have got rockets to the moon! Bhaskar's solution is to argue that causes are real and discoverable. What scientists do is a process of 'retroduction' in the light of experience, but resulting explanations have efficacy within both the transitive (i.e. social) and intransitive (i.e. physical) realms of reality. Given this, in the physical and the social sciences, regularity is still fundamental. What Bhaskar argues is that the explanations for regularities (which are social) have nevertheless causal efficacy within the social realm because of their relation to the physical realm. Ultimately, this builds to a theory of science as critical, dialectical and emancipatory, and this is the really important thing that distinguishes magic from science: Science is underpinned by ethics and politics; magic isn't.

So what of social network analysis? There are many things to say about this, but the most obvious thing is the problem of any analysis: an analytical move is a power-move. The analyst's results inform decision and action. In computer-based data analytics, the ethics and politics of the decision are never computed: the algorithm is king, and with it, the inventor of the algorithm and the interpreter of the algorithm's results. But this is problematic when we look closely at what is represented. Firstly, there is the distinction between 'nodes' and 'arcs': a node is a node because it has an arc to another node. Whilst the diagrams give the impression that nodes and arcs are separate, really they are an 'expansion' of a single piece of information - the fact that X makes a declaration about Y; if X hadn't made a declaration about anyone, then X would not exist on the diagram, irrespective of whether X exists in reality or not. A node is an entity with declared relations to other entities with declared relations. Once we realise this, we might ask ourselves the extent to which we are already aware of these declarations: indeed, we ourselves are constituted by the existent declarations of others. That means the surprise we feel on seeing a diagram is the surprise of seeing something we already know, and that if we were logical, we would not be surprised about at all! If we become aware of this, then we would also become aware of the role of the analyst and the interpreter of the data and their own relations of declaration both to us, to the diagram, and to their arguments. The problem here is that the mere declaration of the power of social network analysis is itself a declaration which impacts us: we very soon get trapped in a web which we ourselves are spinning.

But this is not to say that there are not regularities in social network analysis. It is to say that the regularities exist between us (the audience), the technique (the algorithm) and the magician (the interpreter). Magic tricks may not be science, but a magic trick is itself a phenomenon which can be studied scientifically! Where are the regularities? Well, they exist in the ways people are confused. How can we investigate the ways people are confused? We can develop new ways of exposing the logic behind what is going on.

My personal view is that mathematical category theory is the best tool to do this. It allows for any diagram to be exposed for the nature of its relations with a set of observers each of whom can have a different perspective. It can then explore the logic of different observer perspectives.  (I'll elaborate on this in a future post.) Most importantly, it can provide a logic which situates the observer of the diagram within the diagram in the context of what they know from outside the diagram. When observers see this, it is surprising how surprise disappears! But that isn't magic. 

No comments: