Data Mining is the
process of discovering and extracting concrete knowledge from large data sets in a way that is understandable,
compact and applicable to real life problems. Fayyad, Piatetsky-Shapiro and
Smyth (1996) succinctly refer to it as “pattern and knowledge discovery in
databases”. A powerful image I like to use is thinking of Data Mining as a funnel of information, processing large
amounts of data and information that persons cannot normally observe and
understand in their entirety; and producing a compact and summarised version
that extracts the key knowledge to be learned in an understandable way for the
human user, who can then claim to have informed his decisions from as much
information as was available.
You may not know it yet,
but currently Data Mining methods are changing almost every industry in the
world. Science most notably of course, but other very “real-life” day to day
industries such as medicine, marketing and even street-light coordination to
optimise traffic flow. Banks use Data Mining to decide whether to give someone
a loan, and Facebook’s facial recognition software that predicts which friend
you want to tag in a picture has Data Mining at its core. Even Post Offices use
Data Mining methods to codify hand-written addresses to remove the necessity of
a human reading them. The 4th season of Netflix’s House of Cards
seems to imply that Data Mining applied to politics can win elections.
Football is one such
industry that is beginning to feel its way around the new paradigms. There is a
massive collection of data going on at the moment, with companies such as OPTA
or Prozone recording millions of events, in-game statistics and other
information on the game and the surrounding industry; much more data than a
single person can look at and decipher the information contained. Who can come
to the football’s aid?
The involvement of
mathematical methods in the game has become a misleading mediatised debate, not
helped by the “Hollywood-isation” of Moneyball that oversimplifies the role of
Data Mining as a magic crystal ball to discover talent. Sceptics seem to think
that involving math into the game will damage it, replacing the subtleties and aesthetic
flowing nature of football with something radically different and rather cold,
based on incomprehensibly looking at thousands of numbers on spreadsheets or “number
crunching” on calculators.
Let me assure you now:
as a mathematician, I haven’t spent a single minute of my life looking into
Excel spreadsheets of thousands of numbers or “crunching” numbers into a
calculator (I don’t even own one). There’s no knowledge to be learned by doing
these things, my brain is incapable of interpreting it that simply; not without
the help of much more elegant methods that know how to extract the information
from these large databases and present them to me in compact and useful ways
that I can actually understand.
In truth, mathematics
are enhancing the game, not replacing anything. On the contrary; the wisdom and
intuitive expertise of experienced football men at for example recognizing technically
gifted young players or designing successful game tactics can be studied and codified into these techniques so that
we can build upon their knowledge. In fact, this is what most Data Mining
techniques rely on: using previous successes, knowledge and expectations to
design their own criteria for what to expect from their own performance, which
is what methods such as supervised machine learning are ultimately all about.
However, just as valuable expertise gained organically can be integrated into
these techniques to make them richer, it must also be said that human intuition
is inevitably flawed and prone to mistakes. As a trivialised example, the
judgement of a football scout can be thrown off by a player’s good looks, or he
can be subconsciously biased towards liking left-footed strikers more, potentially
causing him to produce inaccurate valuations. We all know this to be true of
ourselves and of our judgements, and anyone who claims to not be victim of
tangible biases while making decisions on a daily basis is not being honest
with himself. Data mining on the other hand has no pre-conceived ideas, no prejudices
or biases. I obviously cannot yet claim to understand the inner workings of
football clubs nearly enough to pass judgement on their performance, but a
simple review of a handful of studies by respected academics in the football
industry (Anderson and Sally, 2013; Kuper and Szymanski, 2009) seems to reveal
that there are still plenty of inefficiencies to be addressed. I believe that
methods that can identify and address these inefficiencies should be greeted
with enthusiasm by those who love the game, not mistrust.
Data Mining is not
necessarily a noisy new neighbour in town disrupting everyone’s way of life.
It’s not a revolutionary new way of
doing things by pushing tradition off a cliff, but rather it is the gradual and
natural continuation of humanity’s essential practice of interpreting
information and creating knowledge to inform decision making, simply adapted
into the 21st century where globalization and technological
development exponentially increase the scale of available information. Our
understanding of topics as subjective as human behaviour for example have been
greatly enriched by this trend, with behavioural economics now dominating
decisions in government policy, marketing, investment banking, etc. For
football men there are now so many more sources of information than they have
ever had before to inform their decisions and shape their actions. Arguably, so
much information is available that it is beyond a single person to store,
codify and aggregate it in order for tangible recommendations to be drawn out;
and this is why mathematics must be called into action (to funnel the information). Opting against using mathematics to tap
into the whole potential of available knowledge in the huge databases makes no
sense, and those who make this decision will inevitably fall behind by missing
out on the competitive advantage of knowing more than your opponents.
Many sceptics throw
around phrases such as “numbers can’t tell you everything” and jump at the
opportunity to signal out instances where statistics-led approaches have failed
such as Damien Comolli’s transfer dealings while Director of Football at Liverpool.
To them I would like to point out this: even Data Mining methods have a human
component; there is a person behind designing and implementing the methods used,
and subsequently on the receiving end of the methods to interpret and make use
of the results. Applied poorly some techniques can reveal nothing worthwhile;
but applied with creativity and skill these methods can produce some truly
revolutionising discoveries. Ultimately, football clubs must ensure they hire good
mathematicians to tap into these benefits.
I do not like football any
less by observing it under the lens of mathematics; on the contrary, each day I
feel like I gain a richer understanding of it that motivates and captivates me
even more. I hope that this blog will do the same for you.
REFERENCES
1.
Fayyad,
U., Piatetsky-Shapiro, G. and Smyth, P. (1996). The KDD process for extracting
useful knowledge from volumes of data. Communications of the ACM,
39(11), pp.27-34.
2.
Anderson, C. and Sally, D., (2013). The numbers game: why everything you know about football is
wrong. Penguin UK.
3.
Kuper,
S. and Szymanski, S. (2009). Soccernomics. New York: Nation Books.
No comments:
Post a Comment