At the end of the last entry
I touched on the trade-off between comparable structure and ‘granularity’ or ‘level
of detail’ of football data. Imagine this: you have a player who has the
ability to pick a certain type of between-lines pass that greatly increases
your team’s chance of scoring in that play. With passing data, we could try and
identify how this pass is represented in the data and then use the data to
identify other players in the world that are also good at making this kind of
pass. To do this, we will need to break the data down in a detailed way, and
differentiate this type of pass from other vertical passes that perhaps aren’t
as effective. We might want to consider where in the pitch this type of pass
comes from or where it finishes, what action it is preceded by, where the
defenders are, what happens after the pass is made, etc.; all of this in the
hope that we can clearly identify the type of pass we’re looking for and tell
it apart from other passes. However, what happens when we go too far and use
too much detail? It is unlikely that each time our player performs this type of
pass he does it in exactly the same
way, in the same coordinates of the pitch and in the same conditions. If we
start using too much detail, we might actually start classifying instances of
this type of pass as different when in reality they correspond to the same sought-after
‘type’. Once this happens we are no longer capable of identifying the “type of
pass” we were initially looking for, and now have hundreds of different passes
that at this level of detail are all different from each other. We can no
longer identify players who can play this pass because this “type” was
obliterated in the detail.
I actually began thinking about
this issue when reading
Dustin Ward’s piece on clustering different
types of passes. He decides to take 100 clusters or types of passes and see how
often each team or player completes each of those types of passes. This is a
good example of the trade-off mentioned above. 100 seems like a good number, it
certainly reveals more info about a team than if we considered just 1 (which
would basically be like looking at overall Pass Success percentages) or 2
types/clusters of passes.
Choosing 100 is also better than
choosing 100,000. If we chose 100,000, then each player or team would perform
maximum 1 instance in a season of highly detailed, highly differentiated types
of passes. We wouldn’t be able to use this information to compare teams or
players in any way. But is choosing 100 better than choosing 120? Or than
choosing 80? How do we know when this trade-off is striking the right balance?
The
key is having something against which to measure ‘balance’, something we want
to optimise. In this entry I’ll show you an example of how this something could
be ‘repeatability’:
For a while I’ve been wanting to
push the Passing Motifs methodology a bit further and include some spatial information
about the passes to see what else it can tell us about teams’ passing networks.
Below is an example of two very different instances of ABAC.
The question I wanted to answer
was this: will we gain any additional valuable information about teams by
differentiating different ‘types’ of motifs according to their angles,
distances and coordinates on the pitch? Crucially, I also wanted to know where
the right balance would be when doing this differentiation in light of our
structure-detail trade-off.
There are two ways of looking at spatial variables
associated to motifs that I felt could be revealing:
x-y Coordinates of Passes: In Opta’s data files, each pass
has a ‘Start x-y’ Coordinate and an ‘End x-y’ Coordinate, meaning each pass has
four variables in terms of coordinates. A 3-pass long motif would therefore
have a set of 12 variables representing where its passes began and ended.
Angles
and Lengths: Another
way of looking at it is by the ‘angles’ and ‘lengths’ of passes in a motif. The
figure below illustrates how these are found.
With this idea we would have six
variables associated with a motif: the angle of each of the 3 passes of the
motif and the length of each of the 3 passes as well.
NOTE: The thing I like about this ‘angles+lengths’ idea is that it doesn’t “care”
where in the pitch a motif happened, only its geometric structure. I like this
because if it has ‘structure’ or ‘insight’ into teams’ styles it will not be as
heavily determined by whether the team dominates the opposition or not: if we only
look at pitch coordinates of motifs then top teams like Chelsea or Manchester
City will perform all of their motifs high up the pitch. Therefore, the method
would be biased towards saying they perform the same ‘types’ of motifs, namely “high
up the pitch” motifs. I’m not saying that this isn’t meaningful, but it is
information we all know by simply looking at the league table and knowing these
teams play deeper into their opposition’s half. However, if we discover
structure that is independent of the
league table from the geometric shape of motifs, it makes it interesting in the
sense that perhaps it wasn’t correlated with "obvious" aspects.
Whichever of the two ideas we go for, there is
going to be a set of variables associated to each motif, which we can then use
k-means clustering to classify into a certain number of different types of
motifs. Our intuition from the trade-off tells us that there is an intermediate
number of categories that has the best representation of “style”. The problem
is that to use a k-means clustering algorithm, we need to manually tell the
algorithm how many different categories we want before knowing this optimum
number.
Consider this: for each choice of
number of categories, once we have determined the number of categories and
classified the different motifs into the category they correspond to, we can
use the best practice we know from the original passing motif methodology and
look at what percentage of each motif category (in the ABAC-sense) corresponds
to which ‘type’ (in a either a x-y coordinate or angles+length sense). So as an
example, if we had chosen to have 3 different types of motifs, then for each
team we would have this set of numbers: what percentage of the teams ABAB
motifs are type 1, what percentage of the ABABs are type 2, what percentage of
the ABABs are type 3, what percentage of the teams ABAC motifs are type 1, what
percentage of the ABACs are type 2, etc. What we’ll have is a vector
representing each team.
Now suppose we randomly divide
each team’s motifs into two different sets, so now we have Arsenal’s A motifs
and Arsenal’s B motifs as if we were artificially considering each as the
motifs of different teams. If choosing this number of categories reveals teams’
structure or style, then the style attributed to Arsenal’s A vector should be
very similar to the style attributed to Arsenal’s B vector. The more underlying
structure we’re capturing, the more this effect should be obvious. If on the
other hand we’ve gone too far and now the extreme detail is overshooting the
underlying structure we want to discover, then Arsenal’s A vector will not necessarily
be similar to Arsenal’s B vector because the extreme detail is damaging the
comparability of styles. This is what I mean by “repeatability”.
The following graph reveals how “repeatable”
each choice of number of categories is for both the ‘x-y coordinates’ idea and
the ‘angles+lengths’ idea:
The methodology is as follows: for
each number from 2 to 50, we create that number of motif categories using
k-means clustering and assign each motif to a category. We then divide randomly
each team’s motifs into two different sets to have a vector A and vector B for
each team. Then we check how “repeatable” the methodology by checking on
average how close teams’ A vector is to their B vector in comparison to the
rest of vectors representing other teams; and this process (since it involves
both the randomness of a k-means algorithm and the division of a teams motifs
into two sets) is repeated a hundred times for each number. The graph shows as
a percentage the average ‘relative closeness’ for each of the hundred trials as follows: I took each teams’ two vectors and determined on a scale of 1 to 39 how close a team’s A vector was to his B vector. Since there are 20 teams and I divided each one into two vectors, there will be a total of 40 vectors representing 'styles'. Considering as a focal point a team’s A vector, its B vector could either be the closest of the other 39 vectors (1), the second closest (2), all the way up to the farthest away (39). I did this for every team and averaged these numbers, to finally compute the percentage that the outcome was of 39 (this was done using passing data for the 2015-16 Premier League season).
Right off the bat we find evidence
of the balance we’ve been speaking of. When we start increasing the number of
categories we start obtaining more repeatability, meaning we can more closely
recognise two vectors as being the A and B vectors of the same team because
they are similar (i.e. close) to each other. I like to interpret this as
uncovering more underlying information that uniquely identifies a team’s
passing network style: no matter how we randomly divide a team’s motifs into two sets, we roughly still know which sets correspond to the same team because
we know the “style”. We then reach an optimum number of categories for which
this repeatability is optimised: for the ‘x-y coordinates’ idea it’s at 9 and for the ‘angles+lengths’ idea it’s at 13. After this, the
repeatability starts to decrease meaning that a team’s A and B vectors start to
not be as similar to each other because they’re made up of highly detailed
motifs that are overshooting the underlying “style” of what it actually is that
a team inherently does with its passing networks.
We have answered our initial question: The
original passing motif methodology (found in this entry) in which we
took the 5 different motifs and compared teams according to how much they
relatively used each motif had about 83.3% repeatability as per our
methodology. By breaking motifs down into an optimum number of categories
for the ‘x-y coordinates’ and ‘angles+lengths’ ideas (9 and 13 respectively), we were able to increase our repeatability to 94.3% and 84.4% respectively (evidently the 'x,y Coordinates' has better repeatability than 'Angles+Lengths', but as we said before any structure from a purely geometrical classification is interesting).
Below is a set of boxplots illustrating what the 9 categories represent in terms of the different 'x-y' coordinates:
As an illustration, if you look at categories 4 and 8, they both begin a bit past the halfway line really close to the left touchline, but while in Category 4 motifs the three-pass sequence ends a bit further up but still on the left hand touchline, the Category 8 motifs made their way across the pitch to finish closer to the right hand touchline.
The 94.3% repeatability of the 'x-y coordinates 9 category' vectorisation is incredibly high. In fact, if we remove Sunderland and West Bromwich which for some reason only have 80.2% and 83.5% repeatability respectively, the other teams have an average repeatability of 95.7%!
These results mean that we’ve
managed to pin down an underlying structure in teams’ passing networks that
allow us to identify unique team styles (lets call it "Passing Network Autographs") with a high degree of confidence. We’re
at the point where if we’re given a set of motifs we could have a robust educated
guess at which team they correspond to and most likely be right (except perhaps
if they belong to West Brom or Sunderland for some reason). As an example, below is a comparison of Arsenal's autograph versus Bournemouth's (the team whose 'autograph' most differs from Arsenal's):
Perhaps some
readers might be unimpressed with this rather theoretical and un-applied
result, but although I admit that in its raw form this seems a bit unmanageable, I would advise them to keep an open mind and think of the
potential. For example, having such a reliable ‘passing network autograph’ for
teams, we can look through players from outside the Premier League and find
those whose current passing network best fits within a team’s autograph. We
could also use our measure of team style to try and predict which styles are
more effective against each other, or which defenses are the best at
interrupting the attacking flow through a team’s passing network. These
possibilities probably sound more appealing to most readers, but in order to do
them in a meaningful way they must be underpinned by theoretical confidence
that we are indeed identifying team styles. I will try and follow up this
theoretical entry with a more applied one exploring some of these possibilities
later this month.
I want to finish this entry off
by highlighting the important potential of generalisation these ideas have. I
feel they’ve helped me establish best practice when it comes to breaking
passing motifs into different categories according to their spatial properties (and by best practice I mean knowing how many categories I should break it up into);
but the method can also be used to determine best practice in other ideas
currently being explored by football analysts. For example, during my Opta
Forum Presentation’s Q&A,
Marek Kwiatkowski asked whether the
passing motif methodology could be generalised to motifs of more than 3 passes.
The answer is that it can, but we run the risk of going too far and start
overshooting the structure that the methodology helped us identify as team and
player passing style: for 3-pass long motifs we had 5 motif types, while only
going up to 5 or 6-pass long motifs we’re already at 52 and 203 types
respectively with wacky things like ABCADBA. The ideas presented here can help
us answer the question whether it’s worth looking at longer motifs (another
entry soon perhaps?). It can also help
Dustin Ward to establish exactly
how many types of passes he should consider. In general, it helps us to
establish standardised best practice that the whole of football analytics will
benefit from and that its currently distinctly lacking. Echoing
Marek’s piece on the state of analytics:
“Established
scientific disciplines rely on abstract concepts to organise their discoveries
and provide a language in which conjectures can be stated, arguments conducted
and findings related to each other. We lack this kind of language for football
analytics”. We need common-ground theory in which our public work can be
related and compared, and it’s worth truly understood. The lack of it is
holding back all of us who have an active interest in the field really taking
off. I hope this approach to improve our understanding of our ideas and take
steps towards enhancing them and establishing best practice can inspire other
public (and even private) analysts to attempt similar things in their work and
establish bridges through which we can compare and complement our work.
Valuable applications will inevitably flow from robust, interconnected theory.
MATHEMATICAL FOOT-NOTE: Comparing the distance between A and B vectors as their position from 1 to 39 on the closest-farthest away scale may seem a bit unorthodox and one might consider simply using
the z-score of the distance between teams’ A and B vectors in the context of
all the distances between all 40 vectors. However, the reason I don’t do this
is that for each different choice of the number of categories, the dimension in
which these vectors are is different, and on a personal mathematical note I
have deep mistrust in comparing distances between things that are in different
dimensions.
P.S.:
I want to give a brief mention to BenTorvaney who gave me a small but meaningful contribution which I feel
greatly enhanced the results of this entry.