In the previous
entry we left off having seen the result of a clustering dendrogram for the
5-dimensional representation of teams corresponding to the ratio at which they
use the 5 passing sequences using data from the 2015-2016 Premier League season.
It came as a
surprise that Leicester was signalled out by the method as the team with the
most distinctive passing style in the league. But then again, Leicester were
eventually crowned champions, so surely something qualitative is there to be
found. The problem is untangling the true causality relation of what is being
discovered. Saying “Leicester were champions
because they earned the highest number of points” is a bit moronic.
Something like “Leicester were champions
because they scored the third highest amount of goals and had the second least
amount of goals conceded” or “Leicester
were champions because they were able to name an unchanged starting XI the most
times in the season” can provide a bit more insight, but ultimately, when
using data from the same season it can be difficult to decipher the true causality
of discoveries; i.e., did X happen because Leicester had the potential to be
champions, or did Leicester have the potential to be champions because of X.
The essential
question then that the whole football world wants answered is if Leicester
championship run could have been predicted BEFORE the campaign kicked off.
Surely the sports trading community would be interested.
To investigate
deeper, we went back to the data from the 2014-2015 when Leicester were almost
certainly doomed to relegation but miraculously went on an incredible winning
run in the season’s final stretch that saw them go from being 7 points from
safety in April to end comfortably 6 points above the relegation zone. No team
in the history of the Premier League had ever remained in the first division
having fewer than 20 points by the 29th Fixture (Leicester had 19).
Could anyone
have predicted Leicester exploits back then? Should we have known?
This
was the resulting dendrogram for the data from the 2014-2015 season using the
same methodology from the previous entry:
There are
several important things to say regarding the results. First of all, forgetting
about Leicester for a minute, it’s very satisfying to see that many of the same
pairings from the 2015-2016 season are maintained. Arsenal-Manchester City,
Tottenham-Chelsea and Crystal Palace-Sunderland are all examples of pairings
that arise in both cases. There are other general trends that are respected
like Liverpool, Southampton and Swansea being similar, just as Leicester,
Arsenal and Manchester City forming the leftmost group in both cases with
either Watford or Aston Villa. This is important because the probability of
this happening (similar groups for both seasons) if the method was randomly
pairing teams would be extremely low. This means that the method is identifying
something (which I will call passing style) which is consistent in teams across
a pair of consecutive seasons. This
‘satisfying consistency’ can also be seen in data for the 2013-14 and 2012-13
season for which I also replicated the method.
Let’s return now
to Leicester. Just as in the case of the 2015-16 season, Leicester is the team
that joins a subgroup highest up the clustering tree, meaning its passing style
has the weakest bond to any other group of teams, i.e. it is the most
distinctive. There is a very important caveat so we don’t get carried away: “being distinctive” is in no way
equivalent to “being successful”. In
fact, the second most “distinctive” team is Burnley who were relegated at the
end of 2014-15. Both Leicester and Burnley have a relatively low total amounts
of motifs completed, but this doesn’t explain their distinctiveness necessarily since both QPR and Crystal Palace
completed fewer motifs than them and have relatively “strong” bonds with other
teams. Also, a truly fascinating characteristic of Leicester’s results for both
seasons is that in both of them, Leicester’s passing style forms a subgroup
with Arsenal and Manchester City’s, arguably the “passing powerhouses” of the
Premier League.
To answer the
question posed before regarding whether we should have known about Leicester, I
would cautiously say “No”. No, I
very much doubt any concrete methodology would have pointed to Leicester as the
eventual winner. However, keeping in mind that “being distinctive” is not synonymous to “being successful” (poor relegated Burnley), the truth is that with
this data before the start of the 2015-16 season I could have said to pundits: ‘Hey,
keep an eye on Leicester, there’s something interesting going on there (they
are distinctive and are close to Arsenal and Manchester City)’. Moreover, I
would also predict that if Leicester keep their players over the summer, this “style”
which has led them to be distinctive in both the 2014-15 and 2015-16 seasons will
still be there and could once again lead them to success. I wouldn’t go as far
as saying they’ll win it again, but I think they’ll be in the contest. Then
again, I could be completely wrong and Leicester’s fortunes can fall off a
cliff in the upcoming season; but I know better than to think that means that
everything I’ve said here is wrong. I hope the readers do as well (If Leicester end up doing well again, I
would also be cautious about the omnipotence of my methods; statistics and
probability are all about being better informed about the chaotic randomness of
the world, not about fortune telling…).
I’ll keep on
trying to see what else this methodology has to give. I suspect some sort of “tree/dendrogram”
method could be used to quantify how much success (higher finishes in the
league table) is being accumulated in what areas of the tree and what a team’s
position in the dendrogram says about its final league position. Also, as I
mentioned a couple of entries back when I first spoke about this methodology,
the really interesting bit could be extrapolating the method to discover how
well prospective recruitments will fit within a team’s passing style. I also
hope to have a go at this. Finally, some additional variables could also be
integrated into the methodology to further distinguish passing sequences. For
example, a completely vertical instance of ABCD is very different from a
sequence of ABCD composed of horizontal square passes. Integrating this is also
something I’m working on.
Keep an eye on the blog to see how it all
unfolds.