In the previous entry we left off having seen the result of a clustering dendrogram for the 5-dimensional representation of teams corresponding to the ratio at which they use the 5 passing sequences using data from the 2015-2016 Premier League season.
It came as a surprise that Leicester was signalled out by the method as the team with the most distinctive passing style in the league. But then again, Leicester were eventually crowned champions, so surely something qualitative is there to be found. The problem is untangling the true causality relation of what is being discovered. Saying “Leicester were champions because they earned the highest number of points” is a bit moronic. Something like “Leicester were champions because they scored the third highest amount of goals and had the second least amount of goals conceded” or “Leicester were champions because they were able to name an unchanged starting XI the most times in the season” can provide a bit more insight, but ultimately, when using data from the same season it can be difficult to decipher the true causality of discoveries; i.e., did X happen because Leicester had the potential to be champions, or did Leicester have the potential to be champions because of X.
The essential question then that the whole football world wants answered is if Leicester championship run could have been predicted BEFORE the campaign kicked off. Surely the sports trading community would be interested.
To investigate deeper, we went back to the data from the 2014-2015 when Leicester were almost certainly doomed to relegation but miraculously went on an incredible winning run in the season’s final stretch that saw them go from being 7 points from safety in April to end comfortably 6 points above the relegation zone. No team in the history of the Premier League had ever remained in the first division having fewer than 20 points by the 29th Fixture (Leicester had 19).
Could anyone have predicted Leicester exploits back then? Should we have known?
This was the resulting dendrogram for the data from the 2014-2015 season using the same methodology from the previous entry:
There are several important things to say regarding the results. First of all, forgetting about Leicester for a minute, it’s very satisfying to see that many of the same pairings from the 2015-2016 season are maintained. Arsenal-Manchester City, Tottenham-Chelsea and Crystal Palace-Sunderland are all examples of pairings that arise in both cases. There are other general trends that are respected like Liverpool, Southampton and Swansea being similar, just as Leicester, Arsenal and Manchester City forming the leftmost group in both cases with either Watford or Aston Villa. This is important because the probability of this happening (similar groups for both seasons) if the method was randomly pairing teams would be extremely low. This means that the method is identifying something (which I will call passing style) which is consistent in teams across a pair of consecutive seasons. This ‘satisfying consistency’ can also be seen in data for the 2013-14 and 2012-13 season for which I also replicated the method.
Let’s return now to Leicester. Just as in the case of the 2015-16 season, Leicester is the team that joins a subgroup highest up the clustering tree, meaning its passing style has the weakest bond to any other group of teams, i.e. it is the most distinctive. There is a very important caveat so we don’t get carried away: “being distinctive” is in no way equivalent to “being successful”. In fact, the second most “distinctive” team is Burnley who were relegated at the end of 2014-15. Both Leicester and Burnley have a relatively low total amounts of motifs completed, but this doesn’t explain their distinctiveness necessarily since both QPR and Crystal Palace completed fewer motifs than them and have relatively “strong” bonds with other teams. Also, a truly fascinating characteristic of Leicester’s results for both seasons is that in both of them, Leicester’s passing style forms a subgroup with Arsenal and Manchester City’s, arguably the “passing powerhouses” of the Premier League.
To answer the question posed before regarding whether we should have known about Leicester, I would cautiously say “No”. No, I very much doubt any concrete methodology would have pointed to Leicester as the eventual winner. However, keeping in mind that “being distinctive” is not synonymous to “being successful” (poor relegated Burnley), the truth is that with this data before the start of the 2015-16 season I could have said to pundits: ‘Hey, keep an eye on Leicester, there’s something interesting going on there (they are distinctive and are close to Arsenal and Manchester City)’. Moreover, I would also predict that if Leicester keep their players over the summer, this “style” which has led them to be distinctive in both the 2014-15 and 2015-16 seasons will still be there and could once again lead them to success. I wouldn’t go as far as saying they’ll win it again, but I think they’ll be in the contest. Then again, I could be completely wrong and Leicester’s fortunes can fall off a cliff in the upcoming season; but I know better than to think that means that everything I’ve said here is wrong. I hope the readers do as well (If Leicester end up doing well again, I would also be cautious about the omnipotence of my methods; statistics and probability are all about being better informed about the chaotic randomness of the world, not about fortune telling…).
I’ll keep on trying to see what else this methodology has to give. I suspect some sort of “tree/dendrogram” method could be used to quantify how much success (higher finishes in the league table) is being accumulated in what areas of the tree and what a team’s position in the dendrogram says about its final league position. Also, as I mentioned a couple of entries back when I first spoke about this methodology, the really interesting bit could be extrapolating the method to discover how well prospective recruitments will fit within a team’s passing style. I also hope to have a go at this. Finally, some additional variables could also be integrated into the methodology to further distinguish passing sequences. For example, a completely vertical instance of ABCD is very different from a sequence of ABCD composed of horizontal square passes. Integrating this is also something I’m working on.
Keep an eye on the blog to see how it all unfolds.