Friday, 8 July 2016

Quantifying Passing Subsequences: The Mysterious Case of Leicester City

This entry follows up with the previous entry's idea for quantifying teams' and players' passing styles through 3-passes long motifs (if you haven't read it I recommend you do so before reading this one).

Now, I decided to attempt to replicate the results shown previously from the Spanish La Liga using last season's Premier League data. In my application, I quantify the raw passing data by counting the amount of times each motif occurs for each team in each match. The table below shows the total amount of times each team performed each of the five motifs throughout the whole season.

As we can appreciate, Arsenal and Manchester City are either 1st or 2nd for every motif category. However, since teams like Arsenal and Manchester City complete the highest amounts of passes in a season, it is to be expected that they also complete the most motifs for each category.

A different way of looking at this data then is to analyse the relative frequencies of the motifs as a percentage of the total number of motifs completed by a team during match. That is to say, regardless of how many motifs were completed in total by a team, we want to look at which percentage of them were ABAB, which percentage of them were ABAC, etc.

The following boxplots show the distribution of the relative frequency of each motif during each match for each team of the 2015-16 Premier League season:

Now, isn't that interesting?! Leicester emerge as a team with a distinctive playing style now. If you return for a moment to the previous entry you can see that both Barcelona and Leicester “win” in the ABAB and ABAC categories and noticeably “lose” in the ABCD category (this similarity isn’t there in the other two categories). I'm obviously not claiming that Leicester and Barcelona have a similar style, I'm sure I would lose all credibility with football fans and might as well just close the blog. The main difference is that Barcelona don't only win the relative frequency battle, but also the overall total usually completing many more passes than their opposition. Leicester have a much more modest return in overall motifs completed (i.e. many fewer passes completed). In fact they complete the second lowest amount of motifs overall (second only to WBA and only marginally below Sunderland), but for the amount of motifs they do complete, there seems to be something there in the sense that they tend to proportionally perform a distinctive choice of passing sequences/motifs.

In fact, forget about the whole Barcelona thing for a moment. Even without having ever seen those results for La Liga, the methodology is pointing towards Leicester as the team with the unique style in the Premier League. The following figure shows the Clustering Dendrogram for the data viewing each team as a vector in R^5 where each feature is the mean percentage each motif constitutes of the total for each match of the season:

NOTE: It’s not easy to explain briefly what a clustering dendrogram is or means so please refer to any of the good sources on mathematics widely available (Wikipedia is pretty good), but basically it represents how teams are sequentially grouped according to their similarity (distance in R^5). For example, we can see that Leicester, Watford, Arsenal and Manchester City are a “group” but within that group there are two subgroups consisting of Leicester on its own and the other three, and similarly Arsenal is more tightly grouped with Manchester City than with Watford. The higher in the tree a grouping is made, the “weaker” its bond is. With that in mind, Leicester is the team with the “weakest bond” to any other subgroup of teams.

Honestly, I don't know any more than you what this means (yet); but it's very interesting that something pointing towards Leicester came up when I wasn't even looking for it. This was a simple probing methodology which pinpointed Leicester on its own, without me asking: “Is Leicester distinctive?” or “What sets Leicester apart?”.

It would be important to validate whether there is any sort of linearity between the frequency of each motif and the total amount of motifs completed; if that was the case Leicester’s low amount of motifs could explain why it is being set apart, but I don’t think this would tell the whole story as then WBA and Sunderland would also be signaled out in a way.

In the coming weeks I’ll try to get to the bottom of this…

No comments:

Post a Comment