*This is a pretty exciting entry, so bear with me if it gets a bit long, I think its worth it…*

Ever since the first entry on Passing Motifs I
mentioned the potential of extrapolating the methodology to study passing
styles at a player level. That first entry mentioned the idea set forth by
Javier Lopez and Raul Sanchez to answer the question “Who can replace Xavi?”. Nevertheless, that
particular example always left me wanting for more because the outcome was
noticeably skewed towards players from Barcelona and a few other teams like Arsenal
and even Swansea surprisingly. It made me think that the methodology was
ignoring individual player traits and rather picking up stats that are
reflective of the team the player plays for, not of the player himself.

I’ve been thinking ever since what the best way
to extract player passing style from passing motifs is. Here are some of the
ideas I’ve had:

- One first objective is to neutralise the effect of the team passing style on a player. If a team proportionately uses ABAB a lot, then inevitably so will the players. Therefore, if you put Fernandinho in Barcelona, his motif frequencies will start to resemble those of the whole team without it having been something inherent to him all along. The idea I had was to view how a player’s relative motif frequencies
*diverged*from his team’s frequencies in each match. That is to say, if in a match Arsenal performed 40% of its frequencies as an ABAC and 43% of the motifs Coquelin was involved in were ABAC, then Coquelin had a +3% for that motif for that match. Averaging for the whole season, Coquelin could be seen as 5-dimensional vector where each entry corresponds to his average*divergence*for each of the 5 motifs. When the performance of this vectorisation is measured through the methodology outlined in my previous entry using data from the 2014-15 and 2015-16 seasons of the Premier League (only players who had at least 18 appearances to avoid outliers), this was the result:

The fairly negative z-scores reveal that this
methodology has an agreeable stability for those two seasons and is therefore
picking up on some underlying quality of the players playing style.

- Just as we did for team motifs, instead of considering the raw values of motifs a player performed, we consider each performance in a match by a player as a 5-dimensional vector in which each entry is the percentage of the player’s total motifs that that motif corresponds to. So we can represent a match played by Romelu Lukaku as 5% ABAB, 13% ABAC, 25% ABCA, etc. Averaging over a whole season, each player is represented by a 5-dimensional vector.

Once again, we’re reasonably happy that this
vectorisation is picking up on stable player qualities.

- Another way of seeing that data which I felt might be useful is seeing each player’s match as the proportion of each motif his team performed that he participated in. That is to say, if Southampton completed 50 instances of ABAB in a match, and Jordy Clasie participated in 25 of those, he would have a 50% score for ABAB in that match. If in that same match Southampton completed 80 instances of ABAC and Clasie participated in 20, he would have a 25% score for that motif. Applying this logic to the 5 different motifs and averaging over the whole season, each player is once again represented by a 5-dimensional vector. This is how well it performs:

Out of the three 5-dimensional vectorisations we have
shown so far, this is by some margin the one which performs the best. Both its
z-scores are considerably lower than the other two, meaning its capturing pretty
stable information for each player.

- In the first entry regarding passing motifs we mentioned how the motifs could be vectorised in a 15-dimensional vector for players. To refresh your memory, for an ABAC sequence a player could participate as the A player, the B player or the C player. It’s straightforward to count that looking at all 5 motifs there are 15 “participation” possibilities for each player. If we count how many times each player was each letter in each of the 5 motifs, we are left with a 15-dimensional vector representing each player. This is basically the methodology used in the “Who can Replace Xavi?” article.

Comparing
things in different dimensions is rather difficult and not too standardised in
mathematics but I would dare say that it performs worse than previous
5-dimensional vectorisation, especially considering Z-Score 1 which is the most
important indicator.

- Finally, we can take this 15-dimensional idea and slightly alter it to not count the total of each pseudo-motif but rather what their relative frequencies are, so once again do something like if Dimitri Payet performed the B in an ABAC 15 times out of 100 total motifs he participated in, that pseudo-motif has a score of 15%. Once again, each player is represented by a 15-dimensional vector:

Immediately we appreciate that this is the best
performing of all the vectorisations we have seen.

Now, the first thing we must say is that all the 5 different
ways of obtaining player vectors shown here show evidence of uncovering some
stable and underlying qualities of players’ passing style. We have used the
indicators to compare them and discuss which might be better, but there is no
way of determining whether some information which one of them is picking up on is
missed by another.

Here’s the advantage: there is no downside to combining them
all. If we simply glue together all these representations to make one long
45-dimensional (5+5+5+15+15) vector representation for players, then all the
qualities on which each methodology picked up are at a scale represented. If
two players were similar across all representations, they will be similar in the
long one as well; if two players were similar across some of the representations
but not others, then they will be mildly similar depending on how dissimilar they
were in the others; etc.

Here is the performance of this long 45-dimensional vectorisation:

The results are very satisfying and it proves to be a robust
vectorisation for player playing style, more than 1 standard deviation below
the mean distance between all players and more than 4 standard deviations below
the Gaussian distances, even in this very high dimensional space.

This vectorisation will surely provide me with a
lot of material to explore for a good while, its even a little frustrating not
finding an easy visual way in which to convey it to the readers. Lets settle for now on a hierarchical clustering dendrogram as a visualisation tool.

Below is a link for the pdf for the hierarchical clustering dendrogram applied to the data set for the 2015-16 season of the Premier League (only players who played in over 18 matches). Since there are 279 players, the tree labels are really tiny so the image couldn't be uploaded onto the blog directly, but on the pdf you can use your explorer's zoom to explore the results.

https://drive.google.com/file/d/0Bzvjb5fnv1HtZjFtRDJjUVBua0E/view?usp=sharing

Below is a link for the pdf for the hierarchical clustering dendrogram applied to the data set for the 2015-16 season of the Premier League (only players who played in over 18 matches). Since there are 279 players, the tree labels are really tiny so the image couldn't be uploaded onto the blog directly, but on the pdf you can use your explorer's zoom to explore the results.

https://drive.google.com/file/d/0Bzvjb5fnv1HtZjFtRDJjUVBua0E/view?usp=sharing

If you'd rather not, here is a selection of the methodology's results:

- Mesut Ozil has one of the most distinctive passing styles in the league. Cesc Fabregas is the player closest to him and together they form a subgroup with Juan Mata, Ross Barkley, Yaya Toure and Aaron Ramsey.
- Alexis Sanchez is in a league of his own but the players with the most similar passing style are Payet, Moussa Sissoko, Jesus Navas, Sterling and Martial.
- Troy Deeney is in the esteemed company of Aguero, De Bruyne, Oscar and Sigurdsson.
- David Silva, Willian, Eden Hazard and Christian Eriksen are all pretty similar.
- Nemanja Matic, Eric Dier and Gareth Barry have a similar passing style.
- M’Vila, Lanzini, Capoue, Puncheon, Ander Herrera and Drinkwater are all similar, pretty good and perhaps underrated.
- Walcott, Ihenacho, Scott Sinclair, Jefferson Montero, Wilfired Zaha, Bakary Sako, Albrighton, Bolasie and Michail Antonio form a subgroup of similar wingers.
- Giroud is more similar to some rather underwhelming strikers such as Gomis, Cameron Jerome and Pappiss Cisse rather than to world class strikers. The same can be said of Harry Kane being similar to Aroune Kone, Son and Marc Pugh. Maybe the methodology is not as convincing for strikers?
- Shane Long and Odion Ighalo are good alternatives to Jamie Vardy.
- Diego Costa and Lukaku are similar to Rooney.
- Victor Moses, Aaron Lennon and Jordon Ibe are similar.
- Mahrez is similar to Sessegnon, Nathan Redmond and Jesse Lingard. Did Southampton know this?
- Matt Ritchie (ex-Bournemouth now at Newcastle) is in a group with Lallana, Alli, Pedro and Lamela. An opportunity for the taking?
- Angel Rangel has (and has always had) unusual stats for a full-back.
- The methodology recognises who the goalkeepers are and set them apart without this information being explicitly available in the datasets. The same applies for many other players from similar positions which are grouped together like the CBs and full-backs.

This is a poor man’s substitute
to actually exploring the dendrogram yourselves. Not to mention that a
clustering dendrogram is not even the most faithful representation of the
information being collected by this vectorisation, but I’m more than happy with
the results and feel there is some real promise to the methodology. If I can
come up with some better visualisations for the results I’ll post those later
on.

Please have a look through the results from the
dendrogram and comment on whether you feel we’re getting close to convincingly
capturing player passing style through passing motifs.

## No comments:

## Post a Comment