A talk on some of my passing motifs work was selected for
the OptaPro Forum 2017 which took place in Central London on February 8th.
You can see the full video (except the Q&A) below:
Most of the content I'd already written entries about: For the general overview on the passing motifs methodology for teams read this entry. For my applied results on teams from the Premier League (and a take on what the hell happened with Leicester) read this and this entry. Below are the images for the hierarchical clustering dendrograms and the principal component analysis graphs of this 5-dimensional representations of Premier League teams for the 2014-15 and 2015-16 seasons.
Moving on to a player level, you can read up on the general methodology in this entry, and then on the specific scoring system that I presented at the forum to create those lists of players in this entry. Below are some of the lists I had on display which are respectively: Premier League 15-16 using 'Key Passes' to award points, Bundesliga 15-16 using 'Key Passes' to award points and finally Premier League using 'Expected Assists (xA)' to award points. For that last one I used Opta's xA numbers which give account for the probability of a pass turning into a shot with a certain xG value.
Finally, I also did a bit on using Topological Data Analysis (TDA) to explore the results for players which I hadn't done before; although to read up on the general methodology of TDA you can read this entry (wow how things have changed since that entry! I of course now know that Opta doesn't really log 'controls with left thigh'. Don't be fooled by how assured I wrote about the analytics industry back then, I honestly didn't know half of what I know now about that world just 10 months later... hopefully my future self in another 10 months will also look back with pity at my current self's ignorance).
Below is the image from the forum:
Finally, I want to use this (non) write-up on the presentation as a platform to discuss some more general reflections about analytics. 'Operationalising' is a hideous sounding word which was horribly difficult to repeatedly say in front of 300 people; but it actually is very important. There is so much complexity in raw football data that those of us who do analytics really need to broaden our scope when thinking how we will represent this raw information in numbers, vectors or variables that will help us uncover the rich underlying information that is there for the taking. The 'passing motif' operationalisation of raw passing network data is 'neat'; it seems harmless when you first see it and I wouldn't blame you for doubting that those 5/45 numbers attributed to teams/players will actually say much about them, but evidently they do. I think that what it's got going for it is that it helps to account for the sequentiality of the raw events, something which most of the work I encounter out there fails to do. As I said in the presentation, we're a bit too focused on events when its actually the sequences of these events that actually matter.
There is a classic problem though (akin to the overfitting problem of modelling) when trying to account for larger and larger sequences: If we become too granular and for example don't do this methodology for 3-pass long motifs but rather for 10-pass long motifs; then the occurrences will become so specific that we actually lose out on comparable capacity in the structure of our information. Alexis would have such a specific distribution of highly differentiated sequences that he would have no neighbours to reward him for their key passes! We need to strike a balance between sequentiality and lets call it "non-granularity" (this was actually one of the questions at the forum: can/should the methodology be generalised for more than 3 passes?).
Finding the correct concepts that strike this balance is the challenge of analytics. Passing motifs are "neat"; but even I recognise that they are nowhere near the ambitions of what I would hope to achieve in analytics. Exciting years to come!
Finally, I also did a bit on using Topological Data Analysis (TDA) to explore the results for players which I hadn't done before; although to read up on the general methodology of TDA you can read this entry (wow how things have changed since that entry! I of course now know that Opta doesn't really log 'controls with left thigh'. Don't be fooled by how assured I wrote about the analytics industry back then, I honestly didn't know half of what I know now about that world just 10 months later... hopefully my future self in another 10 months will also look back with pity at my current self's ignorance).
Below is the image from the forum:
Finally, I want to use this (non) write-up on the presentation as a platform to discuss some more general reflections about analytics. 'Operationalising' is a hideous sounding word which was horribly difficult to repeatedly say in front of 300 people; but it actually is very important. There is so much complexity in raw football data that those of us who do analytics really need to broaden our scope when thinking how we will represent this raw information in numbers, vectors or variables that will help us uncover the rich underlying information that is there for the taking. The 'passing motif' operationalisation of raw passing network data is 'neat'; it seems harmless when you first see it and I wouldn't blame you for doubting that those 5/45 numbers attributed to teams/players will actually say much about them, but evidently they do. I think that what it's got going for it is that it helps to account for the sequentiality of the raw events, something which most of the work I encounter out there fails to do. As I said in the presentation, we're a bit too focused on events when its actually the sequences of these events that actually matter.
There is a classic problem though (akin to the overfitting problem of modelling) when trying to account for larger and larger sequences: If we become too granular and for example don't do this methodology for 3-pass long motifs but rather for 10-pass long motifs; then the occurrences will become so specific that we actually lose out on comparable capacity in the structure of our information. Alexis would have such a specific distribution of highly differentiated sequences that he would have no neighbours to reward him for their key passes! We need to strike a balance between sequentiality and lets call it "non-granularity" (this was actually one of the questions at the forum: can/should the methodology be generalised for more than 3 passes?).
Finding the correct concepts that strike this balance is the challenge of analytics. Passing motifs are "neat"; but even I recognise that they are nowhere near the ambitions of what I would hope to achieve in analytics. Exciting years to come!
No comments:
Post a Comment