• The Cutback
  • Posts
  • Team Style Similarity: Evolution of My Approach and Applications to Recruitment

Team Style Similarity: Evolution of My Approach and Applications to Recruitment

How advanced similarity ratings can identify better player fits and prevent costly transfer mistakes

The Origins

Today I want to share my work on team style similarity, a project I've been developing for several years. My methodology has evolved significantly, but the starting point was undoubtedly this post from Statsbomb. I initially created similarity ratings based on heatmaps of selected actions for teams, which produced ratings like these for Thiago Motta's Bologna 2023/24 season:

While these ratings provided some insight, I always felt they were somewhat inflated. I also questioned their true value given my limited metrics — around 20 simple events like passes in the first third, take-ons, and so on.

The Evolution

In recent months, I discovered the work of Edoardo Ghezzi and Hadi Sotudeh, who use Statsbomb 360 data to create features for game situations and provide better contextual slicing for events. Although their specific implementation wasn't accessible to me (as I don't work with 360 data), I realized I could adapt parts of their approach to enhance my similarity ratings. I implemented these changes as soon as I found the time, and I'm pleased with the results.

The improvement isn't just in the ratings themselves, which I believe are already better than my previous method:

What's most satisfying is seeing how the model captures complex team styles with surprising accuracy. Take Thiago Motta, for example — a Brazilian of Italian descent who grew up in Barcelona's academy and later played under Gasperini, Mourinho, and Ancelotti (among others). His style is multifaceted, blending positional play, Brazilian relationalism, and a tight defensive approach that clearly borrows elements from Gasperini and resilience from Mourinho.

Looking at the most similar teams to his Bologna from the 2023/24 season, we see Bosz's PSV, Guardiola's City, and De Zerbi's Brighton representing the positional elements, with Ancelotti providing the more relationalist aspects in possession:

When examining out-of-possession similarity, we can identify Mourinho's influence through similarity to teams like Allegri's, Moyes', and Ranieri's, alongside Gasperini's approach and Ancelotti's relationist style:

There's certainly room for improvement — as Thiago would say, "there always is" — but I consider this a significant step forward from the previous version.

The Technical Implementation

To implement Ghezzi and Sotudeh's principles, I created workarounds using possession chains, adding features to them, and developing specific metrics from that. If you're interested in the technical details, I invite you to explore my notebooks, which are now available in my new GitHub portfolio (feel free to share this with anyone looking to learn through replication or recruiters in the football industry).

But basically what I did was to reflect on what we need to capture a team's style and try to apply it. For example, we know there are three phases of play: in possession, out of possession and transition. So from event data I needed to capture what event’s each phase was, and of course by not having pressures, 360 or tracking data, it's very difficult to capture transitions and out-of-possession events. So I needed to define what I think is a transition from event data, and at the same time I assigned to out of possession all the events that respect the requisites for the in possession style but that were made by the opponents + defensive actions of the team.

So what did I use for in possession? Well, I've used some basic metrics like crosses (start and end coordinates), shooting start coordinates and so on, and more complex ones. For example, all possession chains were divided between those that start from set plays and those from regains of the ball – either through winning the ball or the opponents simply losing it – so I've created a relative frequency heatmap for those I deemed eligible for that. I've also divided the pitch in three zones: buildup, which is the first 40% of the pitch (the part you'd exclude in PPDA calculations); the box, for which we assign events performed inside the box or actions that end inside the box; and the rest of the pitch. While these are not exactly the usual zones in which a pitch is divided in analysis, I thought they are a better way to divide things given the purpose.

Why was that important? Because we then go and divide passes, dribbles/carries and take ons for those zones, but that's not enough for me. We know that players in every position do actually play in different zones and consequently do different things – that's also why I've created the clustering position project – so by using the positions I produced before, I slice things a little bit more, and we compare passes from players in a given position in a given zone both in the starting coordinates and the end ones. I think this gives a lot of complexity layers and does actually help in capturing positional or relationist tendencies.

As for transition, we know you can have "natural" transitions like winning the ball and progressing, and those that are "artificial," by manipulating opposition and progressing the ball very fast and very effectively. So how do we capture that? Well, by simplifying. To classify some events as transition inside of a possession chain, we need to have at least two in a set of three that are progressive and that have a meters/second ratio of at least 10. While extremely reductive, it still is a starting point.

Practical Applications in Recruitment

This work is valuable, but its true worth lies in practical applications. Since my interest is in recruitment, I've used these ratings to search for players by weighting performance ratings – created with various metric combinations –  for team’s of interest similarity ratings.

Let me illustrate with an example. As a Juventus fan, I had high hopes for Thiago Motta's project. Unfortunately, summer signings Nico Gonzalez, Teun Koopmeiners, and Douglas Luiz have underperformed since arriving in Turin.

Let's imagine we're in the summer transfer window, evaluating these potential signings without the constraints of finances or Motta's specific requests. Who would my database suggest Juventus should have signed?

Note about the ratings: The values use weighted xG and weighted atomic VAEP metrics - as in the last version of my radars. These metrics are weighted based on the rating of the opponents against which the relevant actions were performed. For this reason, ratings for teams from leagues with no cross-league data (MLS, Russian Premier League, and Brazilian Serie A) are less reliable, as they haven't been directly compared against other teams in the dataset. Also, as you’ll see some players have multiple entries, that’s because I’ve used the same aggregated data to make their non Bologna weighted rating as an example on how this style model could help. A more precise way to do it would be to use style rating only on aggregated data from matches where a player was under the relative manager. As you’d assume that would open some questions, for example schedule strength, or the problem that I’d have to reshape my position clusters also to take into account different managers use of players, which also would make me reshape style similarity. For this reason I preferred to keep it very simple and thus you get multiple entries with the same non Bologna rating.

Douglas Luiz

Douglas Luiz is perhaps the most difficult to assess since he's barely played under Motta for reasons that remain unclear. When I created a rating for defensive midfielders (his clustered position) weighted for Emery's 2023/24 Aston Villa similarity to Motta's Bologna, he ranked as just the 155th best DM for Motta's system:

For context, Locatelli, who became Juventus captain after Bremer's injury, was classified as the 7th best DM for Motta's style (based on his performance under Allegri, not the brief Montero period). Even Barrenechea, who was sent to Aston Villa after McKennie refused the move, ranked 60th. This raises the question: did Juventus really need a new DM?

If they did require reinforcement in this position, perhaps they should have reconsidered sending Arthur away or pursued Caqueret, who's now at Como:

Nico Gonzalez

As for Nico Gonzalez, classified as a right winger at Fiorentina though used all over the attack — as a false 9 when Vlahovic was unavailable (before Kolo Muani arrived) or mostly on the left to maintain width — he ranked just 80th among right wingers for Motta's system:

Perhaps it would have been better to rely on Conceição's arrival plus Mbangula, Weah, and Yildiz? Or target Aston Villa's Diaby or Bailey instead of Douglas Luiz? Or perhaps pursue future Aston Villa signing Malen, Champions League opponent Zhegrova, Real Madrid bench player Brahim Díaz (who can also play behind the striker), or just retain Soulé? My personal favorite option would have been the discounted jewel Minteh:

If the plan was to use Nico primarily on the left, where he played most frequently, they could have simply kept Chiesa (though financial considerations and his contract renewal demands were factors). Alternatives included Zaragoza, who hasn't received opportunities at Bayern, or even Laurienté from relegated Sassuolo:

Teun Koopmeiners

And what about the €60 million investment in Koopmeiners? After deciding not to renew Rabiot due to wage demands and signing Thuram for €20 million, perhaps they should have allocated resources elsewhere, as he was the 85th best left leaning center midfielder in my dataset, behind the other two mentioned:

Even with Fagioli, McKennie, Locatelli, Thuram, and Douglas Luiz in the squad, if they still wanted midfield reinforcement, better options might have included Zielinski (free transfer from Napoli, though reports suggest he had already committed to Inter by January, around when Giuntoli chose Motta). Alternative options included the somewhat underwhelming Kökçü at Benfica, the much-discussed Cherki (who can also play behind the striker or on the wing but offers limited defensive contribution), or the talented – but struggling at Roma – Enzo Le Fée (now in the Championship playoffs):

Perhaps the wisest choice would have been to conserve funds or target Lookman from Atalanta instead of Koopmeiners, as Lookman appeared among the top 30 left winger/left attacking midfielders for Motta's system.

Now that I've completed this analysis (and can lament what might have been under Motta and how funds could have been better utilized — noting that reports suggest Motta wanted some if not all of these three players), you can see how these ratings could enhance recruitment processes. This is the practical application of the algorithm I've developed.

Thank you for reading!