Football Analytics: An Informal Education

The following blog entry is the first in a series of posts on the subject of football analytics. As I educate myself on predictive analytics, I will update the blog with my progress in discovering the most widely used and accurate process of converting raw data into future products.


An Informal Education

In the United States, “sports analytics” is no longer the taboo subject it once was. The days of scoffing at spreadsheets while favoring the raw and undefined “tools” of the prospect have disappeared over the horizon. “Sabermetrics” in baseball and “APBRmetrics” in basketball have transformed their respective sports. Scouts no longer judge players solely on how fast they look or whether they have the infamously undefinable “it” factor. Instead, they quantify the players' potential through a combination analytics and personal history.

How efficient was that left-fielder's path to the fly ball? 74% or 92%. Such seemingly insignificant disparities could be the reason a player is called up to the Majors or is discarded to Minor League purgatory.

Football or Soccer, however, has been slow to embrace the push for analytics. Though some clubs have adopted them into their scouting networks, many clubs are wary of changing their age old systems. For a sport with hundreds of professional leagues throughout the world, the potential database for a football-based prediction system is tremendous. With games like Football Manager producing frighteningly accurate predictions of future Ballon d’Or winners, the hesitation from many boards to alter their backroom habits baffles many analytics enthusiasts.

Scouting’s old guard, however, is correct to point out the critical factor that separates predicting the success of a Bryce Harper or a Luka Doncic from a Christian Pulisic. That critical factor – data.

In his book The Signal and the Noise, Nate Silver points out that “the fuel of any ranking system is information.” In football, how can you plug off-ball runs that free up space in the box for goal scorers into an Excel sheet? If a player like Jorginho passes 2000 times without registering an assist, that means he has no end product, right? However, what if his position is central to a manager’s style of play that rarely has him passing directly to the goalscorer? Can you compare a striker like Suarez to a full back like Marcelo? Moreover, how can a goalkeeper like Jan Oblak be analyzed alongside midfielder Ruben Neves?

This last-minute winner from Belgium in the world cup is the perfect demonstration of this conundrum. Nacer Chadli scored the goal, and Thomas Munier got the assist, but should they get the credit? Not exactly.

If anyone was responsible for Belgium’s final goal, it was Romelu Lukaku. His intelligent off-ball movement first drew the defenders towards the center of the pitch, freeing up Meunier on the right wing to receive De Bruyne’s pass with room to run. Then, Lukaku let Meunier’s low cross pass through his legs and into the path of the onrushing, undefended, Chadli.

So, what system would allow Lukaku’s movement to get its deserved credit? That is the problem with analyzing football.

Silver points out in his book that a good baseball prediction system must accomplish three primary tasks:

1. Account for the context of a player’s statistics

2. Separate out skill from luck

3. Understand how a players performance evolves as he ages – what is known as the aging curve.

With football, the same rules apply. Clubs understand that they take a risk signing a player over the perilous 30-year-old mark. Also, scouts already have trusted systems of contextualizing a player from the English Championship with the Bundesliga and vice versa.

The desire, therefore, is to create a model that substitutes data-driven results for skeptic judgment.

When Nate Silver created PECOTA, his predictive analytics model for baseball, he set out to remove, or at least narrow, the range of player potential: “PECOTA’s innovation was to acknowledge this by providing a range of possible outcomes for each player, based on the precedents set by his comparables: essentially, best-case, worst-case, and most likely scenarios.”

Football Manager uses a similar system when ranking player potential. Players are assigned a grade out of 5 stars for both their “current ability” and their “potential ability.” The game allows the “manager” to reveal how accurate the predictions are by assigning scouts to the player or adding the player to their squad.

Combining Nate Silver’s “comparables” – players from the past who have similar statistics through a certain point in their career – with a result similar to Football Manager’s “current” and “potential” ratings seems like an ideal conclusion to which football analytics should aspire.

Another crossover where football could learn from baseball is in the case of the infamous WAR statistic. For those who are unaware, WAR or WARP stands for Wins Above Replacement Potential. This statistic is meant to capture all the ways that a player adds value on the field through a combination of their offensive and defensive statistics. This crucial stat drives hundreds of millions of dollars in contracts and trades. Silver calculates that “baseball teams are willing to pay about $4 million per win on the free agent market. The extra wins the scouts identified were thus worth a total of $336 million over this period.”

Now, if football analysts develop a “WARP” for the beautiful game, it would be more valuable than knowing the name of the next Messi or Ronaldo. A supposed “PARP” (Points Above Replacement Potential) could allow a Premier League relegation-bound side like Fulham to sign a player in January who had a “PARP” of 7 (their current distance from safety). They would sign the player with the knowledge he has the potential to save their season. If the player delivered, he would earn the club ~ $130 million in Premier League and TV earnings for the following season. This money could fund the transfers that get them a spot in the middle of the table the next season and then into European competition the year after.

One sees the tremendous influence such a golden statistic could have.

Yes, it is entirely hypothetical, and many players would not deliver on their potential. Football is a complex sport. Managers, formations, tactics, weather, physicality, and mentality all affect how a player’s strengths on paper play out on the pitch. But does that mean the risk is not worth taking? Of course not.

This education into the raw basics analytics provides the context for the possibility for footballs future predictions.

What urges clubs like Chelsea or Bayern Munich to spend $60 million on Pulisic or $40 million on Hudson-Odoi remains unknown. Something, however, some elaborate goal or niche stat pushed them to say: “that is our man and we will get his signature whatever the cost.”

Whether these players will become Ballon d’Or winners or busts on the biggest stage remains to be seen.

Look, no one can predict the future. But a football based prediction system like “Sabermetrics” or “APBRmetricts” could come pretty close.