Analyzing Defensive Metrics: Understanding PPDA in Soccer
Written on
Passes per Defensive Action (PPDA) has emerged as a captivating statistic in the realm of soccer analytics. This metric gauges the number of passes an opponent can make before the defending team takes any action to regain possession. A lower PPDA signifies a more aggressive pressing approach, meaning the attacking side faces resistance sooner. Conversely, a high PPDA suggests a more defensive strategy, potentially indicating a team that is "parking the bus."
To illustrate this, let's consider two contrasting examples of PPDA values in action. One notable instance from my dataset shows a staggering PPDA of 152 during a match between Bayern Munich and Hannover in the 2018/19 season. Although Hannover played with ten men from the 55th minute onward, the PPDA value reflects their deep defensive line, resulting in 712 total passes for Bayern compared to only 310 for Hannover. The visual representation of this scenario reveals Hannover's ten outfield players positioned deep in their half, with a formation that clearly illustrates their defensive intent.
A more recent example from the 23/24 season features West Ham facing Manchester City, where West Ham recorded a PPDA of 42.25 against City's 11.4. This reflects a strategic choice to counter-attack rather than press high against one of the top teams. After taking an early lead, West Ham maintained their defensive setup, effectively ceding territory while successfully limiting City’s attacking threats.
On the flip side, let's examine a match where a low PPDA was evident: Liverpool's game against Nottingham Forest in the 22/23 season, where Liverpool's PPDA was an astonishing 2.53, while Forest's was 32.45. To visualize this intense pressing strategy, I have included a link to a YouTube video showcasing Liverpool's pressing tactics under Klopp.
Now that we have established a foundational understanding of PPDA, it's time to delve into the analysis. For this examination, I analyzed eleven seasons' worth of match data across multiple leagues including the EPL, La Liga, Ligue 1, Bundesliga, and Serie A.
Scraping Data
As previously mentioned, the data was sourced from Understat.com, which offers comprehensive statistics for top leagues and is user-friendly for scraping due to its JSON format. The following Python code employs Selenium to collect match data for each team across various seasons, with a final clean-up process to transform the PPDA and PPDA allowed metrics.
import json import numpy as np from selenium.webdriver.chrome.service import Service from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.common.by import By from time import sleep, time import pandas as pd import warnings from bs4 import BeautifulSoup import requests from tqdm import tqdm import re
# Start the timer start_time = time()
warnings.filterwarnings('ignore')
# Create an empty list to store URLs. base_urls = [] urls = [] df_data = []
seasons = [2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023]
chrome_path = "C:/usr/local/bin/chromedriver.exe"
# 'La_Liga', 'Bundesliga', 'Ligue_1', 'Serie_A' competitions = ['EPL', 'La_Liga', 'Bundesliga', 'Ligue_1', 'Serie_A']
for competition in competitions:
for season in seasons:
base_url = f'https://understat.com/league/{competition}/{season}'
base_urls.append(base_url)
season_col_value = int(base_url[-4:])
competition_value = competition
# Set up Chrome options
chrome_options = Options()
chrome_options.headless = True
# Create a Chrome service
service = Service(chrome_path)
# Create a Chrome webdriver with the service and options
driver = webdriver.Chrome(service=service, options=chrome_options)
# Navigate to the URL
driver.get(base_url)
html_content = requests.get(base_url)
soup = BeautifulSoup(html_content.content, 'html.parser')
# Custom function to find league standings
def find_league_standings(tag):
return tag.name == 'script' and 'teamsData ' in tag.textleague_standings_html = soup.find_all(find_league_standings)
league_info = league_standings_html[0].string
ind_start = league_info.index("teamsData ") + 24
ind_end = league_info.index(");n") - 1
json_data = league_info[ind_start:ind_end]
json_data = json_data.encode('utf8').decode('unicode_escape')
json_data = json.loads(json_data)
for team_id, team_data in json_data.items():
for match_data in team_data['history']:
row = {'team_id': team_id, 'team_title': team_data['title'], 'season': season_col_value,
'competition': competition}row.update(match_data)
df_data.append(row)
df = pd.DataFrame(df_data)
df['ppda'] = df.apply(lambda x: x['ppda']['att'] / x['ppda']['def'] if x['ppda']['def'] != 0 else np.nan, axis=1) df['ppda_allowed'] = df.apply(lambda x: x['ppda_allowed']['att'] / x['ppda_allowed']['def'] if x['ppda_allowed']['def'] != 0 else np.nan, axis=1)
df['date'] = pd.to_datetime(df.date, format='ISO8601') df = df.sort_values(['date', 'season', 'team_id'], ascending=True).reset_index(drop=True)
driver.quit()
Correlations
A team's defensive formation can vary based on the quality of their opponents, making it risky to draw conclusions about PPDA values from limited data. I explored the correlation between PPDA and expected points (xPts), which sheds light on how these metrics influence a team's likelihood of earning points over time.
The correlation coefficient ranges from -1 to 1: - A value of 1 indicates a perfect positive relationship. - A value of -1 indicates a perfect negative relationship. - A value of 0 indicates no relationship.
The correlation matrix highlights that xG has the strongest correlation, as expected since xPts is derived from it. The PPDA metric shows a correlation of -0.27, while PPDA allowed has a value of 0.29.
This suggests a weak negative relationship for PPDA with xPts, indicating that as PPDA decreases, xPts tends to increase, albeit slightly. This could be attributed to top-tier teams employing high-pressing strategies. In contrast, PPDA allowed presents a positive correlation with xPts, implying that higher values in this metric correspond with increased expected points. This could reflect a common scenario where winning teams adopt more defensive tactics as they manage their lead.
I also examined matches that ended in a 0-0 draw to minimize game state influences. Although the same negative and positive correlations emerged, the differences in their strengths narrowed slightly. The stronger correlation for PPDA allowed suggests that effective game management late in tight contests becomes crucial for securing points, especially in such a low-scoring sport.
The analysis indicates that while PPDA and PPDA allowed are significant, they may not be as critical as metrics like xG and xG allowed in determining xPts and actual performance. However, they still provide valuable insights.
Final League Standings
In this section, I analyzed average PPDA and PPDA allowed values in relation to final league standings. I specifically looked at the champions of each season to understand their performance metrics.
Focusing on the EPL, the mean PPDA and PPDA allowed values reveal that all league champions typically maintain a low PPDA, generally under 20. This suggests that successful teams excel at pressing and limit opponents' passing opportunities. The accompanying green bar indicates that champions effectively retain possession against pressing strategies. A notable dip in 2016, following Chelsea's title win in 2015, reflects Leicester's unexpected triumph, characterized by a low-block counter-attacking style that diverged from prevailing trends.
When aggregating data across all competitions, the trend persists. Median values show that league champions consistently demonstrate low PPDA, indicating effective ball retrieval, while their PPDA allowed values suggest strong resistance to pressing.
Interestingly, the data for teams finishing in last place reveals higher PPDA values and significantly lower PPDA allowed scores, indicating their struggles with both ball recovery and press resistance.
Conclusion
PPDA serves as a valuable metric for analyzing team defensive setups and tactics. However, making long-term conclusions based on a single game's results can be misleading, as teams often adapt their strategies against weaker opponents. The correlation analysis highlights a weak negative relationship between PPDA and xPts, suggesting that lower PPDA values generally correspond with higher expected points. Conversely, PPDA allowed shows a slight positive correlation.
In soccer, both attacking and defending play crucial roles, and this analysis suggests that defensive metrics may hold more weight in terms of earning points. Overall, successful teams tend to exhibit low PPDA and high PPDA allowed values, reflecting their dominance in ball control. In contrast, relegated teams struggle with high PPDA and low PPDA allowed scores, emphasizing their deficiencies in pressing and possession retention.
While PPDA metrics are not irrelevant, they are most effective when used in conjunction with other metrics that provide a more comprehensive view of a team's capabilities over time. Coaches should pay attention to PPDA when devising strategies, as teams like West Ham are likely to adopt similar defensive setups against strong opponents like Arsenal, while teams facing Manchester City should avoid high-press tactics that could leave them vulnerable to counterattacks. Arsenal, in particular, should be mindful of their ball retention abilities when approaching such challenges.