Lone Star Football's game and playoff forecasts are the main focus of my page, and are my main tool for combining sports with data journalism to create compelling stories with numbers.
The system I use is called the Elo Rating System, named after Arpad Elo, who made the system. It was originally made for chess, but has been used for a wide variety of different games and activities. The system starts by assigning ratings to teams, and uses those ratings to create game forecasts, which are used to update the ratings after the game.
I don't want to take credit for coming up with all of this data and functions, because I adapted them from FiveThirtyEights NFL Forecasts to fit Texas high school football.
For my system, I use data going back to the 2018-19 season to make these ratings. So at the start of the 2018 season is when all teams ratings start. While that is a very limited amount of data, its all the data I've been able to find en masse. Maybe in the future I'll try to expand past that year, but for now its what I have. During that season, all 6A teams started at a rating of 2000, 5A-1 teams started at 1940, and 5A-2 teams started at 1910. While I don't track games for teams below 5A, they do occasionally appear in the ratings if they play a 5A or 6A team. I've found that the best value for those teams is 1850. For teams outside of the UIL, the best value for those is 1900. There are exceptions to these rules, like when Duncanville played IMG Academy, or for when they're going to play Mater Dei, when assigning a below average rating wouldn't be fair, considering we know how strong those teams are. For those teams I estimate their power using Maxprep's rankings.
Game Predictions
For any game, I take the two team's assigned Elo Ratings before the game and use different adjustments to those ratings to create a pregame forecast. After a game, a teams rating will increase if they won, or decrease if they lost. This process repeats every game up to the state championship.
For any game between two teams (A and B), the odds of team A winning are;
Win chance = 1/(10^((-1*EloDiff)/400)+1)
Elo Diff is equal to Team A minus Team B's rating, plus some adjustments, including;
- Home Field Advantage: Teams who are home get 50 points added to their rating.
- Bye week advantage: Teams who had a bye the week before typically perform worse, so we subtract 15 points from their rating.
- Type of game multiplier: There are less upsets during district games, so the difference between two teams rating is multiplied by 1.2 to reflect that.
Once a game is over, the forecast adjusts teams rating based on three different factors;
- K-Factor: This is more of a technical term, but its the value that is used to multiply the pregame win chance by to figure out how many points to add. The higher it is, the more confident the forecast is, and it's less confident the lower it is. I found that the best K-Factor for 6A/5A football is 55.
- Forecast Delta: Another technical term, but this is the difference between a teams projected result and actual result. While our pregame forecasts can be anywhere between 0% and 100%, the actual result will be either 0% for a loss and 100% for a win. This measures the difference between the projection and the actual result.
- Margin of Victory Multiplier: This is definitely the most useful item in measuring ratings, because it measures how close a game was when making changes to a teams rating. A closer game will mean less changes will be made to team's ratings, while a blowout will have massive changes.
Using all those factors together, we get the changes to a teams rating! Thats pretty much how it all works. There are some limitations to this system, for instance theres no changes made during the offseason, so theres no reflection of any changes made to the coaching staff or to different positions on the field. It can't take into account player injuries, or really any factors besides the ones mentioned above.
Example
On October 4th, 2019, Killeen Shoemaker was playing an away game at Temple. Before this game, Temple had a rating of 2,076 and Shoemaker had a rating of 1,954. With just their pregame ratings, Temple is favored by 122 Elo rating points. Lets look at the adjustments;
- Home Field Advantage: Temple is playing at home, so they get 50 points added to their rating, and changing the difference from 122 to 172.
- Bye week advantage: Shoemaker didn't play a game the week before this, so they've got a disadvantage. 15 points are taken from their score, which changes the difference from 172 to 187.
- Type of game multiplier: This is a district game, so the difference is multiplied by 1.2 to reflect the lower probability of an upset. The difference changes from 187 to 224 (rounded down).
Using the new difference of 224, we can now calculate the chance of Temple winning and the chance of Shoemaker winning. To find out Temples chance of winning, we use this formula;
Win chance = 1/(10^((-1*224)/400)+1)
Win chance = 1/(10^(-224/400)+1)
Win chance = 1/(10^-0.56+1)
Win chance = 1/(0.2754+1)
Win chance = 1/1.2754
Win chance = 78%
The sum of the two win chances will be one, so to find Shoemaker's chance of winning, we can just subtract 78% from 1 to get a 22% chance of Shoemaker winning. So thats the forecast before the game happens, and the game goes on, and Temple ends up winning 38-28. To find Temples new rating, we use the postgame variables;
- K-Factor: Once again, theres no great way to quantify what this number means other than it measures how confident my predictions are overall, but the K Factor for all of the games I forecast is 55.
- Forecast Delta: Temple gets a forecast delta of one since they won, and Shoemaker gets a delta of 0 since they lost.
- Margin of Victory Multiplier: The formula for this is complicated, but it essentially measures how a team won and takes that into account by comparing the expected margin of victory and the actual margin. The margin of victory multiplier for this game is 2.177.
We multiply all these factors together to figure out the change in each teams rating after this game;
.
Temple(new)=Old Rating+(K-Factor*Forecast Delta*MOV Multiplier)
Temple(new)=2076+(55*(1-0.78)*2.177)
Temple(new)=2076+(55*0.22*2.177)
Temple(new)=2076+(12.1*2.177)
Temple(new)=2076+26
Temple(new)=2102
This is Temples new rating after the game. They gained 26 rating points for their win over Shoemaker. Now, what if hypothetically Temple lost? For this scenario, we'll use a margin of victory multiplier of 2.2, because thats the average MOV Multiplier.
Temple(new)=Old Rating+(K-Factor*Forecast Delta*MOV Multiplier)
Temple(new)=2076+(55*(0-0.78)*2.2)
Temple(new)=2076+(55*-0.78*2.2)
Temple(new)=2076+(-42.9*2.2)
Temple(new)=2076-94
Temple(new)=1982
As you can see, there would have been a much bigger change in Temple's rating if they had lost this game, since they were given a 78% chance of winning. This is how Elo fixes itself, and creates a better rating after each game to give more accurate forecasts in the future. This is why the ratings tend to be more accurate towards the end of the season, since changes have been made to better reflect each teams real rating.
Season Simulation
Now i've covered how I forecast each individual game, but not how I use those forecasts to create more forecasts. One of the biggest parts of my blog is the playoff projections and the state champion odds. I use these ratings and forecasts for each individual game to simulate the current season 1,000,000 times and count how many times each team makes the playoffs. This gives us each teams chance of making the playoffs, and it can be adjusted to see what would happen in "what if" scenarios, such as if a team wins this game what would happen to their chances. The reason I do one million simulations is because I want there to be so many simulations that the potential error from one simulation to another doesn't have an effect on what is displayed on my blog, which doesn't show any decimals. These simulations are also how I get the average simulated season record and point difference that is displayed.
I only use the simulations to project playoff chances, not state championship chances. I can actually use a mathematical formula for those to figure out exact chances, because I know the odds of a team winning each game using the forecasts above. For 5A, I just use my playoff projections and use the odds given from those each week to publish my state championship chances. For 6A, since we don't know which bracket each team will be in at the end of the regular season, we can't use that strategy, since it'll create some weird scenarios. For instance, my 2021 Preseason projections have Austin Westlake, 6A-1 Champion, being in the 6A-2 bracket and dominating it. While it creates interesting "what if" scenarios, it isn't useful for forecasting. For that reason, I won't publish state championship chances for 6A until the end of the regular season.
Comments
Post a Comment