Wednesday, 3 July 2024

How to price up a football match

Professional punters who make a living from betting on football will tell you that you need to be able to price up a game before you bet on it. Why? What do you mean why? Christ alive you really want all the answers don't you and not have to pay for it. Fine *rolls eyes, I'll tell you then. Pricing up football matches allows you to be able to identify potential errors in odds offered and that's where you can extract value when betting. If you think a bet is a 1/10 (1.1 on the decimal) and you found a bookmaker offering you evens money (2.00) you'd potentially be extracting an extra 90% of value on your bet if it wins. 

So can we (by which I clearly mean I), find a model using readily available data that helps us price up games prior to kick off to look for those inconsistencies to take advantage of?


Now stay with me here with what I'm about to tell you because where this case study goes is really interesting so far with the results it has thrown up. I'm telling you that now because despite listening to numerous podcasts and reading vast amounts of articles I've got to be honest I'm no closer to refining the model yet to get me close to the opening odds.  


HOWEVER I have a model and this model is throwing out results. Importantly I'm going to throw in the disclaimer once more that this blog is not a tipping service and will never be one, but I am going to intro this case study by pointing out last week it would have left a nice profit betting on J League 1 and after two weeks of running data - both weeks would have been in profit. That's got you intestinal now hasn't it. Genuinely though I do mean it when I say this is not a tipping service. The data that follows is meant for information purposes only in allowing readers to follow my progress and with full transparency. I'm not trying to sell people anything, anyone who has read my betting guides will know I'm definitely not trying to promote gambling. (Sorry again about the language used in those if you're easily offended.) What I do know is that people call me a smart arse or in Portuguese Chico esperto and if I'm having to read highly complex models over and over again to understand them then other people will struggle to. 


So I'll try simplify them. 


Added free bonus - I'll be the one doing all the hard work.


All you have to do is read on and I will publish before each round what the data says and compare it after for you using the model you'd find used on pretty much every YouTube video and blog available. All I ask is you don't bet on the data thinking this is going to make you rich. If you're going ‘I'm reading in between the lines and he's saying you should follow me and bet on what he's saying,’ that's not what I'm saying. See Guide 1 Part 1 for my view on tipsters. 


I don't know how scientific papers are written exactly but I vaguely remember the basic premise is you write a hypothesis, record the data and then draw a conclusion. I'm not doing that here. Instead I'll just try explain it as clear as I possibly can in a simple way unlike those people with big mathematical brains who tend to lose most people after the first five lines they've written on the same subject. If you understand those rudimentary basics you can go to advance class I won't be offended. For those of you remaining, are you sitting comfortably? Good, then we'll begin. 


So I wanted to find a model that can help me price football matches to see if I could accurately work out how a bookmaker gets their starting prices.


To do this I got the data from all J League games so far and I split it into every team for home and away. 


So far so simple. 


I found a model online which does some big maths using formulas in excel up until a fortnight ago I didn't even know existed. Don't ask me how it does the big maths because I don't know. Just take it as read that it involves a spreadsheet, some complex equations and a large pool of data. That's really is all you need to know. 


As I understand it to price a game you first need to work out the average team in a league. 


This is because if you had two average teams play against each other at a neutral venue the odds would be 7/4 for each team to win (2.75) on the decimal. That's the start point odds makers use when pricing a game. 


An average team in this instance doesn't have to be an actual team from the league you're looking at. It's not a word search. I'm not about to go ‘In the Premier League last season Wolves were bang average, Manchester City were brilliant and Sheffield United were frankly crap.’ Firstly I'm using the J League as my data model and secondly it is more complicated than that but as I say I'll try make it as simple as possible as best I understand it. 


You take all twenty teams in the J League which is to say I've taken all 20 teams. Take their home record; games played, goals scored and goals conceded. 


All those games provide you with the total games played, total goals scored and total goals conceded. 


You then are able to find out what the average number of goals scored for the league is for teams at home and in turn the average goals conceded. 


Those averages represent what the average side would be in the league i.e. our mythical bang average Wolverhampton Wanderers. 


The big maths then tells you how good any side is against the average team or in turn how bad it it is and gives it a rating for how good they are from an attacking and defensive perspective.


But the average team doesn't exist you're shouting. I know, I know. Somehow it takes that data from sides that are real, measures it against the average side that doesn't exist in the league to give an attacking value and a defensive value for each team. If you want an exact answer - you watch all the videos, listen to all the podcasts, read all the blogs and then you can write own blog post. I'm doing the best I can here for you the dear reader. 


The process is then repeated for away performances across the entire league. Again it finds the perceived ‘average team’ and a mathematical value to any side when playing away and whether any actual side in the league is better or worse than that. 


From those attacking ratings and defensive values based on goals scored, goals conceded, the league averages and whether the side in a fixture is home or away, you put more ridiculously long sets of formulas into the spreadsheet and it allows you to do this…


Select the side playing at home 

Select their opposition 


and like magic it throws out the probability based on goals scored so far for each team home and away and their respective attacking and defensive ratings and decides what the probability of a home win, draw and away win would be.


Probability importantly can be converted into odds. 


Therefore…


I convert those probabilities into odds.


I can find the odds on offer for the games with an operator. 


I can do the comparison checks between the odds offered and my decimal figures. 


DO NOT STOP READING AFTER THE NEXT LINE.


Does this model using goals scored to give me an attacking and defensive rating give me a good indicator to how the operator prices their games before kick off to look for errors in the market I can take advantage of?


No. See the part above in caps again because…


Despite the answer, this is where it gets interesting.


Me being me I've done the same model for goals scored in the J League using the XG for all sides. I was never going to be satisfied whatever the answer just using one set of data. 


So can XG provide using the same model give me an indicator of how games are priced as above?


Again no. 


Wait, wait, WAIT! Don't stop reading because this is how my brain works; 


What I do have is two sets of data that uses a model to predict a home win, draw and away win. 


I have one model for goals scored.


I have one model for XG (expected goals for the uninitiated).


I have the prices for each of the three options - home win, draw and away win. 


It's taken me a sodding long time to set up and find all the relevant data. So what if I use it for an experiment? 


I'll make another spreadsheet. I do like a spreadsheet me. 


So what if I were to record in one column what result the bookmaker thinks will happen based on the side they make favourite for a game (note bookmakers always seem to make one side or the other a favoured outcome and never the draw.)


In column two I record what the model thinks the result will be based on goals scored this season. 


In column 3 I record what the model thinks the result will be based on a sides current XG for the season. 


I do a hypothetical bet of £1 for each game and then draw a conclusion as to whether using either model would actually have left you in profit had you bet on the probability of a result or could you simply just bet the favourite for every game and make a profit? Because even shit like that would be handy to know right? Imagine if it was all that simple and favourites won everything and actually it was free money being handed out. Sorry I was daydreaming again. Where was I? Oh yes…


So just to recap result set one is what the bookmaker has made the favourite before kick off. There are ten rounds. If you bet £1 on each leg on the team they made favourite would it return a profit? 


With the goals scored and XG models if by chance the % is the same probability for two possible results in the same game then no bet would be placed on that game so you might bet less games in this experiment on any given round. 


So for each round of the J League taking the bookmakers favourites the outlay is going to be £10 in this example to cover all rounds. This is just an example. As I always say scale down to scale up so if you created a model you could bet £0.03 with William Hill on each one for example. I'm just using £1 because it's easier to show the results especially when working with decimal odds to make it easier for everyone reading. The nice thing about the J League is the odds on offer compared to a league like the Premier League. There is money to be made. 


Can I just say for anyone expecting the predictions from these two models to be similar then think again. These two models are like younger siblings arguing and fighting with each other over who gets the TV remote control. Not just a little bit either, I mean they're disagreeing - a lot. I'll give you some examples from the upcoming week of fixtures. Take these as an additional reminder of the volatility of any data points used for a base model. 


Sanfrecce v Vissel


Based on the goals scored model it has Sanfrecce with a 61.46% probability of winning.


Younger brother XG - he says Vissel have a higher probability of winning at 41.85%


Kashima v Hokkaido 


Goals scored has Hokkaido at 53.38% probability of winning. XG has … wait for it … Kashima at 84.18% probability of winning. 


Albirex v Sagan


GS has Sagan at 57.24% 

XG has Albirex at 56.78%


Those for context would all be a 180° turn to go from one direction to another if that helps you visualise it. 


I could release all the data tables and calculations but I'm sure if you wanted complex data modelling you'd go to a site that can provide you with that context. Actually I'm not sure there is one. There must be. Anyway I think it'll only confuse. The point of this simple experiment is as I've said above, can either model actually be used to predict one of the three potential match outcomes? 


So onto the data and week one.


June 26th


Fixtures are provided in full.


(For context 1 is a home win, 2 away win and X is the draw)


The predictions shown next to the fixtures go in the following order;


Bookmaker favourite - GS - XG


1 Avispa v Yokohama 2 - N/A - 2

2 Cerezo v Sagan 1 - 1 - 1

3 FC Tokyo v Hokkaido 1 - X - 1

4 Jubilo v Tokyo 2 - 2 - 1

5 Kashima v Gamba 1 - 2 - 1

6 Kawasaki v Shonan 1 - X - 1 

7 Kyoto v Kashiwa 2 - 2 - 2 

8 Nagoya v Urawa 2 - 2 - 2 

9 Sanfrecce v Albirex 1 - 1 - 1 

10 Vissel v Machida 1 - 1 - 2 


So the bookmaker had games 2,3 and 8 correct based on the favourites @ 1.75, 1.85 and 2.50 totalling in our example betting £1 on each £6.10 meaning a loss of £3.90.


Goals scored model couldn't separate Avispa and Yokohama at 36.53% each so no bet was made. It predicted 2, 6 and 8. Game 6 was the draw and that was @ 3.60. So total stake in this instance is £9 and returns of £7.85 (-£1.15). 


XG had a clear prediction for Avispa and Yokohama so £10 staked. It had the same outcome as the bookmakers so -£3.90. 


Are the bookmakers prices XG heavy? Possibly…


Now this could just be a happy coincidence but for those who are reading each post in turn they'll know that the expected ROI on football betting if you paid a tipster is 6-8%. Goals scored model hits that marker. Well it did on week one of the experiment.


So what happened in the next round of games. Same principle and display as the above round. 


June 29th


1 Hokkaido v Albirex 2 - 2 - 2 

2 Kawasaki v Sanfrecce 2 - 1 - 2 

3 Yokohama v Tokyo 1 - 2 - 1 

4 Cerezo v Nagoya 1 - 2 - 2 

5 Gamba v Machida 2 - 2 - 2 

6 FC Tokyo v Avispa 1 - 2 - 2

7 Urawa v Jubilo 1 - 2 - 1 

8 Sagan v Kashiwa 2 - 2 - 1 

9 Shonan v Kyoto 1 - 2 - 1 

10 Vissel v Kashima 1 - 1 - 1


£10 stakes this round for all 3 options.


Bookmakers favourite winners;


1, 4, 5, 7, 8 and 10 @  2.20, 2.37, 2.25, 1.72, 2.10 and 1.95. Returns of £12.29 (+2.29). 


Goals scored winners;


1, 3, 6, 8, 9 and 10 @ 2.20, 1.90, 2.30, 2.10, 3.70 and 1.95. Returns of £14.15 (+£4.15)


XG winners;


1, 5, 6, 7 and 10 @ 2.20, 3.75, 2.30, 1.72 and 1.95. Returns of £11.42 (+1.95)


So all three options gave a positive return this week. Again could just be purely coincidence that the goals scored model profited two weeks on the trot. 


For full disclosure I've not bet either round. 


Again I'm not a tipster service. I'm not for a second suggesting you subscribe to the theory of betting the predicted outcomes on either model but it is interesting. Even with a general model it does go to prove the adage of what I said in the first guide about trusting data over what you perceive your knowledge of a team to be based on their reputation. When subscribing to a tipster service you're required to give full commitment i.e. you have to bet everything and more importantly told to expect peaks and troughs in your betting with them. They'll readily admit they'll go two months on a run of losses. Football betting is not easy and these are the professionals. I'm just an idiot with a sports injury and need something to fill my time whilst I'm climbing the wall and trying not to lose the sodding plot.  Anways I digress back to the subject at hand…


Accumulative totals 


Bookmakers £20 stakes £18.39 (-£1.61)

Goals scored model £19 stakes £22 (+£3)

XG Model £20 stakes £17.52 (-£2.48)


Remember that ROI figure of 6-8%. So for every £10 staked their model runs at between £10.60 and £10.80. So for anyone going fuck off to +£2 I wouldn't get out of bed for that then you know what, that's fine, but be honest with yourself and track and total all your bets. One good win every so often does not a summer make. 


So in conclusion I guess the question you should ask me is did I think the model using goals scored or XG would actually give me a good indicator of whether they could be used as how a bookmaker prices a match? If you've read Part 1 of my guide to betting you'll already know the answer will be no. Because you'd have already ascertained that for every Wolverhampton in a league who might be bang average there a Vizela in Portugal who get hammered and that skews not only their data but the overall data for a league as a whole. 


Look I don't understand the mathematics enough to explain it or how the formulas extract the data and turn it into probabilities. But if you actually knew how to understand the complexities of it then you’d be sat reading one of those papers that explains it in glorious technical detail with algebraic equations and not this waffle. 


I'm just trying to simplify it enough to be understood by the majority of people who might read it by simply pitching it at a level that is hopefully easier to understand.


The basic premise (idea) to all these models you'll find online is by using goals scored in a league, can you predict a result and a range of possible scorelines? The answer truthfully has to be no. Note that's not what they'll tell you. They'll make it seem like a real key model that's going to aid your decision making. If the probability once converted to odds was close you'd strengthen the argument but you need to sharpen the lines. 


What you're basically doing is allowing data to sort teams in an order. So let's once again use the J League as an example. 


We've 20 teams in the league so we'll rank them from 1 through 20 based on goals they've scored at home. 20 being the side who have scored the most and 1 being the side who have scored the least. Let's say those numbers are now a power rating. 


So the team at the top with 20 has the best power rating for scoring goals. The team with 1 of course have the worst power ranking.


We do the same for goals conceded at home. Again we now have a power ranking for defences.


You take all the games played in the league, average them out and that gives you the starting point. 


But what is the real average game? 


Well the figures used are a true average of what happened overall. But what would have happened if Gamba hadn't had a man sent off in the 33rd minute against Machida on the Sunday just gone when they were leading 1-0. What happens if you went OK I won't count that game? In the same way if you looked at Portugal's top flight last season when Sporting CP beat Casa Pia 8-0 in a game where no one was sent off, it was genuinely just that one sided and the opposition coach refused to change his teams tactic of defending with a high line against the best attacking side in the league when they were away from home. Games don't generally have 8 goals in them. So you take an 8 goal game out and OK over the course of an entire season it's a minor change of maybe 0.01 but for the Casa Pia figures it's going to be a huge change in their defensive rating. Same with the XG model because they won't be giving up the same XG to an Estrela or Rio Ave. 


When you start taking all these data points out you'll have a very different average and power rating which would be more accurate as a starting point. Maybe those types of adjustments could see either model be closer to the bookmakers starting price. To be fair some predictions are pretty much on the money once converted from probability to odds but others are night and day they're that wide a gap. Yes we're looking for gaps but not gaps because the model isn't right and needs work and would cost us a fortune. 


A bookmaker will take the XG results for attacking strength and weakness and give a side a rating, they'll take the attacking strength of goals scored and weakness of those conceded and make those another rating and they'll average those two out. They'll have a further equation that takes into account the perceived home advantage in the same formulaic equation which will pick up better data points that are even more relevant. Then they'll have an odds maker pour over that data and they'll have analysts feeding them additional data like players suspended or injured and they'll assign them a value on what their perceived difference on a side winning or losing mighty be. Did home side A fly 2,000 miles in the early hours of Friday morning after a Europa cup tie meaning their perceived home advantage of 0.4 is scrubbed off. When it comes to the Premier League that input is all the more conclusive than against a league like the J League. Who is betting on the J League? Well I do every year. Clearly not enough to bring the market down from the 2/1 prices you can find every round for a win. 


So going back to the teams ranked from 1 through 20 example and pose a simple question;


How will a team with a high ranking attack perform against a team with a really poor defensive rank if they played at home measured against the average performance in the league?

 

You'd imagine very well, but by updating the data each week you can get a better average picture of the probability. But the key word there is an average picture. It's not an accurate picture. But oddly the goals scored model does seem to have something about it but that could just be pure coincidence and for the next four weeks it all goes Pete Tong. But like a Ronaldo penalty at the Euros that's football. Things go wrong in football every game. That's because it's played by humans and as humans it's part of our nature to make mistakes. It's why pencils have rubbers on the end of them. 


That's about as simple as I can make it so hopefully it makes sense. Any technical questions I suggest you find an expert to ask somewhere else. 


Oh and there will always be one. So when I upload the forthcoming data predictions and odds before the weekend take a screenshot if you think I'd be sad enough to change them. There's no point thinking I'm not going to be transparent because I'm telling you not to do something. If you want to see a screenshot of the ridiculous amounts of data which are whittled down to a line of 1 - X - 1 I'm happy to do it. I just don't want to bore the pants off people like they're back doing GCSE maths. I'll just bore you all like you're doing GCSE English instead waffling on. 







No comments:

Post a Comment

An idiots guide to betting on football Part 7

Discipline It's been a good month or more since I wrote my last post and the underlying reason is I've been busy taking my own advic...