Sony Open Fantasy Projections
The real kickoff of the daily fantasy golf season happened last week at the Sony Open. This was the first attempt at adopting the algorithmic techniques I applied this last NFL season to predict games against the spread. This same statistical method went 20-6 and 1 against the spread on its most confident picks during the last 3 weeks of the season.
During the NFL season I also started posting fantasy points projections for NFL quarterbacks. I did not have the time to use the more proprietary advanced method, and instead simply used an iteratively fit linear model to make these fantasy projections. Unlike the excellent performance of the advanced algorithm against the spread – these linear methods proved to be too, well, linear to be useful for projecting fantasy points. The flaws were many, but the most important flaw in my opinion was inability of the linear methods to handle non-linearities that the more advanced algorithm can easily take into consideration.
An example of the deficiency might be described thus… Let’s say we are trying to predict Young Tom Brady‘s fantasy performance this week against Denver. A linear method might give him an incremental bump over the average QB for the following (mostly made up) reasons:
1) Playing against a relatively bad Denver defense – 25% increase in projected fantasy points
2) He’s freaking TOM BRADY – 20% increase
3) Weather is looking good in Denver – 15% increase
4) Let’s assume that one of Denver’s cover guys is injured – 10% increase
5) One of NE’s O-linemen is coming back off of being injured – 10% increase
6) Tom Brady had a monster game last game – 10% increase
7) NE’s offensive scheme matches up favorably against Denver’s D – 15% increase
These boosts would then be added together in a linear method. So if the average QB is expected to get 10 fantasy points per week, the linear method would project Brady for this week at:
10 + .25* 10 + .2 * 10 + .2*10 + .1*10 + .1*10 + .1*10 + .15*10 = 20 pts
This is a gross simplification of both methodologies, but bare with me. One thing the advanced methods account for are multiplicative effects which are typically invisible to linear methods. With multiplicative effects you can consider all the above 7 facts simultaneously – so instead of linearly combining the sizes of their effects, they can be multiplied together. The projection then becomes:
10*(1.25)*(1.2) *(1.2)*(1.1)*(1.1)*(1.1)*(1.15) = 27.6 pts
The difference between 20 points at QB and 27.6 points could be the difference between a so-so week and a monster performance. I think the conclusion from this is basically that most sports play out in the realm of nonlinearities – trying to make them conform to linear statistical methods without accounting for their non-linear nature is just bat-nuts crazy.
So why didn’t I just use the advanced algorithm I used for picking games against the spread to predict fantasy football performance? Simply stated: Time. Just predicting 16 games against the spread took almost 1.5 days for my 8-core laptop to compute. It would take more kernel-time than I’ve got to predict the games and QB fantasy performance. So I stopped making the projections altogether and decided to stay in my lane just picking games…
Which brings me to fantasy golf. Unlike football, golf only has one position. The same stats are considered for each player, so the overhead of data wrangling is far lower. Injuries, turnovers, bad coaching decisions and other externalities are far less relevant for predicting and understanding fantasy golf performance than these factors are for the NFL; we expect statistical methods to be more applicable to golf than almost any other sport. The number of variables (or at least, variables I can get my hands on easily) is far fewer than the variables you might consider relevant for football which means shorter computation times and more manageable data infrastructure/cleaning.
All in all – golf provides a perfect proving ground to see if the same algorithm that I developed and used to predict 16 game outcomes per week with shocking accuracy in the NFL regular season could predict 150 fantasy golf outcomes per week on the PGA tour. The first test was last week’s Sony Open. In this post we will introduce the methods by which I plan to evaluate the algorithms performance. The benchmark for evaluating performance will be the excellent weekly projections made by Notorious at www.rotogrinders.com. Who is Notorious you ask? You obviously haven’t been playing daily fantasy sports… or haven’t been winning, at any rate.
Introduction to Notorious from www.rotogrinders.com
Believe it or not – I’m not the only person on the planet obsessed with fantasy sports. There are tons of sites dedicated to fueling the obsession of our sorry lot; probably the most influential and relevant being www.rotogrinders.com. If you’re playing daily fantasy sports and have never visited this site – as I said earlier – you probably aren’t doing all that well. Members of the site are lovingly referred to as ‘grinders’ and perhaps one of the most accomplished grinders of all goes by the handle ‘Notorious’. He’s won myriad accolades in the industry, not to mention and dump trucks full of cash, through his fantasy sports skill. He’s also one of the grinders who is most interested and successful in fantasy golf.
Each week Notorious puts out a column previewing the upcoming tournament and performing some basic projections on who might be a good pick for the week. His methodology is usually pretty straight forward – which is one of its great virtues. The other is that he’s exactly on point. Take his post for the Sony Open, for instance, where his 4 named picks went 1st (Jimmy Walker), 2nd (Kris Kirk), 8th (Charles Howell III) and 51st (Russell Henley) – all making the cut – which is critical if you want to be competitive in daily fantasy golf.
I always consult Notorious’ picks when setting my lineups. In addition to his picks, he puts together a simple formula to predict fantasy points based off vegas odds, % of cuts made, and perhaps a few other stats, and uses these projections to get a sense of a players value based off of their prices on http://www.draftstreet.com. During the football season I only began to realize the error of my mathematical ways when I started bringing the projected QB points from CBS.com’s experts. Their projections are created by hand and end up being far less linear than my own; their distribution matching, more or less, the distribution of points we see on a weekly basis. I want to start the fantasy golf season using Notorious’ work as a barometer for the sanity of my statistics so as not to make the same mistake twice.
In the following analysis, which I hope to make a weekly feature, you will see comparisons of my statistical projections against those produced by Notorious. He’s been gracious enough to let me use his work for comparison purposes on the blog – so I want to emphasize that this is in no way a competition. Notorious is a very big and successful name in DFS – if the algorithm helps me to come anywhere near his level of performance I’ll be very happy (and profitable) indeed.
Alrighty – let’s get into it!
Sony Open Fantasy Projection Performance
A bit of clarification about how the algorithm works. It doesn’t try to predict fantasy points – that would be insane. In some tournaments the greens are soft, the rough is tight, the weather is nice, etc.; you can expect the average golfer to put up fantasy numbers that are rather impressive if you ignore the context. On the flip side, the Majors are events in which sometimes the winner barely breaks par. Outside of the context that it’s a Major, if you saw Tiger Woods had shot 71, 70, 70, 69 you would think he’d had a terrible tournament.
Both Notorious and myself account for these issues in our own way. I train all my algorithms to predict what I call the “Percent Mean Field Draft Street Points” (‘%MFDSP’) – that is, what percentage of the average golfer in this field’s fantasy points will a particular golfer end up with? I do this for a variety of reasons that I might explore further in another post, but for now suffice to say that it’s the best approach for what I’m trying to accomplish.
Notorious, on the other hand – and this is just from what I can gather, I may be slightly off on this – sets out the number of fantasy points he think the winner of the tournament will likely have. This number is what fluctuates based on context. With this number set, removing context from the equation, he uses his method to predict the fraction of that ‘winning number’ he expects each individual to have. For all practical purposes we are doing the same thing – 6 of one and 1/2 dozen of the other. The advantage of Notorious’ method is that it produces comprehensible fantasy numbers; i.e. it makes more sense to look at and digest visually. The disadvantage is that it still induces context into the equation – even if only artificially – and data that needs context to be understood makes data-driven prediction very, very difficult.
So in order to make our projections comparable, I needed to adjust one set or the other so that they would be using the same convention. Since my convention is context neutral, it would be more difficult and contrived to convert my %MFDSP projections into actual fantasy point projections than the other way around. So for comparison purposes I took Notorious’ figures, calculated the average, then divided the his projected points by that average to come out with what Notorious would have projected as the %MFDSP. This isn’t a perfect conversion since he didn’t predict the entire field – and I hope he doesn’t take umbrage with the methodology – but for the time being will do. As you’ll see – and as is almost always the case with real-world data analysis – there are plenty of imperfections in this post, mostly due to time and data constrains… but, if you ask me, some retrospective analysis is better than none.
One final note before we get to the goods; I don’t have perfect access to golf data. Getting a golfer’s performance from last week takes a bit of work – so I only have a golfer’s recent history if they’ve played (or are playing) in another tournament for which I would need to get the data. For this week that means I only have Sony Open results for the golfers that are now participating in the Humana Challenge. Also – I make predictions for almost the entire field while notorious makes predictions for most of the field, but selects slightly for the golfers that are, or might be, actually fantasy relevant. Only golfers for whom I both have the data, and for whom Notorious has made a projection will be included in this analysis. For the Sony Open that still amounts to a solid 90 + golfers we can analyze.
Below is a grid showing Notorious’ draft street points projections (DSP), the converted mean field draft street points projections (%MFDSP), the machine’s %MFDSP and the actual %MFDSP’s (keeping in mind that the actuals are calculated using only the 95 players for whom the data was in hand). In this grid, figures in green are for players who – for the respective projection method – were projected to make the cut (green in the Actual.%MFDSP implies that the player actually made the cut) . Figures in red are those who were projected to miss the cut for each projection type (and similarly, red players in the actuals column are those that actually missed the cut). The grid is sorted descending by Actual.%MFDSP.
Visual inspection is always a good way to get a sense for how a qualitative measurement is faring in the real world. Right off the bat I notice that, without any braggadocio, both the machine and Notorious’ picks performed quite well. We both had the top 15 best performing golfers projected to make the cut at least. In particular Notorious dominated the very high-end picks, notably Walker, English, Stuard, Overton and Palmer who he had projected very near their actual %MFDSP. When I first saw these results my heart dropped a little bit (and my angrier half said – ‘told you so, ya moron!’) – because it looked anecdotally at least that Notorious’ simple trick for projecting performance had outperformed all my high-and-mighty math. As he mentioned in this week’s Humana Challenge projections, last week was pretty exceptional as far as Notorious’ daily fantasy golf advice was concerned. Any time you pick the winner and 2nd place, you have to feel good. So perhaps these picks represented an above average week… but still, I was not all that excited about the prospect of his simple method outperforming all of my insanity. But then again, he’s really, really good at this stuff – so what did I expect!?
Fortunately, all is not lost. On average the machine did just barely edge out Notorious in terms of pick accuracy when we consider all 95 picks – mathematically anyway. The standard method for measuring a projection’s accuracy is called the root-mean-squared-error (RMSE) – and when I calculated the RMSE for both sets of projections here’s what I came up with (lower is better for an RMSE):
So the machine projections have a lower RMSE (thank God). I realized that Notorious’ methods and my own are so orthogonally different that some linear combination of them would probably outperform them both. So first I tried a simple projection made up of the average of our two projections (50_50Blended.RMSE) and indeed it did better than either prediction alone. Finally, because I’m curious like that, I wanted to find the weighting of the predictions that would produce the lowest possible RMSE. This means that instead of taking the average (.5 * notorious proj. + .5 * machine proj.), I change the .5’s to some other factors, say .6 and .4, or .25 and .75, any two numbers less than 1 whose sum adds up to be exactly 1. This actually forms an equation which can be solved with the help of a laptop and any basic programming software (I use Mathematica). Below is a plot of all the RMSE’s for all the blended models, and the lowest point on this graph represents the optimal blend of .273 * notorious picks + .727 * the machine’s picks:
So yes, the machine’s projections outperformed Notorious’ simple figures, but not by so much that they don’t benefit by being blended roughly 30%-70% with the projections of the DFS heavyweight. So far so good, and this is exciting because it means I can improve the weekly picks even more by incorporating this outside source. I probably wont have time to do it for the upcoming Humana Challenge picks this week (already posted), but I’m interested in actually attempting this blending in future weeks and using these blended projections to produce my rosters … we will see how things go.
As I mentioned earlier – one of the most important aspects of daily fantasy golf is making the cut. So I did a basic analysis of how our rankings faired in this regard. Using our respective top 70’s – which would roughly reflect our predictions for players we feel ought to make the cut – I calculated the percentage of that group that actually did make the cut as follows:
Both are very good numbers in my opinion. The PGA tour is so remarkably competitive that for 90% of the field it’s truly a coin flip if they are going to be playing on the weekend. To provide people with a 67-77% edge in that critical category is really impressive and extraordinarily valuable. And this percentage is leaving off some seriously big names that I know we also both nailed – not the least of which is Adam Scott (as mentioned earlier – Scott was not included because he is not in this coming week’s field so I don’t yet have his data).
All in all, this exercise has accomplished two things.
First, it’s made me respect how effective Notorious’ simple methodology for projecting draft street daily fantasy golf points really is. If, as a daily fantasy player, all you ever used to set your rosters was his weekly projections – you’d probably do damn well for yourself. And secondly, it’s reaffirmed my belief that the algorithm is really onto something, and that it’s application in the realm of golf is a legitimate use of the technology. Once more into the fray, as they say, and this time with a renewed sense of confidence.
Keep a lookout for NFL picks (the machine is getting embarrassed by my old man… I tried to tell him – and you – and myself – that it wasn’t designed for the playoffs… but nooooo! I had to go bet against the hold hustler!) on Saturday. May Tom Brady finally get his come-up-ins.
Special thanks to Notorious at rotogrinders.com for use of his name and his figures in this post.