Week 13: Spread Picks, Results and Analysis

by thesanction1

“Momma said there’d be days like this…” – The Shirelles

Week 13 was absolutely brutal. 5 for 15. Does anyone have an un-tinker button?

What I failed to mention in that post about the advantages of tinkering is that many of the people who conducted the early experiments with the accidentally discovered X-rays also died early, brutal deaths of rare exotic cancers, presumably brought on by their increased exposure to radiation.

I think the algorithm suffered roughly the same fate last week.

I’m writing this between flights so I have to keep it brief – but there were 3 factors which could explain the algorithm’s abysmal performance in week 13:

1) Predictions were made on Tuesday to accommodate the Thanksgiving day games.  Normally the algorithm includes (and is trained on) the most up-to-date data about injuries and weather – things that are much more certain for Sunday’s games on Friday or Saturday than they are earlier in the week.  Most team’s injury / practice reports don’t contain much new information for the upcoming week until at least Wednesday. This lack of information could easily have thrown things off.

2) Tinkering gone wrong. The algorithm needs to be retrained every so often, and I had ideas on ways of improving performance which I tried to implement in the short time between weeks 12 and 13.  Changes to the training perimeters can sometimes require a few days to run, backtest and implement properly – so while changes made their way into the week 13 predictions they had not yet been fully vetted, backtested and verified.  These half-baked tweaks may have contained undetected errors and potentially knocked things out of whack.

3) Lack of confidence.  Week 13 was the first week in which no games were predicted with a confidence index above 3.  In a sense the algorithm was telling us that something was amiss.  The two previous weeks, in which there were three fewer games played due to bye weeks, contained at least 3 games with a confidence index greater than 3.  With no bye weeks in week 13, there were 16 games played and not a single game was predicted above a confidence of 2.8.

Even more than these factors, I think the poor performance is best understood in the context of statistics.  Any single NFL game is one giant pile of uncertainty.  Anything can and does happen.  When the algorithm was successful in weeks 11 and 12, I insisted that the performance be understood in the context of statistics; a monkey throwing darts has a chance of picking every game perfectly on any given week.  An edge is something that plays itself out over many, many contests. In any field that involves the prediction of probabilistic events will have it’s ups and downs.

The best investment managers can lose money in any given year. Perhaps even more damaging in the long run are those numerous morons who know nothing and just get lucky early on in their careers. These are people who have no skills, but due to an extended luck streak have confidence that they are somehow superior to their peers. There are literally millions of people managing money around the world; if a million people pick stocks for 10 years, and each has absolutely no skill (with only a 50/50 chance of outperforming the market in a given year) nearly 1,000 of those money managers will outperform the market in every single year for 10 years straight.  These individuals will genuinely believe that their method of investment management is significantly better than the next guy, when they might as well be flipping a coin to make their decisions.

Many of the most successful hedge funds and financial firms are filled with individuals who are the product of this type of winning streak. 10 years of market outperformance is the type of thing a person can build a reputation on, and that reputation can sustain another 10 years of mediocre or sup-par performance.  Some of them really do have skill, but many of more of them are paid millions annually to flip that coin one more time. Every time they flip heads they get more money and trust, attracting more clients and piling up more cash in their accounts. Much to the chagrin of their clients, coin flips are independent events, and these once lucky individuals rarely sustain their outperformance in the longer term.

Anyone interested in this way of understanding random processes should check out Nassim Talib’s Fooled By Randomness. All of Nassim’s books are well-written and imbued with his unique way of understanding a world full of random processes.

Ultimately the key to surmounting these types of statistical problems is to focus on process rather than outcome as the barometer for soundness of any prediction approach (or any process, for that matter).  Does the methodology applied make sense? Can you explain and understand where the performance is coming from? Is it possible that this method or process can actually impact the probabilities of success? Is the process something significantly difficult to replicate that it’s unlikely that others have already tried it or can easily duplicate it?

Just as many an undeserving career has been made by a lucky streak, many a deserving individual or method has quit or been forced out of an industry for absolutely no fault of their own. Even if you have a method that really does tip probability in your favor, it’s quite possible that a sustained string of tails will convince you that you’re terrible at something and force you to quit.  Again – you have to be honest with yourself, focus on method not outcomes. If the method is sound you should not be too shaken by an early run of bad luck. In the longer term outcomes will always prove out the most legitimate processes.

So I could try to make excuses and explanations of the algorithm’s performance this past week. I could mention that the games that were predicted correctly covered the spread by an average of 12.68 points, while the games that it predicted incorrectly only failed to cover by an average of 5.63 points.  I could point out that 5 of the games the algorithm missed failed to cover by less than 3 points. But none of this changes the fact that last week’s picks were flat out bad.

Overall performance – week: 5 of 15 (33%), season: 19 of 41 (46%):

Performance_OverallPicks_Season_2013_Week_13

SCI greater than 1.0 – week 5 of 12 (42%), season 15 of 30 (50%):

Performance_HighConf1_Season_2013_Week_13

SCI greater than 2.0 – week: 1 of 6 (17%), season: 6 of 15 (40%)

Performance_HighConf2_Season_2013_Week_13

SCI greater than 3.0 – week: None, season: 4 of 6 (67%)

Looking at results like this is so discouraging.  It makes you second guess everything – did I flip some critical sign from positive to negative in the code – should it really have gone 10 of 15?!  Did I make some other error of implementation? Is it all just hopeless?

For all the reasons listed above I’m standing by the process. A world-class statistical approach to predicting sports outcomes has validity, and there’s some edge to be gained. The same tinkering that may have botched last week has been pushed to completion, and backtesting is currently in process.  I’m hoping to have all of this accomplished before this Sunday’s games. I’ve also learned my lesson about projecting before all the injury and weather data is in – this week’s projections will be released no earlier than the Friday updates.

I hope everyone has a great week. Week 13 QB projections analysis should be out tomorrow, and week 14 picks on Saturday.