Tuesday, March 13, 2018

Data Science VS ESPN - Who predicts the best bracket?

1 Million Brackets

I published an article on LinkedIn detailing a project to create one million unique NCAA brackets for this year's tournament.  If you're interested, you can read the article here.  What follows are the results comparing the accuracy of my simulation's predictions vs the accuracy of the general public (chosen to be all submitted brackets to the NCAA Tournament Challenge at ESPN).

 

Last update: 4-2-18

Status: My best bracket is tied for 2nd place (by points) with a chance to win it all!

 

Results

First Round

Data Science: 30 Wins

ESPN: 31 Wins 

 

The best of my 1 million brackets had 30 wins.  There were 2 instances with 30 wins (not visible on the histogram below because the scale of the y-axis). ESPN had 4 brackets with 31 winners.  ESPN is in the lead!


 

 


 

Through Second Round...

Data Science: 40 Wins Total

ESPN: 43 Wins Total

Through two rounds, and some wild upsets, ESPN remains in the lead.  The top bracket on the leader board has 43 wins.  I can already sense for next year, the need to make a few changes.  One, learn about translating statistics into probabilities to account for 16's vs 1's having a chance of winning. It was simple to expect they did not, but perhaps not accurate.  Two, I am comparing my best bracket's number of wins to ESPN's.  ESPN however uses a point system whereby they give more points per win to each of the subsequent rounds.  This makes it more difficult to pick out their best bracket in terms of strictly the number of wins.  Finally, and just for kicks, I may compute my best bracket based on ESPN's scoring system at the end of the tournament.

It was a treat to watch the better strategy and coaching prevail both with UMBC in the first round and Texas A&M in the second round.  What exciting games! I'm looking forward to more great competition in the next round.

 

 


 

 

Going Into the Final

There is good news and bad news.  The bad news is that my total number of predicted games is lower than the simulation even from last year.  The good news is that, using the same scoring system as ESPN, my best bracket is tied for 2nd place. It had all 4 teams in the final four which helped, but missed Villanova in the final mathcup, picking Kansas.  Luckily it picked Michigan to win it all so there's still a chance for one of my brackets to beat them all.

As it stands I am tied for second with 1340 points on my best bracket (bracket number 417502 of 1,000,000).  As shown below, there is still a way for me to have the best bracket: Michigan has to win.  If they do, I will have the best bracket (by points) of all submitted ESPN brackets. This results from no one else having more points than me and also picking Michigan as the winner in the final.  This is pretty exciting given that I created 1 million brackets and ESPN historically has 10 million+ submissions.  This is even more exciting given that this year, ESPN has the option for individuals to choose a "smart bracket".  These brackets theoretically should lead their group of brackets to be a better competitor.

Below is a screen shot of the ESPN leaderboard on top with the top place point totals highlighted in yellow.  I am tied with the bracket circled in blue.  Below is the text file of all my brackets.  The last six teams (of the highlighted row) show the final four, final game, and final winner, with the final four highlighted in yellow.







The odds favor Villanova tomorrow but I'll be rooting for Michigan.

Go Blue!

Final Results

My bracket in contention to win it all did not prevail with the Michigan loss. 

All in all, the project served to highlight the benefits of running a large number of simulations along with some basic probabilities to simulate improved "guesses" over a mass of independent "thinkers" performing the same task. 

I find it fascinating the even one bracket of my million predicted a perfect final four.  The naive odds (assuming every game's winner has a 50/50 shot) suggest that a perfect final four bracket is found 1 / 2^60 times (or 1 in 1 quintillion attempts).  Yet with some moderately better than naive probabilities, I got a "hit" with only 1 million attempts, results 1 trillion times better than expected. 

I look forward to tweaking the simulation for next year. If you have any ideas or suggestions, I'd love to hear them.





 

 

 

 







No comments:

Post a Comment