Creating a Kaggle submission

In this project, you're going to wrap up the stuff on eigen-ranking by creating your own Kaggle-like submission file. Those of you who didn't quite get around to submitting your ESPN brackets will have the opportunity to do that for the Sweet Sixteen as well.

What's the Sweet Sixteen you ask?? Well, at this point, the men's NCAA tournament has been whittled down to 16 teams:

In [1]:
sweet_sixteen = [
 'Villanova','West Virginia',
 'Texas Tech','Purdue',
 'Kansas','Clemson',
 'Syracuse','Duke',
 'Kansas St','Kentucky',
 'Loyola-Chicago','Nevada',
 'Florida St','Gonzaga',
 'Michigan','Texas A&M'
]

Note that the order of these teams is important in describing the structure of the tournament. They are listed as pairs of competititors. I guess the whole picture should look something like this:

ss

The details of the assignment

Part 1 (for everyone)

Check out our updated information on Optimizing Eigenrankings in HTML or Notebook formats. In addition to explaining how to optimize the eigen-process, it now describes how to create a Kaggle submission file and points to a tool to visualize it. Your mission is to create your own file.

Note that the file is actually created at Input 21 but there's a lot that goes into the process before that. Thus, you'll need to grab the data and parse the code to execute the relevant parts to produce the file.

To keep it interesting, everyone will go through a slightly different opimization procedure. The actual opimization happens at Input 15 but, again, depends on code before that. The basic idea is that the process depends on several parameters described at Input 11. We then use regular season and tournament data for a previous year and find values of the parameters to minimize the process for that year. The hope is that those parameters are reasonably good for the current year. In order to individualize the process, you can:

  • First: Everyone should optimize based on your birthyear plus ten. For example, if you were born in 1998, you should optimize using data from 2008.
  • Second: If you'd like to tweak this further, you might try averaging out the procedure over several years or even using a different formula for the matrix entries. This second part is not required, though.

Part 2 (for those who haven't done an ESPN bracket yet)

If you haven't filled out an ESPN bracket yet, you can do so based on the Sweet Sixteen. Note that our optimization code has a new section on viewing tournaments that describes how to view the sweet sixteen and I'll send you an invite to a new group soon!

Turn in and due date

You should email to me by midnight on Wednesday, March 21 a Zip file of a folder that contains the following:

  • A valid Kaggle submission file with 2279 lines.
    • A valid submission file should score and format nicely here.
  • A nicely formatted Jupyter notebook containing just:
    • the code that you used to prodcue the submission file,
    • A brief description of your optimization procdure
    • Your identification in either Square Brackets or NumericalBracket.