A baseball hackathon In Chicago? Yes, please!

On October 18th, I woke up to a notification on my phone for an email titled “2016 MLBAM Bases Coded.” The preview read

We are pleased to inform you that your Team’s entry has been selected as a potential Finalist entry in the 2016 MLBAM Bases Coded Technology Challenge

Two weeks earlier on October 7th, I rounded up two of my friends and applied last minute, literally with minutes to spare, to Major League Baseball’s Bases Coded Challenge. I had first heard of this challenge only a few hours before, through a promotional email sent by Major League Baseball. Why their first email blast came just before the entry deadline, I can’t tell you. Thinking that an entry entailed submitting some sort of project, I was pretty bummed to have missed on out on a competition involving software and baseball, two of my passions!

Luckily I sat down later, after class, and took a closer look at the submission requirements. In fact, MLB only wanted an idea, not a completed piece of software. Winners would be selected based on the potential of their idea, and the experience of their team. With a couple hours left before the deadline, I started filling out the entry form as best I could. I began filling out profiles of my team, who we are, why we wanted to compete- along with links to and descriptions of our previous projects.

Thankfully, I had been brainstorming a baseball app during the months prior, and it was fresh on my mind. Though technically complex, the basic idea for the app was simple enough to fit into the 500 character limit the form required. I enlisted the assistance of my two teammates (Jon and Gio) to help fill out the rest of the entry form. At this point they agreed to help, and to be on the team (I knew they’d say yes). We ended up submitting our entry with 12 minutes left until the deadline, though to be honest, the last 15 minutes were spent deciding on a team name and mascot. We ended up with Benny ‘The Jet’ Rodriguez as our mascot, and our team name was decidedly, C the Ball.

And then we forgot about it.

When the email woke me up, I ran into Jon’s room, threw my phone at him and asked to confirm I was not seeing things. Long story short, we had been selected by Major League Baseball to compete in their 24 hour hackathon during the World Series. We would be flown out from San Luis Obispo to Chicago, be put up in the Chicago Hilton, and compete for a chance to win tickets to Game 4 of the World Series at Wrigley Field.

Our flight left SLO early Wednesday morning, and we arrived in Chicago by early afternoon. During the flight, me and Jon had our first opportunity to start brainstorming/debating how the app would work. Because all 3 of us had to cram schoolwork and studying into the week before our trip, we had no time to bring our idea anything past just an idea. We sat next to an older gentleman who was very interested in our project and who spoke with us for most of the journey. He was even more interesting. He spoke of designing an entire city in Saudi Arabia, managing the Chicago Cubs spring training, and running a foundation for supporting first generation college students. I haven’t been able to verify any of that, but it sure made the trip interesting.

From Chicago Ohare, we took the Blue Line train to The Loop, the meeting grounds for Chicago’s transit lines, and one of the nicer parts of the city (also where our hotel was located). We spent the evening at the hotel making plans for the hackathon the next day.

The hackathon was held at a Marriott Hotel a 30 minute walk north of our hotel. When we walked into the lobby and saw signage for “World Series Distribution Headquarters”, we knew we were in the right place. This must have been the hotel where MLB officials were working out of. On the 7th floor, a large hall, fit to hold hundreds of people, had just several small tables, four of which were for competing teams, and we decided on one we’d be at for the next 24 hours straight. As time neared the actual start of the hackathon clock, we met the three other teams, organizers from MLB, representatives from a company called New Relic, as well as a Senior Developer and Director of Research for MLB’s technology arm, MLBAM. All were extremely welcoming, and many of them I’d love the opportunity to meet again in the future! And so it began…

Our application, later named Scout, was to be a mobile application with a purpose of removing the confusion of statistics from the baseball fan experience. By leveraging the data we were provided by MLB directly, our app would determine the most exciting things about each baseball game to be played on a given day. This would be done entirely on the backend, by comparing pitcher/batter matchups, recent pitching/hitting streaks, rivalries, team streaks, and many more factors. For each game, our app would determine the most important factors and assign a tag to it, such as “Pitcher’s Duel”, or “Top Rivalry.” We spent the first couple hours planning, both what the factors would be, how the data would be funneled through our system, and what we wanted to present to the user. I began working on the iOS app, while Gio and Jon teamed up on the backend side of things.

Having participated in several hackathons, I can say we had above average focus. Maybe because we spent less time on the idea, maybe because of the stakes, or maybe because we were so interested in the project. Whatever it was, we did a lot in a short amount of time. At many points through the night, we discussed how the backend would be hooking up with the app, come time. Nearer the morning there came a point where the format of the data returned by our API changed, but otherwise the connection between server and client was quite smooth. Our biggest problems seemed to stem from using an Amazon EC2 instance to host our backend. Jon and Gio encountered many problems, especially as first time users, and noted the lack of documentation/tutorials to achieve simple tasks. Another problem we encountered was having to incorporate the monitoring software of New Relic, one of the sponsors, and an MLB partner. New Relic monitors your application’s performance, which while useful, was overkill for a 24 hour hackathon. Installing New Relic in our Python backend was not too bad, but trying to add it to our iOS app was not so graceful. We ended up ripping it out completely, but the entire process unfortunately killed the final hour and half of the hackathon. Nevertheless, time was up. The 24 hours had been filled with focus sessions, food breaks (oh man was the food good), Gio dancing in front of the GoPro that had been set up, and working hard on building a product we’re very proud of. To finish off the day, each team informally presented their project in front of the MLB/New Relic folks, not to be judged. The real presentations would come the next day, Saturday, when we’d return to present in front of the panel of judges. With just an evening in front of us, we had to find time to make up some of the lost sleep, build a slide deck, and prepare for a presentation. We began walking back to the hotel, discussing our thoughts on the other projects, and our shot at winning the grand prize. We were confident that our technology was best designed and most efficient (thanks to some preliminary comments from New Relic), and that if we could kill the presentation, we might have a chance.

After reaching our hotel room, we fell into our beds and didn’t wake up until quite later that evening. Hungry, we went out for some Giordano’s pizza, and didn’t get back until just before midnight. Setting to work on Google Docs, we began working on our presentation, defining our message, and honing in on why our application could be a success amongst current baseball fans and in bringing in new fans to the sport. With a completed slide deck, we hit the hay once again, alarms set for early the next morning- we’d be required back at the Marriott not much later than 8:30.

And so we were there early the next morning. The judging panel was revealed to be a writer for Sports On Earth, the Senior VP of Mobile Product Development at MLBAM, the CEO of a sports startup accelerator, and a VP from New Relic. The crowd for the presentations was not much bigger than the group of people we had met over the previous few days, but the tone was a bit more serious now that presentations were to begin. We ended up presenting third of four teams.

The first team to present was our favorite competitor. This team of friends from the Deep South was extremely friendly and we had enjoyed chatting with them both before the competition and after. Having seen their project the day prior, we decided they would be our main competition, and as we later found out, they had the same opinion of us! They developed a mobile app that allowed friends to play a simplified version of fantasy baseball. Instead of having to manage a team over an entire season, their game required players to choose 5 teams each day that they think would win, and compared the results to those of friends. Their project was extremely polished visually, and I would definitely play if it were available in the app store!

The next team was a group of Chicago natives who were experienced Python developers. They too developed a mobile application, but one that would be used by a fan during a game, not before and after. Their app would understand exactly what would be happening in a baseball game, thanks to MLB data, and explain different plays, vocabulary, and confusing baseball concepts to the user. Visuals were not the strong point, but the concept certainly tackled the problem of baseball being confusing to newcomers.

Next it was our turn. Our presentation hit a snag when our app froze on Jon’s computer during the demo. This was thanks to a rogue error left behind by the New Relic integration that had gone wrong, but luckily the iOS simulator was running on my computer as well- so we were back up quickly. We didn’t kill the presentation as we had hoped, but we got our points across and did a swell job.

The first three teams were composed of three guys each, but the final team had just two. In fact, one of the two showed up late and didn’t seem to do any work on the project, so it was truly just one. This team made a Twitter bot that tweeted when a hit or an out was unexpected based on historic data.

As the judges deliberated, we started talking again with the first team who we had become friendly with. We discovered they too had run into complications with New Relic. After 15 minutes or so, we returned to our seats to hear the final decisions.

As Chad Evans, the Senior VP from MLBAM pointed out the judges comments on each of the projects, everyone listened intently. For our application, he pointed out the originality of our idea, and described how the tagging of games was something MLB themselves had been trying to figure out for a long time. While each team received nice feedback, I was very inspired by ours. Unfortunately, the Twitter bot won, so Game 4 tickets were not in our sights.

But Chicago was amazing, we met some really nice people, and on the plane home we ran into a part owner of the Arizona Diamondbacks who was wearing an enormous 2001 World Series Championship ring. The best part is that we’ll be continuing to work on our application and hopefully have it ready by the start of the next season!

Thanks to Major League Baseball for the incredible opportunity, and to our teachers who were kind enough to excuse our absence and extend some deadlines 🙂

Generating Drake Lyrics with a Markov Chain

One of my recent data science lab assignments was to choose a musical artist, scrape every one of his/her song lyrics from the web, and use the Markov Chain technique to generate new lyrics. The following program works by compiling a separate list for every word mentioned sung by Drake. Each list is a list of all words that have ever followed the word to which the list belongs. For example, “you” –> [“to”, “just”, “need”, “finish”, “boys”]. It then uses these connections and some random selection to generate new sentences/lines. I’ve used Drake as my test subject because I think his lyrical style is very recognizable, but you can run this script for any artist. All you need to do is to swap out the LyricsFreak URL. Note, the new URL must be identical in structure (i.e. .com/d/) or else the web scraping will not work.

import requests
import time
from bs4 import BeautifulSoup
import bs4
import time as t
import random

lyrics = []
links = []

songs = requests.get("http://www.lyricsfreak.com/d/drake/")
parser = BeautifulSoup(songs.text, "html.parser")

for song in parser.find_all("td", class_="colfirst"):
    link = song.find("a")['href']
    links.append("http://www.lyricsfreak.com" + link)

for link in links:
    song = requests.get(link)
    parser = BeautifulSoup(song.text, "html.parser")
    lines_dump = parser.find("div", class_="dn", id="content_h")
    if lines_dump is not None:
        lyrics.append( list(lines_dump.strings) )

def train_markov_chain(lyrics):
    transitions = {"": [], 
                   "": [],
                   "": []}
    for lyric in lyrics:
        for lnum, line in enumerate(lyric):
            chopped = line.split()
            for wnum, word in enumerate(chopped):
                if word not in transitions:
                    transitions[word] = []
                if lnum == 0 and wnum == 0:
                elif wnum == 0:
                if lnum == len(lyric) - 1 and wnum == len(chopped) - 1:
                elif wnum == len(chopped) - 1:
    return transitions

chain = train_markov_chain(lyrics)

def generate_new_lyrics(chain):
    # a list for storing the generated words
    words = []
    # generate the first word

    done = False
    while done == False:
        ondeck = random.choice(chain[words[-1]])
        if ondeck == "":
            done = True
    # join the words together into a string with line breaks
    lyrics = " ".join(words[:-1])
    return "\n".join(lyrics.split(""))


Here’s one of the songs the script spit back out:

She in it heals all, all, all, switch the ground off multi-platinum recordings 
Drinking watch is this is in here 
I see my pen up to know what she two up the rims on a little, why is in my day I got to who you strip for you just like the A&R; men never forget it poppin’ don’t make me there is for me who's it go that you might just show up... Damn. 
... on camera 
I can also raise one so well, this money til next to Texas back in, 
You ain't last season changed you gon' come back about her and screwed, I left your clothes 
We can tell the year 'round me and hotels that I mean one else's 
I need more (More) 
Twenty five o four, need you 
I don't know you wanna be, I say you need some dinner you hope you probably end getting right now that OvO that (Money) So much to check if you pass it anyways, 
Yeah, Tom Ford tuscan leather smelling like 
There ain't nothing 
I'mma do it... still fly though, 
I really have to the ceiling 
I give it start? The Tires Burnin 
I need it up in the morning 
Ever since I know I'm squeezin' in my thoughts of Four You Always gone 
Some Girls That I before I guess that's where you 
Somebody shoula told me. 
This Girl you bring us down and it's where it like oh-ah-oh-oh 
Tuck my whole city faded (the ride) 
My memories of baked ziti  
Let It Hurt Faces 
I see your friend 
No One Else 
Beat the past piss

Side Project: Mapping baseball teams ( Pt. 2 )

Today I released version 2 of The Baseball Map. I spent many hours of the past weekend rebuilding much of the site (a simple as it was) from the ground up. Building upon much of the feedback I received from my first endeavor on reddit, I fleshed a more useful interface. The map now provides a quick peek at the wikipedia entry for each stadium, a satellite view of the stadium, and users can now even view a sort of web visualization of where the minor league affiliates are located for each major league team. Last night I put the final touches on the site and pushed it live. It’s not perfect, but it ended up looking and working really well. After miraculously receiving a Product Hunt invite yesterday afternoon, I posted my project to the site and am currently in the ‘upcoming’ list with 5 upvotes, only 2 of which aren’t from myself or my friends. I’m pessimistic about getting featured on Product Hunt, but Reddit and Twitter have been easier allies.

As the first side project I’ve worked on in too long, I’m happy to have gotten something built. I’m excited to have been able to get valuable feedback from users and use it build something that they have told me they really enjoy. Even if just one person told me that, the last couple weeks of work would be worth it.

Side Project: Mapping baseball teams across the country

As a huge fan of baseball, I was recently surfing the web to try and find a map of all the baseball teams in my current area. I’m spending the summer in Lincoln, Nebraska and am aware that the Lincoln Salt Dogs play just a few blocks away from my apartment, but I had no idea (until working on this project) that another team plays about 30 minutes away in Omaha. This would not stand. I decided that baseball fans everywhere must have the most convenient access to locate nearby baseball teams, and the best way to do this would be to map every single baseball team in the United States. So I set out to build ‘The Baseball Map.’ I’ve spent the past week pulling team data manually from the web, which has been quite the job. The amount of copying and pasting location coordinates was too much to handle, so I only did a little bit each day after work. All of the data is currently stored in an array on the front end, but I’d like to implement a simple back end to hand user submission in the future. I made the website using the Google Maps Javascript API and spent most of today (July 3rd) styling it up and refining the interface. I decided early on that I wanted to get a lot of feedback and criticism from baseball fans. I built this just as a fun project, but I’ve really curious about what other fans think of what I’m building. Do they think it’s just a neat thing to look at for 30 seconds? Is there potential for them to enjoy using the site repetitively? I ventured to the reddit.com/r/baseball IRC channel for my first round of feedback, with a basic screenshot and a short description. I got a ‘looks good’ and ‘looks good from that one shot’, which was nice to hear but not the criticism I was looking for. The chatroom discussion quickly resumed with a ‘forgot to buy bread, fuuuuuck’. That’s what I get for going to a chatroom for feedback. I plan on posting the site to actual /r/baseball subreddit tomorrow.

Why reddit for feedback?

  • I’ve found reddit is very inviting to side projects, no matter where you post them, and you’ll usually get a decent number of responses within a few hours of posting. What I’m hoping to hear is what value people think this would have. If the answer is none, which it might be right now, I’d be interested to hear if any of the following would add more value for a baseball fan:
  • Expanding the data set to include college teams, foreign teams, etc.
  • Including a large number of historical locations- including stadiums, baseball landmarks, museums, etc
  • Allowing for user submission of these locations
  • Allowing for commenting on locations – people could share their memorable experiences at their favorite stadiums, or even provide tips for getting into batting practice early, getting autographs, etc.
  • Allow people to check off which stadiums they’ve been to. Many baseball fans take pride in having visited different stadiums.

Questions I still have:

  • Should the data focus more on the team or the stadium?
  • Are fans more interested in seeing nearby stadiums and learning about the history of past/present stadiums, or are they interested in seeing the teams of past/present? I figure this is something I should figure out right now before I put more of my time into adding more data.

Baseball fans love statistics, so what statistics could I gather about teams or stadiums that would be of interest to fans? This could be things like season W-L records for teams throughout history, or attendance records for stadiums throughout history. Or it could be as simple as tracking internal data, such as what teams have the most user submissions (should I implement a commenting system), or which teams have been clicked/viewed the most. This would create a sort of popularity contest.

I’ll post again when I release version 2. For now you can check out the project at thebaseballmap.com