He’s already made his billions.

“He’s 70 years old already made his billions, now he wants to give back to the American people.”

This is something I’ve heard said and seen commented online in defense of Donald Trump’s massive wealth and personal network. The assumption here is that Donald Trump had a cutoff, he reached that cutoff, and now he’s decided to give back.

I’d like to hope the people who hold this assumption have not stopped to think about how he achieved that wealth. That they have not yet considered the possibility that the mindset that got him there doesn’t magically stop once a certain limit is reached. At least then it would be understandable why they vehemently follow this line of thinking.

Other wealthy Americans such as Mark Zuckerberg and Bill Gates have donated huge portions of their worth toward charitable causes, research, and world changing projects. Yet even if they were to run for president, I would think (at least in passing)- if they’re doing so much good outside of politics, why make a bid for the Oval Office?

Donald Trump has done little in the way of directing his money towards anything but himself. He did not become a billionaire by chance, or as a side product of building something that caught on (Facebook, Windows, etc). He became a billionaire through making every decision with the interest of growing his empire.

 

 

 

 

 

Likes and retweets of Donald Trump tweets.

Every tweet has (at minimum) three numerical data points associated with it:

  1. the number of retweets
  2. the number of replies
  3. and the number of favorites it has received.

Several days ago I looked at a Donald Trump tweet and questioned if there were any insight to be gained from examining the ratios between those three numbers. My hypothesis was that those ratios could allow me to classify the public’s response to that tweet. For example:

  • a tweet that has a higher ratio of replies to favorites was perceived as more controversial and elicited more backlash.
  • a tweet with a higher ratio of likes to retweets was perceived as agreeable but insensitive enough that people might not want to share it.

Unfortunately Twitter’s API doesn’t provide the number of replies to a tweet, but here are just some findings after an hour playing in a Python notebook.

Each of Trump’s 200 most recent tweets has been plotted on the graph below. Tweets posted from an iOS device are shown in red, while those from an Android device are shown in blue. It has been suggested that iOS tweets are strictly his staff, while Android tweets are strictly Trump’s personal thoughts. The lower popularity of iOS tweets could support that notion.

You can see the ratio between favorites and retweets is approximately 5:1, but what I was interested to see is which tweets strayed the furthest from the line of best fit. Here are snippets of the 7 tweets with highest ratio of favorites to retweets.

THANK YOU!
#JointSession #MAGA🇺🇸\nhttps://t.co/RDO6Jt2pip
Join me live at 9:00 P.M. \n#JointAddress http...
I will be interviewed on @foxandfriends at 6:0...
Big dinner with Governors tonight at White Hou...
Going to CPAC!

These are certainly not dividing or controversial statements.

Let’s look at the 7 tweets with the fewest favorites per retweet.

How low has President Obama gone to tapp my ph...
Terrible! Just found out that Obama had my "wi...
I hereby demand a second investigation, after ...
We should start an immediate investigation int...
Venezuela should allow Leopoldo Lopez, a polit...
RT @Scavino45: LIVE Joint Statement by Preside...
Iran is playing with fire - they don't appreci...

These are clearly more aggressive messages.

A baseball hackathon In Chicago? Yes, please!

On October 18th, I woke up to a notification on my phone for an email titled “2016 MLBAM Bases Coded.” The preview read

We are pleased to inform you that your Team’s entry has been selected as a potential Finalist entry in the 2016 MLBAM Bases Coded Technology Challenge

Two weeks earlier on October 7th, I rounded up two of my friends and applied last minute, literally with minutes to spare, to Major League Baseball’s Bases Coded Challenge. I had first heard of this challenge only a few hours before, through a promotional email sent by Major League Baseball. Why their first email blast came just before the entry deadline, I can’t tell you. Thinking that an entry entailed submitting some sort of project, I was pretty bummed to have missed on out on a competition involving software and baseball, two of my passions!

Luckily I sat down later, after class, and took a closer look at the submission requirements. In fact, MLB only wanted an idea, not a completed piece of software. Winners would be selected based on the potential of their idea, and the experience of their team. With a couple hours left before the deadline, I started filling out the entry form as best I could. I began filling out profiles of my team, who we are, why we wanted to compete- along with links to and descriptions of our previous projects.

Thankfully, I had been brainstorming a baseball app during the months prior, and it was fresh on my mind. Though technically complex, the basic idea for the app was simple enough to fit into the 500 character limit the form required. I enlisted the assistance of my two teammates (Jon and Gio) to help fill out the rest of the entry form. At this point they agreed to help, and to be on the team (I knew they’d say yes). We ended up submitting our entry with 12 minutes left until the deadline, though to be honest, the last 15 minutes were spent deciding on a team name and mascot. We ended up with Benny ‘The Jet’ Rodriguez as our mascot, and our team name was decidedly, C the Ball.

And then we forgot about it.

When the email woke me up, I ran into Jon’s room, threw my phone at him and asked to confirm I was not seeing things. Long story short, we had been selected by Major League Baseball to compete in their 24 hour hackathon during the World Series. We would be flown out from San Luis Obispo to Chicago, be put up in the Chicago Hilton, and compete for a chance to win tickets to Game 4 of the World Series at Wrigley Field.

Our flight left SLO early Wednesday morning, and we arrived in Chicago by early afternoon. During the flight, me and Jon had our first opportunity to start brainstorming/debating how the app would work. Because all 3 of us had to cram schoolwork and studying into the week before our trip, we had no time to bring our idea anything past just an idea. We sat next to an older gentleman who was very interested in our project and who spoke with us for most of the journey. He was even more interesting. He spoke of designing an entire city in Saudi Arabia, managing the Chicago Cubs spring training, and running a foundation for supporting first generation college students. I haven’t been able to verify any of that, but it sure made the trip interesting.

From Chicago Ohare, we took the Blue Line train to The Loop, the meeting grounds for Chicago’s transit lines, and one of the nicer parts of the city (also where our hotel was located). We spent the evening at the hotel making plans for the hackathon the next day.

The hackathon was held at a Marriott Hotel a 30 minute walk north of our hotel. When we walked into the lobby and saw signage for “World Series Distribution Headquarters”, we knew we were in the right place. This must have been the hotel where MLB officials were working out of. On the 7th floor, a large hall, fit to hold hundreds of people, had just several small tables, four of which were for competing teams, and we decided on one we’d be at for the next 24 hours straight. As time neared the actual start of the hackathon clock, we met the three other teams, organizers from MLB, representatives from a company called New Relic, as well as a Senior Developer and Director of Research for MLB’s technology arm, MLBAM. All were extremely welcoming, and many of them I’d love the opportunity to meet again in the future! And so it began…

Our application, later named Scout, was to be a mobile application with a purpose of removing the confusion of statistics from the baseball fan experience. By leveraging the data we were provided by MLB directly, our app would determine the most exciting things about each baseball game to be played on a given day. This would be done entirely on the backend, by comparing pitcher/batter matchups, recent pitching/hitting streaks, rivalries, team streaks, and many more factors. For each game, our app would determine the most important factors and assign a tag to it, such as “Pitcher’s Duel”, or “Top Rivalry.” We spent the first couple hours planning, both what the factors would be, how the data would be funneled through our system, and what we wanted to present to the user. I began working on the iOS app, while Gio and Jon teamed up on the backend side of things.

Having participated in several hackathons, I can say we had above average focus. Maybe because we spent less time on the idea, maybe because of the stakes, or maybe because we were so interested in the project. Whatever it was, we did a lot in a short amount of time. At many points through the night, we discussed how the backend would be hooking up with the app, come time. Nearer the morning there came a point where the format of the data returned by our API changed, but otherwise the connection between server and client was quite smooth. Our biggest problems seemed to stem from using an Amazon EC2 instance to host our backend. Jon and Gio encountered many problems, especially as first time users, and noted the lack of documentation/tutorials to achieve simple tasks. Another problem we encountered was having to incorporate the monitoring software of New Relic, one of the sponsors, and an MLB partner. New Relic monitors your application’s performance, which while useful, was overkill for a 24 hour hackathon. Installing New Relic in our Python backend was not too bad, but trying to add it to our iOS app was not so graceful. We ended up ripping it out completely, but the entire process unfortunately killed the final hour and half of the hackathon. Nevertheless, time was up. The 24 hours had been filled with focus sessions, food breaks (oh man was the food good), Gio dancing in front of the GoPro that had been set up, and working hard on building a product we’re very proud of. To finish off the day, each team informally presented their project in front of the MLB/New Relic folks, not to be judged. The real presentations would come the next day, Saturday, when we’d return to present in front of the panel of judges. With just an evening in front of us, we had to find time to make up some of the lost sleep, build a slide deck, and prepare for a presentation. We began walking back to the hotel, discussing our thoughts on the other projects, and our shot at winning the grand prize. We were confident that our technology was best designed and most efficient (thanks to some preliminary comments from New Relic), and that if we could kill the presentation, we might have a chance.

After reaching our hotel room, we fell into our beds and didn’t wake up until quite later that evening. Hungry, we went out for some Giordano’s pizza, and didn’t get back until just before midnight. Setting to work on Google Docs, we began working on our presentation, defining our message, and honing in on why our application could be a success amongst current baseball fans and in bringing in new fans to the sport. With a completed slide deck, we hit the hay once again, alarms set for early the next morning- we’d be required back at the Marriott not much later than 8:30.

And so we were there early the next morning. The judging panel was revealed to be a writer for Sports On Earth, the Senior VP of Mobile Product Development at MLBAM, the CEO of a sports startup accelerator, and a VP from New Relic. The crowd for the presentations was not much bigger than the group of people we had met over the previous few days, but the tone was a bit more serious now that presentations were to begin. We ended up presenting third of four teams.

The first team to present was our favorite competitor. This team of friends from the Deep South was extremely friendly and we had enjoyed chatting with them both before the competition and after. Having seen their project the day prior, we decided they would be our main competition, and as we later found out, they had the same opinion of us! They developed a mobile app that allowed friends to play a simplified version of fantasy baseball. Instead of having to manage a team over an entire season, their game required players to choose 5 teams each day that they think would win, and compared the results to those of friends. Their project was extremely polished visually, and I would definitely play if it were available in the app store!

The next team was a group of Chicago natives who were experienced Python developers. They too developed a mobile application, but one that would be used by a fan during a game, not before and after. Their app would understand exactly what would be happening in a baseball game, thanks to MLB data, and explain different plays, vocabulary, and confusing baseball concepts to the user. Visuals were not the strong point, but the concept certainly tackled the problem of baseball being confusing to newcomers.

Next it was our turn. Our presentation hit a snag when our app froze on Jon’s computer during the demo. This was thanks to a rogue error left behind by the New Relic integration that had gone wrong, but luckily the iOS simulator was running on my computer as well- so we were back up quickly. We didn’t kill the presentation as we had hoped, but we got our points across and did a swell job.

The first three teams were composed of three guys each, but the final team had just two. In fact, one of the two showed up late and didn’t seem to do any work on the project, so it was truly just one. This team made a Twitter bot that tweeted when a hit or an out was unexpected based on historic data.

As the judges deliberated, we started talking again with the first team who we had become friendly with. We discovered they too had run into complications with New Relic. After 15 minutes or so, we returned to our seats to hear the final decisions.

As Chad Evans, the Senior VP from MLBAM pointed out the judges comments on each of the projects, everyone listened intently. For our application, he pointed out the originality of our idea, and described how the tagging of games was something MLB themselves had been trying to figure out for a long time. While each team received nice feedback, I was very inspired by ours. Unfortunately, the Twitter bot won, so Game 4 tickets were not in our sights.

But Chicago was amazing, we met some really nice people, and on the plane home we ran into a part owner of the Arizona Diamondbacks who was wearing an enormous 2001 World Series Championship ring. The best part is that we’ll be continuing to work on our application and hopefully have it ready by the start of the next season!

Thanks to Major League Baseball for the incredible opportunity, and to our teachers who were kind enough to excuse our absence and extend some deadlines 🙂

Generating Drake Lyrics with a Markov Chain

One of my recent data science lab assignments was to choose a musical artist, scrape every one of his/her song lyrics from the web, and use the Markov Chain technique to generate new lyrics. The following program works by compiling a separate list for every word mentioned sung by Drake. Each list is a list of all words that have ever followed the word to which the list belongs. For example, “you” –> [“to”, “just”, “need”, “finish”, “boys”]. It then uses these connections and some random selection to generate new sentences/lines. I’ve used Drake as my test subject because I think his lyrical style is very recognizable, but you can run this script for any artist. All you need to do is to swap out the LyricsFreak URL. Note, the new URL must be identical in structure (i.e. .com/d/) or else the web scraping will not work.

48-2

import requests
import time
from bs4 import BeautifulSoup
import bs4
import time as t
import random

lyrics = []
links = []

songs = requests.get("http://www.lyricsfreak.com/d/drake/")
parser = BeautifulSoup(songs.text, "html.parser")

for song in parser.find_all("td", class_="colfirst"):
    link = song.find("a")['href']
    links.append("http://www.lyricsfreak.com" + link)

for link in links:
    song = requests.get(link)
    parser = BeautifulSoup(song.text, "html.parser")
    t.sleep(0.1)
    lines_dump = parser.find("div", class_="dn", id="content_h")
    if lines_dump is not None:
        lyrics.append( list(lines_dump.strings) )

def train_markov_chain(lyrics):
    transitions = {"<START>": [], 
                   "<END>": [],
                   "<N>": []}
    for lyric in lyrics:
        for lnum, line in enumerate(lyric):
            chopped = line.split()
            for wnum, word in enumerate(chopped):
                if word not in transitions:
                    transitions[word] = []
            
                if lnum == 0 and wnum == 0:
                    transitions["<START>"].append(word)
                elif wnum == 0:
                    transitions["<N>"].append(word)
                    
                if lnum == len(lyric) - 1 and wnum == len(chopped) - 1:
                    transitions[word].append("<END>")
                elif wnum == len(chopped) - 1:
                    transitions[word].append("<N>")
                else:
                    transitions[word].append(chopped[wnum+1])
        
    return transitions

chain = train_markov_chain(lyrics)

def generate_new_lyrics(chain):
    # a list for storing the generated words
    words = []
    # generate the first word
    words.append(random.choice(chain["<START>"]))

    done = False
    while done == False:
        ondeck = random.choice(chain[words[-1]])
        if ondeck == "<END>":
            done = True
        else:
            words.append(ondeck)
    
    # join the words together into a string with line breaks
    lyrics = " ".join(words[:-1])
    return "\n".join(lyrics.split("<N>"))

print(generate_new_lyrics(chain))

Here’s one of the songs the script spit back out:

She in it heals all, all, all, switch the ground off multi-platinum recordings 
Drinking watch is this is in here 
I see my pen up to know what she two up the rims on a little, why is in my day I got to who you strip for you just like the A&R; men never forget it poppin’ don’t make me there is for me who's it go that you might just show up... Damn. 
... on camera 
I can also raise one so well, this money til next to Texas back in, 
You ain't last season changed you gon' come back about her and screwed, I left your clothes 
We can tell the year 'round me and hotels that I mean one else's 
I need more (More) 
Twenty five o four, need you 
I don't know you wanna be, I say you need some dinner you hope you probably end getting right now that OvO that (Money) So much to check if you pass it anyways, 
Yeah, Tom Ford tuscan leather smelling like 
There ain't nothing 
I'mma do it... still fly though, 
I really have to the ceiling 
I give it start? The Tires Burnin 
I need it up in the morning 
Ever since I know I'm squeezin' in my thoughts of Four You Always gone 
Some Girls That I before I guess that's where you 
Somebody shoula told me. 
This Girl you bring us down and it's where it like oh-ah-oh-oh 
Tuck my whole city faded (the ride) 
My memories of baked ziti  
Let It Hurt Faces 
I see your friend 
No One Else 
Beat the past piss

Why is it important for our data to be encrypted and private?

Think back to a time when the computer did not exist. Now imagine you’d like to write a letter to a friend. You would have to write out your message on a piece of paper, get an envelope and a stamp, and put it all in a mailbox to be taken by your mailman the next day. Consider you may have your mom proofread the letter before you seal the envelope, or the roommate of the recipient may accidentally open the letter on arrival, thinking it is hers. Furthermore, at any point between you pressing the pen to paper and your friend reading it, the letter can be intercepted and read by anyone. A postal service worker could steal the dollar bill you sent along. A thief could steal it from your friend’s mailbox before she gets home from work on the day it is delivered. Maybe they even steal it from your mailbox before the mailman puts it in his truck (there is a flag on your mailbox screaming ‘stuff in here!’ after all). To improve security, senders might use envelopes lined with patterns to make it difficult to determine their contents. Think beyond the postal service. Every financial transaction related to you was recorded on paper or not all. At work, every paper, presentation, or drawing you produced was likely on paper.

This amounts to the sum total of all recorded information existing within large quantities of paper. Paper is tangible, so does that make it secure? One one hand, the storage of small amounts of paper probably does not include high security detail- to steal a file from a doctors office or rifle through someones recycling bin for their credit card statement would not be terribly difficult. On the other, it would be quite difficult to steal large amounts of paper from widespread locations. To steal all of the medical records from a hospital, or the credit card information of everyone who bought something from any Sears last weekend would require insane man-hours and would be impossible to do without drawing attention.

Imagine, now, all of this information has been digitized, making communications and transactions much simpler, but also opening doors to simplify the process of performing large scale interception. You wouldn’t be in ruin if a thief stole the birthday card and $20 from your aunt sent via snail mail, but you probably would be very upset if that thief intercepted every $20 gift to your Paypal account and the $300 your parents tried to direct deposit to your bank account so you can fly home to see them next weekend.


Every communication, transaction, or other piece of data you interact with today is recorded at some time, to some extent, on the internet. Even if your credit card statements are still mailed to you every month as a few sheets of paper, the company that sent them has all of that information recorded in their computer systems. Even if your grandma has never touched a keyboard in her life, all of her financial information, her grocery store purchase history, a log of her phone calls, even the number of miles she’s driven since her last oil change- it’s all stored on a computer. When many of us think of what information we have stored on the web, we think of Facebook pictures and emails we’ve sent to our coworkers, but in reality, other people/companies/governments probably have stored much more information about your life than you have about yourself.

At this point in our story, hopefully you see reason for concern. The legal collection of massive amounts of public and private information has allowed for new businesses to flourish, cities and governments to understand how to become more efficient, and scientists and researchers to learn more about how the world works. But it also leaves us vulnerable to people who are ready to use our private information, whether we ourselves have recorded/stored/transmitted it or not, for personal gain, hurting others in the act. What can be done to prevent a thief from tapping into these streams of information? One of the ways to counter any such effort is encryption.

Encryption is a method of turning words, numbers, pictures, or any sort of information, into a code that is only readable by the people who are supposed to read it. If you send an encrypted message to your brother, before that message leaves your device (laptop/phone/etc) it is translated into a jumble of numbers and letters that would not make sense to the human eye. To create the order of the numbers and letters in the jumble, your device will use a key. Think of a key as another jumble of letters and numbers that acts as a password of sorts for a piece of data. Based on the letters and numbers in the key, the original message to your brother will be turned into a final encrypted message. The software on your brother’s device will have its own key that will, in a similar manner, be able to decrypt the encrypted message you have sent him. When data is encrypted correctly (there are many different ways to encrypt), it is nearly impossible for an ordinary thief to crack/intercept, requiring automated computer programs potentially many years to solve- certainly not reasonable.

But not every interception attempt is ordinary. The National Security Agency has reportedly spent hundreds of millions of dollars building high speed computer systems that take advantage of flaws in common encryption techniques and can break them. That our own government is spending our money to weaken the security of our data is disheartening. Encryption (hopefully) secures all of the information held by our government, meaning he NSA is willfully working against our own interests. Devising methods to bypass encryption is dangerous, because preventing the techniques from leaking is no guarantee- the NSA does not have a good history of keeping information secret. It is also an incentive for computer scientists and cybersecurity experts to develop more stringent forms of encryption that can hold against larger and more powerful attempts from our government or any other organization.

Encryption is the best tool at our disposal to secure data and keep it out of the wrong hands. To a hacker, whether state-sponsored or amateur, good encryption is like a 3.5 foot thick vault door. It’s very difficult to get through and would require huge amounts of resources and time to drill through. But if someone with the time and resources, such as the NSA, were able to figure out how the door’s lock works, there now exists the ability for someone with fewer resource, less time, but with that knowledge, to break through. Is the answer to keep building thicker doors with more complex locks, or should other methods be used beyond encryption? Both.

To extend on the door analogy and to borrow from a professor I had, imagine a hacker is faced with such a thick, complex, daunting door, yet the walls around it are made of plywood. This is how most hacking occurs, by finding loopholes in a system, not necessarily technically complex wizardry.

It is this scenario that is relevant to the Apple vs. FBI case. The iPhone device in question has some information stored locally on the phone, meaning that even without an internet connection, such information would be accessible from the device. It also holds information that is stored on the internet, via iCloud. All of it is encrypted, but Apple has access to the iCloud data from its own servers, meaning it was able to hand it over to the FBI after shown a warrant. Whether you like it or not, by using iCloud, you agree to this possibility- though the data remains encrypted to keep it away from prying eyes, permission to view the data is given to Apple. No encryption was broken, nor any security bypassed to make this data available- Apple just has a backdoor to get to it, and they guard that backdoor with everything they have. But Apple does not have a backdoor for the information stored locally, and this is exactly what the FBI wanted Apple to create. The problem is that a backdoor to local information is not subject to warrant, and cannot be guarded if stolen from Apple or the government. Once it exists, it can be used to access millions of iOS devices in existence, all around the world, any time or place. Apple itself does not hold this privilege because there is no need to, and the mere existence of said backdoor is too risky.

If by this point you still are okay with weakening encryption and building backdoors, let us review. The following information is protected by cybersecurity methods the government employs but is attempting to weaken at the same time:

  • your SSN
  • all of your money in the bank
  • every stock market in the world
  • your location at any given time (if you use a smartphone)
  • all of your personal information that you would probably put through a shredder if it were on paper
  • top secret military documents
  • intellectual property from US businesses and individuals

 

My Twitter feed and me

I just went through my entire list of people I was following on Twitter and cleaned house. Even though my following count was only just over 100, I’ve begun to really dislike scrolling through my feed without having gained any useful knowledge. Did I really need to spend those few minutes looking at my feed? Did I gain anything of value besides finding a way to spend a few minutes (minimum) of my life?

In deciding who to unfollow, there were so many people that I thought were interesting but didn’t necessarily publish content on a day to day basis that I was interested in. There’s so much duplication of content that I need to sift through when scrolling through my feed. I love baseball and so I follow a lot of baseball related accounts, but that doesn’t necessarily mean that I want to see a majority of baseball content in my feed- it just means I want to see a variety of baseball content/sources in my feed. I ended up creating a baseball list devoted to baseball accounts that I followed. Now if I want to hear some baseball stuff, I’ll have to search the dark corners of Twitter to get to said baseball list; Seriously Twitter, why even have a Lists feature if it’s so hard to access?

The same is true for politics: I might follow a few Bernie Sanders related accounts but I don’t want to be bombarded with a Bernie tweet ever 15 minutes. I might just be following them because I think he’s an interesting guy and want some Bernie related content. Some people publish rich original content occasionally that I don’t want to miss out on, but I don’t want to have the added cost of having to sift through the remainder of what they publish.

Here’s my two cents on what I might like to see on Twitter instead of what it is today:

The most original content from each of the accounts I follow; If a person tweets once a week, show me what they’re saying. If they’re tweeting 10 times a day, don’t show me every tweet right off the bat. Try and predict which of their tweets, if any, will have the most value to me. I can always check to see every one of that person’s tweets if I wish to

A separate feed or sidebar that contains content either retweeted by people I follow, or recommended content based on who I follow and what I like. It’s interesting that some accounts with the most interesting content often retweet other interesting content, which I may also appreciate. The problem comes when 2 or 3 people are retweeting the same interesting content. Recognize when there’s duplication- only show me once.

 

 

Side Project: Mapping baseball teams ( Pt. 2 )

Today I released version 2 of The Baseball Map. I spent many hours of the past weekend rebuilding much of the site (a simple as it was) from the ground up. Building upon much of the feedback I received from my first endeavor on reddit, I fleshed a more useful interface. The map now provides a quick peek at the wikipedia entry for each stadium, a satellite view of the stadium, and users can now even view a sort of web visualization of where the minor league affiliates are located for each major league team. Last night I put the final touches on the site and pushed it live. It’s not perfect, but it ended up looking and working really well. After miraculously receiving a Product Hunt invite yesterday afternoon, I posted my project to the site and am currently in the ‘upcoming’ list with 5 upvotes, only 2 of which aren’t from myself or my friends. I’m pessimistic about getting featured on Product Hunt, but Reddit and Twitter have been easier allies.

As the first side project I’ve worked on in too long, I’m happy to have gotten something built. I’m excited to have been able to get valuable feedback from users and use it build something that they have told me they really enjoy. Even if just one person told me that, the last couple weeks of work would be worth it.

Side Project: Mapping baseball teams across the country

As a huge fan of baseball, I was recently surfing the web to try and find a map of all the baseball teams in my current area. I’m spending the summer in Lincoln, Nebraska and am aware that the Lincoln Salt Dogs play just a few blocks away from my apartment, but I had no idea (until working on this project) that another team plays about 30 minutes away in Omaha. This would not stand. I decided that baseball fans everywhere must have the most convenient access to locate nearby baseball teams, and the best way to do this would be to map every single baseball team in the United States. So I set out to build ‘The Baseball Map.’ I’ve spent the past week pulling team data manually from the web, which has been quite the job. The amount of copying and pasting location coordinates was too much to handle, so I only did a little bit each day after work. All of the data is currently stored in an array on the front end, but I’d like to implement a simple back end to hand user submission in the future. I made the website using the Google Maps Javascript API and spent most of today (July 3rd) styling it up and refining the interface. I decided early on that I wanted to get a lot of feedback and criticism from baseball fans. I built this just as a fun project, but I’ve really curious about what other fans think of what I’m building. Do they think it’s just a neat thing to look at for 30 seconds? Is there potential for them to enjoy using the site repetitively? I ventured to the reddit.com/r/baseball IRC channel for my first round of feedback, with a basic screenshot and a short description. I got a ‘looks good’ and ‘looks good from that one shot’, which was nice to hear but not the criticism I was looking for. The chatroom discussion quickly resumed with a ‘forgot to buy bread, fuuuuuck’. That’s what I get for going to a chatroom for feedback. I plan on posting the site to actual /r/baseball subreddit tomorrow.

Why reddit for feedback?

  • I’ve found reddit is very inviting to side projects, no matter where you post them, and you’ll usually get a decent number of responses within a few hours of posting. What I’m hoping to hear is what value people think this would have. If the answer is none, which it might be right now, I’d be interested to hear if any of the following would add more value for a baseball fan:
  • Expanding the data set to include college teams, foreign teams, etc.
  • Including a large number of historical locations- including stadiums, baseball landmarks, museums, etc
  • Allowing for user submission of these locations
  • Allowing for commenting on locations – people could share their memorable experiences at their favorite stadiums, or even provide tips for getting into batting practice early, getting autographs, etc.
  • Allow people to check off which stadiums they’ve been to. Many baseball fans take pride in having visited different stadiums.

Questions I still have:

  • Should the data focus more on the team or the stadium?
  • Are fans more interested in seeing nearby stadiums and learning about the history of past/present stadiums, or are they interested in seeing the teams of past/present? I figure this is something I should figure out right now before I put more of my time into adding more data.

Baseball fans love statistics, so what statistics could I gather about teams or stadiums that would be of interest to fans? This could be things like season W-L records for teams throughout history, or attendance records for stadiums throughout history. Or it could be as simple as tracking internal data, such as what teams have the most user submissions (should I implement a commenting system), or which teams have been clicked/viewed the most. This would create a sort of popularity contest.

I’ll post again when I release version 2. For now you can check out the project at thebaseballmap.com