text copied from the hackpad
Political Pen Pals
I’m writing letters to get out the vote! And so are others! Voter files and letters and stamps, oh my!
Whenever someone sends Aaron junk mail, he sends them a reply asking them not to send him junk mail, and sometimes it works!
But could there be a better way to spend this time? He got the voter file from the DC board of elections, and started thinking about writing to people encouraging them to vote.
Getting voter file from Wisconsin can be a little expensive, and the party data isn’t really available. So he decided to target New York state senate district five, which could be a good swing district, and he could be a pen pal to voters there.
But all of this already exists, and Aaron didn’t have to build it himself! (Vote Forward) But he did end up crunching some data to figure out who to target.
Voter turnout improved by about 3% for people who received the Vote Forward letters
Aaron wrote letters saying: Happy Birthday! You should vote, here’s why it’s important.
1200 people in the experiment group (people who have birthdays between now and November), 4000 in the control (people who do not have birthdays between now and November). Send letters to likely voters!
Vote Forward is targeting people who are more likely to Vote; Aaron is targeting people who are much less likely to vote. Someone who has voted at some point in their lives, but not in the last few years. Specifically, lots of voters who haven’t voted since Obama was on the ballot.
Vote Forward makes sending letters really easy, and allows you to use a return address so you don’t have to use your own address. But you provide the stamps, paper, and envelopes.
Last.fm tracks every song you’ve played across Spotify, Apple, Google Play, etc.
On any individual day he wanted to be able to see what he listened to. He’s been doing that for seven years – and last.fm has lots of data visualizations. But what was he listening to on July 7th, 2015?
It was a pretty emotional experience to be able to go back and look through and visualize all of the different songs he’s listened to. But what are the outliers, what are the bands he forgot about?
This ends up being representative of when he’s playing music in the background.
May 31, 2012 - 150 songs played
He listened to a lot less music since starting a full-time job, and when really busy, listens to a lot less music. In the really busy streaks, there’s a lot less
Last.fm has an API where you can pull down your listening history, including tags.
Tags aren’t always super accurate for individual songs, artists, or genres, but they can
There’s about 60,000 song plays, so he used WebGL in order to render the visualization.
Took about 3 hours to pull down the data file.
Songs are grouped by genre, artist. Songs are color-coded by genre by each day
Wanted to make childrens stories based on public domain stories using Project Gutenberg, specifically “Grimm’s Fairy” and “Hans Christian Andersen Fairy Tales”
Could he use a neural network to generate some stories? Using Long Short term memory to predict the next character needed to continue the story, in chunks of 50 characters at a time.
First few iterations ended up repeating itself a bit too much! (1 hidden layer, 500 nodes)
Ten trainings later, we get some real grammar (mostly) and punctuation – and it really loves stork-mammas.
20th training, it was getting much better.
Word to word or morpheme to morpheme instead of character by character
Using a Deep Dream generator to make pictures and generate artwork from artwork to add to these stories.
Could also play with the temperature and seed to adjust things further.
Predicting flight delays with tail number as unique identifier. Using Selenium for web-scraping aircraft info, vectorizing time with Cyclical 24 hours clock shape.
Can we predict flight delays?
Aircraft make/model might be a useful
On-time performance dataset and a flightradar24.com on time flight data
The tail number tells you the make/model/airliner/age of the aircraft, and you can get the flight history from flightradar24.com
Pulling this whole dataset and limiting it down to just the Washington DC area, we could see 2.2
The tail number isn’t perfect – there were lots of Unknown aircrafts.
Which airline is the best / worst for servicing the DC area?
Defining a delay as being by more than 15 minutes, Alaska was most on-time and American Airlines was the worst. But American Airlines has a hub in DCA; United has a hub in IAD; Delta doesn’t have a hub in DC, so maybe the lower volume of flights means they have fewer delays to begin with?
Vectorized time by a 24-hour clock; there are pretty much no departures between 3 am and 5 am.
Next steps: find a more accurate data source or clean it in tail number.
Most common reason for delay was a late arrival from another aircraft, not weather.
Used logistic regression and random forest as different models.
Used dummy variables for day of week, carrier, departure airport, arrival airport, aircraft type and aircraft maker
Saturday/Sunday have lower expected delays, but Thursday and Friday have more delays in both departure and arrival.
Used GridSearchCV and SMOTE as tools
An interactive ClojureScript demonstration that shows how to use linear algebra, and live coding to build a fun web application, all within a web page.
What can you do with Clojure (a dialect of Lisp)?
ClojureScript allows you to run Clojure in your browser and evaluate it all automatically.
Paula created a series of matrices and rotated them using Clojure, and after a series of introductory transformations, she ended up re-creating Conway’s Game of Life within a webpage, in Clojure.
So then let’s draw Conway’s Game of Life with a SVG, and we can watch how the Conway’s Game of Life come to life.
The map is a torus, so the top wraps around to the bottom.
Many links! Inspiration
Montgomery county Ride On bus offers a map, but the map doesn’t say which routes are running on weekends. I imported their GTFS files and created a map for each service day, so that I can see where I can or cannot go on weekends.
What bus routes are running today? He likes to explore the region by bus, but he’s only available to explore on weekends, and some routes aren’t running on weekends.
The official Ride On system map doesn’t tell you which routes are available on the weekends.
Google maps asks where you are going, which isn’t conducive to exploring
There are a lot more buses available on the weekdays than on the weekends, both in density and in area covered.
Trip times are important, too. What bus routes are running today? How many trips will each bus make?
Ride On publishes bus schedules in GTFS format, which is a ZIP of CSV files.
In the GTFS files are a list of bus routes, bus stops with GPS coordinates, a calendar of “services”, and a list of trips (a bus serving a route from start to end)
So he created the map by importing the GTFS data into a SQLite database, then writing some queries to find out what services are available.
He wants to automate the GTFS updates, make it multi-agency
Where can I reach within 75 minutes, and what’s the latest time he has to head back?
Calculating the Boxiness of Fonts
The categories on Google Fonts aren’t very specific. Serif, Sans Serif, Display, and Handwriting are helpful, but what other ways are there to categorize a font?
Inspired by polar histograms of city street orientations of US cities, could he apply this to fonts too?
Each glyph is a set of coordinates, with contours
Uses the python library fonttools and calculated the paths, then used matplotlib to plot all of the strokes. Then we can see how many horizontal strokes are there, how many diagonal strokes are there, etc. And we can apply that to a polar histogram for each stroke and each letter for each font.
Fonts like Futura have a lot of sharp, straight lines. Roboto has lots of horizontal and vertical lines; other fonts have more random distributions of their strokes.
What’s next? Quantifying this data in order to determine how boxy a given font is.
Is a font dominated by horizontal and vertical fonts? Then it’s pretty boxy.
The italics does change it up a little bit depending on the font, but some fonts are still pretty boxy even as italics.
Handwriting/script fonts have a lot less boxiness; the polar histograms use only the outline and take directionality into account.
Travis loves doing unconventional things with neural networks, so can we sync the hallucinations of a neural network to music?
It’s deep, and all of the little layers see things. Image comes in, prediction comes out, and it visualizes the results. So we can ask the layers, what do they see?
Tensorflow’s lucid library allows you to interpret what the network is actually seeing
Compositional pattern-producing network
Your eye is designed to see fine-grained shapes and patterns.
But these layers are designed to see patterns.
Synced it to music using librosa, and finding the beats onsets (kicks and hi-hats), then the images change between the beats.