I have been interested in open public data as long as I've known about it. Helsinki Region Transport, the public transport here, has shared the API to their Journey Planner tool. It has inspired many developers to create helpful services on top of it. So I decided to try it out myself. I wanted to see if I could create something helpful and new.
I am interested in infographics and visualizations. So, I have seen these kinds of things done before, but not that much in Finland. Therefore, I took it upon myself to fix that.
While I was looking for my new home, I noticed that getting from place A to B with public transportation was not easy to estimate based on the distance seen on the map. Some bus routes, train or subway would change the travel time to work drastically. So, seeing what neighborhoods are good commuting was tricky. Basically I had to have a hunch about an area and check the Journey Planner to see how fast one could get from there to work.
Having an overlay on top of the map to show transit times would help seeing potential areas on where to buy a new home.
Getting the data
I started by getting the list of all public transportation bus, tram, train, subway and ferry -stops in Helsinki area. I scanned all the stops in a 5km2 sized regions near the city with
curl. Then I removed all the stop data, duplicate coordinates and extra line breaks leaving only coordinates of the stops. I'm not sure why the data contained stops with different names for the same coordinates, but since I was mostly interested on the coordinates them self it did not matter much. I then used Google Fusion Tables to visualize them on the map to get an idea what's going on.
There was 7093 stops in my list. As I understood that getting from place A to B between all the stops would take forever to download, and I'd cap the API limits if I'd try to keep it totally dynamic. Something had to be fixed for this test run. So I decided to fix the destination position to the main railway station. It's as good place as any and most people can estimate how fast they can get from there somewhere.
I then arranged the coordinates into sets of 1500 queries per batch to keep them within the API limitations. I wrote a
curl command to download the route details on a specific Wednesday morning. So each of the 1500 public transit stops would tell how long it took to get to the central place in Helsinki. And then on the next day the next 1500 stops would calculate the same details for that same Wednesday again.
I ran the queries for each route at night to cause as little problems as possible to the awesome services running on the same API. This took 5 nights to run and as a result I got 7093 files with detailed route data.
So far I had used mostly
curl and bash scripts to get the data how I wanted. Now was time to do something with the data, so I started using Node.js for the first time. I was not sure if it would be the correct choice for the task, but I could try.
The data was in JSON format, loads of it. The Node script went trough all the files, searched for the shortest route from that stop (five routes were returned per stop) and calculated the average time for the route. It then stored the coordinates and these two times into an JSON file.
I was happily surprised how fast it went trough the data. Granted, it was an simple task, but it was fast nonetheless.
Interpreting the data
Getting from the JSON file with coordinates to something pretty and visual proved to be the hardest part of the whole ordeal. As I was new to GEO stuff, I had plenty to learn. So far I had worked with one line scripts and documented each to an txt file. I started using object oriented approach from this point onwards and it proved to be a good choice.
I experimented on bunch of different ways to draw the heatmap, ending up with imagemagick. I wanted to write the heatmap data to an image somehow.
I used a radial gradient as a starting image, where brightest white was the center of the stop and each darker tone would be one minute of travel time by foot. I stayed within the greyscale, as it was simpler. That limited me to 255 minutes of travel time, plenty to cover a large area in Helsinki. I also limited the length of the walking to be 500 meters max to keep the calculations shorter.
These walking circles would then be darkened based on how long it takes to the central railway station from that stop. So at the railway station the stop color at the center would be brightest white, and at a stop one minute from there it would be a little bit darker and so on. When these walking circles would be added together, they'd only change the image if the resulting dot would be brighter than it was before as can be seen from my initial test with four stops.
At first, I tried generating a huge canvas, drawing each of the stops there with their "walking circles". That turned out to be really really slow.
Rendered distances (first run)
From that I did however get an initial image to start messing with in Photoshop and you can see more about that later. After I knew that I could continue with this approach, I really had to make the code work faster. I tried a bunch of things, but something a bit unintuitive was the solution.
Instead of drawing one big canvas and then plotting the stops with their "walking circles" once each was slow. Instead drawing the map in smaller slices, then plotting the stops on that area and all around that area for each slice was way faster. So doing the drawing many times on a smaller area was faster than doing them once in a bigger area. go figure.
I tried a bunch of different visualizations on top of the first version of the rendered map.
Photoshop mockup of final heatmap #1.
At first I wanted to have highly visible steps between different travel times as seen in image 1.
Photoshop mockup of final heatmap #2.
I also wanted to see if a real heatmap would work. It was pretty, as image 2 tells.
Photoshop mockup of final heatmap #3 showing from how far you can reach the railway station in 30 minutes.
Then I had an idea about a harsh separation between a certain time and the rest like in image 3. I'll get back to this a bit later.
Photoshop mockup of final heatmap #4.
I also tried a bit crazy version with the rainbow. :)
Building the interactive version
Once I had chosen the visualization to be something like in picture number 3, I had to figure out a way to show it nicely. I knew I had to do some image manipulation on the client side and since flash was familiar, I chose it. I found a small library to handle the map business, and it supported adding a new datalayer on top of Google Maps.
I rendered the heatmap tiles in grayscale for 12 different zoom levels over Helsinki. I stored them with proper names and wrote a quick slider component to zoom in and out. (I did not make it pretty, that was not the point.)
The flash object loads grayscale tiles based on what's seen on the screen and creates a bitmap object with transparency based on what the user has chosen on the slider.
This is how far I got with the project. I was able to do it, and it worked. It was way slower than I thought and the next step where user would be able to point the area of interest them self would be way harder. I'd have to write my own algorithms to calculate the bus routes troughout the city to bypass the 5 day rendering limit that I had with my data. And frankly I was not interested in doing all that work. So far I am happy where this project is at.