I have had this infographic bookmarked for weeks and I can't stop looking at it.
Last month, the New Yorker took some of the simplest data
and turned it into something quite illuminating. Using data on median
household income from the US Census Bureau, the New Yorker created an
interactive infographic that proves that NYC has a serious problem with
income inequality. With the ability to browse individual subway lines,
you can see the range of earnings throughout the city. In crunching data from the census, the New Yorker discovered
that if the borough of Manhattan were a country, "the income gap between
the
richest twenty per cent and the poorest twenty per cent would be on par
with countries like Sierra Leone, Namibia, and Lesotho." Yes, this
sounds shocking. But is it? Clicking through all of the
different subway lines, the shifts are pretty intense, but I feel like I could have drawn those squiggles myself.
So what does this all prove? Certainly the infographic exemplifies the polarizing nature of the city, but it also proves the equalizing nature of the subway. In most US cities, the class divide is clear; take public transportation in Los Angeles or Miami and you are labeled poor, illegal or a hipster. In New York? Try telling someone you drive a car and prepare to be judged.
While the data does not exactly prove this extended observation, it is interesting to think about. The subway line with the greatest stretch is the 2 train, with its lowest point hitting $13,750 at E. 180th St. in the Bronx, and its highest peaking at $205,192 at both Park Place and Chambers St. in lower Manhattan. These gaps may be incredible, but they actually appear in relatively close quarters. Just look at the 4/5 lines that go from $104,514 at 86th St. to $15,625 at 125th St. in just one stop, and in the same borough.
Maybe the haves can avoid passing through the areas of the have nots, and maybe if the subways stretched out further into the outer boroughs, the overall average would dip dramatically, but that's income, and we are here to talk about transportation. Riders at the top and the bottom ride together; everyone avoids eye contact with their fellow straphangers and no one wants to sit within ten feet of the sleeping homeless man, no matter what dot you live at. That's the thing about the NYC subway - whether you like it or not - once you swipe, you're just like everybody else.
A recent Community Board 2 meeting proved that bikeshare is not for everyone. Sorry for the convenience, residents of SoHo, NoHo, and the Village. But for those of us who are upset that we (I) have to walk a whole block and a half to get to the nearest Citibike station, there's good news! None of the program's bikes are actually installed, but there is already an app for it!
Citibike is going to be the nation's largest bikeshare program. Hoping to feed off of the success from other cities like Washington, DC and Miami, the 330 docking stations throughout Manhattan and Brooklyn will be up and running before you know it. While thousands have signed up for the program, not a single bike has been installed. Tom Claes of the Belgian mobile app company WebComrades knows that and has released the first bikeshare app anyway. His app, that will be available for iPhone and Android users, will plot users on a map and alert them to the nearest docking station. Based on the version he created for the city of Antwerp, which is one of the world's most bike-friendly cities, the app will offer a bit more than Citibike's official app, which is already available for download and will begin working when the program actually launches.
Perhaps the coolest function of Claes' app is that users will be able to save their favorite stations and have the app tell them which are full and which are empty. This feature will alleviate the fear of many New Yorkers who have yet to sign up, and with the program already attracting a younger and more tech-savvy group of customers, this app is likely to be very popular.
So how did Claes make this app, you ask? According to Transportation Nation, "Using some clever data-scraping of the bare bones information on the CitiBike NYC website, Claes pulls in the information about the docking stations and saves it to a constantly updating WebComrades server." Apparently he has also been in touch with CitiBike and plans to use their officially released data once it is made available. As we patiently await the bikes to be installed, we can be grateful for the city's commitment to open data, and look forward to more, better, smarter apps that will help us navigate our newly shared streets.
I remember the day Uber came to town last fall. NYU's Wagner School of Public Service has its home in the Puck Building on the southeast corner of Houston and Lafayette Streets. Across Lafayette is a rare sight: a gas station in the middle of Manhattan. Every day around shift-change time, it's inundated with yellow cabs. On this day last fall, I saw young people signing up cabbies for Uber, showing them how to download the app on their smartphones and register. Best of all, Uber was picking up the tab for this first round of e-hailed NYC cabs!
Before I had a chance to use it, UberTaxi had vanished, reports on the subject are not extremely clear on why they left, but it seems the TLC either misled them or intentionally made their operation more difficult, leading them to put ther energy into "more innovation-friendly cities". This was a low blow to the NYC government, but raised a few eyebrows.
A few months later, the TLC announced an e-hail Pilot Program, basically saying you can't just roll into town with your technology... you have to play by our rules. Uber and Hail-O obliged, and the Pilot was set to begin at the end of April. The livery companies who operate black cars around the city (the only ones legally allowed to hav prearranged rides) sued the TLC in short order, saying that the new program would cut into their profits.
A judge dismissed the case on April 22nd, and the city collectively cheered online. Uber launched it's Pilot on Tuesday, April 30th. A day later on May 1st, an appellate judge issued a restraining order halting the pilot program until it can be reviewed by a panel of judges later in May. (To summarize, it was legal and possible to e-hail a cab for just under 24 hours)
The best part is the statement by the lawyer representing the livery company lobby: “From the day the TLC announced the ‘e-hail’ pilot we have been portrayed as anti-technology obstructionists," said the pair in a joint statement. "That has never been the case. This ‘pilot’ program silenced the voices of more than 35,000 livery, black car and luxury limousine operators and drivers; it ignored the City Charter and the Administrative Code. We went to court to fight not only for our industry but for our system of checks and balances. We are pleased and gratified the appellate court will allow us our day in court.”
They're claiming a righteous crusade in the name of government checks and balances. I think "antitechnology obstructionists" actually fits pretty well. There's always someone who will lsoe out when the government makes a regulatory change. The people want e-hailing, and they're going to get it one way or the other. I think the livery companies should be coming up with a better way to coexist or adapt instead of fighting tooth and nail to maintain the status quo.
This brings us to the coloful diagram I've included above. It's a concept for an open source, universal ehailing platform. The idea is similar to Open311, a standardized platform for electronically submitting and tracking civic service requests. The protocol for Open311 is universal, so if someone writes an app for submitting reports in one city, in theory it can work anywhere that supports the Open311 protocol with only minor changes.
Similarly, an open e-hailing system could accept hails from a variety of sources. It's not the city's proprietary app, or a privately owned closed system like Uber or Hailo (Uber only works if the hailer and the cab are both using Uber. I'm not sure if the TLC has adressed this, or if cabbies who want to use Uber and Hailo are juggling two smart devices plus their meter and their GPS)
Requests could come from any OpenTaxi compatible app, and simple hail functionality could be built into any software application. Why stop there? We can make hardware devices for ehailing as well. As illustrated in the diagram, some sort of hailing button could be easily installed at concierge desks, payphones, kiosks, and lampposts. No smartphone? No problem! The devices would connect to the wireless data network and send an e-hail in the exact same way a smartphone app would, but without the phone.
We could also implement metering and electornic payments through OpenTaxi. There's much more to discuss, but this diagram lays out the basic premise. If many cities adopt a standardized plaform, apps and hardware that support it become universal as well. What do you think? Could it work?
Here's the Vimeo version.
I set out several months ago to visualize the MTA's turnstile dataset. It's updated weekly and resides here. The animation you see here was made in processing, but there were numerous steps required to prep the data into a format that could be pulled in. I've met with lots of people at various civic tech events over the past few months who have lamented about how hard to consume this dataset is, and I'm pleased that a little bit of scripting and elbow grease (finger grease, really, as in mouse clicking) has resulted in usable data. Observe:
Above are the first two of about 23,000 lines that constitute one week of turnstile data. In case you were wondering, both of these lines contain data for the same turnstile, but the first runs from midnight on 4/13 to 4:00am on 4/14. The second line goes from 8:00am on 4/14 to 12:00pm on 4/15.
So, each line consists fo three columns of identifying data, and then a sequence of columns with a timestamp,type of report, entry count, and exit count, which repeats 8 times! Best of all, it gives us running totals for each turnstile instead of just a number of entries or exits, so to get anything useful out of it, you need to do some subtraction in excel. Simply subtract your entry tally for one timestamp from the previous reading, which might be 5 columns to the left, or possibly on the previous line somewhere near the end. Easy peasy. To make things more complicated, not every turnstile has readings at 4 hours intervals, and some that do stick to 4 hours are slightly offset, going from 11:00 pm to 3:00 am, for example.
The first step was to write a Ruby script that would split these verbose lines into individually manageable parts. The script is available on github, and the results look like this:
Sure, there are now 291,000 lines, but now that we have each individual reading its own line, we can sort by unique ID, do some math with the previous line to get a solid number.
So, we've conquored the format challenges, but now we have a geocoding problem. But wait, doesn't the GTFS data contain stops.txt, which contains a station identifier and a latitude and longitude? All we need to do is a join or a vlookup to assignn lats and longs to this dataset, right? Unfortunately, it's not that simple. The turnstile dataset's unique id for a station is called the Control Unit (Column 2 if you're interested), and has nothing to do with the station_id field in the GTFS data.
Yesterday, a friend who is just as passionately nerdy about subways as me assisted with the very manual process of grabbing latitudes and longitudes for the 700+ lines in the MTA's key for the Control Units. This task was made slightly more difficult by me not being very familiar with the system outside of Manhattan, and the fact that many stations can have the exact same name and be located miles apart on different lines. I digress. We go through it, I performed my vlookup, and moved the data into processing.
My vision for the video was to simulate the actual movements of people by animating the dots moving in and out of stations. The sketch grabs each line and displays it on its own, so there is no aggregating of data by station and trying to make sense of the nonstandard intervals. Since most of the data exists on 4 hours intervals, there were visible waves of activity. I got around this by offsetting the start and end times for each trip slightly, so that they did not take up the full time for their interval. For example, if a turnstile logged 200 entries for a 4 hour period, 4 dots would be drawn, but the start and end times of their movement would be staggered to blend the activity into the next time period.
Mission accomplished. Even at HD resolutions, it is still difficult to capture the entire NYC region and still see the detail I'd like to. Several people have asked for zoomed-in versions, and I will work on them in the coming weeks.
The ban has been lifted! Soon New Yorkers will be able to e-hail cabs right from their smart phones. Those who sought to block e-hailing, including livery car drivers everywhere, could not stifle the initiative any longer. In fact, in her ruling, Supreme Court Justice Carol Huff explained, "that an e-hail system might eventually be permanently
implemented because the program proved to be popular, effective and
lawful is not a valid argument against it."
Amen! So while this is exciting news for New Yorkers and for the transportation industry as a whole, Chris and I cannot help but wonder - what role will Open Data play in this initiative? Stay tuned to find out.
A few days ago, online news sites such as mashable, techcrunch and gizmodo were buzzing about a new feature to the popular Transit App HopStop. Named “HopStop Live!”, the new feature is touted as a “waze for public transit”, or feedback system where users can alerts about a particular line, station, or Agency. Users have the ability to snap a photo and tie it to their post, and add tags to make sure their alerts show up in the right place. Just like with instagram, you can even multi-cast your alert to facebook and twitter with the flip of an iphone slider.
I took HopStop live for a spin this afternoon to see just what people were writing about transit in NYC. Bear in mind, many other transit apps aggregate social media that matches a transit line based on keywords or hash tags, but these entries on HopStop live were deliberately entered for the rest of HopStop’s users to see. I looked at a few lines I ride often on the NYC subway. I looked at the MTA’s feed, the Staten Island Ferry… hey, the Roosevelt Island cable car is in here? The Peter Pan bus and the IKEA shuttle are too? It’s interesting to see just how many agencies/companies there are operating transit in the region, and easy to forget that it’s more than just subways that move us.
After surfing around in this pseudo-twittersphere of transit, I felt like I was simply reading gripes. Nobody loves their commute. They are probably already pissed off… who is in a good mood on the subway platform surrounded by strangers and usually in a rush (besides fellow transitophiles who soak in every second of ridership as a visceral NYC experience.. rats, bums and all.)? But there seem to be plenty of good mixed in with the bad:
“The 3 train is ALWAYS slow as hell!!!!!!!!”
“It always takes AGES to come. The 4,5 AND even the slow #2 trains come by sooner and with greater regularity than the 3 ever does”
“Always tends to be clean”
“THIS IS A TRAIN THAT IS ALWAYS RIGHT THERE OR IS COMING. CUDOS TO WHO EVERY MAKES IT HAPPEN”
“It is pretty clean… The 3 train at least”
“When riding the 3, the ride is frequently jarring and feels as if you are on an old, wooden roller-coaster (and not in a good way)”
So, from this brief sample at the top of the list, we have 2 reports of slowness, 2 reports of cleanliness, and one report of a shaky ride, and one ALL CAPS ENTHUSIASTIC REPORT THAT IT’S ALWAYS RIGHT THERE OR COMING. What should we think? Are these really the real-time crowdsourced alerts that are meant to warn users of delayed trains and revolutionize transit riding where we still don’t have real-time data? I’m sure the real-world service alerts are mixed in there somewhere, but it seems logical that they would be the minority of posts, drowned out by people’s generalized whining about dirty platforms, rats, and panhandlers.
I’ll argue that there are a few value propositions for this sort of system. The first is it’s stated claim to let the apps users alert fellow straphangers of an accident elsewhere in the system. The MTA and other agencies are quick to get this kind of information out via social media and their machine readable service advisory feeds (which many apps make use of), but if HopStop’s users can get the word out faster, saving people precious time to make alternative arrangements, I think that’s a huge step forward. Furthermore, the feeds are segregated so you only read about the line/station you’re interested in. Where it might be difficult to filter the signal from the noise in the MTA’s system-wide twitter feed, organized feeds like the one in HopStop have the potential to show you only the information you want.
Another huge potential in this kind of service is that it might just get people buzzing about issues and lead to change. This kind of system is ripe for big data sentiment analysis, and it’s only a matter of time before we start reading about the atrocious reviews a particular station or line receives in news articles. Will that be enough to turn the heads of policymakers and transit administrators? Will they even have any incentive to respond to a long list of crowdsourced bad reviews? It’s like yelp for transit lines… but who owns the 1 train? Who is accountable for a specific station? I predict that some community organizer or journalist will soon be using this data to make a case for change.
Lastly, it’s kind of fun. Just like reading comments on youTube, people say funny things when they can rant and remain anonymous. The system is not without its trolls, such as this lone alert filed under the Peter Pan bus:
Seeing what others have written recently about the train you’re riding makes the whole transit-riding experience just a bit more personal. It’s like eagerly watching the twitter backchannel at an event. You don’t know who these people are, but you’re damn curious what they’re saying.
HopStop Live! has been around all of a few days, but already appears to be well-used, at least in the NYC Subway. I am sure the developers of the app will fine-tune the service to provide the maximum benefit to riders, and we’ll start to see systems like this representing the zeitgeist of the modern straphanger.
Betamore hosted a transportation hackathon in Baltimore this weekend by the name of Reinvent Transit. Like any good displaced Baltimorean who is also a transportation-focused urban planning student with programming skills, I hopped on a Bolt Bus and headed down.
My team and I worked on a data visualization in Processing that I’ve wanted to work on for several months: Real-time bus locations from the Charm City Circulator. We took the idea a step further and thought it would be great to overlay it on top of Google Traffic data, to really show when and where the buses were fighting with traffic.
The circulator does not have a set schedule. Rather, it keeps its routes short and attempts to maintain small headways, meaning that there should always be a bus along in a few minutes… should. Traffic does not cooperate with this well-intentioned plan, and over time, buses that were once evenly spaced along a route will end up rolling around in pairs, or even triples. Big events make it worse, so the O’s game on Saturday evening was the perfect traffic/bus demand-generating event for our data collection.
To log the circulator data, we made use of their easily consumable NextBus-powered XML data feed. It’s as easy to load as any website and is available here. We wrote a ruby script that polled the feed every 15 seconds, parsed out the vehicle id, route, latitude, and longitude, and dumped them into a CSV along with a timestamp.
To capture Google traffic data, we went with the medium-tech approach. We first created the simplest of web maps, a google map centered on Baltimore with the appropriate zoom level and the traffic layer turned on. We then wrote a ruby script that opened the website every 45 seconds in firefox, took a screenshot, then saved it with a timestamp as the file name.
We mashed all of this data together in processing, and are pleased with the results. We took the Best Data Visualization prize for the weekend. Open Data FTW!
I heard lots of anecdotes this weekend about how the MTA buses have multiple GPS receivers in them and multiple over-the-air data connections, each serving different systems, installed by different contractors who don’t play well with one another, and are 100% NOT available for the public to use. I’ve heard similar stories about the Light Rail. Transit agencies, if you don’t have an easily-consumable public-facing real-time data feed for all of your vehicles, you’re doing it wrong. If you have to pay a corporation millions and millions to implement it, you’re doing it wrong.
With the help of OpenPlans, the New York MTA built BusTime using Commercial-Off-The-Shelf hardware and open-source software, and they linked in their tracking computer with other on-board data systems to they could share precious space and bandwidth. Bus riders can scan a QR code or text a bus stop code and get instant real-time arrival information (Smartphone not required). App developers can query based on stop or vehicle and get people the information they want, when they want it. Figure it out already, it’s 2013.
Last year, the MTA took an
impressive leap when it opened up its real-time arrival information data for seven
of the city’s subway lines. Last week, Google Maps updated its online and mobile apps to include that data.
In a partnership between
Google Maps and the MTA, straphangers of the numbered subway lines and the
Times Square shuttle can now view live departure times from their mobile apps
and plan their trips with a little more ease. Google Maps is also partnering
with transit systems in cities like Salt Lake City and Washington, DC, but from
the numbers alone, this collaboration is bound to have the largest impact in
NYC. Especially for the stragglers trying to get home at 3am, for whom even if the next train is 27 minutes away, just knowing that can make the wait a whole lot more tolerable.
Google and the MTA are not
the only ones doing the collaboration dance. The MTA has already been taking
serious strides to make its data available to developers who have already
created apps like Roadify, SchedNYC and the MTA’s very own Subway Time. The mission
to get riders from A to B more quickly and easily is a group effort, as more
data is released and more people and agencies take advantage of it.
Among the many mysteries
this city affords us, trip planning is no longer one of them. I doubt the
upgrade will encourage non-subway users to start using public transportation,
but it will certainly make existing riders a lot happier. Unfortunately due
to older technologies, the remaining subway lines are probably many years and
many dollars away from getting that same upgrade, but here’s to a valiant
start!
We recently attended the NY Open Transportation Meetup and
got to see a great presentation by Mike Frumin with MTA’s BusTime program. In my earlier post, I ranted about how tough
it is to consume GTFS-realtime feeds as a novice programmer. BusTime has done it right, making the data
available in so many possible ways from so many possible technologies. Best of all, the platform was built using
open-source technologies and COTS (Commercial, Off the Shelf) equipment, so it
wasn’t some giant contract that would have taken a decade to implement and cost
taxpayers too much. BusTime isn’t simply
a case of opening data, it’s the city’s bottom-up effort to produce that data
and make it open from day 1. It’s almost
the opposite of what we’ve come to expect from government.
You don’t need a $300 phone with a huge dataplan to access
BusTime. They do have a web app that
works great on smartphones, but plain old text messaging works too. You simply text the stop code to BusTime, and
the server responds with information on when the next bus will arrive. It’s a
lower-tech accessible breach in the digital divide, and nobody had to fight for
it… they built it in from the beginning.
BusTime’s API is the best part of the system, which makes
the real-time bus data available to programmers for inclusion in apps,
websites, and transit display boards. Real
time data can be called relative to a specific stop, vehicle, or route, and the
API will return a wealth of data including latitude, longitude, bearing,
distance traveled, distance from a stop in question, etc. If only every transit agency were so
forthcoming with real-time data. They
can if they want to, as all of the technology behind Bus Time is open source,
publicly available, and free for any developer or transit agency to use. How far we’ve come! Unfortunately the subway side of the house is not following the same practices, but that's another blog post.
More about the Tech behind MTA’s BusTime can be found here.
For all of the geeks in the room, here's a screenshot of XML presented by the BusTime API. Look at all that freely accessible data!
A couple of weeks ago, Chris and I attended the NY Open Transportation Meetup "ALL aboard! The reboot, a presentation of MTA's Bus Time & NYC DOT data" co-hosted by NYU's very own Sarah Kaufman. Not only were we greeted with free pizza, we were also given the opportunity to interact with some of the most interesting and innovative people in the transportation field.
First we heard from Neil Freeman of the Department of Transportation, about the agency's strategic communications initiatives and the current state and future of open DOT data. We also heard about how the DOT is undertaking the important initiative of performance measurement. Having completed Wagner's Performance Measurement and Management course, I am familiar with the challenges public agencies face in measuring their success. As the DOT seeks to provide New Yorkers with safe and efficient public space, measuring safety, or even perceptions of safety can be quite difficult. For an agency as large as the DOT, it is good to know that they have a talented and creative team taking the reigns.
We also heard from Mike Frumin of the MTA who, while he couldn't go into too much detail about the latest Bust Time API updates, was able to get us excited for what is to come. I am very much looking forward to seeing how developers will utilize open data so that I can know exactly when my bus is coming, and spend less time outside in the cold.
All in all, the Meetup was entertaining, informative and delicious. A relative newbie to all of this open data talk, I was impressed with the MTA and DOT's policies that seem to be truly embracing developors and open data to create better and more efficient ways for New Yorkers to move around, and even enjoy it!