Validating Availtec’s Bus Arrival Estimates

In my last post, I created a dashboard that shows the estimated arrival time of buses at specific stops. After using it, I was curious to see how accurate those estimates are.

So I have created a new project:

availtec-estimate-validation

As always, everything is opensource and available on my github page.

This project consists of two parts:

  1. A capture script that queries Availtec’s API every 30 seconds for the estimated arrival time and stores them in a local database
  2. A graph script that generates pretty visualizations from the captured data.

Here is an example of the Route 44 Inbound at St. Vincents Hospital for Birmingham, AL’s transit system.

Route 44 Estimates

Each “block” is a bus trip throughout the day. The dark black line represents the actual time the bus arrived at this stop. The dark blue line shows when Availtec estimated the bus would arrive at that specific time.

Ideally, the dark blue line would be as close as possible to the dark black line.

From the image above, we can see that the 4th trip, from 10:44:17 to 10:46:17 had the best estimate. From 10:00am to 10:50am, Availtec estimated the bus would arrive ±30 seconds from when it actually did.

We can also see that the trip after it, had the worst estimate. From the time the trip started, just before 10:50, until around 11:20, Availtec was estimating the bus would arrive at 11:32, when it actually came at 11:38. It wasn’t until about 11:25, that the system suddenly corrected and started estimating a more realistic time.

This project works with any bus system that utilizes Availtec for real time tracking. You can leave the capturer program running in the background for days, then generate graphs based on that historical data. The graphing script allows you to specify which stops, routes, and days to plot.

API For Historical Bus Data

My previous blog post (Visualizing the On Time Performance of Birmingham’s Bus System) looked at creating visualizations based on the real time data provided for Birmingham, Alabama’s public transit system. I have now created a simple, public API that can be used to query the historical data captured from the real time system.

Using my dv8 scripts, I am capturing a snapshot of the real time data every 30 seconds. I am then storing this data in a sql database, so that we will have a historical database. This allows for generating reports and visualizations over time. As of writing this, I have about 2.5 months of data captured.

I am making all of this data public, along with a simple API for making queries. The API is located at:

https://api.dv8.line72.net/

 

However, if you visit that site, you will notice that you get a ‘Page Not Found’ error. That is because it is only a backend API, and you will need to construct URLS to make queries. Let’s look at a few sample queries (All data is returned in JSON format. I have found that making these queries in Firefox is preferred over Chrome, since Firefox will nicely format the results):

Query for all the routes:

https://api.dv8.line72.net/routes

This returned a list of all the known routes and their properties. Each route also has a set of “Trips”. A trip is a unique instance of a bus running along a route. For example, if everyday at 4.00pm a bus leaves Central Station along the route 44 and arrives at its final destination at 5.00pm, this would be a Trip. We can get a list of all the trips for a specific route.

Query for all the trips on the Route 44

Based on the previous query, I have found that the “44 Montclair” has an id of 21.

https://api.dv8.line72.net/route/21/trips

This returned all 61 trips associated with the route 44. Unfortunately, this doesn’t give us that much information about the individual trip. For instance, we have no idea based on the returned results what time of day a trip runs. For every trip, we can get a series of “Waypoints”. Waypoints are all the known information about a bus on a trip at a given time. They are essentially a snapshot, showing the location, passenger count, time deviation, and various other details. We can pick a specific trip and query it.

Query all Waypoints for a single trip on the Route 44

I am going to select a trip at random from the previous query. I will use 699, and will query its waypoints:

https://api.dv8.line72.net/route/21/trip/699/waypoints

This returned A LOT of data. This trip runs every weekday, and we have samples at every 30 seconds. The API returns a maximum of 1000 waypoints, so only the first part of the data was returned. Typically, we want to filter this data, and only get a specific range. For example, let’s only get the waypoints on a specific date. We can use the start_date and end_date parameters to put limits:

Filter the waypoints for a specific date:

https://api.dv8.line72.net/route/21/trip/699/waypoints?start_date=2017-09-05&end_date=2017-09-06

On this date, this trip was an Outbound (0) trip. It left Central Station at 11:45:00 AM and arrived at Eastwood Mall on time at 12:21:00 PM.

We have samples every 30 seconds of this trip. Using that, we could calculated the average on time performance or visualize the number of passengers on the bus. You can take this json data and convert it to CSV using a simple converter tool. You can then import your CSV into a spreadsheet program, like Excel or LibreOffice Calc and create visualizations.

For full documentation on the API including all the routes, parameters, and data types, a Swagger file is available. You can also view the documentation online.

Source Code

Th API was written in PHP using the SLIM framework. The source code is available on my dv8-api-server github project.