Adventures in Educational Experience Design

Jaymes Dec's Teaching Portal

Subway Schedule Scraper

without comments

A few weeks ago, I posted a picture of a new project to Twitter. A few people asked me for more documentation, so here you go.

The Subway Schedule Scraper was inspired by the new displays that tell you how many minutes until your subway train will arrive. I live two short blocks from the Broadway subway station in Astoria, Queens and it only takes about 4 minutes from my door to get to the platform. I wanted a display in my own apartment so that I don’t have to go out into the cold and wait for a train. I can just check the display and leave my apartment when it says the train will arrive in four or five minutes. Is also decided to display the weather so that I know if I should bring an umbrella or wear long underwear.

I used an Arduino with a WiFi Shield and large size LCD + Serial Backpack from Sparkfun. The case is laser cut acrylic.

I’m scraping the subway time data from Google maps. They get it from the MTA. I’m not using the MTA data directly because I find it easier to just scrape the data from Google.

You can’t get real-time data for my subway lines yet. The MTA has a pilot program for real-time data for 1, 2, 3, 4, 5, and 6 trains. So the data I’m using is not the actual times; it’s the scheduled times. But since my stop is the third stop on the line, I’ve found it to be usually accurate.

This code is really hacky, but it works. Occasionally, I’ll get an error from the weather.com feed. I should probably switch to the weather underground API.

I’m scraping the subway and weather data and then formatting it for my Arduino using php. This script  is based on the code for the Air Quality Scraper from Tom Igoe’s excellent book, Making Things Talk. The Arduino code is based on Tom’s WiFi Twitter Client. So thank you, Tom!

To get the URL for your subway stop from Google,

  1. Go to maps.google.com
  2. Search for an address near your station
  3. On map, click on your station
  4. Then click “more info”
  5. Get permanent link from “Print – Link”
/*
Subway Schedule + Astoria Weather Web Page Scraper
Language: PHP
Adapted from Tom Igoe's AirQuality Scraper from Making Things Talk:
http://shop.oreilly.com/product/9780596510510.do
*/

// url of the page with the schedule at Broadway, Astoria Stop:
$subwayUrl =
      file_get_contents('https://maps.google.com/maps/place?q=type:transit_station:%22Broadway%22&ftid=0x89c25f374b1fc2c7:0x54be112e97b4cd0');

// Pull out schdeduled departures section of page
$schedule =  extract_unit($subwayUrl, '<table class="pprtjt"> <tr class="pprtjtl">  <td class="pprtjth">', '</td>  </tr> </table> <div class="pprtjmd">');

// Replace html tags and do some formatting to make text more readable
// Delete tags between subway name and due time
$formattedSchedule = str_replace('</td> <td class="pprtjtt">', '', $schedule);

// Replace 'Due' with '0 mins' for "Due" case
$formattedSchedule = str_replace('Due', "0 min", $formattedSchedule);

// Replace to Coney Island with to Manhattan
$formattedSchedule = str_replace('to Coney Island - Stillwell Av ', "to Manhattan in ", $formattedSchedule);

// Get rid of - Ditmars Blvd
$formattedSchedule = str_replace('- Ditmars Blvd', "", $formattedSchedule);

// Split up data into an array
$formattedSchedule = explode('</td>  </tr><tr class="pprtjtl">  <td class="pprtjth">   ', $formattedSchedule);

// Send start tag for Arduino String Parsing
echo "<text>";
// Spit out results for next 5 trains, separated by $ divider
for ($i = 0; $i <=4; $i++){
	print $formattedSchedule[$i] . "$";
}

// RSS Url for Astoria weather
$feedUrl = 'http://rss.weather.com/weather/rss/local/USNY0059?cm_ven=LWO&cm_cat=rss&par=LWO_rss';

// Some magic to convert that xml into usable data
$ret = array();
// retrieve search results
if($xml = simplexml_load_file($feedUrl)) { //load xml file using simplexml
$result["item"] = $xml->xpath("/rss/channel/item"); //divide feed into array elements
foreach($result as $key => $attribute) {
$i=0;
foreach($attribute as $element) {
if($i < 10){
$ret[$i]['description'] = (string)$element->description;
$i++;
}
}
}
}
$todayWeather = ($ret[6]);

foreach($todayWeather as $key => $value) {
    //echo $key;     // echoes index of an array
$todayWeather = $value;   // echoes value of an array
}
// end of magic

// print a line break between Subway and Weather data
echo "\n";

// Spit out weather data
$todayWeather = explode("&deg;",$todayWeather);
$formattedWeather = str_replace(' & ', "$", $todayWeather[0]);
echo $formattedWeather;
echo " F";
echo "$";
echo "</text>";

// Function to pull a substring based on start and end substrings
function extract_unit($string, $start, $end)
	{
	$pos = stripos($string, $start);

	$str = substr($string, $pos);

	$str_two = substr($str, strlen($start));

	$second_pos = stripos($str_two, $end);

	$str_three = substr($str_two, 0, $second_pos);

	$unit = trim($str_three); // remove whitespaces

	return $unit;
	}

And here is the Arduino code:
/*

 Circuit:
 * WiFi shield attached to pins 10, 11, 12, 13
 * Sparkfun Graphic LCD 160x128 w/ Serial backpack connected to 5V, GND, pins 5, 6
 * Connect Backpack TX -> pin 5, Backpack RX -> pin 6

 created Jan 11 2013
 by Jaymes Dec

 Based on Wifi Twitter Client with Strings
 by Tom Igoe

 */
#include <SPI.h>
#include <WiFi.h>
#include <SoftwareSerial.h>

// Use SoftwareSerial to free up regular serial for debugging
SoftwareSerial screenSerial(5, 6); // RX, TX

//Enter yoru wifi SSID and password here
char ssid[] = "YOUR SSID"; //  your network SSID (name)
char pass[] = "your password";    // your network password (use for WPA, or use as key for WEP)

int keyIndex = 0;            // your network key Index number (needed only for WEP)
int serTXpin = 1;   // select the pin for the TX

int status = WL_IDLE_STATUS; // status of the wifi connection

// initialize the library instance:
WiFiClient client;

const unsigned long requestInterval = 30*1000;    // delay between requests; 30 seconds

char server[] = "www.yourserver.com";     // your server address where the PHP scraper is hosted

boolean requested;                     // whether you've made a request since connecting
unsigned long lastAttemptTime = 0;     // last time you connected to the server, in milliseconds

String currentLine = "";               // string to hold the text from server
String message = "";                     // string to hold the tweet
boolean readingMessage = false;          // if you're currently reading the tweet

void setup() {
  //Change Baud to 9600
  //Serial.begin(115200);
  //Serial.write(0x7C); // Command
  //Serial.write(0x07); // Clear
  //Serial.write(0x32); // Sets baud to 9600 see http://www.sparkfun.com/datasheets/LCD/Monochrome/Corrected-SFE-0016-DataSheet-08884-SerialGraphicLCD-v2.pdf
 // Serial.end();
  Serial.begin(9600);
  screenSerial.begin(115200);
  // reserve space for the strings:
  currentLine.reserve(256);
  message.reserve(150);

  screenSerial.write(byte(0x7C)); // Command
  screenSerial.write(byte(0x00)); // Clear the LCD

  // check for the presence of the shield:
  if (WiFi.status() == WL_NO_SHIELD) {
    Serial.print("WiFi shield not present");
    // don't continue:
    while(true);
  }

  // attempt to connect to Wifi network:
  while ( status != WL_CONNECTED) {
    Serial.print("Attempting to connect to SSID: ");
    Serial.print(ssid);
    // Connect to WPA/WPA2 network. Change this line if using open or WEP network:
    status = WiFi.begin(ssid, pass);

    // wait 10 seconds for connection:
    delay(10000);
  }
  // you're connected now, so print out the status:
  printWifiStatus();
  connectToServer();
}

void loop()
{
  if (client.connected()) {
    if (client.available()) {
      // read incoming bytes:
      char inChar = client.read();

      // add incoming byte to end of line:
      currentLine += inChar;

      // if you get a newline, clear the line:
      if (inChar == '\n') {
        currentLine = "";
      }
      // if the current line ends with <text>, it will
      // be followed by the data:
      if ( currentLine.endsWith("<text>")) {
        // data is beginning. Clear the data string:
        readingMessage = true;
        message = "";
        // break out of the loop so this character isn't added to the message:
        return;
      }
      // if you're currently reading the bytes of the message,
      // add them to the message String:
      if (readingMessage) {
        if (inChar != '<') {
          message += inChar;
        }
        else {
          // if you got a "<" character,
          // you've reached the end of the data:
          readingMessage = false;
          screenSerial.write(byte(0x7C)); // Command
          screenSerial.write(byte(0x00)); // Clear the LCD

          // This is my hacky code that splits the message string into data formatted for the LCD
          int firstEndIndex = message.indexOf('$');
          String firstMessage = message.substring(0,firstEndIndex);

          int secondEndIndex = message.indexOf('$',firstEndIndex+1);
          String secondMessage = message.substring(firstEndIndex+1,secondEndIndex);

          int thirdEndIndex = message.indexOf('$',secondEndIndex+1);
          String thirdMessage = message.substring(secondEndIndex+1,thirdEndIndex);

          int fourthEndIndex = message.indexOf('$',thirdEndIndex+1);
          String fourthMessage = message.substring(thirdEndIndex+1,fourthEndIndex);

          int fifthEndIndex = message.indexOf('$',fourthEndIndex+1);
          String fifthMessage = message.substring(fourthEndIndex+1,fifthEndIndex);

          int sixthEndIndex = message.indexOf('$',fifthEndIndex+1);
          String sixthMessage = message.substring(fifthEndIndex+2,sixthEndIndex);

          int seventhEndIndex = message.indexOf('$',sixthEndIndex+1);
          String seventhMessage = message.substring(sixthEndIndex+1,seventhEndIndex);

          //Display the data
          cursorGoTo(1,127);
          screenSerial.print(firstMessage);
          cursorGoTo(1,119);
          screenSerial.print(secondMessage);
          cursorGoTo(1,111);
          screenSerial.print(thirdMessage);
          cursorGoTo(1,103);
          screenSerial.print(fourthMessage);
          cursorGoTo(1,95);
          screenSerial.print(fifthMessage);
          cursorGoTo(1,87);
          screenSerial.print(sixthMessage);
          cursorGoTo(1,79);
          screenSerial.print(seventhMessage);
          // close the connection to the server:
          client.stop();
        }
      }
    }
  }
  else if (millis() - lastAttemptTime > requestInterval) {
    // if you're not connected, and two minutes have passed since
    // your last connection, then attempt to connect again:
    connectToServer();
  }
}

void connectToServer() {
  // attempt to connect, and wait a millisecond:
  Serial.print("connecting to server...");
  if (client.connect(server, 80)) {
    Serial.print("making HTTP request...");
    // make HTTP GET request to twitter:
    // YOU HAVE TO CHANGE THIS TO YORU OWN SERVER AND PHP SCRAPER
    client.println("GET /jaymes/SubwayScraper/scraper.php HTTP/1.0");
    client.println("Host: www.jaymesdec.com\r\n");
    client.println("Connection:close");
    client.println();
  }
  // note the time of this connect attempt:
  lastAttemptTime = millis();
}

void printWifiStatus() {
  // print the SSID of the network you're attached to:
  Serial.print("SSID: ");
  Serial.print(WiFi.SSID());
  screenSerial.print("SSID: ");

  // print your WiFi shield's IP address:
  IPAddress ip = WiFi.localIP();
  Serial.print("IP Address: ");
  Serial.print(ip);

  // print the received signal strength:
  long rssi = WiFi.RSSI();
  Serial.print("signal strength (RSSI):");
  Serial.print(rssi);
  Serial.print(" dBm");
}

// Function to move LCD cursor
void cursorGoTo(int x, int y){
  screenSerial.write(byte(0x7C)); // Command
  screenSerial.write(byte(0x18)); // set x coordinate
  screenSerial.write(x);
  screenSerial.write(byte(0x7C)); // Command
  screenSerial.write(byte(0x19)); // set x coordinate
  screenSerial.write(y);
}

Written by admin

February 18th, 2013 at 3:17 pm

Posted in Uncategorized

Leave a Reply