Tracking Nearby Ships with SDR, Elasticsearch, Kibana, and Vega Visualizations

Intro

A year ago, I moved nearer to Golden Gate Bridge and seeing the ships come in made me wonder how they’re tracked.  I had seen websites like this one, and was curious how they worked.  It turns out the key is in the title of the page “AIS Marine Traffic.”  The traffic comes from the Automatic Identification System, or AIS.  Ships broadcast their position on a frequency around 162MHz.  You can go out and buy a direct receiver and decoder of this data, but I recently had renewed interest in software-defined radio and was looking to do a project with it.  If you haven’t heard of it before, it’s super neat!  The idea is this: historically in many cases, somebody working with radio signals has needed to build a specific antenna and a specific modulator/demodulator in hardware.  Take, for example, a traditional car radio that can pick up AM and FM radio signals:

RF Bands

Image source: GSU

As shown here, FM radio is broken into 200kHz bands, starting at 88.1 and going to 108.1 MHz in the United States.  When you “tune into” a radio like “station 98.1,” you’re tuning into 98.1MHz.  You may have noticed that radio stations are increments of 0.2 (98.1, 98.3, 98.5, …) in the US — there are no FM radio stations that end in even numbers because each station gets 200kHz, or 0.2MHz, so you see 98.1+0.2+0.2+0.2+… and the frequencies will thus always be odd.  Also worth noting: 98.1 is technically more like everything from 98.0-98.2 and they’re just talking about the “center frequency” of the transmission.  When you turned old radio knobs, you were generally changing the electro/physical properties of a tuning capacitor, which caused a circuit in your car or home to move the center frequency as you desired.  An electrical engineer that was designing such a radio and circuit needed to know that they were designing an antenna for 88.1-108.1MHz, what the modulation characteristics were of the signal, etc.  They’d design a physical circuit that met those properties and you’d get it in your car or home.  If, instead of FM radio, they were designing for TV, there would be a different set of frequencies and rules they’d design for.  Nowadays, though, you can buy a single integrated circuit that can listen to a huge range of frequencies and tune the bandwidth and demodulation all through a software interface!  What’s really neat is that you can get the whole thing wrapped into a single USB stick that works for Linux, Windows, or Mac for about $10-50, depending on what you buy and where you buy it.  So that’s what I did!

This post is my attempt to recreate a website like marinetraffic.com using a $20 piece of hardware, Elasticsearch, Kibana, and a few hundred lines of code.

Tuning In

I poked around and found a variety of AIS-related projects, including a C project that can directly decode AIS packets from a SDR.  I pulled that down and compiled it to see what it did.  That project publishes the data as a TCP or UDP stream of messages with a very special formatting.  You can also get it to output to the shell.  For example:

./rtl_ais -n
Edge tuning disabled.
DC filter enabled.
RTL AGC disabled.
Internal AIS decoder enabled.
Buffer size: 163.84 mS
Downsample factor: 64
Low pass: 25000 Hz
Output: 48000 Hz
Found 1 device(s):
0: Realtek, RTL2838UHIDIR, SN: 00000001

Using device 0: Generic RTL2832U OEM
Detached kernel driver
Found Rafael Micro R820T tuner
Log NMEA sentences to console ON
AIS data will be sent to 127.0.0.1 port 10110
Tuner gain set to automatic.
Tuned to 162000000 Hz.
Sampling at 1600000 S/s.
!AIVDM,1,1,,A,15N0;wS002G?MS6E`tCs3V3N0<2D,0*22

The first lines are all the hardware initialization.  The last line is a message from a nearby ship!  It’s not super consumable as-is: the message is in AIVDM format, so I need something to decode that.  By way of this example, we can see in the comma-separated output:

  • The !AIVDM header
  • The number of fragments (in this case, 1)
  • Which fragment number this message is (in this case, 1)
  • A sentence ID for multi-sequence messages (in this case, it’s empty — there’s only 1 sentence)
  • The radio channel code (A in this case — rtl_ais listens to both A and B channels)
  • The data payload (15N0;wS002G?MS6E`tCs3V3N0<2D in this case)
  • A checksum, which there are some rules around what the checksum looks like (0*22 in this case)

That data payload thing is probably the most interesting part of the message.  In order to process more complex messages, we’ll need to do something that can parse fragment numbers and combine them together, but we can get started on writing a parser with just this.  According to the spec:

The data payload is an ASCII-encoded bit vector. Each character represents six bits of data. To recover the six bits, subtract 48 from the ASCII character value; if the result is greater than 40 subtract 8. According to [IEC-PAS], the valid ASCII characters for this encoding begin with “0” (64) and end with “w” (87); however, the intermediate range “X” (88) to “\_” (95) is not used.

Fortunately, there’s even an existing python library for doing AIS message decoding.  A quick test with this message shows:

$ python3
Python 3.5.3 (default, Jan 19 2017, 14:11:04)
[GCC 6.3.0 20170124] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ais
>>> from pprint import pprint
>>> pprint(ais.decode('15N0;wS002G?MS6E`tCs3V3N0<2D',0))
{u'cog': 283.0,
u'id': 1L,
u'mmsi': 367004670L,
u'nav_status': 3L,
u'position_accuracy': 0L,
u'raim': False,
u'received_stations': 148L,
u'repeat_indicator': 0L,
u'rot': 0.0,
u'rot_over_range': False,
u'slot_timeout': 3L,
u'sog': 0.20000000298023224,
u'spare': 0L,
u'special_manoeuvre': 0L,
u'sync_state': 0L,
u'timestamp': 47L,
u'true_heading': 193L,
u'x': -122.45146166666666,
u'y': 37.818158333333336}

There are a lot of ways to get that data into a consumable location.  You can run one of a variety of applications that can show you the live output.  I like Elasticsearch, so I wanted to index my data there and I didn’t see any projects that did.  The best way would probably be to modify one of these programs to output to Elasticsearch instead of UDP packets or wherever else they shovel data, but I figured it would be easy to just write a little UDP server that listens to port 10110 (the default rtl_ais delivery port), formats the messages, and ships (get it, ships?) them off.  First, we need to see how rtl_ais sends the data.  I could read the source code, but it’s probably faster for me to write a simple UDP server in Python that listens for a buffer and prints what we get:

import socketserver
import ais
from pprint import pprint

class AISHandler(socketserver.DatagramRequestHandler):
  def handle(self):
    data = self.rfile.readline()
    pprint(data)

if __name__ == "__main__":
  server = socketserver.UDPServer(('', 10110), AISHandler)
  server.serve_forever()

Which we can run and see:

$ python3 rtl_ais_server.py
b'!AIVDM,1,1,,A,15MvK40P1go?WQ`E`eL<Ngwh28Pu,0*4F\r\n'
b'!AIVDM,1,1,,B,15Ngp6PPCpG?bs<E`oN:v8Op0hQp,0*1E\r\n'

So basically, rtl_ais sends the raw text followed by a carriage return and newline.  That’s easy enough.  I just need to keep track of multi-packet sequences, decode them, format them how I want, and send them off to Elasticsearch so we can visualize them in Kibana.  I’ll skip past the boring interim code and give you the final result.  The one tricky bit is keeping track of multi-sentence messages.  There are a few situations that complicate multi-sentence messages:

  • At least in theory, I suppose a message could arrive out of order.  I don’t know if that happens in practice or not: I haven’t seen it yet.  The only explanations I could see why they would is that faulty encoding/decoding software results in non-sequential packets.  Or a message traveling backwards in time (queue spooky ghost ship music).
  • It’s likely that you’ll receive only parts of messages, and that certainly needs to be accounted for.  For example, maybe my antenna hears sentences 1, 2, and 4 of 4 total sentences.
  • It’s possible you’ll receive messages from different AIS radios broadcasted in an interleaved message set.  That is, if you have boats 1 and 2 as B1, B2 and you have sentences 1 and 2 as S1 and S2, it’s possible you’ll receive B1S1,B2S1,B1S2,B2S2.  Because this is rare, some software I’ve seen ignores this.  Not me though!
  • Ships report different data components at different times.  For example, a ship may report its position and heading in one packet and it may report its destination and ETA in a different packet minutes later.  If we want to associate all of the data surrounding a ship, we’ll have to keep track of some kind of “ship” record that will need updating.

A simple solution for the second and third of these is just to assume you have some relatively small queue of messages with their sentence ID attached.  Evict messages from the queue if you get too many in order to keep from a huge queue.  Python has a convenient OrderedDict class that shares some properties of a queue (namely, that you can evict/pop items at will in a FIFO fashion).  For the third, since I’m storing the data in Elasticsearch, I can use doc_as_upsert to merge ship information together into a single ship object.  Altogether, the code looks like this:

import socketserver
import ais
from elasticsearch import Elasticsearch
from datetime import datetime
from collections import OrderedDict
from pprint import pprint

class AISHandler(socketserver.DatagramRequestHandler):
  _messagemap = OrderedDict()
  _es = Elasticsearch()

  def indexDoc(self, doc):
    timestamp = datetime.utcnow()
    dmy = timestamp.strftime('%Y-%m-%d')
    body = doc
    body['curtime'] = timestamp
    print(body)
    if ('x' in body and 'y' in body):
      body['location'] = {'lon': body['x'], 'lat': body['y']}
    self._es.index(index="ais-" + dmy, doc_type="_doc", body=body)
    self._es.update(index='ships-seen',doc_type='_doc',id=body['mmsi'],
                body={ "doc": body, "doc_as_upsert": True })

  def handle(self):
    data = self.rfile.readline()
    while data:
      data = data.strip().decode("utf-8")
      ais_arr = data.split(',')
      num_fragments = int(ais_arr[1])
      fragment_number = int(ais_arr[2])
      sentence_id = ais_arr[3]
      channel = ais_arr[4]
      data_payload = ais_arr[5]
      if num_fragments == 1:
        # this is a single-fragment payload, so we can decode it straight away
        decoded_message = ais.decode(data_payload,0)
        self.indexDoc(doc=decoded_message)
      elif fragment_number < num_fragments:
        # this is a multi-fragment payload and we haven't yet received them all.  Add this to any existing sentences
        if sentence_id in self._messagemap:
          self._messagemap[sentence_id] = self._messagemap[sentence_id] + data_payload
        else:
          # if we have too many items in queue, we'll force one out
          if (len(self._messagemap.keys()) > 100):
            self._messagemap.popitem(last=True)
          self._messagemap[sentence_id] = data_payload
      else:
        # we've hit the end of the multi-fragment data
        self._messagemap[sentence_id] = self._messagemap[sentence_id] + data_payload
        decoded_message = ais.decode(self._messagemap[sentence_id], num_fragments)
        self._messagemap.pop(sentence_id, None)
        self.indexDoc(doc=decoded_message)
      data = self.rfile.readline()


if __name__ == "__main__":
  server = socketserver.UDPServer(('', 10110), AISHandler)
  server.serve_forever()

Before we run this program, it does require first setting up our Elasticsearch index templates.  I don’t know a lot about all of the messages types that the AIS system could send, but fortunately I didn’t need to.  All of the fields I’ve seen seem to be interpreted correctly via Elasticsearch’s dynamic mapping other than the location (geo_point needs to be manually defined), but we should probably define at least 3 key ones just in case:

PUT /_template/ais
{
  "index_patterns": ["ais-*", "ships-*"],
  "mappings": {
    "_doc": {
      "properties": {
        "name": {
          "type": "text"
        },
        "location": {
          "type": "geo_point"
        },
        "curtime": {
          "type": "date"
        }
      }
    }
  },
  "settings": {
    "number_of_shards": 1
  }
}

Once I start up rtl_ais_server.py I see some documents flow into Elasticsearch:

{
  "_index": "ais-2018-08-17",
  "_type": "_doc",
  "_id": "1-oPRmUBdPKlebHLixCO",
  "_version": 1,
  "_score": 2,
  "_source": {
    "timestamp": 61,
    "raim": false,
    "assigned_mode": false,
    "aton_status": 0,
    "y": 37.80078833333334,
    "off_pos": false,
    "fix_type": 7,
    "x": -122.375045,
    "position_accuracy": 0,
    "dim_d": 0,
    "id": 21,
    "aton_type": 19,
    "spare": 0,
    "dim_b": 0,
    "location": {
      "lon": -122.375045,
      "lat": 37.80078833333334
    },
    "repeat_indicator": 0,
    "curtime": "2018-08-17T04:05:47.615296",
    "virtual_aton": true,
    "mmsi": 993692027,
    "dim_a": 0,
    "dim_c": 0,
    "name": "SF OAK BAY BR VAIS D@"
  },
  "fields": {
    "curtime": [
      "2018-08-17T04:05:47.615Z"
    ]
  }
}

The last step I wanted to do here is to build a visualization.  There are a few ways to do this, but what I thought would be interesting was to try my hand at a Vega visualization, which was released in version 6.2 of Kibana.  Coming into vega cold (clearly I’m really not in tune with front-end frameworks these days!), a few things became clearer to me after working on this:

  • The Vega declaration syntax is definitely not an easy thing to get a hang of!
  • Vega is super powerful!
  • Vega is super difficult to debug if you don’t understand the structure of how visualizations are executed.
  • There seem to be a variety of limitations.  For example, it appears that most of the marks it has are not rotatable, which I experienced (so I just chose a UTF8 character that).  I’ve also had some difficulties with actually doing things like changing the cursors on hover, and I’m not sure what it actually takes to have non-system type tooltips.  I think you can probably attach visibility of some text to another mark.  But I’ve spent almost no time so far trying to figure out why yet.

Anyway, after a lot of fiddling, the following works.  It seems that text is the only rotatable “mark” right now, so I had to choose a character that kind of looks like I imagined a ship on a map to look.

{
   "$schema": "https://vega.github.io/schema/vega/v3.0.json",
   "config": {
     "kibana": {
       "type": "map",
       "latitude": 37.77,
       "longitude": -122.45,
       "zoom": 12,
       "mapStyle": "default",
       "minZoom": 5,
       "maxZoom": 17,
       "zoomControl": true,
       "delayRepaint": true,
       "controlsDirection": "horizontal"
     }
   },
   "data": [
     {
       "name": "ships",
       "url": {
         "index": "ships-*",
         "body": {
           "query": {
            "bool": {
              "must": [
                { "exists": { "field": "location" } },
                { "exists": { "field": "mmsi" } },
                { "range": {"curtime": {"%timefilter%": true}} }
              ]
            }
           },
           "size": 10000
         }
       }
       "format": { "type": "json", "property": "hits.hits" },
       "transform": [
          {
            "type": "geopoint",
            "projection": "projection",
            "fields": [
              {"expr": "datum._source.location.lon"},
              {"expr": "datum._source.location.lat"}
            ],
            "as": ["x", "y"]
          },
          {
            "type": "formula",
            "expr": "if (datum._source.dim_a != null && datum._source.dim_b != null && datum._source.dim_c != null, log(1 + datum._source.dim_a + datum._source.dim_b + datum._source.dim_c) * 10, 100)"
            "as": "shipsize"
          },
          {
            "type": "formula",
            "expr": "datum._source.mmsi % 20"
            "as": "shipcolor"
          },
          {
            type: "formula",
            expr: "if (datum._source.name != null, datum._source.name, 'Unnamed')",
            as: "shipname"
          },
          {
            type: "formula",
            expr: "if (datum._source.true_heading != null, datum._source.true_heading, 'Unnamed')",
            as: "heading"
          }
       ],
     },
   ],
   scales: [
    {
      "name": "shipcolorscale",
      "type": "ordinal",
      "domain": {"data": "ships", "field": "shipcolor"},
      "range": { "scheme": "category20" }
    }
  ],
    "marks": [
      {
          "type": "text",
          "from": {"data": "ships"},
          "encode": {
            "update": {
              "x": {"signal": "datum.x"},
              "y": {"signal": "datum.y"},
              "text": { "value": "⏏" },
              "fontSize": { "signal": "datum.shipsize" },
              "fill": { "scale": "shipcolorscale", "field": "shipcolor" },
              "align": { "value": "center" },
              "fontWeight": { "value": "bold" },
              "angle": { "signal": "datum.heading" },
              "tooltip": { "signal": "datum.shipname" },
            }
            "hover": {
              "fill": {"value": "red"},
              "tooltip": { "signal": "datum.shipname" }
            }
          }
      }
    ]
}

And it looks pretty good!  On hovering, I can see the S.S. Jeremiah O’Brien is safely docked in the bay.

And I can also see that one of our pilot boats is heading out to sea to guide a ship in.

A couple notes about the visualization:

  • I’ve sized the icons by the size of the ship.  Big ship = big icon.
  • The colors of the ships are a hash of the ship’s ID (MMSI) modulo 20, and clearly there are some collisions in such a small space.  I couldn’t find an easy way to do a kind of deterministic color from a ship’s MSSI in a larger space than 20 — I think I’m just missing something though.
  • This uses the ships-seen index, which only keeps the most recent position/heading of the ship due to the upsert.

A visualization of ais-* shows us all of the events of the incoming/outgoing vessels, which is also interesting, because it allows you to see the sort of trail of the boats.

I’m both interested further in the direction of other types of digital data radio signals as well as different types of signals wholesale.

That about wraps it up.  You can grab all the code for this project on GitHub.  Sea you next time!