Radio Projects

Tracking Nearby Ships with SDR, Elasticsearch, Kibana, and Vega Visualizations


A year ago, I moved nearer to Golden Gate Bridge and seeing the ships come in made me wonder how they’re tracked.  I had seen websites like this one, and was curious how they worked.  It turns out the key is in the title of the page “AIS Marine Traffic.”  The traffic comes from the Automatic Identification System, or AIS.  Ships broadcast their position on a frequency around 162MHz.  You can go out and buy a direct receiver and decoder of this data, but I recently had renewed interest in software-defined radio and was looking to do a project with it.  If you haven’t heard of it before, it’s super neat!  The idea is this: historically in many cases, somebody working with radio signals has needed to build a specific antenna and a specific modulator/demodulator in hardware.  Take, for example, a traditional car radio that can pick up AM and FM radio signals:

RF Bands

Image source: GSU

As shown here, FM radio is broken into 200kHz bands, starting at 88.1 and going to 108.1 MHz in the United States.  When you “tune into” a radio like “station 98.1,” you’re tuning into 98.1MHz.  You may have noticed that radio stations are increments of 0.2 (98.1, 98.3, 98.5, …) in the US — there are no FM radio stations that end in even numbers because each station gets 200kHz, or 0.2MHz, so you see 98.1+0.2+0.2+0.2+… and the frequencies will thus always be odd.  Also worth noting: 98.1 is technically more like everything from 98.0-98.2 and they’re just talking about the “center frequency” of the transmission.  When you turned old radio knobs, you were generally changing the electro/physical properties of a tuning capacitor, which caused a circuit in your car or home to move the center frequency as you desired.  An electrical engineer that was designing such a radio and circuit needed to know that they were designing an antenna for 88.1-108.1MHz, what the modulation characteristics were of the signal, etc.  They’d design a physical circuit that met those properties and you’d get it in your car or home.  If, instead of FM radio, they were designing for TV, there would be a different set of frequencies and rules they’d design for.  Nowadays, though, you can buy a single integrated circuit that can listen to a huge range of frequencies and tune the bandwidth and demodulation all through a software interface!  What’s really neat is that you can get the whole thing wrapped into a single USB stick that works for Linux, Windows, or Mac for about $10-50, depending on what you buy and where you buy it.  So that’s what I did!

This post is my attempt to recreate a website like using a $20 piece of hardware, Elasticsearch, Kibana, and a few hundred lines of code.

Tuning In

I poked around and found a variety of AIS-related projects, including a C project that can directly decode AIS packets from a SDR.  I pulled that down and compiled it to see what it did.  That project publishes the data as a TCP or UDP stream of messages with a very special formatting.  You can also get it to output to the shell.  For example:

./rtl_ais -n
Edge tuning disabled.
DC filter enabled.
RTL AGC disabled.
Internal AIS decoder enabled.
Buffer size: 163.84 mS
Downsample factor: 64
Low pass: 25000 Hz
Output: 48000 Hz
Found 1 device(s):
0: Realtek, RTL2838UHIDIR, SN: 00000001

Using device 0: Generic RTL2832U OEM
Detached kernel driver
Found Rafael Micro R820T tuner
Log NMEA sentences to console ON
AIS data will be sent to port 10110
Tuner gain set to automatic.
Tuned to 162000000 Hz.
Sampling at 1600000 S/s.

The first lines are all the hardware initialization.  The last line is a message from a nearby ship!  It’s not super consumable as-is: the message is in AIVDM format, so I need something to decode that.  By way of this example, we can see in the comma-separated output:

  • The !AIVDM header
  • The number of fragments (in this case, 1)
  • Which fragment number this message is (in this case, 1)
  • A sentence ID for multi-sequence messages (in this case, it’s empty — there’s only 1 sentence)
  • The radio channel code (A in this case — rtl_ais listens to both A and B channels)
  • The data payload (15N0;wS002G?MS6E`tCs3V3N0<2D in this case)
  • A checksum, which there are some rules around what the checksum looks like (0*22 in this case)

That data payload thing is probably the most interesting part of the message.  In order to process more complex messages, we’ll need to do something that can parse fragment numbers and combine them together, but we can get started on writing a parser with just this.  According to the spec:

The data payload is an ASCII-encoded bit vector. Each character represents six bits of data. To recover the six bits, subtract 48 from the ASCII character value; if the result is greater than 40 subtract 8. According to [IEC-PAS], the valid ASCII characters for this encoding begin with “0” (64) and end with “w” (87); however, the intermediate range “X” (88) to “\_” (95) is not used.

Fortunately, there’s even an existing python library for doing AIS message decoding.  A quick test with this message shows:

$ python3
Python 3.5.3 (default, Jan 19 2017, 14:11:04)
[GCC 6.3.0 20170124] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ais
>>> from pprint import pprint
>>> pprint(ais.decode('15N0;wS002G?MS6E`tCs3V3N0<2D',0))
{u'cog': 283.0,
u'id': 1L,
u'mmsi': 367004670L,
u'nav_status': 3L,
u'position_accuracy': 0L,
u'raim': False,
u'received_stations': 148L,
u'repeat_indicator': 0L,
u'rot': 0.0,
u'rot_over_range': False,
u'slot_timeout': 3L,
u'sog': 0.20000000298023224,
u'spare': 0L,
u'special_manoeuvre': 0L,
u'sync_state': 0L,
u'timestamp': 47L,
u'true_heading': 193L,
u'x': -122.45146166666666,
u'y': 37.818158333333336}

There are a lot of ways to get that data into a consumable location.  You can run one of a variety of applications that can show you the live output.  I like Elasticsearch, so I wanted to index my data there and I didn’t see any projects that did.  The best way would probably be to modify one of these programs to output to Elasticsearch instead of UDP packets or wherever else they shovel data, but I figured it would be easy to just write a little UDP server that listens to port 10110 (the default rtl_ais delivery port), formats the messages, and ships (get it, ships?) them off.  First, we need to see how rtl_ais sends the data.  I could read the source code, but it’s probably faster for me to write a simple UDP server in Python that listens for a buffer and prints what we get:

import socketserver
import ais
from pprint import pprint

class AISHandler(socketserver.DatagramRequestHandler):
  def handle(self):
    data = self.rfile.readline()

if __name__ == "__main__":
  server = socketserver.UDPServer(('', 10110), AISHandler)

Which we can run and see:

$ python3

So basically, rtl_ais sends the raw text followed by a carriage return and newline.  That’s easy enough.  I just need to keep track of multi-packet sequences, decode them, format them how I want, and send them off to Elasticsearch so we can visualize them in Kibana.  I’ll skip past the boring interim code and give you the final result.  The one tricky bit is keeping track of multi-sentence messages.  There are a few situations that complicate multi-sentence messages:

  • At least in theory, I suppose a message could arrive out of order.  I don’t know if that happens in practice or not: I haven’t seen it yet.  The only explanations I could see why they would is that faulty encoding/decoding software results in non-sequential packets.  Or a message traveling backwards in time (queue spooky ghost ship music).
  • It’s likely that you’ll receive only parts of messages, and that certainly needs to be accounted for.  For example, maybe my antenna hears sentences 1, 2, and 4 of 4 total sentences.
  • It’s possible you’ll receive messages from different AIS radios broadcasted in an interleaved message set.  That is, if you have boats 1 and 2 as B1, B2 and you have sentences 1 and 2 as S1 and S2, it’s possible you’ll receive B1S1,B2S1,B1S2,B2S2.  Because this is rare, some software I’ve seen ignores this.  Not me though!
  • Ships report different data components at different times.  For example, a ship may report its position and heading in one packet and it may report its destination and ETA in a different packet minutes later.  If we want to associate all of the data surrounding a ship, we’ll have to keep track of some kind of “ship” record that will need updating.

A simple solution for the second and third of these is just to assume you have some relatively small queue of messages with their sentence ID attached.  Evict messages from the queue if you get too many in order to keep from a huge queue.  Python has a convenient OrderedDict class that shares some properties of a queue (namely, that you can evict/pop items at will in a FIFO fashion).  For the third, since I’m storing the data in Elasticsearch, I can use doc_as_upsert to merge ship information together into a single ship object.  Altogether, the code looks like this:

import socketserver
import ais
from elasticsearch import Elasticsearch
from datetime import datetime
from collections import OrderedDict
from pprint import pprint

class AISHandler(socketserver.DatagramRequestHandler):
  _messagemap = OrderedDict()
  _es = Elasticsearch()

  def indexDoc(self, doc):
    timestamp = datetime.utcnow()
    dmy = timestamp.strftime('%Y-%m-%d')
    body = doc
    body['curtime'] = timestamp
    if ('x' in body and 'y' in body):
      body['location'] = {'lon': body['x'], 'lat': body['y']}
    self._es.index(index="ais-" + dmy, doc_type="_doc", body=body)
                body={ "doc": body, "doc_as_upsert": True })

  def handle(self):
    data = self.rfile.readline()
    while data:
      data = data.strip().decode("utf-8")
      ais_arr = data.split(',')
      num_fragments = int(ais_arr[1])
      fragment_number = int(ais_arr[2])
      sentence_id = ais_arr[3]
      channel = ais_arr[4]
      data_payload = ais_arr[5]
      if num_fragments == 1:
        # this is a single-fragment payload, so we can decode it straight away
        decoded_message = ais.decode(data_payload,0)
      elif fragment_number < num_fragments:
        # this is a multi-fragment payload and we haven't yet received them all.  Add this to any existing sentences
        if sentence_id in self._messagemap:
          self._messagemap[sentence_id] = self._messagemap[sentence_id] + data_payload
          # if we have too many items in queue, we'll force one out
          if (len(self._messagemap.keys()) > 100):
          self._messagemap[sentence_id] = data_payload
        # we've hit the end of the multi-fragment data
        self._messagemap[sentence_id] = self._messagemap[sentence_id] + data_payload
        decoded_message = ais.decode(self._messagemap[sentence_id], num_fragments)
        self._messagemap.pop(sentence_id, None)
      data = self.rfile.readline()

if __name__ == "__main__":
  server = socketserver.UDPServer(('', 10110), AISHandler)

Before we run this program, it does require first setting up our Elasticsearch index templates.  I don’t know a lot about all of the messages types that the AIS system could send, but fortunately I didn’t need to.  All of the fields I’ve seen seem to be interpreted correctly via Elasticsearch’s dynamic mapping other than the location (geo_point needs to be manually defined), but we should probably define at least 3 key ones just in case:

PUT /_template/ais
  "index_patterns": ["ais-*", "ships-*"],
  "mappings": {
    "_doc": {
      "properties": {
        "name": {
          "type": "text"
        "location": {
          "type": "geo_point"
        "curtime": {
          "type": "date"
  "settings": {
    "number_of_shards": 1

Once I start up I see some documents flow into Elasticsearch:

  "_index": "ais-2018-08-17",
  "_type": "_doc",
  "_id": "1-oPRmUBdPKlebHLixCO",
  "_version": 1,
  "_score": 2,
  "_source": {
    "timestamp": 61,
    "raim": false,
    "assigned_mode": false,
    "aton_status": 0,
    "y": 37.80078833333334,
    "off_pos": false,
    "fix_type": 7,
    "x": -122.375045,
    "position_accuracy": 0,
    "dim_d": 0,
    "id": 21,
    "aton_type": 19,
    "spare": 0,
    "dim_b": 0,
    "location": {
      "lon": -122.375045,
      "lat": 37.80078833333334
    "repeat_indicator": 0,
    "curtime": "2018-08-17T04:05:47.615296",
    "virtual_aton": true,
    "mmsi": 993692027,
    "dim_a": 0,
    "dim_c": 0,
    "name": "SF OAK BAY BR VAIS D@"
  "fields": {
    "curtime": [

The last step I wanted to do here is to build a visualization.  There are a few ways to do this, but what I thought would be interesting was to try my hand at a Vega visualization, which was released in version 6.2 of Kibana.  Coming into vega cold (clearly I’m really not in tune with front-end frameworks these days!), a few things became clearer to me after working on this:

  • The Vega declaration syntax is definitely not an easy thing to get a hang of!
  • Vega is super powerful!
  • Vega is super difficult to debug if you don’t understand the structure of how visualizations are executed.
  • There seem to be a variety of limitations.  For example, it appears that most of the marks it has are not rotatable, which I experienced (so I just chose a UTF8 character that).  I’ve also had some difficulties with actually doing things like changing the cursors on hover, and I’m not sure what it actually takes to have non-system type tooltips.  I think you can probably attach visibility of some text to another mark.  But I’ve spent almost no time so far trying to figure out why yet.

Anyway, after a lot of fiddling, the following works.  It seems that text is the only rotatable “mark” right now, so I had to choose a character that kind of looks like I imagined a ship on a map to look.

   "$schema": "",
   "config": {
     "kibana": {
       "type": "map",
       "latitude": 37.77,
       "longitude": -122.45,
       "zoom": 12,
       "mapStyle": "default",
       "minZoom": 5,
       "maxZoom": 17,
       "zoomControl": true,
       "delayRepaint": true,
       "controlsDirection": "horizontal"
   "data": [
       "name": "ships",
       "url": {
         "index": "ships-*",
         "body": {
           "query": {
            "bool": {
              "must": [
                { "exists": { "field": "location" } },
                { "exists": { "field": "mmsi" } },
                { "range": {"curtime": {"%timefilter%": true}} }
           "size": 10000
       "format": { "type": "json", "property": "hits.hits" },
       "transform": [
            "type": "geopoint",
            "projection": "projection",
            "fields": [
              {"expr": "datum._source.location.lon"},
              {"expr": ""}
            "as": ["x", "y"]
            "type": "formula",
            "expr": "if (datum._source.dim_a != null && datum._source.dim_b != null && datum._source.dim_c != null, log(1 + datum._source.dim_a + datum._source.dim_b + datum._source.dim_c) * 10, 100)"
            "as": "shipsize"
            "type": "formula",
            "expr": "datum._source.mmsi % 20"
            "as": "shipcolor"
            type: "formula",
            expr: "if ( != null,, 'Unnamed')",
            as: "shipname"
            type: "formula",
            expr: "if (datum._source.true_heading != null, datum._source.true_heading, 'Unnamed')",
            as: "heading"
   scales: [
      "name": "shipcolorscale",
      "type": "ordinal",
      "domain": {"data": "ships", "field": "shipcolor"},
      "range": { "scheme": "category20" }
    "marks": [
          "type": "text",
          "from": {"data": "ships"},
          "encode": {
            "update": {
              "x": {"signal": "datum.x"},
              "y": {"signal": "datum.y"},
              "text": { "value": "⏏" },
              "fontSize": { "signal": "datum.shipsize" },
              "fill": { "scale": "shipcolorscale", "field": "shipcolor" },
              "align": { "value": "center" },
              "fontWeight": { "value": "bold" },
              "angle": { "signal": "datum.heading" },
              "tooltip": { "signal": "datum.shipname" },
            "hover": {
              "fill": {"value": "red"},
              "tooltip": { "signal": "datum.shipname" }

And it looks pretty good!  On hovering, I can see the S.S. Jeremiah O’Brien is safely docked in the bay.

And I can also see that one of our pilot boats is heading out to sea to guide a ship in.

A couple notes about the visualization:

  • I’ve sized the icons by the size of the ship.  Big ship = big icon.
  • The colors of the ships are a hash of the ship’s ID (MMSI) modulo 20, and clearly there are some collisions in such a small space.  I couldn’t find an easy way to do a kind of deterministic color from a ship’s MSSI in a larger space than 20 — I think I’m just missing something though.
  • This uses the ships-seen index, which only keeps the most recent position/heading of the ship due to the upsert.

A visualization of ais-* shows us all of the events of the incoming/outgoing vessels, which is also interesting, because it allows you to see the sort of trail of the boats.

I’m both interested further in the direction of other types of digital data radio signals as well as different types of signals wholesale.

That about wraps it up.  You can grab all the code for this project on GitHub.  Sea you next time!

Antenna Design Genetic Algorithm


For certain classes of antennas, e.g. Yagi-Uda antennas, the design characteristics have no known “best case” numeric values.  That is, if you want to design a Yagi-Uda antenna for a particular frequency, there is no known numeric solution for the width of the dipoles, number of dipoles, and distance between each dipole in order to achieve the highest gain.  Instead, people rely on experimental evidence: the designs of a number of common frequencies have been tested in the field to produce certain amount of gain, so if you know what frequency you’re looking to design for, you go look up the tables based upon what others have tested for.  If you’re looking for an entirely unique frequency, you have to go experiment yourself.

New Approach

I wrote a MatLab program to use a genetic algorithm to modify the parameters of a antenna and eventually “give birth” to a “best known case” antenna based upon forward gain, etc. Currently, it uses NEC2 (Numerical Electromagnetics Code 2) as the processing engine. It writes out a text file to disk, then calls NEC2 to process that file.  This allows us to try a number of unknown antenna designs and permute possible solutions.  It can be run in a distributed fashion, with each machine “phoning home” to a central database which then redistributes the best-known designs to the worker machines.

The output is something like the following, which shows the forward gain of a given design through a set of frequencies

Output from a forward gain analysis


Genetic Algorithm:

function [beta,stopcode]=ga(funstr,parspace,options,p1,p2,p3,p4,p5,p6,p7,p8,p9)
% Genetic Algorithm for function maximization.
%  beta       = (1 x K) parameter vector maximizing funstr
%  stopcode   = code for terminating condition
%                == 1 if terminated normally
%                == 2 if maximum number of iterations exceeded
%  funstr     = name of function to be maximized (string).
%  parspace   = (2 x K) matrix is [min; max] of parameter space dimensions
%               or, if (3 x K), then bottom row is a good starting value
%  options    = vector of option settings
%  p1,p2,...,p9 are optional parameters to be passed to funstr
% where:
% options(1) = m (size of generation, must be even integer)
% options(2) = eta (crossover rate in (0,1); use Booker's VCO if < 0)
% options(3) = gamma (mutation rate in (0,1))
% options(4) = printcnt (print status once every printcnt iterations)
%                Set printcnt to zero to suppress printout.
% options(5) = maxiter (maximum number of iterations)
% options(6) = stopiter (minimum number of gains < epsln before stop)
% options(7) = epsln (smallest gain worth recognizing)
% options(8) = rplcbest (every rplcbest iterations, insert best-so-far)
% options(9) = 1 if function is vectorized (i.e., if the function
%                can simultaneously evaluate many parameter vectors).
%    Default option settings: [20,-1,0.12,10,20000,2000,1e-4,50,0]
% Note: 
%    The function is maximized with respect to its first parameter,
%    which is expressed as a row vector.
%    Example: 
%      Say we want to maximize function f with respect to vector p,
%      and need also to pass to f data matrices x,y,z.  Then,
%      write the function f so it is called as f(p,x,y,z).  GA will
%      assume that p is a row vector.

months = ['Jan';'Feb';'Mar';'Apr';'May';'Jun';...

if nargin>2
   if isempty(options)
m=options(1); eta=options(2); gam=options(3);
stopiter=options(6); epsln=options(7);

% Use Booker's VCO if eta==-1
vco=(eta<0);  eta=abs(eta);

% Cancel rplcbest if <=0
if rplcbest<=0, rplcbest=maxiter+1; end


% Draw initial Generation
if b0rows>0
  parspace=parspace([1 2],:);

% Initial 'best' holders
bestfun=-Inf; beta=zeros(1,K);

% Score for each of m vectors

% Setup function string for evaluations
evalstr = [funstr,'(G'];
if ~vecfun
        evalstr=[evalstr, '(i,:)'];
if nargin>3, evalstr=[evalstr,paramstr(1:3*(nargin-3))]; end
evalstr = [evalstr, ')'];

% Print header
if printcnt>0
   disp(['Maximization of function ',funstr])
   disp('i      = Current generation')
   disp('best_i = Best function value in generation i')
   disp('best   = Best function value so far')
   disp('miss   = Number of generations since last hit')
   disp('psi    = Proportion of unique genomes in generation')
   disp(sprintf(['\n',blanks(20),'i     best_i        best     miss   psi']))

iter=0;  stopcode=0;
oldpsi=1;  % for VCO option
while stopcode==0
   % Call function for each vector in G
   if vecfun
     for i=1:m
   bf=max([bf0 bestfun]);
   if fgain>epsln
   if fgain>0
   if printcnt>0 & rem(iter,printcnt)==1
        ckhr=int2str(ck(4)+100);  ckday=int2str(ck(3)+100);
        ckmin=int2str(ck(5)+100); cksec=int2str(ck(6)+100);
        timestamp=[ckday(2:3),months(ck(2),:),' ',...
           ckhr(2:3),':',ckmin(2:3),':',cksec(2:3),' '];
        disp([timestamp,sprintf('%6.0f %8.5e %8.5e %5.0f %5.3f',...
                [iter bf0 bestfun inarow psi])])
        save gabest beta timestamp iter funstr
   % Reproduction
   r=rand(1,m); r=sum(r(ones(m,1),:)>pcum(:,ones(1,m)))+1;
   % Crossover
   if vco
        eta=max([0.2 min([1,eta-psi+oldpsi])]);
   if y>0
     % choose crossover point
     for i=1:y
   % Mutation
   % Once every rplcbest iterations, re-insert best beta
   if rem(iter,rplcbest)==0

if printcnt>0
   if stopcode==1
        disp(sprintf('GA: No improvement in %5.0f generations.\n',stopiter))
        disp(sprintf('GA: Maximum number of iterations exceeded.\n'))
% end of GA.M
function [gain_t]=Yagi(p)
% This program is used with a getic optimization code.
% It creates an NEC input file given the parameters for a 3 element YAGI
% antenna.  Then, it runs NEC and reads in the parameters(Gain, Impedance, etc) generated by NEC.
%=========Create NEC input file========================================
D=0.0085; % diamter of elements in wavelengths
Lr=p(1); %perimeter of reflector
Ls=p(2); %perimeter of driven element
Ld=p(3); %perimeter of director
Sr=-p(4); %location of reflector
Sd=p(5);  %location of director
%    Geometry input for NEC

rsl = Lr / 4; %reflector single side length: square loop, one side = permiter / 4
ssl = Ls / 4; %driven single side length: square loop, one side = permiter / 4
dsl = Lr / 4; %director single side length: square loop, one side = permiter / 4
num_segments_per_side = 7;

fprintf(FID_nec,strcat('CM UDA-YAGI SQUARE LOOP ANTENNA','\n'));
fprintf(FID_nec,strcat('CE FIle Generated by MatLab','\n'));
fprintf(FID_nec,'GW %3i %3i %8.4f %8.4f %8.4f %8.4f %8.4f %8.4f %8.4f\n',1,num_segments_per_side,Sr,0,-Lr/2,Sr,0,Lr/2,R); %Reflector
fprintf(FID_nec,'GW %3i %3i %8.4f %8.4f %8.4f %8.4f %8.4f %8.4f %8.4f\n',2,num_segments_per_side,0 ,0,-Ls/2, 0,0,Ls/2,R); %Driven Element
fprintf(FID_nec,'GW %3i %3i %8.4f %8.4f %8.4f %8.4f %8.4f %8.4f %8.4f\n',3,num_segments_per_side,Sd,0,-Ld/2,Sd,0,Ld/2,R); %Director
%    Program Control Commands for NEC
fprintf(FID_nec,'EX %3i %3i %3i %3i %8.4f %8.4f\n',0,2,4,0,1,0);%Exitation Command wire 2 segment 4
fprintf(FID_nec,'FR %3i %3i %3i %3i %8.4f %8.4f\n',0,1,0,0,2400,0);%set freq to 299.8 MHz so wavelength will be 1m
fprintf(FID_nec,'RP %3i %3i %3i %3i %8.4f %8.4f %8.4f %8.4f\n',0,1,1,1000,90,0,0,0);%calculate gain at boresite
%=======Create file to pipe to NEC ===================================
%=======Run NEC======================================================
!NEC2Dx500 < input_CMD >tmp;
%=======Read Data form NEC output file===============================
[freq,Z,gain_t,E_theta,E_phi,n_freq_meas,run_time] = nec_read('NEC.out');