Software

My wrap up of interesting/practical machine learning of 2020

As part of my work (but also personal interest) I try to stay on top of not just the research side of machine learning (ML — what many folks think of as part of artificial intelligence) but also practical and interesting examples. On the professional side, this is because a lot of the software I’ve worked to create over the past 13 years has directly built in elements of machine learning. On a personal level, I’m interested particularly in the economic impacts of some of what machine learning will bring: which jobs will be automated away or significant portions will be automated (see: “Humans Need Not Apply”).

To some extent, this is a sort of dystopian view, and I won’t get into my thoughts on what can/should be done about it, but I do want to point out that it’s not just the simple, repetitive, or labor-intensive jobs that can be automated. Some of the most interesting developments in machine learning over the past couple years have been in creative tasks and tasks which most people associated with the type of thinking only a human could do.

In this blog, I’m going to outline some of the most interesting projects in ML/AI that fit the bill of doing creative tasks or logical reasoning and which have online demos or videos of the demos, most of which have launched roughly in the past 1-2 years. You can actually go play with many of these things yourself to get a sense of where certain aspects of ML are at the start of 2021.

Image Generation

One of the things that most people associate with “a thing only a human could do” is to generate art. Maybe it’s taking a lovely photo or painting something very creative. Here are some online demos that show that machines can now do this too (and how well they do so):

DeepArt allows you to upload a photo that you take and then apply a style. For example, here I am “painted” automatically by a machine in the style of Vincent Van Gogh in just minutes from a photo I took in seconds. There are a number of interesting implications to this, ranging from forgery to novel artwork creations to “allowing anyone to become an artist.”

GauGAN allows you to create photo-realistic images by just drawing an image like you would in MS Paint. Here’s an image I drew of a mountain and a hill in the ocean next to a beach with a cloud in just a few minutes and an example output:

It doesn’t take that much to imagine how you could use something like to create art of places that don’t/can’t exist and you can imagine combining strategies of something like this with something like DeepArt to create paintings that require very little skill and only a good imagination.

Dall-E: Taking the previous examples a step further, what if you could just type up what you want an image of? That’s what Dall-E does (fresh off the presses as of January 5, 2021). Dall-E can take text that you type and generate an image for it. Their examples on the blog do a lot to spark imagination and you can play around with a few examples. You can go to this link to see how something like this might generate an image of “an armchair in the shape of an avocado” or “a professional high quality emoji of a happy alpaca” or my favorite: “an illustration of a baby daikon radish in a tutu walking a dog.” This type of thing has the potential to radically change illustration and design work.

Audio/Music Generation

It’s not just visual art/artists that are going to be under the ML gun. ML can now make music too.

MuseNet allows computers to dynamically generate new music from a text prompt. For example, “use the first 5 notes of Chopin Op 1, No 9 as a basis to generate an entirely new song.”

The original piece
Computer generated piece

If you follow through to the MuseNet blog, you’ll see it can combine musical styles, generate new music from a few starting notes, or just give it a prompt like “Bluegrass piano-guitar-bass-drums.”

GPT-3 lyric generation. It doesn’t just stop at the tone generation, ML can generate lyrics now too. Here’s a song with lyrics written entirely by a machine:

Oh yeah, and ML can even sing your song for you. Here’s over 7000 songs that are generated/sung entirely by machines. Are they perfect? No — especially not rhyme scheme or some of the voice impersonations. But those are getting better too…

Impersonation

There are now a series of “this _____ does not exist” generators that you can explore. This person doesn’t exist, this cat doesn’t exist, this horse doesn’t exist, this artwork doesn’t exist, and hey, even this chemical doesn’t exist because why not. Reload each page to see a new one of these that don’t exist. Don’t find something category of thing you want to create? There’s a way to generate the new category here if you have some software knowhow. These seem fairly benign at the surface (who cares that a fake person/cat/horse/… image could be generated), but the implications to this type of thing go far beyond the amusing.

Want to impersonate another person’s voice? Generate your own audio as Dr Who or HAL 9000 at 15.ai.

Want to impersonate another person entirely as a video? All of the following are fabricated by having ML figure out how to generate a person’s lookalike with explicit facial expressions.

This is over 2 year old technology
https://www.youtube.com/watch?v=VhFSIR7r7Yo
Now it’s coming up in an entirely new way to generate satire and much more nefarious purposes
Definitely not the most PC, but that’s part of the craziness of ML-generated audio/video

The visual artifacts you see on these videos are going to disappear over time as computational power increases. Now imagine combining these 2 together: fake speech generated by a ML model of a famous person combined with a fake video of that person’s facial expressions and movements and you can see you don’t even need to hire a voice actor to create really serious challenges in categories of fake news, legal challenges against verbal contracts, etc.

Games

The classic example of ML beating a human lies in the realm of Chess, and more recently with a game computers were thought to be unable to play competitively, Go. But there are other games you may not think of.

ML can play pictionary, for example now live in your browser against you. Or hey, need help drawing your pictionary item? ML can help you complete your sketch. It can answer trivia questions. Or it can make up a dungeons and dragons game on the fly for you. Or check out these 3 videos of ML playing games you’ve probably played — and doing so better than you.

ML playing Mario
ML playing pool
My favorite: AI playing hide and seek

There are a number of interesting implications to this type of thing. One is that — if you play games — I imagine we’ll see much more complicated AI bots that play against you. But the AI playing hide and seek in particular is interesting because it involves some lightweight construction with specific goals. There are far more advanced versions of engineering and behavioral optimizations that exist outside of these demos. For example, in the past year, an AI pilot beat the top Air Force fighter pilots 5-0 in a dogfight simulation. You can see where “games” can quickly apply to real-world situations.

Other Professions

There are already entire companies set up to reduce time, improve the quality of output, or entirely replace people from the process of certain professions. Here are a few recent examples:

And there are a variety of other professions which already have working demos or systems in place to help.

This is not comprehensive and “academic” and certain other types of applications that are still too new to be available to the public in demo form aren’t here but I hope this helps show a bit of what’s come around in the past year or so in the world of ML in ways you can go exploring yourself!

Converting shapefiles into GeoJSON (for Elasticsearch)

This blog is all about finding, converting, and adding geo data from a commonly found format (shapefiles) to a format that can be used by json engines like Elasticsearch.

Finding Data

Open geo data can be found in a lot of places. Open city data is a great source of geo data in many jurisdictions. Searching for “open data <cityname>” can yield a lot of results. For example, https://datasf.org/opendata/ is San Francisco’s open data portal. Some jurisdictions will have dedicated GIS portals.

You’ll often find geo data in a few formats:

  1. A CSV of geo points
  2. Shapefiles for geo points
  3. Shapefiles for geo shapes
  4. WKT (Well-known text)
  5. GeoJSON

Elasticsearch natively supports WKT and GeoJSON and I’ll leave the work to import CSVs as an exercise to the reader for now. I’m going to focus this on how to convert/import shapefiles. Sometimes GeoJSON has a full FeatureCollection which is essentially a full layer of features, that, at least for Elasticsearch, needs to be converted to a list of individual Features (points and shapes). I will cover that here in Breaking a GeoJSON FeatureCollection up

In this example, we’ll use the counties in Atlanta, which can be found at http://gisdata.fultoncountyga.gov/datasets/53ca7db14b8f4a9193c1883247886459_67. You can go to Download -> Shapefile to get the shapefile zip file. In this counties example, this looks like this once I’ve unzipped:

$ ls  
Counties_Atlanta_Region.cpg Counties_Atlanta_Region.dbf Counties_Atlanta_Region.prj Counties_Atlanta_Region.shp Counties_Atlanta_Region.shx

The Wikipedia article on shapefiles has a breakdown of what all of these files actually contain, but we only need 1 file for the rest of this exercise: the .shp file, which contains the feature geometries itself.

Converting Shapefiles to GeoJSON

After you have a shapefile, the next step is to get the data into GeoJSON format.

Looking at the Atlanta counties again, the Counties_Atlanta_Region.shp file is the one that’s interesting to us. We’ll use a tool called ogr2ogr to convert .shp files to GeoJSON. ogr2ogr is part of GDAL and can be installed on a Mac if you have homebrew installed by using:

brew install gdal

Alternatively, you can install it manually. ogr2ogr is a wonderful tool to have on your laptop for using/testing geo data. Once you have it, continuing with our example, you should be able to run:

ogr2ogr -f GeoJSON -t_srs crs:84 output_counties.json Counties_Atlanta_Region.shp

This means:

  • -f GeoJSON: Output to GeoJSON format
  • -t_srs crs:84: There are a lot of coordinate reference systems. If you know you need data in a different coordinate system, you can override this with something else, though that’s going to generally be a highly specialized case. We’re telling ogr2ogr to use WGS84 on the output, which is the same system that GPS uses.
  • output_counties.json is the output file
  • Counties_Atlanta_Region.shp is the input file.

After you run this, you now have a GeoJSON file! You can inspect this file if you like to see what a GeoJSON file looks like if you’ve never had a look at one.

Breaking a GeoJSON FeatureCollection Up

If we look at the resulting GeoJSON file in the previous step, we see at the top of it:

“type”: “FeatureCollection”, “name”: “Counties_Atlanta_Region”, “crs”: { “type”: “name”, “properties”: { “name”: “urn:ogc:def:crs:OGC:1.3:CRS84” } }, “features”: [ …

Elasticsearch, like several systems that handle geo data, handles most GeoJSON, but FeatureCollections are essentially an array of objects that want to separately index for most purposes. FeatureCollections are sort of like a “bulk” dataset and we need to get individual points/shapes (Features) so that we can search, filter, and use them. In this example, the individual features are individual counties in Atlanta. This is where jq comes in handy.

jq can also be installed via homebrew:

brew install jq

Afterwards, you can do “select the array of features[] from output_counties.json, and output 1 feature per line” by

jq -c ‘.features[]’ output_counties.json

The -c flag means “compact” — it outputs 1 feature per line, which can be useful for what we’re about to do next…

Simultaneously Extracting Features and Converting to Bulk Format

We can do 1 step better than just extracting the features array by simultaneously converting the output to Elasticsearch’s bulk format with sed:

jq -c ‘.features[]’ output_counties.json | sed -e ‘s/^/{ “index” : { “_index” : “geodata”, “_type” : “_doc” } }\

/’ > output_counties_bulk.json && echo “” >> output_counties_bulk.json

The sed bit just adds a bulk header line (and a newline) per record and the echo "" >> output_counties_bulk.json makes sure the file ends in a newline, as this is required by Elasticsearch.

Change geodata to an Elasticsearch index name of your choosing.

Set Up Elasticsearch Mappings

At this point, I’d set up the Elasticsearch mappings for this “geodata” index (or whatever name you want to give it). Metadata related to the shape is often in .properties and geo shape data is often in .geometry. The county data here looks typical:

jq -c ‘.features[].properties’ output_counties.json

Shows us a list of properties like:

{“OBJECTID”:28,”STATEFP10″:”13″,”COUNTYFP10″:”013″,”GEOID10″:”13013″,”NAME10″:”Barrow”,”NAMELSAD10″:”Barrow County”,”totpop10″:69367,”WFD”:”N”,”RDC_AAA”:”N”,”MNGWPD”:”N”,”MPO”:”Partial”,”MSA”:”Y”,”F1HR_NA”:”N”,”F8HR_NA”:”N”,”Reg_Comm”:”Northeast Georgia”,”Acres”:104266,”Sq_Miles”:162.914993,”Label”:”BARROW”,”GlobalID”:”{36E2EA48-1481-44D7-91C9-7C51AC8AB9E9}”,”last_edite”:”2015-10-14T17:19:34.000Z”}

At this point, you can add any mappings around these fields and/or use an ingest node pipeline to manipulate the data prior to indexing. For now, I’m just going to set up the geo_shape field, but you can add extras.

PUT /geodata  
{  
  "settings": {  
    "number_of_shards": 1  
  },  
  "mappings": {  
    "_doc": {  
      "properties": {  
        "geometry": {  
          "type": "geo_shape"  
        }  
      }  
    }  
  }  
}

Bulk Loading Data to Elasticsearch

And at this point, you can bulk-load the data

curl -H “Content-Type: application/x-ndjson” -XPOST localhost:9200/_bulk –data-binary “@output_counties_bulk.json”

And then you can set up or reload Kibana index patterns for your index to make sure it shows up. Make sure to change any time filters to be appropriate with any visualizations you use. I often turn off the “time” field for quick demos as it can often be inconsistent/missing dates (as I found this Atlanta county data to be).

Recap / TL;DR

Get a shapefile

ogr2ogr -f GeoJSON -t_srs crs:84 your_geojson.json your_shapefile.shp

jq -c ‘.features[]’ your_geojson.json | sed -e ‘s/^/{ “index” : { “_index” : “your_index”, “_type” : “_doc” } }\

/’ > your_geojson_bulk.json && echo “” >> your_geojson_bulk.json

Set up your mappings. Often the following works, but you may need to check field names:

PUT /geodata  
{  
  "settings": {  
    "number_of_shards": 1  
  },  
  "mappings": {  
    "_doc": {  
      "properties": {  
        "geometry": {  
          "type": "geo_shape"  
        }  
      }  
    }  
  }  
}

curl -H “Content-Type: application/x-ndjson” -XPOST localhost:9200/_bulk –data-binary “@your_geojson_bulk.json”

Set up (or refresh) Kibana index patterns to include your_index.

Voila!

Baby Naming Project

tl;dr: you can get a new person/car/pet/whatever naming app now at https://github.com/eskibars/lentil-namer

As a first-time dad-to-be, one of the things I’ve noticed is that the first thing anyone asks you about your baby is if you know the gender. The second thing they ask you is if you have names picked out. The first one is kind of easy: you’ve either decided to learn about the baby via e.g. an ultrasound or you’ve chosen to not — but ultimately it’s not really up to you. The second one is harder: you’re given a choice to name someone that’s going to be something they probably will live with for the rest of their life.

One of the things I’ve learned in my time as a product manager is that naming things is the hardest thing in virtually any profession. Naming a human is doubly so. Agreeing on a name for a human between 2 separate humans… that’s the hardest: you have no chance for quorum without unanimity; the number of reasonable options for you to both agree on is virtually limitless; and it’s difficult to feel you actually each had an “equal” part in the name given that one of you is likely to propose some name first. This is where new baby naming apps like this have come up: swipe left/right to suggest you like a name and your partner will be able to swipe as well. If it’s a match, ta-da! You’ve found one you both like. No bias, no judgement on proposing “silly” names, and no feeling bad about shooting down your partners’ name idea that you think is silly.

However, I got frustrated with the options that were out there for this problem. They all require some application installation with sign-ups and unknown data sharing practices for what I felt like was a pretty simple problem. They also often didn’t work well either with iOS or Android or didn’t allow you to participate across the two. And finally, many didn’t give you a lot of options like choosing the origins of the name or showing you the meaning. Perhaps most importantly, I was interested in trying my hand at writing a small application that took advantage of the block-max weak-and (WAND) that we released as part of Elasticsearch 7. It’s quite nifty in that it lets you bias certain results and return them with speeds that can be orders of magnitude faster than 6.x or earlier as long as you don’t need a total count of hits at result time. (In fact, there is no bound as to how much faster this algorithm can be.)

So with a baby on the way I set about to try my hand at creating something useful with block-max WAND: a way for my wife and I to name our future child without bias between ourselves!

“Hey Michelle, I’ve got the best name for our child’s name!” – me

The software

I wrote a bit of software which attempts to overcome some of the limitations I’ve seen in these apps. It:

  1. Runs as a web application which means you can load it in any browser (including mobile) and swipe left/right. No app install / you can control your own data
  2. There’s no sign-up required. You start it up, start a project, and you’re good to go. No data-sharing with unknown 3rd party companies. You can host it yourself.
  3. Lets you bias the origin and the gender of the baby
  4. Lets you review and edit any of your past answers

The flow looks something like this:

Then, you can set the settings you like and share the project with your significant other
Finally, you get to swiping: swipe left if you dislike the name and right if you like it!

Once you swipe left/right, you’ll find the names in the the answer review / mutual matches and can switch your answers if you so please. If you find them in the mutual matches, that may be a good one to discuss with your partner!

The primary things that makes this stand out to previous applications I’ve written using Elasticsearch is that it

  1. Uses rank_feature fields, which are new to Elasticsearch 7 and really make the performance scream
  2. Is built on flask, which I have no experience with until now. I also have virtually no Python experience, so that was maybe even a more extreme difference
  3. I’ve licensed it as AGPL and released it on GitHub

What’s in a name?

The name for this comes from yet another thing we found: apps that tell you the size of your baby as it progresses. For example, “this week, your baby is the size of a grape” or “this week, your baby’s foot is the size of a raspberry.” I came up with the idea for this right around the time we found out that our baby was the size of a lentil. So it’s a lentil namer, and hence the GitHub project name 🙂

What’s next?

We’ve used this (and some old fashioned suggestions) to decide on a number of names that we both like and now we’re down to picking our name. We don’t plan to reveal until after it’s born. Sorry friends & family 🙂

Tracking Nearby Ships with SDR, Elasticsearch, Kibana, and Vega Visualizations

Intro

A year ago, I moved nearer to Golden Gate Bridge and seeing the ships come in made me wonder how they’re tracked.  I had seen websites like this one, and was curious how they worked.  It turns out the key is in the title of the page “AIS Marine Traffic.”  The traffic comes from the Automatic Identification System, or AIS.  Ships broadcast their position on a frequency around 162MHz.  You can go out and buy a direct receiver and decoder of this data, but I recently had renewed interest in software-defined radio and was looking to do a project with it.  If you haven’t heard of it before, it’s super neat!  The idea is this: historically in many cases, somebody working with radio signals has needed to build a specific antenna and a specific modulator/demodulator in hardware.  Take, for example, a traditional car radio that can pick up AM and FM radio signals:

RF Bands

Image source: GSU

As shown here, FM radio is broken into 200kHz bands, starting at 88.1 and going to 108.1 MHz in the United States.  When you “tune into” a radio like “station 98.1,” you’re tuning into 98.1MHz.  You may have noticed that radio stations are increments of 0.2 (98.1, 98.3, 98.5, …) in the US — there are no FM radio stations that end in even numbers because each station gets 200kHz, or 0.2MHz, so you see 98.1+0.2+0.2+0.2+… and the frequencies will thus always be odd.  Also worth noting: 98.1 is technically more like everything from 98.0-98.2 and they’re just talking about the “center frequency” of the transmission.  When you turned old radio knobs, you were generally changing the electro/physical properties of a tuning capacitor, which caused a circuit in your car or home to move the center frequency as you desired.  An electrical engineer that was designing such a radio and circuit needed to know that they were designing an antenna for 88.1-108.1MHz, what the modulation characteristics were of the signal, etc.  They’d design a physical circuit that met those properties and you’d get it in your car or home.  If, instead of FM radio, they were designing for TV, there would be a different set of frequencies and rules they’d design for.  Nowadays, though, you can buy a single integrated circuit that can listen to a huge range of frequencies and tune the bandwidth and demodulation all through a software interface!  What’s really neat is that you can get the whole thing wrapped into a single USB stick that works for Linux, Windows, or Mac for about $10-50, depending on what you buy and where you buy it.  So that’s what I did!

This post is my attempt to recreate a website like marinetraffic.com using a $20 piece of hardware, Elasticsearch, Kibana, and a few hundred lines of code.

Tuning In

I poked around and found a variety of AIS-related projects, including a C project that can directly decode AIS packets from a SDR.  I pulled that down and compiled it to see what it did.  That project publishes the data as a TCP or UDP stream of messages with a very special formatting.  You can also get it to output to the shell.  For example:

./rtl_ais -n
Edge tuning disabled.
DC filter enabled.
RTL AGC disabled.
Internal AIS decoder enabled.
Buffer size: 163.84 mS
Downsample factor: 64
Low pass: 25000 Hz
Output: 48000 Hz
Found 1 device(s):
0: Realtek, RTL2838UHIDIR, SN: 00000001

Using device 0: Generic RTL2832U OEM
Detached kernel driver
Found Rafael Micro R820T tuner
Log NMEA sentences to console ON
AIS data will be sent to 127.0.0.1 port 10110
Tuner gain set to automatic.
Tuned to 162000000 Hz.
Sampling at 1600000 S/s.
!AIVDM,1,1,,A,15N0;wS002G?MS6E`tCs3V3N0<2D,0*22

The first lines are all the hardware initialization.  The last line is a message from a nearby ship!  It’s not super consumable as-is: the message is in AIVDM format, so I need something to decode that.  By way of this example, we can see in the comma-separated output:

  • The !AIVDM header
  • The number of fragments (in this case, 1)
  • Which fragment number this message is (in this case, 1)
  • A sentence ID for multi-sequence messages (in this case, it’s empty — there’s only 1 sentence)
  • The radio channel code (A in this case — rtl_ais listens to both A and B channels)
  • The data payload (15N0;wS002G?MS6E`tCs3V3N0<2D in this case)
  • A checksum, which there are some rules around what the checksum looks like (0*22 in this case)

That data payload thing is probably the most interesting part of the message.  In order to process more complex messages, we’ll need to do something that can parse fragment numbers and combine them together, but we can get started on writing a parser with just this.  According to the spec:

The data payload is an ASCII-encoded bit vector. Each character represents six bits of data. To recover the six bits, subtract 48 from the ASCII character value; if the result is greater than 40 subtract 8. According to [IEC-PAS], the valid ASCII characters for this encoding begin with “0” (64) and end with “w” (87); however, the intermediate range “X” (88) to “\_” (95) is not used.

Fortunately, there’s even an existing python library for doing AIS message decoding.  A quick test with this message shows:

$ python3
Python 3.5.3 (default, Jan 19 2017, 14:11:04)
[GCC 6.3.0 20170124] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ais
>>> from pprint import pprint
>>> pprint(ais.decode('15N0;wS002G?MS6E`tCs3V3N0<2D',0))
{u'cog': 283.0,
u'id': 1L,
u'mmsi': 367004670L,
u'nav_status': 3L,
u'position_accuracy': 0L,
u'raim': False,
u'received_stations': 148L,
u'repeat_indicator': 0L,
u'rot': 0.0,
u'rot_over_range': False,
u'slot_timeout': 3L,
u'sog': 0.20000000298023224,
u'spare': 0L,
u'special_manoeuvre': 0L,
u'sync_state': 0L,
u'timestamp': 47L,
u'true_heading': 193L,
u'x': -122.45146166666666,
u'y': 37.818158333333336}

There are a lot of ways to get that data into a consumable location.  You can run one of a variety of applications that can show you the live output.  I like Elasticsearch, so I wanted to index my data there and I didn’t see any projects that did.  The best way would probably be to modify one of these programs to output to Elasticsearch instead of UDP packets or wherever else they shovel data, but I figured it would be easy to just write a little UDP server that listens to port 10110 (the default rtl_ais delivery port), formats the messages, and ships (get it, ships?) them off.  First, we need to see how rtl_ais sends the data.  I could read the source code, but it’s probably faster for me to write a simple UDP server in Python that listens for a buffer and prints what we get:

import socketserver
import ais
from pprint import pprint

class AISHandler(socketserver.DatagramRequestHandler):
  def handle(self):
    data = self.rfile.readline()
    pprint(data)

if __name__ == "__main__":
  server = socketserver.UDPServer(('', 10110), AISHandler)
  server.serve_forever()

Which we can run and see:

$ python3 rtl_ais_server.py
b'!AIVDM,1,1,,A,15MvK40P1go?WQ`E`eL<Ngwh28Pu,0*4F\r\n'
b'!AIVDM,1,1,,B,15Ngp6PPCpG?bs<E`oN:v8Op0hQp,0*1E\r\n'

So basically, rtl_ais sends the raw text followed by a carriage return and newline.  That’s easy enough.  I just need to keep track of multi-packet sequences, decode them, format them how I want, and send them off to Elasticsearch so we can visualize them in Kibana.  I’ll skip past the boring interim code and give you the final result.  The one tricky bit is keeping track of multi-sentence messages.  There are a few situations that complicate multi-sentence messages:

  • At least in theory, I suppose a message could arrive out of order.  I don’t know if that happens in practice or not: I haven’t seen it yet.  The only explanations I could see why they would is that faulty encoding/decoding software results in non-sequential packets.  Or a message traveling backwards in time (queue spooky ghost ship music).
  • It’s likely that you’ll receive only parts of messages, and that certainly needs to be accounted for.  For example, maybe my antenna hears sentences 1, 2, and 4 of 4 total sentences.
  • It’s possible you’ll receive messages from different AIS radios broadcasted in an interleaved message set.  That is, if you have boats 1 and 2 as B1, B2 and you have sentences 1 and 2 as S1 and S2, it’s possible you’ll receive B1S1,B2S1,B1S2,B2S2.  Because this is rare, some software I’ve seen ignores this.  Not me though!
  • Ships report different data components at different times.  For example, a ship may report its position and heading in one packet and it may report its destination and ETA in a different packet minutes later.  If we want to associate all of the data surrounding a ship, we’ll have to keep track of some kind of “ship” record that will need updating.

A simple solution for the second and third of these is just to assume you have some relatively small queue of messages with their sentence ID attached.  Evict messages from the queue if you get too many in order to keep from a huge queue.  Python has a convenient OrderedDict class that shares some properties of a queue (namely, that you can evict/pop items at will in a FIFO fashion).  For the third, since I’m storing the data in Elasticsearch, I can use doc_as_upsert to merge ship information together into a single ship object.  Altogether, the code looks like this:

import socketserver
import ais
from elasticsearch import Elasticsearch
from datetime import datetime
from collections import OrderedDict
from pprint import pprint

class AISHandler(socketserver.DatagramRequestHandler):
  _messagemap = OrderedDict()
  _es = Elasticsearch()

  def indexDoc(self, doc):
    timestamp = datetime.utcnow()
    dmy = timestamp.strftime('%Y-%m-%d')
    body = doc
    body['curtime'] = timestamp
    print(body)
    if ('x' in body and 'y' in body):
      body['location'] = {'lon': body['x'], 'lat': body['y']}
    self._es.index(index="ais-" + dmy, doc_type="_doc", body=body)
    self._es.update(index='ships-seen',doc_type='_doc',id=body['mmsi'],
                body={ "doc": body, "doc_as_upsert": True })

  def handle(self):
    data = self.rfile.readline()
    while data:
      data = data.strip().decode("utf-8")
      ais_arr = data.split(',')
      num_fragments = int(ais_arr[1])
      fragment_number = int(ais_arr[2])
      sentence_id = ais_arr[3]
      channel = ais_arr[4]
      data_payload = ais_arr[5]
      if num_fragments == 1:
        # this is a single-fragment payload, so we can decode it straight away
        decoded_message = ais.decode(data_payload,0)
        self.indexDoc(doc=decoded_message)
      elif fragment_number < num_fragments:
        # this is a multi-fragment payload and we haven't yet received them all.  Add this to any existing sentences
        if sentence_id in self._messagemap:
          self._messagemap[sentence_id] = self._messagemap[sentence_id] + data_payload
        else:
          # if we have too many items in queue, we'll force one out
          if (len(self._messagemap.keys()) > 100):
            self._messagemap.popitem(last=True)
          self._messagemap[sentence_id] = data_payload
      else:
        # we've hit the end of the multi-fragment data
        self._messagemap[sentence_id] = self._messagemap[sentence_id] + data_payload
        decoded_message = ais.decode(self._messagemap[sentence_id], num_fragments)
        self._messagemap.pop(sentence_id, None)
        self.indexDoc(doc=decoded_message)
      data = self.rfile.readline()


if __name__ == "__main__":
  server = socketserver.UDPServer(('', 10110), AISHandler)
  server.serve_forever()

Before we run this program, it does require first setting up our Elasticsearch index templates.  I don’t know a lot about all of the messages types that the AIS system could send, but fortunately I didn’t need to.  All of the fields I’ve seen seem to be interpreted correctly via Elasticsearch’s dynamic mapping other than the location (geo_point needs to be manually defined), but we should probably define at least 3 key ones just in case:

PUT /_template/ais
{
  "index_patterns": ["ais-*", "ships-*"],
  "mappings": {
    "_doc": {
      "properties": {
        "name": {
          "type": "text"
        },
        "location": {
          "type": "geo_point"
        },
        "curtime": {
          "type": "date"
        }
      }
    }
  },
  "settings": {
    "number_of_shards": 1
  }
}

Once I start up rtl_ais_server.py I see some documents flow into Elasticsearch:

{
  "_index": "ais-2018-08-17",
  "_type": "_doc",
  "_id": "1-oPRmUBdPKlebHLixCO",
  "_version": 1,
  "_score": 2,
  "_source": {
    "timestamp": 61,
    "raim": false,
    "assigned_mode": false,
    "aton_status": 0,
    "y": 37.80078833333334,
    "off_pos": false,
    "fix_type": 7,
    "x": -122.375045,
    "position_accuracy": 0,
    "dim_d": 0,
    "id": 21,
    "aton_type": 19,
    "spare": 0,
    "dim_b": 0,
    "location": {
      "lon": -122.375045,
      "lat": 37.80078833333334
    },
    "repeat_indicator": 0,
    "curtime": "2018-08-17T04:05:47.615296",
    "virtual_aton": true,
    "mmsi": 993692027,
    "dim_a": 0,
    "dim_c": 0,
    "name": "SF OAK BAY BR VAIS D@"
  },
  "fields": {
    "curtime": [
      "2018-08-17T04:05:47.615Z"
    ]
  }
}

The last step I wanted to do here is to build a visualization.  There are a few ways to do this, but what I thought would be interesting was to try my hand at a Vega visualization, which was released in version 6.2 of Kibana.  Coming into vega cold (clearly I’m really not in tune with front-end frameworks these days!), a few things became clearer to me after working on this:

  • The Vega declaration syntax is definitely not an easy thing to get a hang of!
  • Vega is super powerful!
  • Vega is super difficult to debug if you don’t understand the structure of how visualizations are executed.
  • There seem to be a variety of limitations.  For example, it appears that most of the marks it has are not rotatable, which I experienced (so I just chose a UTF8 character that).  I’ve also had some difficulties with actually doing things like changing the cursors on hover, and I’m not sure what it actually takes to have non-system type tooltips.  I think you can probably attach visibility of some text to another mark.  But I’ve spent almost no time so far trying to figure out why yet.

Anyway, after a lot of fiddling, the following works.  It seems that text is the only rotatable “mark” right now, so I had to choose a character that kind of looks like I imagined a ship on a map to look.

{
   "$schema": "https://vega.github.io/schema/vega/v3.0.json",
   "config": {
     "kibana": {
       "type": "map",
       "latitude": 37.77,
       "longitude": -122.45,
       "zoom": 12,
       "mapStyle": "default",
       "minZoom": 5,
       "maxZoom": 17,
       "zoomControl": true,
       "delayRepaint": true,
       "controlsDirection": "horizontal"
     }
   },
   "data": [
     {
       "name": "ships",
       "url": {
         "index": "ships-*",
         "body": {
           "query": {
            "bool": {
              "must": [
                { "exists": { "field": "location" } },
                { "exists": { "field": "mmsi" } },
                { "range": {"curtime": {"%timefilter%": true}} }
              ]
            }
           },
           "size": 10000
         }
       }
       "format": { "type": "json", "property": "hits.hits" },
       "transform": [
          {
            "type": "geopoint",
            "projection": "projection",
            "fields": [
              {"expr": "datum._source.location.lon"},
              {"expr": "datum._source.location.lat"}
            ],
            "as": ["x", "y"]
          },
          {
            "type": "formula",
            "expr": "if (datum._source.dim_a != null && datum._source.dim_b != null && datum._source.dim_c != null, log(1 + datum._source.dim_a + datum._source.dim_b + datum._source.dim_c) * 10, 100)"
            "as": "shipsize"
          },
          {
            "type": "formula",
            "expr": "datum._source.mmsi % 20"
            "as": "shipcolor"
          },
          {
            type: "formula",
            expr: "if (datum._source.name != null, datum._source.name, 'Unnamed')",
            as: "shipname"
          },
          {
            type: "formula",
            expr: "if (datum._source.true_heading != null, datum._source.true_heading, 'Unnamed')",
            as: "heading"
          }
       ],
     },
   ],
   scales: [
    {
      "name": "shipcolorscale",
      "type": "ordinal",
      "domain": {"data": "ships", "field": "shipcolor"},
      "range": { "scheme": "category20" }
    }
  ],
    "marks": [
      {
          "type": "text",
          "from": {"data": "ships"},
          "encode": {
            "update": {
              "x": {"signal": "datum.x"},
              "y": {"signal": "datum.y"},
              "text": { "value": "⏏" },
              "fontSize": { "signal": "datum.shipsize" },
              "fill": { "scale": "shipcolorscale", "field": "shipcolor" },
              "align": { "value": "center" },
              "fontWeight": { "value": "bold" },
              "angle": { "signal": "datum.heading" },
              "tooltip": { "signal": "datum.shipname" },
            }
            "hover": {
              "fill": {"value": "red"},
              "tooltip": { "signal": "datum.shipname" }
            }
          }
      }
    ]
}

And it looks pretty good!  On hovering, I can see the S.S. Jeremiah O’Brien is safely docked in the bay.

And I can also see that one of our pilot boats is heading out to sea to guide a ship in.

A couple notes about the visualization:

  • I’ve sized the icons by the size of the ship.  Big ship = big icon.
  • The colors of the ships are a hash of the ship’s ID (MMSI) modulo 20, and clearly there are some collisions in such a small space.  I couldn’t find an easy way to do a kind of deterministic color from a ship’s MSSI in a larger space than 20 — I think I’m just missing something though.
  • This uses the ships-seen index, which only keeps the most recent position/heading of the ship due to the upsert.

A visualization of ais-* shows us all of the events of the incoming/outgoing vessels, which is also interesting, because it allows you to see the sort of trail of the boats.

I’m both interested further in the direction of other types of digital data radio signals as well as different types of signals wholesale.

That about wraps it up.  You can grab all the code for this project on GitHub.  Sea you next time!

Combining Tasker with Elasticsearch and Kibana for Personal Dashboards

Tasker is probably my favorite Android app.  It gives you total control over your device.  Not to mention there’s a great community that develops all sorts of interesting tasks/integrations.

I was recently on a 10+ hour flight where the entertainment system wasn’t working and my laptop was inaccessible so I was kind of bored.  But I had my phone/Tasker, and I got the idea to write a little IoT type of integration while I was on the plane.  Basically, I wanted to create an interesting integration of shipping the sensors from my phone off to an Elasticsearch cluster.  With this, I could:

  • Keep track of where I’ve been, including my run paths and any hikes
  • Track and visualize my pedometer over days
  • Remember what that song was I was listening to on Play Music/Pandora/Spotify/etc but forgot the name to
  • See graphs of my battery usage over days so I can see how the profile changes as I add/update/remove apps, as I tether, and as my phone ages
  • Not be bored for a while on the plane

Building Kibana Dashboards

Before I walk through how I created this and give you the downloadable task, I just want to show these dashboards, which I really liked:

Here, I’ve plotted the current battery % on the left and using a derivative pipeline aggregation (available in Elasticsearch and Kibana), we can see how much much the device charged or how much it drew down.  We can use this to plot the charge speed of different chargers or see how much apps use.  Because I also have things like the screen brightness and network information in the same index object, I can actually plot them against one another to see how much the screen on time/brightness uses battery over a multi-day cycle.

I can plot my runs and corresponding step counts:

Or where I’ve been the last 7 days:

Or even the last 90 days:

I was hoping to get a signal strength map of my cellphone signal, but it looks like Tasker may always be returning “0” even when I have full strength.  I may have set that up wrong, so if you have info on how to get it to return non-zero values, let me know!

Anyway, on to the code.

Tasker Javascript Tasks: The Framework

Tasker has Javascript tasks which are quite interesting: you can access variables and you have access to things like XMLHttpRequest.  That makes it pretty easy to get the Tasker variables and then send them across to a RESTful engine like Elasticsearch.

A few hundred miles across the Atlantic later, I had something like this:

var esServer = par[0];
if (esServer.substr(esServer.length - 1) !== "/") {
	esServer = esServer + "/";
}
setLocal('%esserver', esServer);
setLocal('%authinfo', par[1]);
var indexPrefix = "tasker";

var d = new Date();
var dateStr = d.toISOString().slice(0,10);

var intVars = { battery: '%BATT', cell_signal_strength: '%CELLSIG',
									  display_brightness: '%BRIGHT', light_level: '%LIGHT',
										uptime: '%UPS', free_memory: '%MEMF', pedometer: '%STEPSTAKEN'
									};
var doubleVars = { altitude: '%LOCALT', magnetic_strength: '%MFIELD', temperature: '%TEMP' };
var booleanVars = { bluetooth_on: '%BLUE', locked: '%KEYG', muted: '%MUTED', speakerphone: '%SPHONE',
									  wifi_enabled: '%WIFI', wimax_enabled: '%WIMAX', screen_on: '%SCREEN', roaming: '%ROAM',
airplane_mode: '%AIR'
									};
var keywordVars = { bluetooth_on: '%BLUE', cell_network: '%TNET', device: '%DEVID', device_id: '%DEVTID' };

postData = { timestamp: d.toISOString() };
template = { timestamp: { type: 'date' } };

for (var i in keywordVars) {
  var res = global(keywordVars[i]);
  if (typeof(res) !== 'undefined') {
  	postData[i] = res;
  }
  template[i] = { type: 'keyword' };
}

for (var i in booleanVars) {
  var res = global(booleanVars[i]);
  if (typeof(res) !== 'undefined' && res !== '') {
  	postData[i] = (res === 'true' || res === 'on');
  }
  template[i] = { type: 'boolean' };
}

for (var i in intVars) {
  var res = global(intVars[i]);
  if (typeof(res) !== 'undefined' && res !== '') {
  	if (res >= 0) {
    	postData[i] = parseInt(res);
    }
  }
  template[i] = { type: 'long' };
}

for (var i in doubleVars) {
  var res = global(doubleVars[i]);
  if (typeof(res) !== 'undefined' && res !== '') {
  	if (res !== 0) {
    	postData[i] = parseFloat(res);
    }
  }
  template[i] = { type: 'double' };
}

template['location'] = { type: 'geo_point' };
if (getLocation('any', true, 30)) {
  var loc = global('%LOC');
	var latlon = loc.split(',');
	postData['location'] = { 'lat': parseFloat(latlon[0]), 'lon': parseFloat(latlon[1]) };
}

template['song'] = { type: 'text' };
template['music_playing'] = { type: 'boolean' };
if (global('%ARTIST') !== '' && global('%TRACK') !== ''
   ) {
	  postData['music_playing'] = (global('%ISPLAYING') === 'true' || global('%ISPLAYING') === true);
	  if (postData['music_playing'] === true) {
		  postData['song'] = global('%ARTIST') + ' - ' + global('%TRACK');
		}
} else {
	if (global('%ISPLAYING') === 'false' || global('%ISPLAYING') === false) {
	  postData['music_playing'] = false;
	}
}

var wifii = global('%WIFII');
if (typeof(wifii) !== 'undefined' && res !== '') {
	var wifiarr = wifii.split("\n");
	if (wifiarr[0] === '>>> CONNECTION <<<') {
		postData['wifi_name'] = (wifiarr[2]).slice(1,-1);
	}
	template['wifi_name'] = { type: 'keyword' };
}

var jsondocstring = JSON.stringify(postData);
var indexName = indexPrefix + '-' + dateStr;
indexType = 'doc';
setLocal('%jsondocbulkheader', JSON.stringify({ "_index": indexName, "_type": indexType}));
setLocal('%jsondocstring',jsondocstring);

var xhrTemplate = new XMLHttpRequest();
xhrTemplate.open("PUT", esServer + "_template/" + indexPrefix, false);
xhrTemplate.setRequestHeader("Content-type", "application/json");
if (typeof(par[1]) !== 'undefined') {
	xhrTemplate.setRequestHeader("Authorization", "Basic " + btoa(par[1]));
}
var templateString = JSON.stringify({ template: indexPrefix + '-*', mappings: { doc: { properties: template } } });

try {
	xhrTemplate.send(templateString);
} catch (e) { }
try {
	var xhrDoc = new XMLHttpRequest();
	xhrDoc.open("POST", esServer + indexName + '/' + indexType, false);
	xhrDoc.setRequestHeader("Content-type", "application/json");
	if (typeof(par[1]) !== 'undefined') {
		xhrDoc.setRequestHeader("Authorization", "Basic " + btoa(par[1]));
	}

	xhrDoc.send(jsondocstring);
	setLocal('%sentdoc','1');
} catch (e) { }
exit();

Let’s walk through what this does.

In the beginning:

var esServer = par[0];
if (esServer.substr(esServer.length - 1) !== "/") {
	esServer = esServer + "/";
}
setLocal('%esserver', esServer);
setLocal('%authinfo', par[1]);
var indexPrefix = "tasker";

var d = new Date();
var dateStr = d.toISOString().slice(0,10);

Here, we set up an index prefix (“tasker”) and we’ll get 2 parameters passed in: the Elasticsearch server host/port (in par[0], which goes into %esserver), and any authentication in the form of username:password (in par[1], which goes into %authinfo).So reminder: when we add this Javascript task, we’ll need to pass those bits in.

Next:

var intVars = { battery: '%BATT', cell_signal_strength: '%CELLSIG',
									  display_brightness: '%BRIGHT', light_level: '%LIGHT',
										uptime: '%UPS', free_memory: '%MEMF', pedometer: '%STEPSTAKEN'
									};
var doubleVars = { altitude: '%LOCALT', magnetic_strength: '%MFIELD', temperature: '%TEMP' };
var booleanVars = { bluetooth_on: '%BLUE', locked: '%KEYG', muted: '%MUTED', speakerphone: '%SPHONE',
									  wifi_enabled: '%WIFI', wimax_enabled: '%WIMAX', screen_on: '%SCREEN', roaming: '%ROAM',
airplane_mode: '%AIR'
									};
var keywordVars = { bluetooth_on: '%BLUE', cell_network: '%TNET', device: '%DEVID', device_id: '%DEVTID' };

Here, we gather a list of hardware/software variables that Tasker defines into a variety of types.  For example, we put %TEMP into a JSON object of other double variables.  It’s important that we do this because we need to know what these types are so we can make some assumptions around how we process each of the variables.  For example, looking at Tasker’s variables page, we can see that the variable representing if the device is in Airplane mode (%AIR) returns as “on” or “off” while we really want to process them as booleans (true or false).  Many numbers in Tasker return -1 if the value is not defined or cannot be determined, where we really want that to be an empty value in storage.  So we store our expected types in a JSON object to check later.

The next set of for loops does 2 things:

for (var i in keywordVars) {
  var res = global(keywordVars[i]);
  if (typeof(res) !== 'undefined') {
  	postData[i] = res;
  }
  template[i] = { type: 'keyword' };
}

for (var i in booleanVars) {
  var res = global(booleanVars[i]);
  if (typeof(res) !== 'undefined' && res !== '') {
  	postData[i] = (res === 'true' || res === 'on');
  }
  template[i] = { type: 'boolean' };
}

for (var i in intVars) {
  var res = global(intVars[i]);
  if (typeof(res) !== 'undefined' && res !== '') {
  	if (res >= 0) {
    	postData[i] = parseInt(res);
    }
  }
  template[i] = { type: 'long' };
}

for (var i in doubleVars) {
  var res = global(doubleVars[i]);
  if (typeof(res) !== 'undefined' && res !== '') {
  	if (res !== 0) {
    	postData[i] = parseFloat(res);
    }
  }
  template[i] = { type: 'double' };
}

First, it tries to convert, parse, or drop the value based upon the aforementioned logic.  Next, it adds the variables it finds to an Elasticsearch template variable we use later.

Next, we have a bit of custom logic for parsing the current location (which comes in a single string of the form “Lat,Lon”), the currently playing song (which comes through Media Utilities), and the WiFi connection info to get the current network name (the %WIFII variable has a rather bizarre format: the 3rd line contains the network name if the first line is “>>> CONNECTION <<<“):

template['location'] = { type: 'geo_point' };
if (getLocation('any', true, 30)) {
  var loc = global('%LOC');
	var latlon = loc.split(',');
	postData['location'] = { 'lat': parseFloat(latlon[0]), 'lon': parseFloat(latlon[1]) };
}

template['song'] = { type: 'text' };
template['music_playing'] = { type: 'boolean' };
if (global('%ARTIST') !== '' && global('%TRACK') !== ''
   ) {
	  postData['music_playing'] = (global('%ISPLAYING') === 'true' || global('%ISPLAYING') === true);
	  if (postData['music_playing'] === true) {
		  postData['song'] = global('%ARTIST') + ' - ' + global('%TRACK');
		}
} else {
	if (global('%ISPLAYING') === 'false' || global('%ISPLAYING') === false) {
	  postData['music_playing'] = false;
	}
}

var wifii = global('%WIFII');
if (typeof(wifii) !== 'undefined' && res !== '') {
	var wifiarr = wifii.split("\n");
	if (wifiarr[0] === '>>> CONNECTION <<<') {
		postData['wifi_name'] = (wifiarr[2]).slice(1,-1);
	}
	template['wifi_name'] = { type: 'keyword' };
}

Finally:

var jsondocstring = JSON.stringify(postData);
var indexName = indexPrefix + '-' + dateStr;
indexType = 'doc';
setLocal('%jsondocbulkheader', JSON.stringify({ "_index": indexName, "_type": indexType}));
setLocal('%jsondocstring',jsondocstring);

var xhrTemplate = new XMLHttpRequest();
xhrTemplate.open("PUT", esServer + "_template/" + indexPrefix, false);
xhrTemplate.setRequestHeader("Content-type", "application/json");
if (typeof(par[1]) !== 'undefined') {
	xhrTemplate.setRequestHeader("Authorization", "Basic " + btoa(par[1]));
}
var templateString = JSON.stringify({ template: indexPrefix + '-*', mappings: { doc: { properties: template } } });

try {
	xhrTemplate.send(templateString);
} catch (e) { }
try {
	var xhrDoc = new XMLHttpRequest();
	xhrDoc.open("POST", esServer + indexName + '/' + indexType, false);
	xhrDoc.setRequestHeader("Content-type", "application/json");
	if (typeof(par[1]) !== 'undefined') {
		xhrDoc.setRequestHeader("Authorization", "Basic " + btoa(par[1]));
	}

	xhrDoc.send(jsondocstring);
	setLocal('%sentdoc','1');
} catch (e) { }
exit();

In this block, we form the document (and the index template!) into a string via JSON.stringify(), we upload them both to Elasticsearch.  Tasker seemed to have some issues with the asynchronous nature of XMLHttpRequest, though it’s possible I was doing something wrong, so I converted them into synchronous calls.  At the very end, we set “%sentdoc” to 1 to indicate to the rest of the task that we’ve successfully sent the document.  If this isn’t set to 1, we know we didn’t have Internet access or some other barrier to uploading the metrics.  In a subsequent action in the task, we’ll write out the bulk header and document to a file on disk so we can upload all the data we missed.

Putting It Into A Project

This project has 4 profiles:

  1. Every 10 steps, run a task that sets %STEPSTAKEN to 10+%STEPSTAKEN
  2. Every night at midnight, run a task that sets %STEPSTAKEN to 0
  3. Every 5 minutes, run the “Update Info” task
  4. Every time Media Utilities has a new track, run the “Update Info” task

That leaves me to the “Update Info” task.  That simply grabs the current values of %ISPLAYING, %TRACK, and %ARTIST, and then runs the “Send to Elasticsearch” task.

The “Send to Elasticsearch” task is a bit more complicated.  It:

  • Sets %sentdoc to 0 (we’ll use this later!)
  • Runs the Javacriptlet above
  • If %sentdoc isn’t set to 1 at the end of the Javascript, we write out %jsonbulkheader and then %jsondocstring to a file Tasker/tasks/taskQueue.json
  • Else if %WIFII contains >>> CONNECTION <<< this means we’re on a wifi network and we can bulk upload the taskQueue contents.  So we read that into %jsonqueue and send it up with another (much shorter) Javascript (and then delete the taskQueue.json file so we don’t resend it):
var jsondocstring = jsonqueue;

var xhrDoc = new XMLHttpRequest();
xhrDoc.open("POST", esServer + '_bulk', false);
xhrDoc.setRequestHeader("Content-type", "application/json");
if (typeof(authinfo) !== 'undefined') {
	xhrDoc.setRequestHeader("Authorization", "Basic " + btoa(authinfo));
}

xhrDoc.send(jsondocstring);

setLocal('%jsonqueue','✓');
exit();

And that’s it!

Putting It All Together

I’ve uploaded the entire project here.  You can download them directly and import the project or just have a look over the output and see how it’s tied together and make your own.  It does need the %ELASTICSEARCH and %ELASTICSEARCH_AUTH variables to be set and you’ll see it needs Media Utilities to be installed as well.  I had fun making this and I’m hoping to add more to it in the future!

Connecting Domoticz and Elasticsearch

I use Domoticz for my home automation needs.  I’ll just start off by saying it has its pros and cons.

Pros:

  • It seems to have pretty wide device support
  • It allows for a completely offline home-automation setup (for those of us that don’t want our every home action to go through a 3rd party server, some of which have proven…problematic)
  • It has an API (though the output format is…not so great — more on that later)
  • You can create your own virtual devices (e.g. I use this to create a virtual device for when my phone connects to my home router)
  • It can run natively on a RaspberryPi (or a BananaPi or BananaPro if you fork the project and make some minor modifications).  Together with a RazBerry, this makes for a great self-contained unit.
  • It’s pretty scriptable.  They’ve embedded Blockly and both a Lua and Python API

Cons:

  • The default UI is pretty ugly
  • Not a lot of great mobile support
  • Integration with 3rd parties is limited.  For example, the community has made an InfluxDB/Grafana integration, but there doesn’t seem to be any Elasticsearch integration at this time
  • The API is very string-y: many components return a string like “3.8A” instead of value: 3.8 + unit: “A”.  This makes many things a bit challenging to parse.

This post is about trying to address a few of the cons by uploading data to Elasticsearch and using Kibana for visualizations.

Understanding the Domoticz API

First, it’s important to note the output of the Domoticz API.  It’s is largely documented here, but for this demo, we’re going to focus on the type=devices component of the API.

Let’s have a look at the result:

{
   "ActTime" : 1510530122,
   "ServerTime" : "2017-11-12 15:42:02",
   "Sunrise" : "06:48",
   "Sunset" : "16:59",
   "result" : [
      {
         "AddjMulti" : 1.0,
         "AddjMulti2" : 1.0,
         "AddjValue" : 0.0,
         "AddjValue2" : 0.0,
         "BatteryLevel" : 100,
         "CustomImage" : 0,
         "Data" : "74.3 F, 41 %",
         "Description" : "",
         "DewPoint" : "49.10",
         "Favorite" : 0,
         "HardwareID" : 6,
         "HardwareName" : "RaZberry",
         "HardwareType" : "OpenZWave USB",
         "HardwareTypeVal" : 21,
         "HaveTimeout" : false,
         "Humidity" : 41,
         "HumidityStatus" : "Comfortable",
         "ID" : "0201",
         "LastUpdate" : "2017-11-12 15:41:28",
         "Name" : "Office Temperature",
         "Notifications" : "false",
         "PlanID" : "0",
         "PlanIDs" : [ 0 ],
         "Protected" : false,
         "ShowNotifications" : true,
         "SignalLevel" : "-",
         "SubType" : "WTGR800",
         "Temp" : 74.299999999999997,
         "Timers" : "false",
         "Type" : "Temp + Humidity",
         "TypeImg" : "temperature",
         "Unit" : 0,
         "Used" : 1,
         "XOffset" : "0",
         "YOffset" : "0",
         "idx" : "65"
      }
   ]
}

This gives us the basis for what we’ll need to do!

A Few Bits About the API

There are a few things that are a bit of pain with the API that you can see from the above:

  1. Domoticz treats everything in Celsius even if the devices return in Fahrenheit.  As a result, there can be multiple conversions back and forth which can result in floating-point rounding troubles, as you can see above which is 74.29999… instead of 74.3 as the device had returned.  I haven’t delved into every API or component return, but I assume this is more than just Celsius/Fahrenheit but in other metric/English conversions.
  2. The devices tend to return multiple values in the same device and those may have varying types.  For example, notice the “Data” above is “74.3F, 41 %” which is Temperature + Humidity combined in a single string while we also get a separate “Humidity” value (integer) and a separate “Temp” value (floating point) and a separate “DewPoint” value (floating point, but JSON typed to string and not in the Data block).  Other values you’d expect to be numbers (“SignalLevel”) may return strings (e.g. “-“) if no value is provided. “true/false” may return in strings (as you can see in “Timers”) etc.
  3. A continuation of #2, if every type of device had all of the values it was returning in a separate JSON object that was at least standardized against itself, then this would just be mildly annoying, but alas that’s not the case.  For example, energy monitoring devices only seem to return their data in the “Data” string (no “Amperage” or “Wattage” attribute)

Also, Domoticz doesn’t seem to have any internal scheduled-scripting (that I’ve found) with any frequency higher than once every minute, so we have to poll this API to get the data off the device instead of using those lovely internals.  So this is what we have to work around.  Let’s get started!

Using Elasticsearch’s Ingest Nodes

Elasticsearch has Ingest Nodes, which allow us to transform this type of data directly in the Elasticsearch cluster without setting up any external dependencies.  First, we’ll need to create an ingest pipeline:

I set up the following:

curl -XPUT "https://MY_ES_USER:MY_ES_PASSWORD@MY_ES_HOST:MY_ES_PORT/_ingest/pipeline/domoticz" -H 'Content-Type: application/json' -d'
{
  "description": "Converts domoticz API to ES",
  "processors": [
    {
      "date": {
        "field": "LastUpdate",
        "target_field": "timestamp",
        "formats": [
          "yyyy-MM-dd HH:mm:ss"
        ],
        "timezone": "America/Los_Angeles"
      },
      "remove": {
        "field": "LastUpdate"
      },
      "set": {
        "field": "IngestTime",
        "value": "{{_ingest.timestamp}}"
      },
      "rename": {
        "field": "Data",
        "target_field": "DataString"
      },
      "script": {
        "lang": "painless",
        "source": "if (ctx.Type == \"Current\") { ctx.Current = ctx.DataString.replace(\" A\",\"\") } else if (ctx.Type == \"General\") { if (ctx.Usage != null) { ctx.Wattage = ctx.Usage.replace(\" Watt\",\"\"); } else if (ctx.CounterToday != null) { ctx.EnergyToday = ctx.CounterToday.replace(\" kWh\",\"\"); } else if (ctx.DataString.endsWith(\" SNR\")) { ctx.SNR = ctx.DataString.replace(\" SNR\",\"\"); } } else if  (ctx.Type == \"Lux\") { ctx.Lux = ctx.DataString.replace(\" Lux\",\"\") }"
      },
      "date_index_name": {
        "field": "IngestTime",
        "index_name_format": "yyyy-MM-dd",
        "index_name_prefix": "devices-",
        "date_rounding": "d"
      }
    }
  ]
}'

This does a few things: it moves the “LastUpdate” field into a “timestamp” field, sets an “IngestTime” field to the time the document was received, adds “Current,” “Wattage,” “EnergyToday,” “Lux,” and “SNR” named fields if the based on the “Type” and the Data field string, and finally sets up a date-based index for the values.

Great.  Now we have an ingest pipeline that can process the results of each of the devices.  This will place the data into an index named devices-<date>.  We’ll also want an index template to store the data to:

curl -XPUT "https://MY_ES_USER:MY_ES_PASSWORD@MY_ES_HOST:MY_ES_PORT/_template/domoticz" -H 'Content-Type: application/json' -d'
{
  "domoticz": {
    "order": 0,
    "template": "devices-*",
    "settings": {
      "index": {
        "number_of_shards": "1"
      }
    },
    "mappings": {
      "devices": {
        "properties": {
          "BatteryLevel": {
            "type": "scaled_float",
            "scaling_factor": 100
          },
          "DataString": {
            "type": "text"
          },
          "HardwareName": {
            "type": "keyword"
          },
          "IngestTime": {
            "type": "date"
          },
          "Level": {
            "type": "double"
          },
          "LevelInt": {
            "type": "scaled_float",
            "scaling_factor": 100
          },
          "Name": {
            "type": "text"
          },
          "Status": {
            "type": "keyword"
          },
          "Type": {
            "type": "keyword"
          },
          "Current": {
            "type": "double"
          },
          "DewPoint": {
            "type": "double"
          },
          "Gust": {
            "type": "double"
          },
          "EnergyToday": {
            "type": "double"
          },
          "Rain": {
            "type": "double"
          },
          "RainRate": {
            "type": "double"
          },
          "Speed": {
            "type": "double"
          },
          "Chill": {
            "type": "double"
          },
          "UVI": {
            "type": "double"
          },
          "Direction": {
            "type": "double"
          },
          "Lux": {
            "type": "integer"
          },
          "Wattage": {
            "type": "double"
          },
          "CounterToday": {
            "type": "text"
          },
          "HumidityStatus": {
            "type": "text"
          },
          "Usage": {
            "type": "text"
          },
          "idx": {
            "type": "integer"
          },
          "timestamp": {
            "type": "date"
          }
        }
      }
    }
  }
}'

Most of this template is not purely “necessary” as Elasticsearch would handle the data correctly in most if not all of these field values, but it sets up efficient mappings and makes sure we don’t have mistakes in our data model.

Uploading the API to Elasticsearch

We need to push the data to Elasticsearch at some regular interval.  I’ve used jq to do some lightweight filtering and reformatting of the data and then ultimately push it up to Elasticsearch.  We can add something like the following to the crontab to initiate it on boot.

#!/bin/bash

while true
  do curl "http://127.0.0.1:8080/json.htm?type=devices&filter=all&used=true&order=Name" | jq -c '.result[] | with_entries(select( .key as $k | $keys | index($k)))' --arg keys '["idx","HardwareName","Name","BatteryLevel","Gust","Direction","Chill","Speed","Temp","UVI","Rain","RainRate","Visibility","Radiation","Level","LevelInt","Status","Humidity","DewPoint","HumidityStatus","CounterToday","Usage","Voltage","Type","Data","LastUpdate"]' | sed -e 's/^/{ "index" : { "_index" : "devices", "_type" : "devices" } }\
/' > /tmp/devices_bulk.json && curl -H 'Content-type: application/json' -XPOST 'https://MY_ES_USER@MY_ES_PASSWORD:MY_ES_PORT/_bulk?pipeline=domoticz' --data-binary @/tmp/devices_bulk.json > /dev/null && sleep 5
done

This script does a few things: it selects the relevant pieces of information that I want indexed (Temperature, HardwareName, etc) from the response and then formats a _bulk index request to Elasticsearch and then POSTs it.

In my case, I’ve used an Elastic Cloud instance to push the data to.  This allows me to externalize the data in a secure fashion and make it available no matter where I’m at, but only to me.  It also lets me fire off alerts, e.g. if one of my security devices goes off and I’m not home.

Drop this bash script into /opt/domoticz/scripts/ (where I’ve installed Domoticz) and in crontab, add:

@reboot bash /opt/domoticz/scripts/uploadES.sh

Setting Up Kibana

The two biggest quirks for setting up Kibana with this data are

  1. Domoticz reports the time as the last time the device had an update, but for most dashboards what you really want is to see the state of all devices at a given point in time.  This means we really want the ingest time, not the device update time (hence adding that to the ingest template).
  2. Using the device IDs for multi-valued documents with similar types.  For example, showing all motion sensors or door sensors on the same graph.  For this, there are a few approaches, but I used the sub aggregation to split the charts by filtering for specific device IDs (“idx” in the data).  This looks like the following in Kibana:

Example Dashboard

Once we tie it all together, we get quite nice graphs!

This makes for a nice dashboard!  We can see the inside and outside temperatures, when there was motion in various spaces (in red), when doors were opened (in red), and when devices were turned on and off.  You can even see in the “office” and “living room” temperature charts when the heat kicked on in my apartment.  We can fire off alerts for security (e.g. “front door is opened but nobody’s home”) or

Future Work

You’ll likely see some of the following in future posts or as future work from me:

  • The bottom right graph hides additional detail of something I’ve built out, which is tracking the wifi strength of connected/roaming devices.  I think this is quite useful as a virtual presence detection device.  I’ll add more detail in the future about how I’ve done this.
  • I’m adding more devices.  One of the things I really like about having the data in a place like this is I can start to gather and visualize other external inputs to the system (e.g. my phone’s sensors and location).  I’ve got some code written for this already.
  • I’d like to make a native Elasticsearch exporter and timer for Domoticz and fix some of this API weirdness, but I’m not terribly familiar with the Domoticz C++ codebase yet to do this.
  • I think my habits could be fairly well modeled with Elastic’s machine learning.  I’d like to build/use machine learning models on top of this data to automatically flag and alert on anomalies in the data rather than hard-code rules.

JSparks

I’ve released my first Android application! It’s a full vertical stack to help teach kids how to program using the excitement of robots and eliminate many of the headaches associated with beginning to learn to write software including syntax errors.

You can find full details at sparki.eskibars.com

Populating users in Active Directory using Perl with OLE

Occasionally, I’ve had the need to populate sample users into an Active Directory for projects.  There are, no doubt, other ways to do this that are simpler/more direct.  For example, it’s likely incredibly easy using PowerShell (but I have never been a PowerShell person) and even in the Perl world, there are Perl modules to interact with LDAP.  So using OLE may not seem like the ideal way, and that’s probably right, but OLE comes with virtually every Windows Perl distribution I’ve seen (as opposed to LDAP) and it isn’t terribly difficult.  In general, I haven’t seen too many examples of using OLE to access this type of data on Windows systems, so I wanted to write a brief post about how it can be used for those that stumble upon it.  It’s also nice that, by virtue of running through OLE, you don’t need anything special for authentication here, as that will be automatically inherited by your active running user.

First things first

In this example, I’ll assume you’ve at least got an Active Directory set up and that it’s generally fairly vanilla.  We need to import the Win32::OLE bits into our script, which is relatively straightforward

use Win32::OLE 'in';

Next, we need to know a bit about the Active Directory that’s been set up.  If you don’t know the structure of it, you can use a variety of tools including the open source LDAP Explorer or Microsoft’s Sysinternals Active Directory Explorer.  You’ll need:

  • The domain to put things in, in LDAP format
  • The base path to put the new users/groups in
  • To be a user that has access to create users (and groups, if you’re looking to do so) at the given base path
  • To create or use some format of e-mail addresses for the new users
  • An LDAP URL to access the system

So in this example, we’ll assume I’ll be putting users into customdomain.sfdemo.mysite.com (LDAP format of “DC=customsubdomain,DC=sfdemo,DC=mysite,DC=com”) and giving them an e-mail address of “firstname.lastname@mysite.com”.  The base path will be “OU=UserGroup,OU=DemoUsers,DC=customsubdomain,DC=sfdemo,DC=mysite,DC=com”

So let’s go define these in Perl

my $domain = 'DC=customsubdomain,DC=sfdemo,DC=mysite,DC=com';
my $newpassword = 'pa55w0rd!';
my $emaildomain = 'mysite.com';
my $basepath = 'OU=UserGroup,OU=DemoUsers,' . $domain;
my $ldaphost = '127.0.0.1';
my $ADsPath = 'LDAP://' . $ldaphost . '/' . $basepath;

Once we have this all defined, we can go grab the OU object and use it later

my $ou = Win32::OLE->GetObject($ADsPath) or die "Unable to get $ADsPath\n";

So to start out with, we’ll do something pretty easy: adding a security group.  Let’s say we want to call this group “LegalUsers”.  At its simplest, adding the group is just

  1. Creating the group
  2. Assigning an the “sAMAccountName” property to it for Active Directory
  3. Updating (saving) the object:
my $securitygroup = 'LegalUsers';
my $newgroup;
$newgroup = $ou->Create('group','CN=' . $securitygroup);
$newgroup->{sAMAccountName} = $securitygroup; #add the sAMAccountName property
$newgroup->SetInfo( ); #save the object
$newgroup = Win32::OLE->GetObject('LDAP://127.0.0.1/cn=' . $securitygroup . ',' . $basepath) or die "Unable to find OU\n"; #retrieve the updated object for later reference

More complex: Adding a user

The only thing that makes adding users more complex is that there are more fields to add.  We need to add the name and set the path of the user in AD and add an e-mail, but may also want to give them a default password, change their account expiration properties, assign them to groups, etc.  Let’s walk through an example of this…

#For this example, we'll show 1 particular username.  The bulk import later will make more sense with this
#Although the bulk import will just take a flat 1-line-per-username format, a CSV could be devised and used instead
my $username = 'shane.connelly';
my $email = $username . "\@$emaildomain"; #our new e-mail, as in shane.connelly@mysite.com
my @namearray = split(/\./, $username);

my $realname = $username;
my $fullname = join(' ', map (ucfirst, @namearray)); #name will now be upper-cased first letters with a space as in "Shane Connelly"
my $initials = uc(join('', map (substr($_,0,1), @namearray))); #initials will now be the first letters of each word, as in "SC"
$initials = substr($initials,1,length($initials) - 2);
my $lastname = ucfirst($namearray[$#namearray]);

print "Adding $fullname\n";

#As with the groups, simply creating a user is fairly easy.
#The only thing that really makes this different than any other LDAP import is that most AD systems have some specific schema as to the field names you use
#The below works with that schema
my $u = $ou->Create("user", 'cn=' . $fullname) or die "Unable to create user\n";
$u->{sAMAccountName} = $username; #as with the groups, we need a sAMAccountName and this needs to be unique
$u->{company} = 'XYZ Corporation';
$u->{initials} = $initials if ($initials);
$u->{displayName} = $fullname;
$u->{name} = $fullname;
$u->{givenName} = ucfirst($namearray[0]);
$u->{sn} = $lastname; #sn = surname
$u->{mail} = $email;
$u->{mobile} = ('(' .(100 + int(rand(800))) . ') ' . (100 + int(rand(800))) . '-' . (1000 + int(rand(8000)))); #create some random phone number for demonstration purposes
$u->{co} = 'United States'; #co = country
$u->{c} = 'US'; #c = country code
$u->SetInfo(  );

#We'll give the user a default password.  For this example, every user will have the same password, which is obviously quite insecure
$u = Win32::OLE->GetObject('LDAP://127.0.0.1/cn=' . $fullname . ',' . $basepath) or die "Unable to find last added user\n";
$u->ChangePassword("",$newpassword);
$u->SetInfo(  );

#Set some flags for the user
#For this example, we'll say "Password doesn't expire" (0x10000 or 65536 in decimal) and make it a "Normal account" (0x0200 or 512 in decimal)
#Just sum up the flags (0x10000 + 0x0200 = 0x10200 or 65536+512 = 66048 in decimal)
#There are a number of flags you can add to a user account, available at http://support.microsoft.com/kb/305144
$u->{userAccountControl} = '66048';
$u->SetInfo(  );

#perhaps we want to also add the user to the "Remote Desktop Users" group
my $g = Win32::OLE->GetObject('LDAP://127.0.0.1/cn=Remote Desktop Users,CN=Builtin,' . $domain) or die "Unable to find last remote desktop group\n";
$g->Add('LDAP://cn=' . $fullname . ',' . $basepath);
$newgroup->Add('LDAP://cn=' . $fullname . ',' . $basepath) if ($newgroup);

Tying it all together

For creating demo users, I’d often simply create a flat file with usernames in it and read through them.  You can also add managers, cities, and addresses for a more “real-looking” scenario.

use strict;
use Win32::OLE 'in';

##### CONFIG #####

my $domain = 'DC=customsubdomain,DC=sfdemo,DC=mysite,DC=com';
my $newpassword = 'pa55w0rd!';
my $emaildomain = 'mysite.com';
my $basepath = 'OU=UserGroup,OU=DemoUsers,' . $domain;
my $securitygroup = 'LegalUsers';
my @cities = ('New York:New York','Los Angeles:California','Houston:Texas','Philadelphia:Pennsylvania','Phoenix:Arizona','San Antonio:Texas',
			  'San Diego:California','Dallas:Texas','San Jose:California','Jacksonville:Florida','Indianapolis:Indiana','San Francisco:California',
			  'Austin:Texas','Columbus:Ohio','Fort Worth:Texas','Charlotte:North Carolina','Detroit:Michigan');
my @streettypes = ('Avenue','Street','Circle');
my @streetnames = ('Second','Third','First','Fourth','Park','Fifth','Main','Sixth','Oak','Pine','Maple','View','Washington','Lake','Hill','Cedar','Stuart','Smith');

my @managers = ('CN=Ally Mcbeal,OU=UserGroup,OU=DemoUsers,DC=customsubdomain,DC=SFDemo,DC=MySite,DC=com',
				'CN=Shane Connelly,OU=UserGroup,OU=DemoUsers,DC=customsubdomain,DC=SFDemo,DC=MySite,DC=com');
##### END CONFIG #####

my $ADsPath = 'LDAP://127.0.0.1/' . $basepath;
my $ou = Win32::OLE->GetObject($ADsPath) or die "Unable to get $ADsPath\n";

my $newgroup;
if ($securitygroup)
{
  $newgroup = $ou->Create('group','CN=' . $securitygroup);
  $newgroup->{sAMAccountName} = $securitygroup;
  $newgroup->SetInfo( );
  $newgroup = Win32::OLE->GetObject('LDAP://127.0.0.1/cn=' . $securitygroup . ',' . $basepath) or die "Unable to find OU\n";
}

open(USRS,"users.txt");
while (<USRS>)
{
	chomp;
	my $username = $_;
	$username =~ s/\_/\./g;
	my $email = $username . "\@$emaildomain";
	my %altemails = ();
	my @namearray = split(/\./, $username);
	
	my $realname = $username;
	my $fullname = join(' ', map (ucfirst, @namearray));
	my $initials = uc(join('', map (substr($_,0,1), @namearray)));
	$initials = substr($initials,1,length($initials) - 2);
	my $lastname = ucfirst($namearray[$#namearray]);
	
	print "Adding $fullname\n";
	
	my $u = $ou->Create("user", 'cn=' . $fullname) or die "Unable to create user\n";
	$u->{sAMAccountName} = $username;
	$u->{company} = 'XYZ Corporation';
	$u->{initials} = $initials if ($initials);
	$u->{displayName} = $fullname;
	$u->{name} = $fullname;
	$u->{givenName} = ucfirst($namearray[0]);
	$u->{sn} = $lastname;
	$u->{mail} = $email;
	$u->{mobile} = ('(' .(100 + int(rand(800))) . ') ' . (100 + int(rand(800))) . '-' . (1000 + int(rand(8000))));
	$u->{co} = 'United States';
	$u->{c} = 'US';
	
	my $userlocation = $cities[rand @cities];
	my ($city, $state) = split(":", $userlocation);
	$u->{st} = $state;
	$u->{l} = $city;
	$u->{streetAddress} = int(rand(9999)) . ' ' . $streetnames[rand @streetnames] . ' ' . $streettypes[rand @streettypes];
	$u->{postalCode} = int(rand(99999));
	$u->{userPrincipalName} = $email;
	$u->{manager} = $managers[rand @managers];
	$u->SetInfo(  );
	
	$u = Win32::OLE->GetObject('LDAP://127.0.0.1/cn=' . $fullname . ',' . $basepath) or die "Unable to find last added user\n";
	$u->ChangePassword("",$newpassword);
	$u->SetInfo(  );
	
	$u->{userAccountControl} = '66048';
	$u->SetInfo(  );
	
	my $g = Win32::OLE->GetObject('LDAP://127.0.0.1/cn=Remote Desktop Users,CN=Builtin,' . $domain) or die "Unable to find last remote desktop group\n";
	$g->Add('LDAP://cn=' . $fullname . ',' . $basepath);
	$newgroup->Add('LDAP://cn=' . $fullname . ',' . $basepath) if ($newgroup);
}
close(USRS);

Antenna Design Genetic Algorithm

Background

For certain classes of antennas, e.g. Yagi-Uda antennas, the design characteristics have no known “best case” numeric values.  That is, if you want to design a Yagi-Uda antenna for a particular frequency, there is no known numeric solution for the width of the dipoles, number of dipoles, and distance between each dipole in order to achieve the highest gain.  Instead, people rely on experimental evidence: the designs of a number of common frequencies have been tested in the field to produce certain amount of gain, so if you know what frequency you’re looking to design for, you go look up the tables based upon what others have tested for.  If you’re looking for an entirely unique frequency, you have to go experiment yourself.

New Approach

I wrote a MatLab program to use a genetic algorithm to modify the parameters of a antenna and eventually “give birth” to a “best known case” antenna based upon forward gain, etc. Currently, it uses NEC2 (Numerical Electromagnetics Code 2) as the processing engine. It writes out a text file to disk, then calls NEC2 to process that file.  This allows us to try a number of unknown antenna designs and permute possible solutions.  It can be run in a distributed fashion, with each machine “phoning home” to a central database which then redistributes the best-known designs to the worker machines.

The output is something like the following, which shows the forward gain of a given design through a set of frequencies

Output from a forward gain analysis

Code

Genetic Algorithm:

function [beta,stopcode]=ga(funstr,parspace,options,p1,p2,p3,p4,p5,p6,p7,p8,p9)
%[beta,stopcode]=ga(funstr,parspace,options,p1,p2,p3,p4,p5,p6,p7,p8,p9)
% Genetic Algorithm for function maximization.
%
% OUTPUTS:
%  beta       = (1 x K) parameter vector maximizing funstr
%  stopcode   = code for terminating condition
%                == 1 if terminated normally
%                == 2 if maximum number of iterations exceeded
%
% INPUTS:
%  funstr     = name of function to be maximized (string).
%  parspace   = (2 x K) matrix is [min; max] of parameter space dimensions
%               or, if (3 x K), then bottom row is a good starting value
%  options    = vector of option settings
%  p1,p2,...,p9 are optional parameters to be passed to funstr
%
% where:
% options(1) = m (size of generation, must be even integer)
% options(2) = eta (crossover rate in (0,1); use Booker's VCO if < 0)
% options(3) = gamma (mutation rate in (0,1))
% options(4) = printcnt (print status once every printcnt iterations)
%                Set printcnt to zero to suppress printout.
% options(5) = maxiter (maximum number of iterations)
% options(6) = stopiter (minimum number of gains < epsln before stop)
% options(7) = epsln (smallest gain worth recognizing)
% options(8) = rplcbest (every rplcbest iterations, insert best-so-far)
% options(9) = 1 if function is vectorized (i.e., if the function
%                can simultaneously evaluate many parameter vectors).
%    Default option settings: [20,-1,0.12,10,20000,2000,1e-4,50,0]
%
% Note: 
%    The function is maximized with respect to its first parameter,
%    which is expressed as a row vector.
%    Example: 
%      Say we want to maximize function f with respect to vector p,
%      and need also to pass to f data matrices x,y,z.  Then,
%      write the function f so it is called as f(p,x,y,z).  GA will
%      assume that p is a row vector.

defopt=[200,-1,0.12,10,20000,2000,1e-4,50,0];
months = ['Jan';'Feb';'Mar';'Apr';'May';'Jun';...
          'Jul';'Aug';'Sep';'Oct';'Nov';'Dec'];

if nargin>2
   if isempty(options)
        options=defopt;
   end
else
   options=defopt;
end
m=options(1); eta=options(2); gam=options(3);
printcnt=options(4);
maxiter=options(5);
stopiter=options(6); epsln=options(7);
rplcbest=options(8);
vecfun=options(9);

% Use Booker's VCO if eta==-1
vco=(eta<0);  eta=abs(eta);

% Cancel rplcbest if <=0
if rplcbest<=0, rplcbest=maxiter+1; end

K=size(parspace,2);

% Draw initial Generation
G=rand(m,K).*(parspace(2*ones(m,1),:)-parspace(ones(m,1),:))...
       +parspace(ones(m,1),:);
b0rows=size(parspace,1)-2;
if b0rows>0
  G(1:b0rows,:)=parspace(3:b0rows+2,:);
  parspace=parspace([1 2],:);
end

% Initial 'best' holders
inarow=0;
bestfun=-Inf; beta=zeros(1,K);

% Score for each of m vectors
f=zeros(m,1);

% Setup function string for evaluations
paramstr=',p1,p2,p3,p4,p5,p6,p7,p8,p9';
evalstr = [funstr,'(G'];
if ~vecfun
        evalstr=[evalstr, '(i,:)'];
end
if nargin>3, evalstr=[evalstr,paramstr(1:3*(nargin-3))]; end
evalstr = [evalstr, ')'];

% Print header
if printcnt>0
   disp(['Maximization of function ',funstr])
   disp('i      = Current generation')
   disp('best_i = Best function value in generation i')
   disp('best   = Best function value so far')
   disp('miss   = Number of generations since last hit')
   disp('psi    = Proportion of unique genomes in generation')
   disp(sprintf(['\n',blanks(20),'i     best_i        best     miss   psi']))
end

iter=0;  stopcode=0;
oldpsi=1;  % for VCO option
while stopcode==0
   iter=iter+1;
   % Call function for each vector in G
   if vecfun
        f=eval(evalstr);
   else
     for i=1:m
        f(i)=eval(evalstr);
     end
   end
   f0=f;
   [bf0,bx]=max(f);
   bf=max([bf0 bestfun]);
   fgain=(bf-bestfun);
   if fgain>epsln
        inarow=0;
   else
        inarow=inarow+1;
   end
   if fgain>0
        bestfun=bf;
        beta=G(bx(1),:);
   end
   if printcnt>0 & rem(iter,printcnt)==1
        psi=length(unique(G))/m;
        ck=clock;
        ckhr=int2str(ck(4)+100);  ckday=int2str(ck(3)+100);
        ckmin=int2str(ck(5)+100); cksec=int2str(ck(6)+100);
        timestamp=[ckday(2:3),months(ck(2),:),' ',...
           ckhr(2:3),':',ckmin(2:3),':',cksec(2:3),' '];
        disp([timestamp,sprintf('%6.0f %8.5e %8.5e %5.0f %5.3f',...
                [iter bf0 bestfun inarow psi])])
        disp(beta)
        save gabest beta timestamp iter funstr
   end
   % Reproduction
   f=(f-min(f)).^(1+log(iter)/100);
   pcum=cumsum(f)/sum(f);
   r=rand(1,m); r=sum(r(ones(m,1),:)>pcum(:,ones(1,m)))+1;
   G=G(r,:);
   % Crossover
   if vco
        psi=length(unique(G))/m;
        eta=max([0.2 min([1,eta-psi+oldpsi])]);
        oldpsi=psi;
   end   
   y=sum(rand(m/2,1)<eta);
   if y>0
     % choose crossover point
     x=floor(rand(y,1)*(K-1))+1;
     for i=1:y
        tmp=G(i,x(i)+1:K);
        G(i,x(i)+1:K)=G(i+m/2,x(i)+1:K);
        G(i+m/2,x(i)+1:K)=tmp;
     end
   end
   % Mutation
   M=rand(m,K).*(parspace(2*ones(m,1),:)-parspace(ones(m,1),:))...
       +parspace(ones(m,1),:);
   domuta=find(rand(m,K)<gam);
   G(domuta)=M(domuta);
   % Once every rplcbest iterations, re-insert best beta
   if rem(iter,rplcbest)==0
        G(m,:)=beta;
   end
   stopcode=(inarow>stopiter)+2*(iter>maxiter);
end

if printcnt>0
   if stopcode==1
        disp(sprintf('GA: No improvement in %5.0f generations.\n',stopiter))
   else
        disp(sprintf('GA: Maximum number of iterations exceeded.\n'))
   end
end
% end of GA.M
function [gain_t]=Yagi(p)
% This program is used with a getic optimization code.
% It creates an NEC input file given the parameters for a 3 element YAGI
% antenna.  Then, it runs NEC and reads in the parameters(Gain, Impedance, etc) generated by NEC.
%=========Create NEC input file========================================
Fname_nec='Yagi.nec';
FID_nec=fopen(Fname_nec,'wt');
D=0.0085; % diamter of elements in wavelengths
R=D/2;
Lr=p(1); %perimeter of reflector
Ls=p(2); %perimeter of driven element
Ld=p(3); %perimeter of director
Sr=-p(4); %location of reflector
Sd=p(5);  %location of director
%    Geometry input for NEC

rsl = Lr / 4; %reflector single side length: square loop, one side = permiter / 4
ssl = Ls / 4; %driven single side length: square loop, one side = permiter / 4
dsl = Lr / 4; %director single side length: square loop, one side = permiter / 4
num_segments_per_side = 7;

fprintf(FID_nec,strcat('CM UDA-YAGI SQUARE LOOP ANTENNA','\n'));
fprintf(FID_nec,strcat('CE FIle Generated by MatLab','\n'));
fprintf(FID_nec,'GW %3i %3i %8.4f %8.4f %8.4f %8.4f %8.4f %8.4f %8.4f\n',1,num_segments_per_side,Sr,0,-Lr/2,Sr,0,Lr/2,R); %Reflector
fprintf(FID_nec,'GW %3i %3i %8.4f %8.4f %8.4f %8.4f %8.4f %8.4f %8.4f\n',2,num_segments_per_side,0 ,0,-Ls/2, 0,0,Ls/2,R); %Driven Element
fprintf(FID_nec,'GW %3i %3i %8.4f %8.4f %8.4f %8.4f %8.4f %8.4f %8.4f\n',3,num_segments_per_side,Sd,0,-Ld/2,Sd,0,Ld/2,R); %Director
%    Program Control Commands for NEC
fprintf(FID_nec,strcat('GE','\n'));
fprintf(FID_nec,'EX %3i %3i %3i %3i %8.4f %8.4f\n',0,2,4,0,1,0);%Exitation Command wire 2 segment 4
fprintf(FID_nec,'FR %3i %3i %3i %3i %8.4f %8.4f\n',0,1,0,0,2400,0);%set freq to 299.8 MHz so wavelength will be 1m
fprintf(FID_nec,'RP %3i %3i %3i %3i %8.4f %8.4f %8.4f %8.4f\n',0,1,1,1000,90,0,0,0);%calculate gain at boresite
fprintf(FID_nec,strcat('EN','\n'));
fclose(FID_nec);
%=======Create file to pipe to NEC ===================================
FID_input=fopen('input_CMD','wt');
fprintf(FID_input,strcat(Fname_nec,'\n'));
fprintf(FID_input,strcat('NEC.out','\n'));
fclose(FID_input);
%=======Run NEC======================================================
!NEC2Dx500 < input_CMD >tmp;
%=======Read Data form NEC output file===============================
[freq,Z,gain_t,E_theta,E_phi,n_freq_meas,run_time] = nec_read('NEC.out');