Internet Of Chuffs

This page last updated: 30 June 2018.

Back in 2017, at a Cambridge Wireless meeting, some colleagues of mine happened to be talking to Rob Morland, who is involved in something called the A1 Steam Locomotive Trust.  The trust has built a brand new 3m steam locomotive, the Tornado, which runs on the national rail network.  Rob was wondering whether it was possible to stream the live sound of the steam engine to those on the track side etc. waiting for it to come past.  For some reason my colleagues pointed him at me and hence a project was born.

It began as a work (u-blox Ltd) project which could be used as a model to show an Internet of Things architecture, hence digital end to end.  To reduce the probability of there being analogue audio issues I decided to use an I2S MEMS micophone (InvenSense ICS43434).  However, though the work board I was using supported I2S, it only did so with hardware modifications which, in the end, turned out to be impractical.  So, after a few months of initial development, I moved the implementation to a Raspberry Pi, where I2S audio is simply a matter of connecting wires, and in 2018 it became a hobby project rather than a work project.


Requirements

The design is dictated by these requirements:


Architecture

The architecture of the system looks something like this:

IoC architecture

The microphone pics up the chuffs and passes them into the Raspberry Pi.  There the audio is compressed into a stream and passed over the cellular uplink and through the cellular network onto the public internet where the stream terminates at a server.  The server decodes the stream and buffers an appropriate amount of audio.  The users then access this buffered audio on their mobile device over a cellular network (which may or may not be the same one as the chuffs passed through) if they have been granted access by the server.

The architecture raises a number of issues:

How Is The Audio Input Captured?

I'm a software engineer and so I wanted to, as far as possible, avoid any potential issues with hardware design.  By far the simplest approach, especially since there was an intention to show Internet Of Things behaviours, is to use an I2S microphone such as the InvenSense ICS43434. Not much bigger than a grain of rice, this microphone can be powered from 3.6 to 1.8 Volts and provides a completely standard Phillips format I2S digital output that can be read by any microcontroller with an I2S interface.  Audio is 24 bit and capture rates can be at least 44 kHz.

How Is The Uplink Audio Stream Coded?

I experimented with various bit depths, capture frequencies and coding schemes.  From a capture point of view, 24 bit is somewhat high so I compromised at 16 bit.  In terms of capture frequencies, 44 kHz is CD audio quality but that is going to be too high for a cellular network and so I compromised at 16 kHz.  With a raw PCM transport, ignoring overheads, this would require a constant 256 kbits/s uplink on the cellular interface.  This is definitely on the large side: cellular networks may offer links of this bandwidth on the downlink, uplink is an entirely different matter.  However, I didn't want to go any lower than this in quality terms and so the next variable is the audio coding scheme.

While it would be theoretically possible to MP3 encode at source, that is a processor intensive operation and MP3 is neither stream oriented nor loss tolerant; it is coded in blocks  of 1152 samples and the audio content is interleaved across many blocks so losing a single block has a large effect.

Jonathan Perkins at work suggested I adopt a NICAM-like approach. NICAM was the first scheme used by the BBC for broadcasting digital multi-channel audio at a controlled quality, allowing stereo audio to be broadcast for the first time.  It also happens to be very suited to embedded systems.  Basically a chunk of samples are taken and the peak is worked out.  Then all the samples are shifted down so that every sample in the block can fit into the desired NICAM bit-width.  The amount of shifting that was performed is included with the coded block.  At the far end the block is reconstructed; any loss will always be in the lower bits of the block.  With a relatively short block the "gain window" moves such that the loss is not noticable.  I chose an 8 bit NICAM width and a block duration of 1 ms (16 samples).  For a 16 kHz sampling rate this results in an uplink rate of 132 kbits/s, which (by experiment) is bearable.

In addition to the audio stream itself I borrowed from the likes of RTP and included a sequence number, microsecond timestamp and coding scheme indicator in the block header; I called this URTP (u-blox Real Time Protocol).

How Is The Audio Stream Served To Users?

I initially thought about using RTP or something similar but I really did NOT want to have to write a mobile application for this; it had to work out of the box with existing mobile devices.  The answer to this turns out to be HTTP Live Streaming.  This protocol, originally developed by Apple, chops up an audio stream into segment files each a few seconds long, which are MP3 encoded but with a very specific header added so that the browser can reconstruct them.  There is then an index file which lists the segments to the browser.  No client application is required, just a browser; the browsers of all Apple phones include native HLS support while all Android phones and desktop browsers can be supported with the marvellous hls.js.

How Is The Link Over The Public Internet To The Server Secured?

In the original Internet Of Things plan I had assumed that DTLS was going to be the security scheme of choice.  However, I experimented with sending uplink audio stream over UDP and found that there were relatively significant losses, several percent.  Hence I decided that TCP was a better bet for the audio stream.  Then there's also the issue of cellular networks: they sometimes perform deep packet inspection and deny service to things they decide don't meet their tariff model, they have quite active and unpredictable firewalls and they don't allow incoming TCP connections (which will be needed for control operations).

Jonathan came to my rescue again here with the answer to all of these problems: SSH.  SSH comes built into all Linux platforms and allows the setting up of secure tunnels between servers, even multi-hop, provided that you have an account on each of the machines, which can be certificates based.  You generate an SSH key on the Raspberry Pi and then push it to the server.  The Raspberry Pi can then use SSH to set up tunnels from its port X to port Y on the server and, also, setup up tunnels in the reverse direction, from port A on the server to port B on the Raspberry Pi.  The tunnels are secure, can be configured to include keep-alives and restarts, and, should the private key on the Raspberry Pi ever be exposed, the server can simply remove the public key from its lists.

How Is The Downlink To The Users Authenticated/Secured?

At the simplest level the server can include a certificate so that a HTTPS connection is made but that doesn't answer the problem of how permissions may be encoded for a paid-for service.  This needs some thinking.

What Is The System Latency?

There are a few sources of latency:

Hence, in the case where there are no cellular outages the delay is largely dependent upon the duration of an MP3 segment file plus some browser/HTTP behaviour uncertainty.  By experiment, with S set at 1 second (a 3.3 kbyte MP3 segment file, about the same length as the HLS index file in fact), a best case end to end latency of around 3 seconds can be achieved (tested using Chrome on a PC as the receiving browser).

As soon as there is a cellular outage the effect is to increase the latency, however testing (see below) has shown that hls.js is sufficiently clever to re-sync the stream using the timestamps inside each MP3 segment file so this is not an issue.


Hardware

For initial testing, the hardware consists of a Raspberry Pi B+ (which I happened to have in my cupboard), a microphone on a flexible strip evaluation board connected via a break-out board, and a u-blox 2G/3G modem board from Hologram called the Nova.  A 2G/3G modem draws more current than the Pi can provide (close to 3 Amps peak) and so I used a Y cable that allows me to provide separate power to the modem while testing.  Then I moved all of this to a Pi Zero W since that should have sufficient processing power but is smaller and more robust.  Here I used a USB hub with an Ethernet connector built in as I wanted the flexibility of being able to switch on/off an auxiliary network connection to take over from cellular (and there's no physical switch to disable Wifi on the Pi Zero W).

I used a Giff Gaff (Telefonica network) SIM: they offer an unlimited pay-as-you-go data package for 20 per month, which works out at about 3 pence per hour of audio streaming if streaming constantly, bandwidth limiting to 384 kbits/s from 8 a.m. to midnight after 9 Gbytes consumed, which is still more throughput than I need.

Raspberry Pi, modem and
                microphone
Microphone, back Microphone, front
Additional power to modem Pi Zero W

On the server side, I simply used a Digital Ocean Ubuntu server, the cheapest one at $5 per month.

Software

The software comes in three parts (available on github with comprehensive READMEs):

There is, of course, quite a lot of configuration required on the Raspberry Pi side (setting up SSH tunnels etc.), all of which is covered in the README.md.  In order to meet the requirement that the only control of the recording device is power on/off, the Raspberry Pi is also configured to run from a read-only file system, preventing potential SD card corruption from a disorganised shut-down.


Testing

For testing I wanted to be free of the need for a power supply and so I powered everything from a Tracer 22 Ah LiPo battery that I had lying around for other purposes; this only needs recharging every few days even under heavy use.

Tracer
        lithium battery

I also needed audio capabilities as follows:

I began by testing the stability of the connection and the worst case latency in the following scenarios (where the early rows took a few weeks of constant testing and debugging to get right); testing began with hls.js version 0.9.1.

Scenario ioc-client connectivity Browsing device Outcome
PC
(Chrome 66.0.3359.139)
Android Phone
(Samsung Galaxy A3, Chrome 66.0.3359.126)
Apple Phone
Cellular Ethernet Ethernet Wifi Cellular Wifi Cellular
Overnight (8 hours) constant streaming, ioc-client stationary.

x x



With 333 ms segment files HLS liveSyncDurationCount = 1 and liveMaxLatencyDurationCount = 3 the browser generally maintained a ~3 second latency over the period, though I have seen it extend up to 6 seconds on some occasions.
x
x



As above, though I think that the latency was at ~6 seconds for more of the time.

x

x


With 333 ms segment files the mobile browser failed to keep up: after some time, handset dependent (maybe 20 to 40 minutes), the browser started cancelling downloads because it perceived that they would be out of date, resulting in gaps in the buffered stream, which also fragmented browser memory.  Switching to 1 second segment files, however, the stream was maintained with a ~3 second latency; my suspicion is that the HTTP overhead was too large on such short-duration fetches and the larger segment file doesn't actually take any longer to get.  At this time I also switched to using the Openfresh modified HLS (described here) and, with the #EXT-X-FRESH-IS-COMING tag added, this kept the maximum latency down to ~5 seconds.

x


x

As above: ~3 second delay using 1 second segment files, sometimes falling back to ~5 seconds.  The browser shows 125 kbytes downloaded per minute so, for 1 Mbyte of mobile data, you get 8 minutes of listening time.  Interesting to compare this with the uplink data volume, which, at 140 kbits/s, is uploading just over 1 Mbyte per minute.  A clear gain from the very complex processing behind MP3.  hls.js maintained the audio stream down as far as E-GPRS coverage, recovering from gaps in the stream without incurring delays, until the issue reported here was encountered.
x


x


As above.

x



x

As above, interestingly showing a similar issue with the break-up of audio at the browser-end after an overnight run (hls.js not being used in the case of Safari as it has native HLS support), though in apparently good Wifi coverage.  This was with the kind help of my sister in south Wales, as I don't possess an iPhone, so I am unable to testify as to the nature of the breaking-up of the audio directly.

I also did some ad hoc desktop browser testing and streaming was shown to work for a ~8 hour run on the following browsers:
I did some bench-based testing of the cellular uplink behaviour under various radio conditions.  These tests were carried out with the chuff box assembled as below, using Chrome on a PC as the listener, and over 10ish runs of each scenario.

Scenario
Outcome
Time to begin streaming from power-on in good coverage conditions.
Approximately 65 seconds; ioc-client LED begins to flash at 0.5 Hz after about 15 seconds (indicating boot), Hologram Nova blue LED goes solid blue (indicating a data connection) at about 25 seconds, ioc-client LED begins 2 Hz flashing (meant to indicate network up) at about 55 seconds and streaming begins 10 seconds after that.
Time to begin streaming after powering-on in a no service condition; power on with antenna uncrewed, wait 60 seconds, screw antenna on.
Hologram Nova blue LED goes solid blue (indicating a data connection) within 10 to 20 seconds of screwing on the antenna and streaming begins about 10 to 20 seconds after that.
Dropping off the network; unscrew antenna, check that streaming stops by watching the ioc-client LED indicator, screw antenna back on again.
Hologram Nova blue LED goes solid blue (indicating a data connectoin) within 5 seconds of screwing the antenna back into place and streaming begins 10 to 12 seconds after that.
SSH tunnel outage; unscrew the antenna for greater than 60 seconds.
Streaming recovers within 30 to 60 seconds of screwing the antenna back into place again.


Chuff Box

The unit was to be mounted on a pre-existing flat metal plate covered by a plastic cowl on top of the Tornado's water tank in the tender: the cowl can be seen above the word "British" in this picture.  The cowl is only 40 mm high, the main constraint on the enclosure design, and provides water-proofing.  Power is 5 Volts and Rob Morland verified that the single feedback LED will receive enough drive from a Raspberry Pi GPIO pin without additional buffering; just a 150 Ohm series resistor is required.  I've no idea about audio design and so the "Mark 1" box has a small hole in the bottom to which the MEMS microphone is pressed and that is aligned with a similar hole in the flat metal base-plate.

For once I managed to get a box of the optimal size on my first attempt: an ABS flanged box 105 x 70 x 35 mm from RS; flanged boxes are preferred as the engine gets very dirty so it is useful to be able to fix/remove a box without having to open it.  I threaded the PCB mounting holes inside the box with an M2.5 plug tap to take 6 mm long nylon bolts.  Continuing my charmed life, this box exactly fitted the width of the 160 x 100 mm vero board, also from RS, so it only remained for me to slice off the length I needed and arrange the components/holes.  I used a scalpel to cut slivers of copper track away appropriately before soldering.

Vero board layout
Bottom of vero
                board with cuts
The wired board
The populated board

The USB connection of the Pi Zero doesn't come out on a header so I soldered wires directly to the test pads, adding a blob of Araldite as strain relief; green is USB+, white is USB-.

Pi Zero USB data
          connection

In order to follow some form of convention on the D-sub I wired ground to pin 5 and the control LED output to pin 3, matching ground and Tx data for a serial connection; that's Chuff Box Mark I completed.

Chuff box closed
Chuff box open


Audio Quality

After constructing Chuff Box Mark I, I began to look at audio quality.  It turns out that it is relatively easy to leave bugs in audio processing: the air/brain system is a very forgiving thing.  I gave the whole system a thorough shakedown, in the process writing a PC-based URTP decoder in C++.  I also added compilation switches to the ioc-client to generate a ramping value that could be verified easily and the ability to dump both the raw audio and the URTP encoded output to file on the Raspberry Pi.  Then I spent a few days listening to my music collection on random play through the system to check it out, using a pair of headphones plugged into my mobile phone as the receiver.

I tweaked the AGC to stop the audio levels being fiddled with too much: 3 bits of hysteresis and each bit of uplift in gain must be needed continuously for 10 seconds before it is applied; downshifts in gain are applied immediately to prevent clipping. I added an option to allow the maximum gain (which in my case is a bit shift) to be set on the command line; the maximum value of 12 (Spinal Tap pah) tended to result in quite a loud hiss; 10 turned out to be better.  In order to avoid (or reduce the effect of) a modulated noise floor, I added pre-emphasis and de-emphasis FIR filtering to the Unicam codec using the wonderful http://t-filter.engineerjs.com/ to determine the coefficients that approximate a CCITT J17 filter:

Pre-emphasis filter
De-emphasis

Then I switched from testing over an Ethernet uplink to using the Hologram Nova modem proper inside the closed Chuff Box and I found another complication: the Hologram Nova, or probably the DC to DC converters on that board, were emitting a very audible 5 kHz squeal.  The very helpful people at Hologram Nova confirmed that I could supply 3.8 V direct to a pad on the PCB from my own power source.  While this helped it didn't fix the problem and added another component to the Chuff Box so instead I did three things:

This did the trick without noticeably affecting the quality of the wanted audio.

Squeal Foam packing over microphone
Desqueal filter
Spectrum after shielding and
                  filtering

However, I later noticed that, at highest gain, when all is quiet, there was still a "clickety-click" type low-level audio noise from the DC to DC converter, picked up by the microphone.  It was proved to be audio noise rather than electrical noise since my mobile phone, with microphone placed near the box, was able to pick it up.  And there's not much I can do about that kind of noise.  Best now to just install the Chuff Box and see how it behaves.


Installation

Exciting!  Made me feel like a kid, a kid who gets to clamber on an enormous steam engine.

As of mid 2018 the Tornado was housed at the Nene Valley Railway, not far from me, for repairs. Rob Morland took my Chuff Box, improved it with some additional stand-offs to make sure the Pi was secured and the USB modem was not going to shake loose, mounted the low-profile high-gain antenna I had purchased on an aluminium bracket and fitted the lot to an aluminium plate which, in turn, mounts on a steel plate on top of the locomotive's tender.

On a very sunny Saturday, 30th June, Rob was there to install the thing and so I joined him test it and take some photographs.

The loco
The A1 Tornado.
The loco's name-plate
Name plate.
The installation location
The installation location on the tender (red arrow).
The "audio
                streamer" box and antenna mounted on the tender
The chuff box (bottom right) and antenna (bottom left) mounted on the aluminium plate, which is mounted on a steel plate.  The microphone has an exit hole through both plates.
A view to the
                chuffing-end
A view towards the chuffing end; the grey plastic cover for the aluminium plate can be seen middle-bottom of the picture. The locomotive's cabin is enclosed; its roof is just south of the green engine body.

 
Rob has mounted the tiny blue feedback LED and the on/off switch in the cab. A short video of it flashing too rapidly for you to see is included above; refresh this page if no YouTube video image appears above this text, sometimes it doesn't load on the first attempt.

Now the Nene Valley Railway is not that far from the A1 and we found that there was a constant traffic rumble in the distance, sufficiently strong that the gain of the system was never high enough to hear the "clickety-click" of the DC to DC converter.  We tried shouting inside the cab and around the tender but all we got over the chuff box stream was indistinct mumbling; this is good, we don't want to overhear conversations.  There was a small amount of chuffing on the Nene Valley railway, about 20 metres to the left of the "view towards the chuffing-end" picture above, and this was definitely audible but I think that until we get some chuffing that is not near a main road we won't be able to tell how the system really behaves.

While I was there, Rob gave me a tour of the rest of the engine and the support coach (which you can see on the left of the "view towards the chuffing end" picture above).

The cab, fire-box
                front removed.
The cabin, fire-box front removed for maintenance.
Going down
Going down-under.
Rob Morland
Rob Morland explains.
The cam of the third
                (middle) engine, arrow indicating where a Bluetooth
                temperature sensor is installed.
I thought this was interesting: the cam of the middle engine has a Bluetooth temperature sensor installed in it (the yellow arrow).
The Bluetooth
                receiver with USB cable heading off to a Pi in the cab
The Bluetooth receiver, just behind the cam, with USB cable heading off to a Raspberry Pi mounted in the cab which then offers temperature readings over Wifi.

There's a fiendish amount of electronics on both the loco and the support coach: LED strip lighting, battery banks, sensors, multiple safety systems and generators of various forms.  The Bluetooth temperature sensor in the cam that you can see above has to stand more than 20 G of acceleration and continue to provide readings without fail at all times.  Quite a thing.


Back to Meades Family Homepage