Internet Of Chuffs

This page last updated: 3 March 2018.

Back in 2017, at a Cambridge Wireless meeting, some colleagues of mine happened to be talking to Rob Morland, who is involved in something called the A1 Steam Locomotive Trust.  The trust has built a brand new 3m steam locomotive, the Tornado, which runs on the national rail network.  Rob was wondering whether it was possible to stream the live sound of the steam engine to those on the track side etc. waiting for it to come past.  For some reason my colleagues pointed him at me and hence a project was born.

It began as a work (u-blox Ltd) project which could be used as a model to show an Internet of Things architecture, hence digital end to end.  To reduce the probability of there being analogue audio issues I decided to use an I2S MEMS micophone (InvenSense ICS43434).  However, though the work board I was using supported I2S, it only did so with hardware modifications which, in the end, turned out to be impractical.  So, after a few months of initial development, I moved the implementation to a Raspberry Pi, where I2S audio is simply a matter of connecting wires, and in 2018 it became a hobby project rather than a work project.


The design is dictated by these requirements:


The architecture of the system looks something like this:

IoC architecture

The microphone pics up the chuffs and passes them into the Raspberry Pi.  There the audio is compressed into a stream and passed over the cellular uplink and through the cellular network onto the public internet where the stream terminates at a server.  The server decodes the stream and buffers an appropriate amount of audio.  The users then access this buffered audio on their mobile device over a cellular network (which may or may not be the same one as the chuffs passed through) if they have been granted access by the server.

The architecture raises a number of issues:

How Is The Audio Input Captured?

I'm a software engineer and so I wanted to, as far as possible, avoid any potential issues with hardware design.  By far the simplest approach, especially since there was an intention to show Internet Of Things behaviours, is to use an I2S microphone such as the InvenSense ICS43434. Not much bigger than a grain of rice, this microphone can be powered from 3.6 to 1.8 Volts and provides a completely standard Phillips format I2S digital output that can be read by any microcontroller with an I2S interface.  Audio is 24 bit and capture rates can be at least 44 kHz.

How Is The Uplink Audio Stream Coded?

I experimented with various bit depths, capture frequencies and coding schemes.  From a capture point of view, 24 bit is somewhat high so I compromised at 16 bit.  In terms of capture frequencies, while 44 kHz is CD audio quality that is again going to be too high for a cellular network and so I compromised at 16 kHz.  With a raw PCM transport, ignoring overheads, this would require a constant 256 kbits/s uplink on the cellular interface.  This is definitely on the large side: cellular networks may offer links of this bandwidth on the downlink, uplink is an entirely different matter.  However, I didn't want to go any lower than this in quality terms and so the next variable is the audio coding scheme.

While it would be theoretically possible to MP3 encode at source, that is a processor intensive operation and MP3 is neither stream oriented nor loss tolerant; it is coded in blocks  of 1152 samples and the audio content is interleaved across many blocks so losing a single block has a large effect.

Jonathan Perkins at work suggested I adopt a NICAM-like approach. NICAM was the first scheme used by the BBC for broadcasting digital multi-channel audio at a controlled quality, allowing stereo audio to be broadcast for the first time.  It also happens to be very suited to embedded systems.  Basically a chunk of samples are taken and the peak is worked out.  Then all the samples are shifted down so that every sample in the block can fit into the desired NICAM bit-width.  The amount of shifting that was performed is included with the coded block.  At the far end the block is reconstructed; any loss will always be in the lower bits of the block.  With a relatively short block the "gain window" moves such that the loss is not noticable.  I chose an 8 bit NICAM width and a block duration of 1 ms (16 samples).  For a 16 kHz sampling rate this results in an uplink rate of 132 kbits/s, which (by experiment) is bearable.

In addition to the audio stream itself I borrowed from the likes of RTP and included a sequence number, microsecond timestamp and coding scheme indicator in the block header; I called this URTP (u-blox Real Time Protocol).

How Is The Audio Stream Served To Users?

I initially thought about using RTP or something similar but I really did NOT want to have to write a mobile application for this; it had to work out of the box with existing mobile devices.  The answer to this turns out to be HTTP Live Streaming.  This protocol, originally developed by Apple, chops up an audio stream into segment files each a few seconds long, which are MP3 encoded but with a very specific header added so that the browser can reconstruct them.  There is then an index file which lists the segments to the browser.  No client application is required, just a browser; the browsers of all Apple and Android phones include HLS support.

How Is The Link Over The Public Internet To The Server Secured?

In the original Internet Of Things plan I had assumed that DTLS was going to be the security scheme of choice.  However, I experimented with sending uplink audio stream over UDP and found that there were relatively significant losses, several percent.  Hence I decided that TCP was a better bet for the audio stream.  Then there's also the issue of cellular networks: they sometimes perform deep packet inspection and deny service to things they decide don't meet their tariff model, they have quite active and unpredictable firewalls and they don't allow incoming TCP connections (which will be needed for control operations).

Jonathan came to my rescue again here with the answer to all of these problems: SSH.  SSH comes built into all Linux platforms and allows the setting up of secure tunnels between servers, even multi-hop, provided that you have an account on each of the machines, which can be certificates based.  You generate an SSH key on the Raspberry Pi and then push it to the server.  The Raspberry Pi can then use SSH to set up tunnels from its port X to port Y on the server and, also, setup up tunnels in the reverse direction, from port A on the server to port B on the Raspberry Pi.  The tunnels are secure, can be configured to include keep-alives and restarts, and, should the private key on the Raspberry Pi ever be exposed, the server can simply remove the public key from its lists.

How Is The Downlink To The Users Authenticated/Secured?

At the simplest level the server can include a certificate so that a HTTPS connection is made but that doesn't answer the problem of how permissions may be encoded for a paid-for service.  This needs some thinking.

What Is The System Latency?

There are a few sources of latency:

Hence, in the case where there are no cellular outages the delay is largely dependent upon the duration of an MP3 segment file plus some browser/HTTP behaviour uncertainty.  By experiment, with S set at 1 second (a 3.3 kbyte MP3 segment file, about the same length as the HLS index file in fact), a best case end to end latency of around 2 seconds can be achieved (tested using Chrome on a PC as the receiving browser), though such short durations are likely to be unusual and warrant further testing.

As soon as there is a cellular outage, the effect is to increase the latency.  Buffers exist in the Raspberry Pi (5 seconds) and in the HLS MP3 files (4 seconds).  The problem is that there is no easy way to reduce the latency once there has been a build-up, at least not without slipping the audio stream to catch up, which would be perceptible to users.  This needs more thought.


For initial testing, the hardware consists of a Raspberry Pi B+ (which I happened to have in my cupboard), a microphone on a flexible strip evaluation board connected via a break-out board, and a u-blox 2G/3G modem board from Hologram called the Nova.  A 2G/3G modem draws more current than the Pi can provide (close to 3 Amps peak) and so I used a Y cable that allows me to provide separate power to the modem while testing.  Then I moved all of this to a Pi Zero W since that should have sufficient processing power but is smaller and more robust.  Here I used a USB hub with an Ethernet connector built in as is wanted the flexibility of being able to switch on/off an auxiliary network connection to take over from cellular (and there's no physical switch to disable Wifi on the Pi Zero W).

I used a Giff Gaff (Telefonica network) SIM, since they offer an all-you-can-eat, pay-as-you-go data package for 20 per month (which works out at about 15p per hour of audio streaming and about 140 hours of streaming available per month).

Raspberry Pi, modem and
Microphone, back Microphone, front
Additional power to modem Pi Zero W

On the server side, I simply used a Digital Ocean Ubuntu server, the cheapest one at $5 per month.


The software comes in three parts (available on github with comprehensive READMEs):

There is, of course, quite a lot of configuration required on the Raspberry Pi side (setting up SSH tunnels etc.), all of which is covered in the  In order to meet the requirement that the only control of the recording device is power on/off, the Raspberry Pi is also configured to run from a read-only file system, preventing potential SD card corruption from a disorganised shut-down.

Initial Testing

To be completed.


To be completed.

Back to Meades Family Homepage