This page last updated: 24 May 2018.Back in 2017, at a Cambridge Wireless meeting, some colleagues of mine happened to be talking to Rob Morland, who is involved in something called the A1 Steam Locomotive Trust. The trust has built a brand new £3m steam locomotive, the Tornado, which runs on the national rail network. Rob was wondering whether it was possible to stream the live sound of the steam engine to those on the track side etc. waiting for it to come past. For some reason my colleagues pointed him at me and hence a project was born.
The design is dictated by these requirements:
The architecture of the system looks something like this:
I'm a software engineer and so I wanted to, as far as
possible, avoid any potential issues with hardware
design. By far the simplest approach, especially since
there was an intention to show Internet Of Things behaviours,
is to use an I2S microphone such as the InvenSense
ICS43434. Not much bigger than a grain of rice, this
microphone can be powered from 3.6 to 1.8 Volts and
provides a completely standard Phillips format I2S digital
output that can be read by any microcontroller with an I2S
interface. Audio is 24 bit and capture rates can be
at least 44 kHz.
I experimented with various bit depths, capture frequencies
and coding schemes. From a capture point of view,
24 bit is somewhat high so I compromised at
16 bit. In terms of capture frequencies,
44 kHz is CD audio quality but that is going to be too
high for a cellular network and so I compromised at
16 kHz. With a raw PCM transport, ignoring
overheads, this would require a constant 256 kbits/s
uplink on the cellular interface. This is definitely on
the large side: cellular networks may offer links of this
bandwidth on the downlink, uplink is an entirely different
matter. However, I didn't want to go any lower than this
in quality terms and so the next variable is the audio coding
While it would be theoretically possible to MP3 encode at
source, that is a processor intensive operation and MP3 is
neither stream oriented nor loss tolerant; it is coded in
blocks of 1152 samples and the audio content is
interleaved across many blocks so losing a single block has a
Jonathan Perkins at work suggested I adopt a NICAM-like
was the first scheme used by the BBC for broadcasting digital
multi-channel audio at a controlled quality, allowing stereo
audio to be broadcast for the first time. It also
happens to be very suited to embedded systems. Basically
a chunk of samples are taken and the peak is worked out.
Then all the samples are shifted down so that every sample in
the block can fit into the desired NICAM bit-width. The
amount of shifting that was performed is included with the
coded block. At the far end the block is reconstructed;
any loss will always be in the lower bits of the block.
With a relatively short block the "gain window" moves such
that the loss is not noticable. I chose an 8 bit
NICAM width and a block duration of 1 ms
(16 samples). For a 16 kHz sampling rate this
results in an uplink rate of 132 kbits/s, which (by
experiment) is bearable.
In addition to the audio stream itself I borrowed from the
likes of RTP and included a sequence number, microsecond
timestamp and coding scheme indicator in the block header; I
called this URTP (u-blox Real Time Protocol).
I initially thought about using RTP or something similar but
I really did NOT want to have to write a mobile application
for this; it had to work out of the box with existing mobile
devices. The answer to this turns out to be HTTP
Live Streaming. This protocol, originally
developed by Apple, chops up an audio stream into segment
files each a few seconds long, which are MP3 encoded but with
a very specific header added so that the browser can
reconstruct them. There is then an index file which
lists the segments to the browser. No client application
is required, just a browser; the browsers of all Apple phones
include native HLS support while all Android phones and
desktop browsers can be supported with the marvellous hls.js.
In the original Internet Of Things plan I had assumed that
DTLS was going to be the security scheme of choice.
However, I experimented with sending uplink audio stream over
UDP and found that there were relatively significant losses,
several percent. Hence I decided that TCP was a better
bet for the audio stream. Then there's also the issue of
cellular networks: they sometimes perform deep packet
inspection and deny service to things they decide don't meet
their tariff model, they have quite active and unpredictable
firewalls and they don't allow incoming TCP connections (which
will be needed for control operations).
Jonathan came to my rescue again here with the answer to all
of these problems: SSH. SSH comes built into all Linux
platforms and allows the setting up of secure tunnels between
servers, even multi-hop, provided that you have an account on
each of the machines, which can be certificates based.
You generate an SSH key on the Raspberry Pi and then push it
to the server. The Raspberry Pi can then use SSH to set
up tunnels from its port X to port Y on the server and, also,
setup up tunnels in the reverse direction, from port A on the
server to port B on the Raspberry Pi. The tunnels are
secure, can be configured to include keep-alives and restarts,
and, should the private key on the Raspberry Pi ever be
exposed, the server can simply remove the public key from its
At the simplest level the server can include a certificate
so that a HTTPS connection is made but that doesn't answer the
problem of how permissions may be encoded for a paid-for
service. This needs some thinking.
There are a few sources of latency:
Hence, in the case where there are no cellular outages the
delay is largely dependent upon the duration of an MP3 segment
file plus some browser/HTTP behaviour uncertainty. By
experiment, with S set at 1 second (a 3.3 kbyte MP3
segment file, about the same length as the HLS index file in
fact), a best case end to end latency of around 3 seconds can
be achieved (tested using Chrome on a PC as the receiving
As soon as there is a cellular outage the effect is to
increase the latency, however testing (see below) has shown
that hls.js is sufficiently clever to re-sync the stream using
the timestamps inside each MP3 segment file so this is not an
For initial testing, the hardware consists of a Raspberry Pi
B+ (which I happened to have in my cupboard), a microphone on a
flexible strip evaluation board connected via a break-out board,
and a u-blox 2G/3G modem board from Hologram called the
Nova. A 2G/3G modem draws more current than the Pi can
provide (close to 3 Amps peak) and so I used a Y cable that
allows me to provide separate power to the modem while
testing. Then I moved all of this to a Pi Zero W since
that should have sufficient processing power but is smaller and
more robust. Here I used a USB hub with an Ethernet
connector built in as I wanted the flexibility of being able to
switch on/off an auxiliary network connection to take over from
cellular (and there's no physical switch to disable Wifi on the
Pi Zero W).
I used a Giff Gaff (Telefonica network) SIM: they offer an
unlimited pay-as-you-go data package for £20 per month, which
works out at about 3 pence per hour of audio streaming if
streaming constantly, bandwidth limiting to 384 kbits/s
from 8 a.m. to midnight after 9 Gbytes consumed, which
is still more throughput than I need.
The software comes in three parts (available on github with
There is, of course, quite a lot of configuration required on
the Raspberry Pi side (setting up SSH tunnels etc.), all of
which is covered in the README.md.
In order to meet the requirement that the only control of the
recording device is power on/off, the Raspberry Pi is also
configured to run from a read-only file system, preventing
potential SD card corruption from a disorganised shut-down.
For testing I wanted to be free of the need for a power supply
and so I powered everything from a Tracer 22 Ah LiPo battery
that I had lying around for other purposes; this only needs
recharging every few days even under heavy use.
I also needed audio capabilities as follows:
I began by testing the stability of the connection and the
worst case latency in the following scenarios (where the early
rows took a few weeks of constant testing and debugging to get
right); testing was performed with hls.js version 0.9.1:
|Scenario||ioc-client connectivity||Browsing device||Outcome|
(Samsung Galaxy A3, Chrome 66.0.3359.126)
constant streaming, ioc-client stationary.
||x||x||With 333 ms segment files HLS liveSyncDurationCount = 1 and liveMaxLatencyDurationCount = 3 the browser generally maintained a ~3 second latency over the period, though I have seen it extend up to 6 seconds on some occasions.|
|x||x||As above, though I think that
the latency was at ~6 seconds for more of the time.
||x||With 333 ms segment files
the mobile browser failed to keep up: after some time,
handset dependent (maybe 20 to 40 minutes), the
browser started cancelling downloads because it perceived
that they would be out of date, resulting in gaps in the
buffered stream, which also fragmented browser
memory. Switching to 1 second segment files,
however, the stream was maintained with a ~3 second
latency; my suspicion is that the HTTP overhead was too
large on such short-duration fetches and the larger
segment file doesn't actually take any longer to
get. At this time I also switched to using the Openfresh
modified HLS (described
here) and, with the #EXT-X-FRESH-IS-COMING tag
added, this kept the maximum latency down to ~5 seconds.
||x||As above: ~3 second delay
using 1 second segment files, sometimes falling back
to ~5 seconds. The browser shows 125 kbytes
downloaded per minute so, for 1 Mbyte of mobile data,
you get 8 minutes of listening time.
Interesting to compare this with the uplink data volume,
which, at 140 kbits/s, is uploading just over 1 Mbyte
per minute. A clear gain from the very complex
processing behind MP3. hls.js maintained the audio
stream down as far as E-GPRS coverage, recovering from
gaps in the stream without incurring delays, until the
issue reported here
||As above, interestingly
showing a similar issue with the break-up of audio at the
browser-end after an overnight run (hls.js not being used
in the case of Safari as it has native HLS support),
though in apparently good Wifi coverage. This was
with the kind help of my sister in south Wales, as I don't
possess an iPhone, so I am unable to testify as to the
nature of the breaking-up of the audio directly.
|Time to begin streaming from power-on in
good coverage conditions.
||Approximately 65 seconds; ioc-client LED
begins to flash at 0.5 Hz after about 15 seconds
(indicating boot), Hologram Nova blue LED goes solid blue
(indicating a data connection) at about 25 seconds,
ioc-client LED begins 2 Hz flashing (indicating
network up) at about 55 seconds and streaming begins
10 seconds after that.
|Time to being streaming after powering-on
in a no service condition; power on with antenna uncrewed,
wait 60 seconds, screw antenna on.
||Hologram Nova blue LED goes solid blue
(indicating a data connection) within 10 to
20 seconds of screwing on the antenna and streaming
begins about 10 to 20 seconds after that.
|Dropping off the network; unscrew antenna,
check that streaming stops by watching the ioc-client LED
indicator, screw antenna back on again.
||Link drops and recovers within 10 to 30
seconds of screwing the antenna back into place again
about 50% of the time. The rest of the time, even though
the cellular link apparently recovered (i.e. the Hologram
Nova blue LED was solid blue), streaming did not; this
|SSH tunnel outage; unscrew the antenna for
greater than 60 seconds.
||Link drops but recovers within 30 to 60 seconds of screwing the antenna back into place again.|