A page about an extension I made for the e-puck robots which extends the robots' sound signalling capabilities.
An e-puck with the finished soundboard.
For my PhD research I wanted to experiment with robots that are capable of communicating information about their environment with each other.
In the university we have 13 E-Puck robots, for swarm robotics research. These robots can communicate with each other using: LEDs, a camera, Bluetooth, microphones, a speaker and infra-red sensor messaging.
One of the things I needed the robots to be able to do is tell each other about the location of a food source and then express some information about the food source. This is so that other robots can decide whether or not to go towards to food source and the direction to go in.
I chose to use sound as the communication medium for a few reasons:
The epucks do already have a speaker on them (as you can see in the picture) and microphones, but these all point upwards, so aren't best suited for robot to robot communication (and I needed to use the lpuck extension, which allows the robots to run linux and player/stage but completely covers all the microphones). Maybe you're thinking "but what about the Bluetooth?", well firstly, you can't tell the location of the source from the Bluetooth signal. Also the e-pucks' Bluetooth is set up in slave mode, so e-pucks can't initiate a connection with each other. It is technically possible to change this I think, but it can easily brick the e-pucks (not good).
There are three functions the soundboard needs to have:
The soundboard replaces the board on the e-puck that has a speaker on it. To get it to play a tone I just put a better speaker on the soundboard and connected it to the main e-puck body along the exisiting speaker connection.
The standard e-puck code library for sound generation doesn't allow you to just play a simple tone, you have to convert a wav file (or similar) to a sequence of hexadecimal numbers which the sound generator reads in and plays. The Glowbots project, however, created a really good, stripped down version of the original e-puck library, called the epuck economy library. This includes a really cool little wavetable synthesiser that can generate, play and combine 3 different tones at once. The original Glowbots project page doesn't seem to be available anymore but there is a sourceforge site for the e-puck economy library.
An e-puck.
To do this I used a Fast Fourier Transform (FFT) library I found and applied it to the signal from the microphones. The library I used was the public domain FFTpic18 library by Alciom.
This library was specifically written for use on the PIC18 family of microcontrollers. It's got a small codebase and is very simple to use; you just write your data into a buffer and then call one of two FFT functions to turn that buffer data into FFT'd data. It can do either a 128 point FFT on complex data (i.e. data that has a real and imaginary element to it), or a 256 point FFT on real (i.e. not complex) data.
The signals from the microphones are not complex, so I used the 256 point real data FFT. For reasons I'll cover in the next section there are 3 microphones on the soundboard, so that needs 3 * 256 int buffers for each microphone's signal, plus another 256 ints for the FFT buffer. That's 1024 ints, or 2048 bytes, which is more RAM than I had available on the PIC I was using.To fix this I edited the library code to run a 128 point FFT instead of 256, which meant I only needed 128 data readings per microphone (instead of 256). To reduce the memory use even more, I stored each mic readings as one byte instead of a 16-bit int, saving another 128 bytes per mic. The total memory usage for the microphone buffering and FFT is now 3 microphone buffers * 128 bytes each + 128 ints for the FFT buffer = 640 bytes.
The modified FFT library code is available in the downloads section of this page.
The robots need to be able to move towards a sound source, so a robot needs to be able to measure of how much to the left or right it should turn in order to be facing the sound source. This is called Direction of Arrival (DOA) estimation. I tried two different methods of DOA estimation: delay and sum beamforming, and just comparing the how loud a tone was in the left and right microphones. Sadly neither method was particularly reliable and I ran out of time to work on the soundboard. However, the methodology for the beamforming is pretty interesting, and maybe the results can be improved upon with further work.
The first DOA estimation method I tried was delay and sum beamforming. In this method, you have several microphones, evenly spaced on a flat plane. The time difference between the sound reaching each microphone is used to guess the DOA.
As sound travels, it reaches each microphone at a different time. The resulting sum is noisy and has a low amplitude.
By delaying the data we can delay the signal at the same time, so the sum has a large amplitude and there is less noise.
The amount of time the sound spends travelling between microphones depends on the DOA of the sound. If the sound source is straight ahead there is no delay, if it is fully to the left or right there is the maximum delay. By applying a delay to the microphone signals, we can effectively "steer" the microphones so that they listen in one direction at a time. By comparing the results of this steering we can estimate a DOA for the sound source. For more information about how delay and sum beamforming works, I would recommend Andy Greensted's lab book pages. That site includes equations for working out the (idealised) frequency response of any microphone array, along with very helpful and easy to follow derivations of the equations.
Essentially, you would ideally want to have as fast a sample rate as possible, so that you can delay the signals more precisely. The microphones should be as far apart as possible so that the delay between them is larger, and there should be as many microphones as possible to make the delay and sum more accurate. For this project though, the soundboard is limited by the size of the e-pucks and the speed of the microprocessor on the soundboard. Consequently, the total microphone spacing shouldn't really exceed 7cm (the diameter of the e-pucks), and since the processor can only take so many readings each second, adding more microphones will reduce the overall sample rate. After running the numbers I decided that 3 microphones spaced 5cm apart would give the best results within these constraints. The e-pucks have a 7cm diameter and the microphones have a total width of 10cm, so there is only a small overshoot. The 3 microphones have a sample rate of 38kHz, which is already quite low, and adding another microphone reduces the sample rate to 28.5kHz.
I built the soundboard and tested the DOA estimation with the delay and sum beamforming. To reduce processing and to limit the amount of data being sent from the soundboard to the e-puck, I only measured the steering angles of ±90°, ±45° and 0°. A tone was classed as "detected" if the power spectral density at that frequency was above a threshold, for 3 out of the 5 measured angles. If a tone was detected then the measured angle that gave the largest power spectral density was used as the DOA estimate.
To test the soundboard's frequency and DOA estimation, I played all the test tones at it using another e-puck with a soundboard. The signalling e-puck was at a distance of 20cm and at DOAs of ±90°, ±45° and 0°. The receiving soundboard was able to correctly guess the frequency approximately 66% of the time, but could only correctly estimate the DOA 17% of the time. Since there were only 5 possibilities for the DOA estimation this was worse than randomly guessing. I tried a few different approaches to determining the frequency and DOA but nothing really improved on this success rate.
After being unsuccessful with the beamforming approach I experimented with comparing the amplitude of the left and right microphones to get an inexact idea of whether a sound source was to the left or right of the robot.
Without the constraints of beamforming we could reduce the microphone sample rate down to 8k. This had 2 benefits. Firstly the maximum signal the economy library wavetable synthesis can produce is 3.5kHz. So a sample rate of 38kHz is somewhat excessive, since only a 7kHz is needed to detect this frequency. Secondly, a smaller sample rate means that the FFT divides the spectrum into narrower bands. An FFT band's width on the frequency spectrum (it's bandwidth) is the sample rate divided by the number of FFT points (in our case 128). With a 38kHz sample rate, each band is 296Hz wide, but at 8k the bands are only 62.5Hz wide. Consequently, we can detect more distinct frequencies, up to the maximum signal freq of 3.5kHz, with the lower sample rate.
With this method of DOA estimation, an FFT was done on the raw microphone data (instead of the delayed and summed microphone readings). To measure the amplitude of the received signal the FFT'd signals from all the microphones were summed. To measure the DOA the FFT of the right microphone's data was subtracted from the FFT of the left microphone. This gave a left-right differential measure which gives a rough indication of the DOA.
The picture below shows some results of me measuring amplitude and differential of a signal as I move the receiving robot further away from the sound source at different DOAs. From this data I did some thresholding in order to estimate the frequency and DOA of incoming signals. This gave over all successful frequency and DOA estimation rates of up to 93% and 39% respectively. Whilst this isn't perfect, it is an improvement on the delay and sum beamforming.
Amplitude comparison data from the soundboard. The differential should be negative with the negative DOA angles. Each data point shows the mean of 40 readings. You can see that there are definite trends in terms of how the differential changes with the actual DOA, but the data is quite erratic and sensitive to environmental conditions.