Arduino_Raspberry PI_Robot_Speech recognition_OpenCV_Julius_

Robot powered by a Raspberry pi and an Arduino Mega . he can take voice commands using Julius4 , voice responding using espeak and face detection with OpenCV.

It's really amazing what you can do with the different development boards available today. This is my 3rd attempt trying to make an autonomous Robot , well more or less autonomous . RikonV2 is the successor of my previous Robot RikonV1 .

This time combining both an Arduino Mega 2560 to manage all the sensors and motors control, and a Raspberry pi B+ for tasks that requires more computing power and multimedia handling . This gave me a good robot platform to play with and unlimited features can be added

The idea behind using both an Arduino and a RPI is to sort-off simulate a spine column and a brain, The Arduino will manage the sensors (accelerometer, ultrasonic distance sensors, etc ...  ),  actuators ( motors etc ... )and the RPI will be in charge of images acquisition and processing  and voice recognition .

A communication interface between the two dev boards is required , multiple ways are available such as I2C , Serial, USB... ,  i’ve chosen serial communication for it’s simplicity , however because of the different voltage level between Arduino (5V) and RPI (3.3V) a voltage level shifter is necessary .  I’ve used an adafruit voltage level shifter :

On the RPI side the following changes need to be applied in order for the serial communication to work properly :
In the file /boot/cmdline.txt change the line :

dwc_otg.lpm_enable=0 console=ttyAMA0,115200 kgdboc=ttyAMA0,115200 console=tty1 root=/dev/mmcblk0p6 rootfstype=ext4 elevator=deadline rootwait

with :

dwc_otg.lpm_enable=0 console=tty1 root=/dev/mmcblk0p6 rootfstype=ext4 elevator=deadline rootwait

and comment out the following line in the file /etc/inittab  to be like this :

#T0:23:respawn:/sbin/getty -L ttyAMA0 115200 vt100

Then reboot the RPI so the modification take affect .
Once done it should be possible to send and receive serial string using the following python code :

import serial
ser = serial.Serial(‘/dev/ttyAMA0′, 9600, timeout=1)
ser.write(“Hello RikonV2\n”) # send to arduino
print ser.readline() # receive from arduino

On the Arduino side, the  code  hadling the serial communication can be like this :

String inputString = “”;
boolean stringComplete = false;

void setup() {
void loop() {
if (stringComplete) { //Once the string is received completely
Serial1.println(inputString+”_OK”); // Send back received string + "_OK"

/* Processing received strings here */

inputString = “”;
stringComplete = false;

void serialEvent1() {
while (Serial1.available()) {
// get the new byte:
char inChar = (char)Serial1.read();
// add it to the inputString:
if (inChar == ‘\n’) {
stringComplete = true;
return ;
inputString += inChar;

Notice that i used Serial1 because i connected the voltage level shifter with Arduino PINs 18 and 19.
Now we can exchange data between Arduino and RPI , and so we are open to a limitless possible features to implement.
The base frame of the robot is from plywood , cutted, shaped  then glued together with woor glue :

At first i was planing to make the robot to balance on two wheels , but the servomotors weren't pricese enough after i modified them for continious rotation, so i decided to use them as DC motors and use an the H-brige L298N IC as a motor controller :

The motors are controlled by Arduino using the following code :

const int mLeft1 = 6;
const int mLeft2 = 5;
const int mRight1 = 4;
const int mRight2 = 3;
const int mSpeed = 150;
void setup() {
pinMode(mLeft1, OUTPUT);
pinMode(mLeft2, OUTPUT);
pinMode(mRight1, OUTPUT);
pinMode(mRight2, OUTPUT);
void mStop(){
analogWrite(mRight1, LOW);
analogWrite(mLeft2, LOW);
analogWrite(mRight2, LOW);
analogWrite(mLeft1, LOW);
void mForward(){
analogWrite(mLeft2, mSpeed);
analogWrite(mRight2, mSpeed);
void mBackward(){
analogWrite(mRight1, mSpeed);
analogWrite(mLeft1, mSpeed);
void mTLeft(){
analogWrite(mRight1, mSpeed);
analogWrite(mLeft2, mSpeed);
void mTRight(){
analogWrite(mRight2, mSpeed);
analogWrite(mLeft1, mSpeed);
void loop() {

I found that it's better to build the electronics as shields for the arduino, it make it easy to switch the arduino between different project :

Like i mentioned before, the plan was to make a balancing robot, but now it tunred out as a R2D2 style, well sort off... :)

I placed the RPI on the back , the Arduino in the front   and the motor driver on the bottom :

The head is a public phone card cutted and shaped using a heat gun :

Then i drilled some holes for the eyes, mouth  and one in the front head for the RPI camera module, and finally placed some LEDs salavged from an old mobile phone :

The RPI doesn't have an analogue microphone input, so a USB audio card is requiered :

In order to use it with RPI , we need to set it as default sound card, but first we make sure it's detected by the kernel using the command :

Bus 001 Device 002: ID 0424:9512 Standard Microsystems Corp.
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 003: ID 0424:ec00 Standard Microsystems Corp.
Bus 001 Device 004: ID 0d8c:000e C-Media Electronics, Inc. Audio Adapter (Planet UP-100, Genius G-Talk)

Then in the file /etc/modprobe.d/alsa-base.conf  we add a "#" at the begining of the line options snd-usb-audio index=-2 so it looks like :

#options snd-usb-audio index=-2

Save the file and reboot, and then run the command alsamixer if everything is correct it should show Generic USB Audio Device as a Card  :

If it still shows   bcm2835 ALSA , try re-editing the file /etc/modprobe.d/alsa-base.conf   by removing the  "#" and changing the index value to 0  and adding the line options snd_bcm2835 index=1 if it's not there :

options snd-usb-audio index=0
options snd_bcm2835 index=1

Save, reboot and reverify using alsamixer again .

Finally i added  little front cover using a plastic botlle cutted and painted with a white spray paint :

After assembling everything it looked like this :

So far the hardware implelmentation as well as the basic communication interface between the RPI and the Arduino are complete .   And now we can start giving the robot few features .

Voice recognition:

I struggeled for a while to find a good and easy to use open source voice recognition software and specially that does not require internet connectivity to work, till i found Julius which is a continuous speech recognition decoder software.

Compile Julius :

  • Install library packets necessary to get julius source code as well as for compilation :
sudo apt-get install alsa-tools alsa-oss flex zlib1g-dev libc-bin libc-dev-bin python-pexpect libasound2 libasound2-dev cvs
  • Create some environment variables used during Julius compilation to compile it for the ARM processor architecture :
export CFLAGS="-O2 -mcpu=arm1176jzf-s -mfpu=vfp -mfloat-abi=hard -pipe -fomit-frame-pointer"
  • Then run the following commands to configure, compile and install Julius :

  ./configure --with-mictype=alsa
sudo make
sudo make install

The install is now complete but before we can use it we need to download the  accoustic Model from here then extract it :

cd /home/pi/
mkdir julius
wget wget http://www.repository.voxforge1.org/downloads/Main/Tags/Releases/0.9.0/Julius_AcousticModels_16kHz-16bit_MFCC_O_D_\(0.9.0\).tgz
tar xvf Julius_AcousticModels_16kHz-16bit_MFCC_O_D_\(0.9.0\).tgz
rm Julius_AcousticModels_16kHz-16bit_MFCC_O_D_\(0.9.0\).tgz

We can now start configuring the grammar syntaxes for the sentences we want to be detected . for that we need to create two files one will contains words and word groups we will use, the file must have .voca extension , and the 2nd file contains the phrases grammar and must  have .grammar extension :

root@raspberrypi:/home/pi/julius/grammar# tree
├── sample.grammar
└── sample.voca

The  .vocal file contains all the words with there phonetic pronunciations   that you want them to be detected  :

% NS_B
<\s> sil
% NS_E

FIVE f ay v
FOUR f ao r
NINE n ay n

MOVE m uw v

TURN t er n

LEFT l eh f t
RIGHT r ay t
FORWARD f ao r w er d
BACKWARD b ae k w er d

The words are groupped , and this will simplify the task when defining the syntaxes .  The first two words NS_B and  NS_E represent the silence start and end of the phrase .  a file named dict  containes the phonetics for all the words supported when we extracted the accoustic model file.

Next we need to define the syntaxes, which is done in the file with extension .grammar, it looks something like this :



The first line indicate that the phrase start with a slience NS_B  then a syntax called MOVE_ which can be a combination of either, (2nd line) a word from the word group MOVE Then a word from the word group  DIRECTION then a digit from the word group DIGIT then the word UNITS, (3rd line ) or only a word from the group MOVE followed by a word from the group DIRECTION . and the phrase ends with a silence NS_E.

Using those syntaxes we can compose for example the following phrases :


Once the two files are defined as wished,  we run the following command, make sure you are in the same folder where .vocal and .grammar file are:

mkdfa.pl   sample

Sample is the name of the .vocal and .grammar files .  If no error it should output something like :

sample.grammar has 12 rules
sample.voca has 17 categories and 30 words
Now parsing grammar file
Now modifying grammar to minimize states[6]
Now parsing vocabulary file
Now making nondeterministic finite automaton[25/25]
Now making deterministic finite automaton[19/19]
Now making triplet list[19/19]
17 categories, 19 nodes, 27 arcs
-> minimized: 11 nodes, 18 arcs
generated: sample.dfa sample.term sample.dict

Notice that three file were generated  : sample.dfa sample.term sample.dict . if it isn't the case then something went wrong .

Finally we can run Julius .

Make sure you are in the folder where the accoustic model was extracted and run the following command :

julius -input mic  -nocharconv -quiet -h hmmdefs  -hlist tiedlist -dfa grammar/sample.dfa -v grammar/sample.dict

Saying a phrase into the microhpne  will be shown in the verbous :

pass1_best: <\s> MOVE
sentence1: <\s> STOP
pass1_best: <\s> MOVE
sentence1: <\s> STOP
pass1_best: <\s> MOVE FORWARD FIVE
sentence1: <\s> MOVE FORWARD

Now we have Julius configured to detect our phrases, but we want it to do something when it matches . so next thing is to link each word of phrase to our piece of code we want to run . To do that, Julius provide a server mode , and so we can use a program in whatever  programming language we want to get the data and the process them as we wish .

To run Julius in server mode , we add the option -module 10500 to the previous command , with 10500 is the TCP listening port :

julius -input mic  -nocharconv -quiet -h hmmdefs  -hlist tiedlist -dfa grammar/sample.dfa -v grammar/sample.dict -module 10500

There are basic clients programes avaialble in different progamming languages , i chosed to use Python  which has a library dedicated to connect to Julius server .

To install it on the RPI , use the following steps :

  • Install the python setuptools using :
apt-get install python-setuptools
  • Download , extract and install the library using :
wget https://github.com/Diaoul/pyjulius/archive/master.zip
unzip master.zip -d pyjulius
cd pyjulius/pyjulius-master/
python setup.py install

A simple example of the client python code can look like this :

#!/usr/bin/env python
import sys
import pyjulius
import Queue
# Initialize and try to connect
client = pyjulius.Client('localhost', 10500)
except pyjulius.ConnectionError:
print 'Start julius as module first!'
# Start listening to the server
while 1:
result = client.results.get(False)
if isinstance(result, pyjulius.Sentence):
#print 'Sentence "%s" recognized with score %.2f' % (result, result.score)
#print result.words
if result.words[0].word == "MOVE" and result.words[0].confidence > 0.9:
if result.words[1].word == "FORWARD" and result.words[0].confidence > 0.9 :
print "forward"
if result.words[1].word == "BACKWARD" and result.words[0].confidence > 0.9 :
print "backward"
except Queue.Empty:
#print repr(result)
except KeyboardInterrupt:
print 'Exiting...'
client.stop() # send the stop signal
client.join() # wait for the thread to die
client.disconnect() # disconnect from julius

Julius in server mode sends the macthed phrases to the python client which stores it in the variable result in two parts: the words and it's confidence scores . The scores represent the the accuracy of the detected work .

Speech Synthesizing :

For speech synthesization i used espeak which can be easily installed using the packaging manager :

apt-get install espeak

And now you can simply use the following command to do text to speech conversion :

espeak -p20 -ven+m1 -k3 -s140   "Yes Master"

The option p20 changes the pitch, -ven+m1 to select the male voice, k3 for no capital letters indication sound, and the text between quotes is text you want to synthesize . 
From the python script it can be called like this :

os.system('speak -p20 -ven+m1 -k3 -s140   "Yes Master"')

Comments :