Using PocketSphinx within Python Code
Looks like my installation records will have to be updated to account for a different installation source, and maybe a different version of the source code.
Ok, here's the process so far. Install sphinxbase and pocketsphinx from GitHub - this means using the bleeding-edge versions, rather than the tried-and true alpha5 versions that I talked about in previous posts. This just seems to work better. Once this is all figured out, I'll go back and clean those up.
cd ~/tools
git clone https://github.com/cmusphinx/sphinxbase.git
cd ./sphinxbase
./autogen.sh
./configure
make
make check
make install
cd ~/tools
git clone https://github.com/cmusphinx/pocketsphinx.git
cd ./pocketsphinx
./autogen.sh
./configure
make clean all
make check
sudo make install
Now look inside the pocketsphinx directory:cd ~/tools/pocketsphinx/swig/python/test
There's a whole bunch of test scripts that walk you through the implementation of pocketsphinx in python. It's basically done for you. Check the one called kws-test.py -- that's the one that will wait to hear a keyword, run a command when it does, then resume listening. Perfect!I'm going to assume that you've already created your own voice model based on the other posts in this blog, and that you've got a directory dedicated to command and control experiments.
If that's not true, then just mess with the script without moving it. Just make a backup. The only effective difference is that the detection will be less accurate; for the purposes of this tutorial, ignore the rest of the code down to where I've pasted my copy of the python script. The only thing you should change has to do with reading from the microphone rather than an audio file; change the script to match what I've got here. You're done now. The rest of this tutorial is for those who have already created their own voice model. See others of my posts for how to do that.
# Open file to read the data
# stream = open(os.path.join(datadir, "test-file.wav"), "rb")
# Alternatively you can read from microphone
import pyaudio
p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=1024)
stream.start_stream()
Ok. For the rest of us, let's get back to messing with this script. While still in the test directory,mkdir ~/tools/cc_ex
cp ./kws_test.py ~/tools/cc_ex/kws_test.py
cd ~/tools/cc_ex/
gedit kws_test.py
There's a few changes to make in the python script. Make sure the model directory has been adjusted. Also, the script by default is checking in a .raw audio file for the keyword: uncomment and comment the relevant lines so the script uses pyaudio to record from the microphone. The full text of my version of the script is below. Note that the keyphrase it's looking for is the word 'and'. Pretty simple, and very likely to have been covered a lot in the voice training.
Note also that there's a weird quirk in the detection - you have to speak quickly. I tried for a long time making long, sonorous 'aaaannnnddd' noises at my microphone, and it didn't pick up. Finally gave a short, staccato 'and' - it detected me right away. Did it five more times, and it picked me up each time. I don't see a way to get around that - I think it's built into the buffer, so it won't even hear the whole thing otherwise. Or maybe I just said 'and' in the training really fast each time, though I don't think that's likely.
#!/usr/bin/python
import sys, os
from pocketsphinx.pocketsphinx import *
from sphinxbase.sphinxbase import *
modeldir = "~/tools/train-voice-data-pocketsphinx"
# Create a decoder with certain model
config = Decoder.default_config()
config.set_string('-hmm', os.path.join(modeldir, 'neo-en/en-us'))
config.set_string('-dict', os.path.join(modeldir, 'neo-en/cmudict-en-us.dict'))
config.set_string('-keyphrase', 'and')
config.set_float('-kws_threshold', 1e+1)
#config.set_string('-logfn', '/dev/null')
# Open file to read the data
# stream = open(os.path.join(datadir, "test-file.wav"), "rb")
# Alternatively you can read from microphone
import pyaudio
p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=1024)
stream.start_stream()
# Process audio chunk by chunk. On keyphrase detected perform action and restart search
decoder = Decoder(config)
decoder.start_utt()
while True:
buf = stream.read(1024)
if buf:
decoder.process_raw(buf, False, False)
else:
break
if decoder.hyp() != None:
print ([(seg.word, seg.prob, seg.start_frame, seg.end_frame) for seg in decoder.seg()])
print ("Detected keyphrase, restarting search")
decoder.end_utt()
decoder.start_utt()
Anyway, that's all. If it doesn't work, don't blame me. That's as dead simple as I know how to make it.