CMU Sphinx is advanced enough to use its understanding of grammar to help it figure out the likelihood that a particular word was spoken.  To do this, it needs to have a predefined concept of which words tend to follow each other -- it needs to understand the format of what is spoken to it.  The context of a 'command and control' AI has a very specific type of grammar involved, where the format is predominately commands and statements.

If CMU Sphinx has been made to recognize that, it will be able to filter words that don't make sense in that context and weight more heavily words that do make sense as control words: it will know that 'play music' is more likely than 'pink music', and 'shutdown' is more likely to be a command than 'showdown'.

For now, here's the primary sources:
http://cmusphinx.sourceforge.net/wiki/tutoriallm
http://www.speech.cs.cmu.edu/tools/lmtool-new.html

Using this, it should be possible to create the grammar language model based on a big list of sentences; only problem is, I don't have a sentence list.  Once that LM has been created, the voice data I've created should be retrained - even that is done based on grammar statistics.