Final strategy: I'm going to use CMU Sphinx with a small vocabulary trained to my voice for most commands.  I'll use the kaldi-gstreamer-server, or maybe even an online service, for larger, arbitrary pieces of sound - stuff that I can't predict.

Which means that I'll have two separate, behemoth systems installed on the computer.  Ouch.  At least I can stream Kaldi from a different computer.  Sphinx should be small enough to not be a problem.

Here's what I need to be able to train the command and control language model.