//php echo do_shortcode(‘[responsivevoice_button voice=”US English Male” buttontext=”Listen to Post”]’) ?>
MOUNTAIN VIEW, CALIF. — Pete Warden’s startup, Useful Sensors, initiated a new crowdfunding campaign this week for its natural language processing (NLP) system built on a small Rockchip single-board computer. The “AI in a box” module, built on off-the-shelf hardware from Rockchip, understands natural language, answers queries and generates text entirely at the edge. The company can run OpenAI’s Whisper twice as fast as Faster-Whisper—previously the fastest implementation of Whisper using a specialized inference engine.
Useful Sensors’ “AI in a box” NLP module uses off-the-shelf hardware components combined with Useful Sensors’ models and software, including the company’s Useful Transformers framework and the optimizations for the hardware it’s running on.
The startup chose the Rockchip board for the RK3588S SoC’s powerful quad-Arm Cortex-A76 and quad-Arm Cortex-A55 cores, plus Rockchip’s 6 TOPS (INT4) in-house developed neural processing unit (NPU).
“The NPU helps us run the speech-to-text twice as fast as anyone else on this board,” Warden told EE Times. “There are a bunch of fun use cases you can build on top of this. Once you have speech-to-text, there’s all these other wonderful things you can build, including captions for live events, for which we will work with some conferences in the next couple of months to provide live closed captions from our boxes.”
As well as providing captions and transcriptions, the box can also execute translations from 15 major languages to English on the fly, and run a large language model to generate responses to questions. Use cases might include captioning or serving as a voice keyboard, for example.
The crowdfunding campaign, via Crowdsupply, is a chance to get the system into the hands of the maker community, Warden said.
“[The crowdfunding campaign] is a chance to get this into the hands of makers and prototypers—we are working with larger companies to get this into larger products, but we think this can be a really great platform for people to build their own applications on top of the speech to text capability,” he said.
The crowdfunded version of the system will come in a custom enclosure so it can be “placed on the kitchen table to caption the conversation, or bring to an event and plug in to an HDMI output for event captioning, or bring to a meeting so you can have translation real time,” Warden said, adding that anyone who can write Python can use the stream of text delivered in other ways.
Warden’s goal is to get a system like this down to a tiny form factor and a 50-cent price tag within the next few years to enable widespread AI applications at the edge. In the meantime, the current system can provide a voice-based user interface for products entirely at the edge.
Useful Transformers’ implementation of Whisper relies on custom C++ level code, which calls Rockchip’s matrix multiplication library. The non-hardware-specific parts of the stack are open sourced. “Rockchip’s libraries are closed source but available,” Useful Sensors’ co-founder and CTO Manjunath Kudlur told EE Times. “That’s the only part that needs replacing [for the framework] to run on other hardware. The goal is to make the rest of the firmware so lightweight that it runs as fast as possible.”
The company has open-sourced its framework, Useful Transformers, and while it only works for the Rockchip device today, and only runs Whisper, the plan is to expand to other transformer models and other hardware in the future.
Useful Sensors is also working on building sensor modules for its proprietary model implementations. The company’s current-gen person sensor is a postage-stamp sized board with a camera on the front and an Espressif Systems ESP32-S3 wireless-enabled microcontroller on the back. A second, similar board is based on the Raspberry Pi RP2040 microcontroller with an LG camera and is intended for reading QR codes.