SmartAssistant Blueprint (RFC)

Hi,

Let me open thread to collect info or ideas about a Voice Recognition feature that would work at the edge.

What do you think of this feature ? is it desirable ? that’s something that could get along well with our scenarios with privacy in mind.

Looking at Mycroft of the most popular opensource framework:

DeepSpeech is not yet ready for production use and Mycroft currently uses Google STT as the default STT engine.

I’ll try to share you latest updates about Mozilla’s DeepSpeech,
since I know one of its developer.

I also remember webthings.io used to support snip.ai which was also offline but it’s not open anymore (since this company has been acquired).

Until ready, cloud based backends can be used like in homeass:

Updates:

With privacy/security in mind, voice assistant (like a camera) is pretty hard to set up correctly. First, it should clearly inform if it is sending anything outside or recording. And ideally have a user switch for online/offline. Then, all other applications connecting to the stream would need authentication IMO

1 Like

Hi, thx for feedback, BTW I did not notice that the speaker blueprint was already mentioned in this place:

https://project.ostc-eu.org/projects/open-source-technology-center-eu/work_packages/284/activity?query_id=100

But I’ll use this forum to share updates if anyone interested.

Yesterday, I managed to build mycroft (in core-image) but it will require more fixes to be usable. I’ll try to upstream them:

Note that bitbake recipe use pocketsphinx STT backend (GitHub - cmusphinx/pocketsphinx: PocketSphinx is a lightweight speech recognition engine, specifically tuned for handheld and mobile devices, though it works equally well on the desktop) which is less accurate than Mozilla’s DeepSpeech I mentioned earlier, but on the other hand PocketSphinx is less resources consuming which is good to for our low end IoT devices class.

And for cameras it can be done locally using ONVIF protocol but maybe you had in mind the “recognition part” which tend to occur in the cloud, motion project provides basic functionalities though.

Relate-to:

https://project.ostc-eu.org/projects/open-source-technology-center-eu/work_packages/287/activity?query_id=100

1 Like

I’m interested in a runtime comparison of DeepSpeech and pocketsphinx STT. From my past exposure, I’ve heard that the pocketsphinx one was barely usable in terms of accuracy.