Week 21: Implementing voice queries over HTTP & the skills viewer

AskBob voice assistant

New /voicequery endpoint

We have added an additional /voicequery endpoint to enable users to upload 16kHz single-channel RIFF WAV files encoded as multipart/form-data to AskBob for interpretation as an alternative to sending transcribed text to /query.

POST requests to /voicequery must specify a sender, just as with textual queries, and also upload the raw binary data of a valid WAV file under the speech key.

A form that could upload a valid WAV file to AskBob for a /query-like response is the following:

<form action="http://localhost:8000/voicequery" method="post" enctype="multipart/form-data">
    <input type="text" name="sender" value="askbob"><br>
    <input type="file" name="speech"><br>
    <input type="submit">
</form>

This functionality is particularly important to the FISE Lounge team, who will be using this endpoint to allow the AskBob speech transcriber based on Mozilla DeepSpeech to take over speech transcription from IBM Watson when ‘cloud features’ are disabled from their dashboard.

To enable voice functionality, the AskBob web server must also be started with the additional -v flag, as in the following:

$ python -m askbob -s -v

Furthermore, a Mozilla DeepSpeech model and external scorer must be properly configured to be used by AskBob with the additional voice dependencies installed for the new voice-enabled server to function properly.

CORS support

We have also added support for cross-origin resource sharing to AskBob, should anyone wish to configure this directly in the application.

We have provided the following option in the runtime config.ini configuration file to set the output of the Access-Control-Allow-Origin header used to enable CORS:

[Server]
host = 0.0.0.0
port = 8000
cors_origins = *

This is important for deployment; it would allow AskBob to be installed and run on a web server accessed with a domain name (potentially over a local or public network depending on how users wish to deploy AskBob). It is also important to allow the skills viewer to access data from the /skills endpoint.

Updated /skills endpoint format

The JSON response format of the /skills endpoint was updated to be of the following format:

{
   "plugins": [
      {
         "plugin":"plugin_name",
         "description":"Human-readable description.",
         "author":"Author name.",
         "icon":"Icon URL (optional)"
      }
   ],
   "skills": [
      {
         "plugin": "plugin_name",
         "category": "miscellaneous",
         "description": "Plugin description.",
         "examples": [
            "Intent example 1",
            "Intent example 2",
            "Intent example 3"
         ]
      }
   ]
}

This allows us to provide further information to our skills viewer web app.

Improved testing

We also added addtional tests for the AskBob responder, transcriber, plugin decorator and interactive loop components to ensure the application is functioning as expected. While writing unit tests, we also made minor refactorings and stability improvements.

Skills viewer web app

Also this week, we outlined the basic structure of the skills viewer web app, which will allow users to visually inspect the skills installed on an AskBob server (be it voice-enabled or voiceless) sorted by plugin name or category. It will show intent examples that users could query the voice assistant with to access that particular ‘skill’. It will also view per-plugin icons, should they be specified by plugin authors.

Similarly to the configuration generator web app, it is being built with React as a front-end framework and Material-UI for styling. Currently, an example set of skills data is hard-coded into the app; however, next week, we will move to allow users to provide an endpoint URL from which the skills viewer will read skills.