Week 17: Laying the foundations of a plugin system for AskBob

This week, I outlined a new plugin system for AskBob and made it easier to deploy using Docker.

Designing a plugin system for AskBob

I had a few design goals and principles in mind when setting out how such a plugin system could work:

  • extensibilityAskBob should expose a clearly defined API to facilitate the creation of new plugins;
  • modularity – plugins should be self-contained packages with all files required for training and use by AskBob;
  • flexibility – plugins should have access to the vast majority of AskBob’s capabilities so that third-party developers may use AskBob creatively in their own projects;
  • simple installation – it should be no more difficult than dragging new plugins into a folder and running a setup utility; and
  • conflict avoidance – naming conflicts between plugins ought to be minimised;

With these in mind, I arrived at an experimental design for an AskBob plugin system. A plugin is composed of a config.json file (of the sort that may be generated by our configuration generator web app) containing the necessary data required to train our Rasa model as well as Python files defining actions more complex than fixed responses. The folders containing enabled plugins are stored at a configurable plugins location, which by default is a folder called plugins.

The setup utility was modified to only take a “main” configuration file as an optional argument so that it could train entirely from the data provided by installed plugins. At “build-time” when python -m askbob --setup [config.json] is run, the config.json files of all plugins in the plugins folder are loaded, parsed and then translated into Rasa configuration YAML files, which are used to train and generate a new Rasa model supporting the voice assistant skills added by the installed plugins.

We are also able to take advantage of the existing Rasa SDK and Rasa action server for writing and running action code. The RasaResponseService was modified to automatically start the Rasa actions server on object construction and shut it down on destruction (to avoid leaving the process running once AskBob is shut down).

When a plugin supports actions written in Python, there must be at least two files: an __init__.py file with an __all__ variable specifying the names of the Python files containing action code; and at least one other Python file (typically actions.py) containing action codes. Plugin actions must extend from the rasa_sdk.Action class and also be decorated with an AskBob decorator, which names the actions properly.

Example plugin structure

{
    "plugin": "time",
    "intents": [
        {
            "intent_id": "ask_time",
            "examples": [
                "What time is it?",
                "What time is it right now?",
                "What time is it now?",
                "Tell me the time",
                "Tell me the time right now",
                "Tell me the time now"
            ]
        }
    ],
    "actions": [
        "fetch_time"
    ],
    "skills": [
        {
            "description": "give the system time",
            "intent": "ask_time",
            "actions": [
                "fetch_time"
            ]
        }
    ]
}
# plugins/time/__init__.py

__all__ = ["actions"]

# plugins/time/actions.py

from typing import Any, Text, Dict, List
from rasa_sdk import Action, Tracker
from rasa_sdk.executor import CollectingDispatcher

import askbob.plugin


@askbob.plugin.action("time", "fetch_time")
class ActionFetchTime(Action):

    def run(self, dispatcher: CollectingDispatcher,
            tracker: Tracker,
            domain: Dict[Text, Any]) -> List[Dict[Text, Any]]:
        from datetime import datetime

        dispatcher.utter_message(text="The time is {0}.".format(
            datetime.now().strftime("%H:%M")))

        return []

Action response format changes

Previously, actions could only return text-based messages to be output via TTS, as below:

{
  "messages": [
    "Hello, world!",
    "I'm Bob, how do you do?"
  ]
}

The format of responses has been modified to allow action responses to be more flexible. Support has been added for text, image and custom JSON responses when AskBob is being run in server mode that third parties may now take advantage of.

A mixture of different response types is allowed, for example, a developer using AskBob for voice commands implementing a ‘call’ skill within an AskBob plugin may return the following messages in response to the query “call Tim”:

{
  "messages": [
    {
      "text": "Calling Tim."
    },
    {
      "custom": {
        "type": "call_user",
        "callee": "Tim"
      }
    }
  ]
}

On their front end, they would then be able to both output “Calling Tim.” using text-to-speech and execute the necessary logic to actually call Tim. Indeed, this is the strategy used to build the video conferencing plugin.

FISE ecosystem integration

With a plugin system to build on, I had further meetings with the video conferencing team to discuss how AskBob could be used to trigger actions on their video conferencing front end. Using the strategy mentioned in the previous subsection, I was able to implement commands for calling other users and changing their video conferencing lobby background for them to use on their front end.

Calling users takes the format in the previous subsection, while changing the background returns the following response:

{
    "messages": [
        {
            "custom": {
                "type": "change_background"
            }
        }
    ]
}

Improvements

This week, the AskBob codebase was upgraded to be more versatile: it was tweaked so that AskBob could be run solely in server mode without having to install any of the dependencies related to DeepSpeech and pyttsx3 required for interactive mode; compatibility for running in Linux was improved; and support for Docker was added.

Docker

From the root of the AskBob repository, it is now possible to build a new Docker container using the following command:

$ docker build --build-arg ASKBOB_SETUP_CONFIG=default_config.json -t askbob .

Note: Here, the --build-arg flag and argument are optional if the assistant is to be setup only with installed plugins and no additional “main” configuration file.

AskBob may then be launched and operated in server mode using the following command:

$ docker run -it --rm -p 8000:8000 askbob

There is currently no support for running AskBob in interactive mode from within a Docker container.

Other improvements

The logging output from the AskBob application was cleaned up and more beautifully formatted using coloredlogs. Additional documentation on installation and usage was added to the README.

The natural language understanding classifier threshold was increased so to reduce unexpected responses when a query that is out of the depth of a particular AskBob installation is asked.

AskBob Config app

This week, we added a stories and skills page to the AskBob config web app. The skills page allows users to create a skill, which is an intent paired with a list of responses. On the backend, this will be converted into a simple rule so that the voice assistant can provide a range of different responses to an intent.

The stories page allows users to add stories. A Story is a list of intents and responses. On the backend, this will be converted into a rasa story, which is used to direct the conversation. Example stories are used to make the conversation more natural and direct the flow of it.

Next, we could implement the ability to add actions to stories as well as entities. The ability to add forms would also be useful for actions the need multiple pieces of data and.

Client meetings

On Monday, I held a meeting with Dr Joseph Connor to discuss our progress with AskBob, deployment to a Raspberry Pi device and potential uses for the voice assistant in a clinical setting. We will start looking into the hardware available to us to use.

On Friday, I also held a meeting with Dr Dean Mohamedally to entertain the idea of building an Echo Show-style device (a voice assistant with a digital display) using the work from the three FISE teams fronted by the front-end interface of the video conferencing group. Given that AskBob works on (and indeed, is mostly developed on) Windows, he suggested we investigate loading AskBob onto an Intel NUC device – a small low-power Windows 10 computer, potentially with a Celeron or Atom processor.

Team changes

On Tuesday afternoon, Felipe informed us that he had interrupted his studies and would be leaving the team with immediate effect as a result.

Next steps

Next week, we will look to improve our documentation; make it easier to create new, interesting AskBob plugins in repositories outside the main AskBob repository and build an ecosystem around AskBob; and further develop our integrations with the other FISE projects.