Where is the Controller For an MVC VoiceXML Application

In a previous post we created a simple "Hello World" application to demonstrate how to use the VoiceModel library.  This did not demonstrate how to control the flow of a voice application so in this post I will explore how we can add a controller to our MVC VoiceXML application.  Lets take a simple example of a voice application that allows the caller to get the weather conditions for an area based on zip code.  Here is the call flow for our application.

If we were developing this application in straight VoiceXML the first block in our call flow could be handled something like this.

<?xml version="1.0" encoding="UTF-8"?>
<vxml xsi:schemaLocation="http://www.w3.org/2001/vxml http://www.w3.org/TR/voicexml20/vxml.xsd" 
xmlns="http://www.w3.org/2001/vxml" version="2.1"> 
  <form id="greeting">
      <prompt>Welcome to Weather Voice.</prompt> 
      <submit next="GetZip"/> 

After voicing the prompt the IVR would execute the submit tag which would call the action GetZip in the Controller.  We could create an Output object from the VoiceModel to do this, but I do not like this approach because it couples control of the application flow with the objects that represent actions that are performed in our application such as voice prompts and getting user input.  This approach reduces reusability of  our voice objects and will cause us to create rather large and complex controllers in MVC.

My goal for VoiceModel is to decouple the voice objects from the controllers by using a state machine approach that can be represented as objects themselves.  This will also give me the added advantage of being able to persist the controller, which in turn would allow for tools that define the call flow to generate the controller rather than hand code it for every application.  It also fits with the direction of the Voice Browser Working Group who is in charge of the VoiceXML standard.  They have come up with a new standard called State Chart XML (SCXML) which allows for defining a state machine in an XML type language.  The reason for adding this is to remove control of the application from the VoiceXML and put it into a separate controller defined by SCXML.  I am taking the same approach with VoiceModel so that in the future the application call flow could be defined by something like SCXML.  It also leaves open the option to use something like Windows Workflow Foundation to define the call flow.

With this in mind I have created a simple state machine in VoiceModel that can be used.  But as previously stated you can use whatever you like for control of your call flow such as WPF, an open source state machine like stateless, or something based on SCXML.  To make this as easy as possible to use I have created a framework in VoiceModel that starts by creating a derived Controller to use in your MVC application.  I have created a sample of the weather application described above which you can get on the VoiceModel Project website.  This example uses the concepts described in this post.  The derived Controller is called  VoiceController.  You will derive your controller from this as shown in the code below.

using VoiceModel;
using VoiceModel.CallFlow;
using WeatherVoice.Models;
using WeatherVoice.Builders;

namespace WeatherVoice.Controllers
    public class WeatherController : VoiceController
        public ActionResult StateMachine(string vm_id, string vm_event, string vm_result)
            return VoiceView(vm_id, vm_event, vm_result,
                new WeatherVoiceModelBuilder(),
                new WeatherCallFlowBuilder());


You will want to create an action method as shown in this code as well.  This action expects 3 parameters which are passed from the VoiceXML that is automatically generated by VoiceModel.  The method VoiceView takes these 3 parameters and determines what actions to perform and returns the next view for the VoiceXML browser based on our state machine.  In addition to these parameters it needs to know how to access the voice objects and the state machine.  This is done be passing in an instance of a VoiceModelBuilder for the VoiceModel and a CallFlowBuilder for the state machine or controller.  For this example I am just building them each time, but in future posts I will explore how we can persist and cache this information for better performance.  Here is what our custom VoiceModelBuilder looks like that create the voice objects needed for our Weather application.

public class WeatherVoiceModelBuilder : IVoiceModelBuilder
    public IVoiceModels Build()
        VoiceModels views = new VoiceModels();
        //Create a base document that will be used by all of the objects
        //that specifies what action to use in the Controller.  This action
        //will be used for every object to run our state machine.
        VxmlDocument doc = new VxmlDocument();
        doc.controllerName = "StateMachine";
        views.Add(new Output(doc, "greeting", "Welcome to Weather Voice."));
        views.Add(new Input(doc, "getZip", 
            "Enter the five digit zip code for the area where you would like the 
             weather report on.",
            new Grammar("digits?minlength=5")));
        Prompt weatherPrompt = new Prompt();
        weatherPrompt.audios.Add(new TtsMessage("The temperature today is "));
        weatherPrompt.audios.Add(new TtsVariable("d.temp"));
        weatherPrompt.audios.Add(new Silence(1000));
        weatherPrompt.audios.Add(new TtsMessage("The conditions are "));
        weatherPrompt.audios.Add(new TtsVariable("d.conditions"));
        views.Add(new Output(doc, "voiceWeather", weatherPrompt));
        views.Add(new Exit("goodbye", "Thank you for using Weather Voice. Goodbye."));

        return views;

And here is what our custom CallFlowBuilder looks like for this application.

public class WeatherCallFlowBuilder : ICallFlowBuilder
    public ICallFlow Build()
        CallFlow flow = new CallFlow();
        flow.AddStartState(new State("greeting", "getZip"));
        flow.AddState(new State("getZip", "getWeather"));
        flow.AddState(new GetWeather("getWeather", "voiceWeather", 
           new DAL.WeatherServiceMockup()));
        flow.AddState(new State("voiceWeather", "goodbye"));
        flow.AddState(new State("goodbye"));
        return flow;


The assumption for this state machine to work properly is that you give the states the same name as the voice objects they represent.  For example, the Output object named "greeting" is represented by the State that is also named "greeting".  I do not care much for the magic strings we are using here but that can be solved if we put some type of tool on top of this to generate these objects or by using the same T4 template concepts used in T4MVC.

You will notice that this example does not really use a service to get the weather.  To test these concepts out  quickly I created a mockup of this service.  The beauty of creating our voice application in MVC is now we can use development methods like test driven development (TDD), dependency injection and mockups to increase the productivity and quality of our voice applications.  I will provide an example that actually uses a service like Weather Underground to actually get real weather conditions in a later post.

Notice that there is one state that does not represent a VoiceXML view, which is our state for accessing the weather service to get the weather information.  This state is derived from the State class and fires its own events to transition to the next state that represents a view.  Below is the definition of this state.

public class GetWeather : State
    IWeatherService _service;

    public override void OnEntry()
        Weather currWeather = _service.getWeather(this.results);
        JavaScriptSerializer serializer = new JavaScriptSerializer();
        string jsonWeather = serializer.Serialize(currWeather);
        this.Flows.FireEvent(this.Id, "continue", jsonWeather);

    public GetWeather(string id, string target, IWeatherService service)
        : base(id)
        this.AddTransition("continue", target, null);
        this._service = service;

The method used for passing data between states is to serialize it as JSON which provides a lot flexibility in the type of data that is passed around.  If you are referencing the data in your VoiceModel it needs to be accessed through an object called d.  You will notice in the VoiceModel for the object voiceWeather that there is a TtsVariable that is defined as d.temp.  This is to reference the JSON generated by the OnEntry event for the State getWeather.

So get this example from the VoiceModel Project website and play around with it.  There is a description on how you can test this out on a free IVR provided by Voxeo called Prophecy.  I am very much interested in your comments on the approach taken in designing VoiceModel and how it can be extend to make it even more robust and easier to use.  You can also contribute to this project on CodePlex.

Popular posts from this blog

Customizing Claims for Authorization in ASP.NET Core 2.0

Using Claims in ASP.NET Identity

Adding Email Confirmation to ASP.NET Identity in MVC 5