Voice Application Call Flows: State Machine, Workflow, or Rules Engine

I have been doing some work with workflow engines lately and it got me thinking; what is the difference between a workflow and a state machine? Further reflection had me wondering whether a voice application call flow is best represented by a state machine or a workflow. Discussions I had while working with the VoiceXML Forums Tools Committee raised a third option of using rules engines for more complex applications.  So what is the best option for implementing a voice application? To determine this lets examine what each method of modeling application logic actually is.  But first lets make sure we are on the same page as to what a voice application call flow is.

Voice/IVR application call flows are usually designed on paper prior to implementation. This design shows the possible paths a user can take through the voice application depending upon their responses and data that is returned by external and internal systems. A common method of depicting call flow is to use a tool like Visio to diagram it.  Usually the method used is a flow chart.  Below is a simple call flow for the VoiceWeather example in the VoiceModel Project.

This diagram shows the sequence of activities that can occur in the voice application. Which prompts are played and when, when information is collected from the user, and when it gets information from external systems. There are no decision points in this simple application but they are usually depicted as diamonds with branches base on evaluation of a condition, just like a typical flowchart. I have seen many representations of call flows by various vendors but most of them revolve around this simple flowchart method.  These call flows are presented to the business owner of the application for approval and have enough detail that a developer can implement them to exacting specifications. Not a very lean process, but that is a topic of another discussion.

A state machine is a common method of modeling and implementing the dynamics of a software system.  The Unified Modeling Language (UML) provide methods for modeling state machines in State Charts.  State Charts in UML are based what are commonly referred to as Harel State Machines.  Harel State Machines were first defined by David Harel in his doctoral thesis, which added concepts  to finite state machines (FSM) like composite or nested state machines, parallel state machines, and actions in states.  Another group that has adopted the Harel State Machines is the Voice Browser Working Group with the State Chart XML (SCXML) specification.  Below is an example state chart using UML that shows how to bake cookies.

This chart represents the various states of baking cookies and the events that occur to go to the next state. In this examples there is also parallel state machines as depicted by the bars. This shows that you can get the oven ready at the same time that you are preparing the batter and the trays.  It also shows a composite/nested state machine for the state of Batter Preparation, although the details are not shown in this chart.  With a composite state machine all of the details of the child state machine must be completed/finalized or an event thrown that only the parent can catch before the parent state can transition on.  This provides a lot of power in managing the complexities of a FSM.

So what is a workflow?  A workflow represents a sequence of activities.  The transition between each activity, or step, occurs when a previous activity is completed.  Workflows can have decisions on transitions that can cause branching to other activities.  Workflows are commonly used to depict business processes and can be implemented using workflow engines like Microsoft's Windows Workflow Foundation.   Workflows can be diagrammed in UML using Activity Diagrams.  Here is an activity diagram for our baking cookies example.

You will notice that with activity diagrams you can also have parallel activities.  The main difference between our baking cookie state machine and activity diagram (i.e. workflow) is that the focus is on actions instead of states and the transitions occur when an action is completed, instead of when an event occurs.  And therein lies the main difference between state machines and workflows.

So is a voice application call flow more like a workflow are a state machine?  I would have to say the typical way they are modeled, as shown in the previous flowchart example, is more like a workflow. The system just transitions to the next prompt to play when a prompt is completed or it gets input entered or spoken by the user.  It is usually modeled to perform an action/step and transition to the next action/step when the current one is completed.

So should we use a workflow engine or a finite state machine to implement a voice application?  The Voice Browser Working Group seems to have decide on state machines with the introduction of SCXML.  Surprisingly I think part of the answer is in the UML specification. And I quote:
By defining a small set of additional subtypes to the basic state machine concepts, the well-formedness of activity graphs can be defined formally, and subsequently mapped to the dynamic semantics of state machines.
That is a fancy way of saying you can implement a workflow with a state machine.  Basically the events in a "workflow" state machine are the completion of an activity and the states are performing the activities.  Remember that a Harel State Machine introduced the concept of actions while in states. This is represented in  SCXML with the onentry tag that represents actions when going into a state, and an onexit tag for actions when leaving a state.

But before you make up your mind lets look at a third option for implementing voice applications, a rules engine.  A rules engine takes basic conditions for determining what actions to perform next.  These conditions can be written in a domain specific language or a common programming language such as ECMA script.  Some rules engines have more advance concepts like forward chaining which provides even more power and versatility in implementing application logic.  Here is how we might write the rules for our baking  cookie example.
  • If Oven is Off Then Turn Oven On
  • If Batter Not Prepared Then Prepare Batter
  • If Batter Prepared And Cookie Trays Not Prepared Then Prepare Trays
  • If Trays Prepared And Oven Temp is  350 Then Put Trays in Oven
  • If Cookie Color is Golden Brown Then Take Cookies Out Of Oven

We could use a rules engine to define rules on what prompts to play, when to ask users for input, and when to get data from external systems.  This approach is seen by some in the industry as the best way for extremely complex voice applications where using a FSM would quickly get unwieldy and you would have a spaghetti of transitions going to a multitude of states.  So should we use a workflow, a state machine, or a rules engine to implement voice applications? I think it depends.  Here are some guidelines for when to use each.

  • Workflow - A well defined flow through the process. Not event driven.
  • State Machine - Not a well defined flow through the process. Event driven.
  • Rules - When business logic changes often or need to change conditions dynamically.

It is interesting to note that the Windows Workflow Foundation supports all three methods and I hope to explore using it in the VoiceModel Project. The VoiceModel Project is an open source project to ease development of VoiceXML applications using ASP.NET MVC.  VoiceModel decouples the views (VoiceXML presented to the IVR browser) from the controller (call flow), so we could plug in any of these methods to drive the application logic.  Currently VoiceModel provides a state machine to drive the application logic and we have been making some pretty big changes so that it can support composite state machines and to make it easier to persist to something like SCXML.  The composite state machines will make it much easier to develop and use Reusable Dialog Components. Look for future posts that detail some of these changes.  We initially went with state machines because we can implement workflow/activities in a state machine and because of the Voice Browsers Working Groups work with SCXML.

I think there is a difference between what you want to model the system with during design time versus how you actually implement it.  I would not try to use a state chart to explain to business owners of the application how their system will work, but it works great for implementing the application.  Activity diagrams and flowcharts are easier for them to understand and they can be translated to FSM's.  I have taken the approach of modeling the system with rules with some success but as the system grows it can get harder to verify that all of the rules are in place or that they fire all of the actions required.

So what methods do you use to model and implement the dynamics of a voice applications? I would be interested in hearing other approaches and experiences.


Popular posts from this blog

Customizing Claims for Authorization in ASP.NET Core 2.0

Using Claims in ASP.NET Identity

Adding Email Confirmation to ASP.NET Identity in MVC 5