Published on Nov 10, 2015
A Voice Browser is a "device which interprets a (voice) markup language and is capable of generating voice output and/or interpreting voice input, and possibly other input/output modalities." The definition of a voice browser, above, is a broad one.
The fact that the system deals with speech is obvious given the first word of the name, but what makes a software system that interacts with the user via speech a "browser"?
The information that the system uses (for either domain data or dialog flow) is dynamic and comes somewhere from the Internet. From an end-user's perspective, the impetus is to provide a service similar to what graphical browsers of HTML and related technologies do today, but on devices that are not equipped with full-browsers or even the screens to support them. This situation is only exacerbated by the fact that much of today's content depends on the ability to run scripting languages and 3rd-party plug-ins to work correctly.
Much of the efforts concentrate on using the telephone as the first voice browsing device. This is not to say that it is the preferred embodiment for a voice browser, only that the number of access devices is huge, and because it is at the opposite end of the graphical-browser continuum, which high lights the requirements that make a speech interface viable. By the first meeting it was clear that this scope-limiting was also needed in order to make progress, given that there are significant challenges in designing a system that uses or integrates with existing content, or that automatically scales to the features of various access devices.
It defines a speech recognition grammar specification language that will be generally useful across a variety of speech platforms used in the context of a dialog and synthesis markup environment."
When the system or application needs to describe to the speech-recognizer what to listen for, one way it can do so is via a format that is both human and machine-readable.
"To assist in clarifying the scope of charters of each of the several subgroups of the W3C Voice Browser Working Group, a representative or model architecture for a typical voice browser application has been developed. This architecture illustrates one possible arrangement of the main components of a typical system, and should not be construed as a recommendation."
It establishes a prioritized list of requirements for natural language processing in a voice browser environment. The data that a voice browser uses to create a dialog can vary from a rigid set of instructions and state transitions, whether declaratively and/or procedurally stated, to a dialog that is created dynamically from information and constraints about the dialog itself. The NLP requirements document describes the requirements of a system that takes the latter approach, using an example paradigm of a set of tasks operating on a frame-based model. Slots in the frame that are optionally filled guide the dialog and provide contextual information used for task-selection.
It establishes a prioritized list of requirements for speech synthesis markup which any proposed markup language should address. A text-to-speech system, which is usually a stand-alone module that does not actually "understand the meaning" of what is spoken, must rely on hints to produce an utterance that is natural and easy to understand, and moreover, evokes the desired meaning in the listener. In addition to these prosodic elements, the document also describes issues such as multi-lingual capability, pronunciation issues for words not in the lexicon, time-synchronization, and textual items that require special preprocessing before they can be spoken properly