Websocket-based Voice AI Telephony Integration

Overview

Purpose

Host agents using our Indian phone number.

Use cases

Outbound calls: sales, reminders, scheduling, comfirmations, surveys
Inbound calls: service, support
Bi-directional

Setup

Before using this feature, make sure you:

Have your agent ready and running on a publically accessible WebSocket endpoint.
Have an Awaaz AI account with:
- A template set up for calls
- An Indian phone number assigned
- The WebSocket URL configured

Create a free Awaaz AI account at https://app.awaaz.de. Once you do, our team will get in touch to take your details and configure your account

How to Use

Once setup is complete, you can use Awaaz AI Telephony in these steps:

Upload Messages

Log in to the Awaaz AI Portal UI
Upload the messages you want to deliver via the XFin API
Awaaz AI scheduler will queue up the messages for sending calls

Outgoing Calls

Calls are triggered to end users via your Indian phone number.
During each call, Awaaz AI opens a WebSocket connection to your agent.

Real-Time Audio Exchange

Once connected:
- User → Awaaz AI → Agent (user’s audio is sent as encoded JSON messages).
- Agent → Awaaz AI → User (agent’s audio is returned the same way).

Message Formats

Awaaz AI uses pre-defined JSON formats to exchange audio and events.

Messages from Awaaz AI

Start Message

This message will be sent directly after websocket connection is established.
It will contain audio formating information and any data that might be required by the agent for executing call.

{
  "event": "start",
  "sequence_number": 1,
  "stream_sid": "MZXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
  "start": {
    "stream_sid": "MZXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
    "call_sid": "MZXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
    "media_format": {
        "encoding": "base64",
        "sample_rate": 8000,
        "bit_rate": "" },
    "custom_parameters": {
      "FirstName": "Jane",
      "LastName": "Doe",
      "RemoteParty": "Bob",
    }
  }
}

Media Message

AwaazAI will continuously send call audio in below json encoded message format.
Each payload will contain 320 bytes of base64 encoded, 16-bit PCM, 8 kHz audio.

{
  "event": "media",
  "sequence_number": 2,
  "stream_sid": "MZXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
  "media": {
    "chunk": 1,
    "timestamp": "5",
    "payload": "<>"
  } 
}

DTMF Message

This message will be sent when a digit is pressed in call by user.

{
 "event": "dtmf",
 "stream_sid":"MZXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
 "sequence_number":3,
 "dtmf": {
     "digit": "<>",
     "duration":"<duration in ms>"
 }
}

Mark Message

This message is used to notify when media processing is completed

{
  "event": "mark",
  "sequence_number": "4",
  "stream_sid": "MZXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
  "mark": {
    "name": "my label"
  }
}

Stop Message

This message will be sent when call has ended.

{
  "event": "stop",
  "sequence_number": 5,
  "stream_sid": "MZXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
  "stop": {
    "call_sid": "MZXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
  }
}

Messages to Awaaz AI

All messages sent to Awaaz AI should follow this format for audio messages

Media Message

This message will be received from your websocket with audio to be played in call

{
  "event": "media",
  "sequence_number": 1,
  "stream_sid": "MZXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
  "media": {
    "chunk": 1,
    "timestamp": "5",
    "payload": "<>"
  } 
}

Mark Message

This message is sent by your websocket to mark a media point. Once audio is processed, you will recieve mark event message from Awaaz AI with matching name

{
  "event": "mark",
  "sequence_number": 2,
  "stream_sid": "MZXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
  "mark": {
    "name": "my label"
  }
}

Clear Message

Send this message to clear audio that has been sent but not played yet.

{
  "event": "clear",
  "stream_sid": "MZXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
}

Example Client Code for Your Application

We provide a sample client implementation in our GitHub repository. This boilerplate demonstrates how to interact with the application by:

Receiving audio chunks in JSON messages.
Accumulating a certain number of chunks.
Sends audio from an audio file in chunks

Note: This example is intended as a reference. You should refactor it to fit the specific requirements of your agent or application.

You can explore the full sample and get started quickly by cloning the repo:
View Example Client Code

​Overview

​Purpose

​Use cases

​Setup

​How to Use

​Upload Messages

​Outgoing Calls

​Real-Time Audio Exchange

​Message Formats

​Messages from Awaaz AI

​Start Message

​Media Message

​DTMF Message

​Mark Message

​Stop Message

​Messages to Awaaz AI

​Media Message

​Mark Message

​Clear Message

​Example Client Code for Your Application

Overview

Purpose

Use cases

Setup

How to Use

Upload Messages

Outgoing Calls

Real-Time Audio Exchange

Message Formats

Messages from Awaaz AI

Start Message

Media Message

DTMF Message

Mark Message

Stop Message

Messages to Awaaz AI

Media Message

Mark Message

Clear Message

Example Client Code for Your Application