Excellence in Software Engineering
ZRTP: The Steel Wall of VoIP Encryption
09 January 2017

Author: Melih KÜÇÜKERDÖNMEZ, SW Consultant – Embedded Systems


In the world of verbal communication, one cannot have the ultimate security (unless it does not involve telepathy). Even if you and the person you talk to are in a room constructed with 10 inch steel walls, can you still be confident that no one could listen to your conversation? With today’s technology, surely not. But having a secure chat behind closed doors is not our concern. Let’s look at the steel walls of Voice-over-IP (VoIP), which is most common form of communication, especially in business.

When VoIP started blooming on the early 2000, everyone started to realize one major problem they are facing; merging voice and data networks sure cuts the price but makes the voice communication susceptible to attacks from the internet region. The voice media uses RTP, so the solution had to be to encrypt the RTP session, resulting in Secure RTP (SRTP). MIKEY, SDES and DTLS-SRTP (we will talk about the last one later) are a few key exchange methods required to encrypt the sessions. MIKEY and SDES need the help of another channel for the key management.

A key exchange is always open to be interfered by malevolent people, whom we call specifically the Man-in-the-Middle (MitM). Those guys sniff the network, look for key exchanges, catch them and use to spoof themselves as the other party that someone wants to talk to. If the entire security of a connection depends on third parties as in the case of Public Key Infrastructures (PKI) [1], the risks could increase.

So we need to free SRTP to rely on other protocols or channels for key exchange. Here comes Zimmerman RTP (ZRTP), sharing the same creator of Pretty Good Privacy (PGP) [2], Phil Zimmerman. It is an open standard, defined by RFC 6189 [3], and created for real P2P encryption of VoIP. This means not even servers required to route the call could decrypt the media and listen what it is saying.


How it works?

ZRTP is a key exchange method intended for VoIP communication. It works on the same channel that the RTP session would establish, thus it does not rely on another protocol. This is one of good parts that you don’t have to secure the session establishment channel (though it escapes me who would not when ZRTP is preferred). Gotten rid of one dependency, it also removes the need for a PKI, because it does not use public keys. ZRTP logic is that 2 entities should just have to prove each other that they are who they say they are. No authorities required.

Now here comes the best part; how does ZRTP deal with the MitM attacks? By verbal confirmation between 2 parties. It introduces a concept of Short Authentication String (SAS) that is 4 bytes long. This short text, generated during the session negotiation, are presented on each party’s device or application, so that they could read it to each other. If your number does not match the other one, it would be wise not to speak any company secrets, because it means someone has intervened. I like to compare this with 2-F authentication of mobile devices. ZRTP requires Diffie-Hellman, the key exchange algorithm generating the keys, which could be thought of as something you know. And SAS, which only shows up on the device you are using, could be the something you have.

Even if both parties are too anxious to start chatting about the next big thing and do not bother with reading SAS out loud, ZRTP defines to keep some of the key elements used in securing the channel for future use. They can be partially kept to be used to derive the new keys for the next one. This is called as Key Continuity. Mind you that ZRTP has also Perfect Forward Secrecy, a fancy term that just says the keys themselves are destroyed at the end of each session.


The possibility of a MitM attack is not zero for ZRTP. But it is very low. One has to capture the very first session between 2 parties. Even then, the SAS needs to be guessed which has a possibility of 1 in 65536. Surely, there are other P2P encryption methods, but none of them are an open standard. Thus they work only between themselves. DTLS-SRTP is one that has similarities with ZRTP, but it does not particularly deal with MitM attacks. ZRTP is the only method that tries to solve every security gap in one place. And having that freely available is great to secure voice communications.

[1] PKI: https://en.wikipedia.org/wiki/Public_key_infrastructure
[2] PGP: https://en.wikipedia.org/wiki/Pretty_Good_Privacy
[3] RFC 6189: https://tools.ietf.org/html/rfc6189

Past Articles

The Existence of the Desk Phones

The Existence of the Desk Phones

Although the worldwide spread of smartphones and UC applications raise some question marks about the future of desk phones, it is certain that this negative change will not occur overnight, and desk phones will continue to a long way ahead, despite some challenges.