Towards a Secure Telephone Network

by Dave D'Rave

Analog telephone systems were invented in the 1870s.

First-generation digital telephone systems (T-1) were developed in the late 1950s.  Neither were specified with any thought to the requirements of security or privacy.  We are living with the results of those decisions today.

It should be possible to build telephones which are compatible with the current infrastructure, which can do the following:

  • Originate and receive calls with high-quality end-to-end encryption.
  • Originate calls using secure signals, such that third-parties cannot read the metadata.
  • Receive calls using authenticated signals, such that Caller ID cannot be spoofed.
  • Originate (non-secure) calls to legacy telephones, with or without Caller ID.
  • Receive calls from legacy telephones.

Encryption of the Voice Channel

There are a variety of methods which allow secure voice communication, if you have good key management.  Modern microprocessors are very powerful, and good quality crypto can be implemented without excessive battery use.  You can read about the details of modern crypto and "man-in-the-middle" attacks on your favorite website.

Let's just say that good crypto exists, but that many exploits exist.  One of the best exploits in the plain old fashioned "bug the place in which you are using the phone" method, which defeats all of the other crypto you may be using.

The main idea behind these proposals is that telephone calls should normally operate in an end-to-end encrypted mode, that the two directions should use different encryption keysets, and that the keysets should change for every phone call.  Each phone call should generate unique session keys using some kind of hard- ware random number generator.

The proposed crypto method for voice is to use 14-bit linear encoding sampled at 40 kHz, compressed using some kind of lossy algorithm (wavelets are a good choice), packetized, and then encrypted using one of the Rijndael family of algorithms.  This will probably require 100 kbps of physical bandwidth for high-grade voice quality.

Secure and Authenticated Metadata

To make a phone call to someone without delivering any metadata to the phone system, you will need a telephone session server.  Most IP phone systems can support this.

The proposed method is as follows:

  • You enter the number you wish to call.
  • Your local phone then connects to the server and transfers the various information needed over an encrypted command channel.
  • The server then calls the person you wish to talk to.  If they have a standard phone, then the Caller ID says whatever you want ("Santa Claus 800-NOR-POLE"), or it says nothing.  If they have a compatible phone, then the server will deliver an authenticated packet, which contains the Caller ID to be reported to the user.
  • Assuming that the person you want has a compatible secure phone and answers the call, the server will authenticate them, transfer the session keys for voice encryption, and you can start talking.

This procedure means that, as far as the phone company is concerned, you made a call to the server, and the server made a call to your friend, but there is no connection between the two of you.  Being able to make calls without giving metadata to the switching network is an important first step towards secure and private communications.

Various methods could be used to further obscure the metadata.  One idea is to use callback: When you originate a call, the initial setup procedure results in a very short call, followed by the server calling you back using a different number (typically one with no Caller ID).

Another method is to have the server periodically change modes from "cell phone dataplan" mode to "digital voice mode," which would appear to the unwanted observer to be an incoming fax or something.

Degrading Traffic Analysis

An important category of hostile data collection is traffic analysis.

By observing how many packets go from the person originating the call versus how many packets go from the person receiving the call, some idea can be gained about the call contents.

The solution is to send random packets at random intervals, so as to balance out the apparent data flow.

It also may be a good idea to send dummy traffic on the command channel to obscure, for example, which time zone a phone is operating in.

Key Management and Authenticated Data Transfer

The usual way to attack these kinds of systems is to engage in a "man-in-the-middle" attack.  The typical way to prevent such attacks is to use public-key cryptography, with authorized servers and their authorized public keys pre-registered.  As a practical matter, the server's public key needs to be installed at the factory.

There are several levels of secure communications in the system.

First, there is an authenticated/encrypted channel between the originating phone and the server.

Second, there is an authenticated/encrypted channel between the server and the receiving phone.

Third, there is end-to-end bidirectional encryption between the two phones.

Compatibility with 802.11 and IP Phone Systems

This system of telephony can be used with conventional voice channels, and it can also be used with packet voice or data channels.

The physical medium could be anything with sufficient bandwidth/latency, which obviously includes the Internet.

One interesting feature is the ability of this system to run the control packets over a different channel from the actual voice, which enhances metadata security.

For example, you could run the voice packets over a cell phone data network, but send the setup, authentication, and management packets over a non-related 802.11 connection.

Compatibility with Non-Secure Phones

While phones which do not support end-to-end encryption will not be as secure, there are still certain advantages to using this system.

Metadata will be partly obscured, and incoming calls can be scrubbed against spam much more efficiently if a phone server is handling the routing.

In addition, a smart phone could be programmed to give a red flash or a distinctive ring when a non-secure call is incoming, which would give the operator the option of declining the call based on its security level.

More Advanced Features

Secure conference calls require a server that supports individual encrypted links.

It also requires that the conference device itself has access to the unencrypted voice data.  Conference calls inherently operate at a reduced level of trust.

A somewhat better secure conference call server would consist of multiple inbound (receive-only) voice channels, along with a standard conference call device which is in a secure location.  The effective security of such a system would depend critically on the physical security of the server, and on the use of VPN technology to disguise the physical location from IP scanning.

Tor is probably not a good idea.

For one thing, it seems that more than half of the Tor gateways are controlled by national intelligence agencies.  For another thing, Tor has poor latency characteristics.

Technical Details of Layer 1

The usual operating mode for this type of device is some kind of packet-oriented low-latency network, such as the CDMA technology used by cell phones.

Generally speaking, any fast Ethernet-type network will work.

When connecting over an analog network, such as POTS or "Analog Cell Phone," the voice traffic must be converted into digital signals using an analog modem.  (This is 1980s technology.)  While the connection will work, voice quality may be degraded substantially.

When connecting to a legacy (non-encrypted) phone, voice quality will be limited by whatever the non-encrypted channel supports.

Technical Details of Layer 2 and Layer 3

The usual issues of MAC address spoofing and VPN setup apply.

If you are using a VoIP system, it is probably a good idea to identify calls as "fax data" or "compressed video."

If you are using generic Internet connections, packets can be identified as "HTTP traffic" or "FTP traffic."

Return to $2600 Index