iXBT Labs - Computer Hardware in Detail






IP-telephony problems


One of the leading tendencies in telecommunication development is joining of this scientific and technical sphere together with information science. It became possible due to the fact that a computer does not just operate with the data but it can transfer and receive them as well. So, in this context the network is considered to be some universal medium for spreading of information.

The modern networks are based on the methods of packet transfer and commutation. They use a simple idea of presentation of any kind of information (data, images, speech, sound, service and controlling messages) as a numerical order which is divided into small parts called packets that have the necessary information for their identification, routing, errors correction etc provided. This approach allows to transfer all kinds of information, use different means to transfer the data and use universal commutation systems.

Under the conditions of unlimited networks and unlimited channel capacity the development of such networks is a pure technical problem. The scientific and technical problems emerge when we come across the resource boundedness. Moreover, these problems differ depending on the kind of information, and they require the specialists in the topic.

In this article we will consider some issues concerned with the telephone talks on the net with packet transfer and commutation under the conditions of scarce resources. In fact, these problems include juridical, economic, scientific, technical and organizational parts. But since the technical and scientific aspects are given practically no attention in home publications, we will touch upon this most.

Essentially, the basic task of the telephony variant under discussion is to provide the voice communication of 2 or more abonents of different networks by one net.

Since the majority of the networks and mainly the Internet use IP (Internet Protocol) to form the packets, then it will be quite rational to use the name of IP-telephony for the telephony on the Internet and Intranet instead of "Internet-telephony" as we used to come over in home and foreign publications.

This article is written due to the following plan. First of all there will be considered variants of telephony based on the Internet, and this will allow to mark out the basic elements of IP-telephony. Then we will move to the voice signal representation in telephony. We will discuss in detail the features of the Internet channel and take a look at the experimental data received when studying the key features. Taking into account the channel characteristics and ways of organization of telephony connections we will formulate the rules and ways of creation of vocoders. We will discuss the architecture of the gateway as a basic element of IP telephony systems, and problems of signal processing in the gateway. We will also touch upon the voice transfer on the Internet and estimate the possibility and rationality of usage of the present protocols. And in conclusion we will discuss the ways of gateway soft hardware implementation.

Different models of IP-telephony

There are 2 basic schemes of IP-telephony which are widely adopted. The first one (fig.1) is concerned with telephone talks organization between PC users. The computers must have multimedia and/or special programs, soft hardware maintaining duplex telephone talks, the required service and control provided. The users' PCs must be connected to the local net, have a personal IP-address or connect the Internet via the modem.

Fig. 1. Structure chart of organization of telephone connection via Internet

The second scheme (fig.2) provides for including special multifunction devices - gateways. The gateway is used to represent analog voice and service signals as a numerical order, organize Internet packets out of this order, transfer them onto Internet, receive the pockets, convert digital signal into analog. It's also used to organize interface, generate and detect signals of abonent signaling, operate the modes of telephone talks etc.

The gateways can be installed on the servers of Internet providers, city telephone exchanges, private-branch offices, local network servers, Web-servers of the companies needed in voice hot-lines, technical support services and routers.

Depending on the scheme of connection organization the gateway architecture differs, that is some functions and interfaces can change. However, it implements the basic functions: quality duplex packet transmission and commutation of digital signals.

The basic schemes described above can combine. There are different ways of organizing of IP-telephony using gateways which are in different net points. However, according to many reviews the advertisements of the most of companies working in the field of IP-telephony, the gateway usage is a mainline nowadays, and the gateway itself is a key element.

Fig. 2. Structure chart of organization of telephone connection via Internet using gateways

Voice signal representation

Let's consider a voice dialogue in the Internet. This process has three stages:


  • connection (of the abonents)
  • information exchange
  • disconnection

At the first and the third stages there only the service data that transfer, and at the second stage the abonents exchange both service data and information.

The source of informational data is a voice signal. There are different types of signal segments: vocalized, unvocalized, intermediate and pauses. The length of different signals in digital form takes up different number of bits for data encoding and transfer. Therefore, the transmission rates of different signals can also differ. That's why voice data transmission in each direction of duplex channel is considered as transmission of anisochronous segments of transactions with block synchronization included.

The described model is a basis for analysis and synthesis of IP-telephony system. Anisochronism of transactions allows at one hand to optimize the traffic at the expense of decreasing of the average transmission rate, and at the other hand to compensate fluctuation in the channel at the expense of relative free reproduction of each transaction. That's why the described model of voice signal allows to change the standard problem setting of voice signal codec construction for IP-telephony systems. This type of codecs is to be built with the variable rate. This issue we will consider later.

Internet channel features

The Internet channels are characterized by:


  • real bandwidth defined as a "bottleneck" in virtual channel at the given moment.
  • traffic that depends on the time;
  • packet latency that depends on traffic, number of routers, physics characteristics of channels, delays for operating in signals occurring in voice codecs and other gateway devices; all these are also dependent on the time;
  • packet losses which ride on "bottleneck", queues;
  • interchanging of packets which are delivered by different ways.

The described effects we can demonstrate on the graphics. So, the fig. 3 shows packet latency histograms which demonstrate empirical probability distribution of delays. The abscissa axis indicates the relative delay of the real packet from the ideal one per time unit.

Fig. 3. Packet latency histograms

The packet delays are greatly dependant on the time. The graphic chart has a large dynamic range and rate of changing. Noticeable alteration of transmission time can occur during one short communication session, and fluctuation of transmission time can constitute from 10 ms till 1 s.

Fig. 4. Packet losses histograms

The fig. 3 shows the values of the delays and their probabilities. These data help to organize processing procedure and choose processing parameters. Thus, a temporary structure of voice packet stream is changing. So, there is a necessity to create a buffer to convert a packet voice signal, that has delays in the channel or packets interexchanges, into a contiguous natural real-time voice signal. The buffer parameters depend on signal latency value in a duplex mode and packet losses percentage. Packet losses is another negative factor in IP-telephony.

Fig .4 demonstrates packet losses histograms. The abscissa axis indicates the number of packets lost in succession. The histogram shows that losses of one, two or three pockets are probable more then losses of big packs.

It's essential that losses of big packets can lead to inconvertible local voice deformation, whereas losses of 1-3 packets can be compensated.

Traffic increase causes delays and losses in the channel. Since the bandwidth is limited, it occurs when there is heavy traffic both integral and local. The curves (fig. 3, 4) achieved at different transmission rates indicate the necessity of usage of less voice transmission rates to get a desirable telephony quality.

IP telephony vocoders

The features of voice transmission channels (mainly on the Internet) and possible models of telephony on the basis of Internet make a set of demands to vocoders. Since the voice data are transferred within the packets, there is no need of encoding and synchronous transfer of the voice signals equal in duration. As we have said already the most natural and rational way for IP-telephony is usage of codecs with variable voice signal encoding rate. The vocoder with variable rate is based on the input signal classifier which defines the data amount and, thus, chooses the encoding method and voice data transmission rate. One of the simplest voice signal detectors is Voice Activity Detector (VAD) which extracts active speech and pauses out of the input signal. The "active speech" signal is encoded according one of the popular algorithms (as a rule, on the base of Code Excited Linear Prediction (CELP) method) at the rate of 4-8 KBytes/s. The "pause" signal is encoded and transferred at low rate (0.1 - 0.2 KBytes/s) or not transferred at all. The first case is more desirable.

Since there are more effective detectors of input signal, it allows to optimize encoding strategy choice (data transmission rate) when the signals of more importance for speech quality have higher rate than those of less importance. This model allows to reach low average rates (2-4 KBytes/s) at high quality of synthesized voice.

Notice, that for concerned applications the traditional problem for vocoders of decreasing the delays with the signal being in codec is not actual because the total delay in IP-telephony mainly depends on the delays when the signal passes the Internet channels. Nevertheless, the solutions allowing to decrease the delay in vocoder are of practical interest.

Analysis of voice quality shows that the main source of artifacts appearing, quality decreasing and synthesized speech intelligibility is an interruption of voice stream caused by packet losses or exceeding of maximum permissible time for voice packet delivery. Fig. 4 shows that one packet loss probability is higher than probability of loss of packet series in succession. We expect that in future under the growing bandwidth, optimization of routers and protocols the leading role will be belong to one packet losses. Notice, that when the packet is delivered the data, as a rule, have no losses. And under these conditions antinoise coding is not rational.

So, one of the central problems of voceder developing for IP-telephony is creation of voice compression algorithms which are tolerant to packet losses.

For servicing of wide net of abonents the IP-telephony with gateway usage must include abonent communication lines with analog ends. This means that the analog voice signal synthesized in the gateway will proceed by connection line to the abonent's telephone. And the similar signal will go from the microphone of abonent's telephone by analog line to the vocoder in the gateway. The classic algorithms of low-rate voice compression are sensitive to amplitude-frequency distortions which can occur in connection lines and acoustic tracks. That's why it must be taken into account when creating algorithms of low-rate vocoders.

What are the perspectives for vocoder development of IP-telephony? What do we have now and what do we expect to achieve in the nearest future? According to different issue data there are no so far any research works for Internet-telephony which were recommended by ITU-T. Among the international standards recommended for systems of this kind the following standards are mentioned most of all: G.723.1 for the voice rate of 5.3 and 6.3 KBytes/s, and G.729 for the rate of 8 KBytes/s.

These standards ensured quite high quality of voice transmission under the ideal conditions. First they were developed for channels different from Internet and a bit later they became partly adopted to packet losses. The developments of these standards include Voice Activity Detector and elements which process voice signal synthesis in the segments that correspond to the lost voice data. Nowadays the firms and universities leading in the sphere of telecommunications are developing the algorithms of vocoders for Internet-telephony. According to the ad publications and our research works we expect compression algorithms with average rate at 2-4 KBytes/s and lower quality of synthesized voice with permissible distortions under the conditions of 20% voice packet losses.

Now let's pay attention to the perspective ways of developing of low-rate vocoders with variable rate. In each case the methods which use linear prediction are preferable. The usage of CELP-algorithms is best for rates more than 3 KBytes/s. For lower rates the algorithms will be developed on the base of proper detection of voice signal followed by rational encoding.

The gateway and its architecture.

The gateway is a basis of IP-telephony. It converts service signals and data from one net (i.e. PSTN) into the Internet packets and back. The convertion must not distort a voice signal much, and the transmission mode must keep the exchange of information between abonents in a real-time mode.

The functions of gateway at the point to point connection:


  • Realization of physical interface with network.
  • Detection and generation of signals of abonent signaling
  • Convertion of signals of abonent signaling into data packets and back.
  • Connection of abonents.
  • Transmission of signaling and voice packets.
  • Disconnection of abonents.

The most functions of gateways with the architecture TCP/IP are carried out in the applied processes.

The functions of different types lead to the problem of its soft hardware realization. The rational solution of this problem is based on the usage of distributed system where service tasks and net connection are carried out using the universal processor, and signal processing and telephony interface hold on the digital processor.

Signal processing in the gateway

Fig.6 demonstrates the signal processing in the gateway with analog 2-wire PSTN channel connected.

Fig. 6. Scheme of signal processing in the gateway

The telephony signal proceeds from 2-wire trunk to the differential system that divides the receiving and delivering parts of the channel. Then the output signal together with a small part of input signal is delivered onto ADC where it converts into 12- or 8-digit signal. In the echo-canceller the part of the input signal is deleted. The echo-canceller is an adaptive nonrecursive filter, the memory length of which and adaptation mechanism are chosen to meet the requirements of ÌÊKÒÒ (ITU-T) G.165. For detecting of MF signals, DTMF signals or pulse dialing they use the corresponding detectors. The further processing of input signal is carrying out in a voice coder in session mode where the signal divides in separate segments (each 30 ms), and each input block is correlated with I-frame (137 b).

VAD (voice activity detector) differentiates the pauses and voice. If the pause appears the I-frame may not proceed to the virtual channel service. Let's look at the pause frame transmission mode. Only every fifth signal of the same type proceeds to the session level. The present spectral parameters take 27 bit for encoding in the absence of voice. The logical channel receives either I-frame (137 or 227 bit) or confirmation of pause. On the pause frames a generator of comfort noise reproduces the spectral distribution of the pause signal. On receiving the pause I-frame, the parameters of the generator renew. The I-frame (137 bit) switches on a voice decoder that forms 12-digit voice signal. For echo-canceller this signal is a signal of the distant abonent, the filtering of which gives a component of electrical echo in the delivered signal.

The analysis of the scheme of signal processing and the experience allow to define the following problems of digital processing of signals in the gateway.

With the usage of 2-wire trucks the actual problem is echo-cancelling when it's necessary to compensate speech (voice) and telephony signaling. Another important problem is detecting of telephony signaling since service and voice signals can interchange.

The key problem of vocoder developing was discussed in the part "IP-telephony vocoders". The close problem is VAD synthesis when it's necessary to detect the pauses on the background of quite intensive noise (offices, streets, cars etc).

Net protocols

When organizing telephony talks on the net, it's necessary to transfer 2 types of information: service and voice. The first one includes the call signals, disconnection signals and other service messages.

The foundation stone of the Internet is Internet Protocol (IP). This protocol is of net level, it provides the packet routing on the net. Though it does not guarantee the ideal delivery of packets. So, packets can be distorted, delayed have different rout (that is different delivery time). On the basis of IP there are protocols of transport level Transport Control Protocol (TCP) and User Datagram Protocol (UDP).

The basic requirement to command information transmission is absence of errors. So, it is necessary to use the reliable message delivery protocol. One of this kind is TCP that provides guaranteed message delivery. The delivery time is of great importance as well, but it is unstable because if errors appeared the message is transmitted again and again until successfully. And the duration of service procedures can unlimited increases that is inadmissible for connection stage and some other procedures. That's why the problem of creation of the reliable transmission mechanism remains. It must both guarantee error-free delivery of information and minimize the delivery time when errors appear.

The problem of voice packet delivery time is a central one. It is caused by necessity to maintain the talk of the abonents in real time mode, for what the delays must not exceed 250 - 300 ms. Under such conditions no repeated message transmissions must occur. Therefore, there used inauthentic transport protocols, for example, UDP. If a transmission error occurs it is registered without any repeated transmissions. The packets transmitted due to the UDP protocol can be lost. It's concerned either with equipment or with the fact that the "lifetime" of the packet has elapsed and he was destroyed on one of the routers. In the second case no repeated transmissions are organized. In the transmission process both the packet transpositions and packet distortion can happen, though the second happens seldom.

A voice stream must be restored before it comes onto the decoder, for what the real-time protocol is used. The head of the given protocol contains a time mark and a packet number. These parameters allow to define not only the order of the packets in the stream, but also a moment of decoding of each packet, that is, it allows to restore the stream. The most widespread protocol of this kind is Real Time Protocol (RTP) recommended to usage in the standard on construction of H.323 real time systems.

The packet stream distortions are concerned with the net load. A voice packet stream can considerably load the net, especially in the case of multichannel systems. It happens due to the high intensity of the stream (small-size frames has 20 bt/30ms rate) and big length of the transmitted service information. The general head length of a voice packet 2 times exceeds the packet size. The transmission of the service data of such length is unacceptable, especially in multichannel systems. Thus, it is necessary to search for methods of decreasing of service data length. There are two possible solutions of this problem. The first proposes to build special transport protocols for a IP-telephony, which could reduce a head of the protocol of a transport level. The second version suggests multiplexing of channels in multichannel systems. In this case the voice packets from different channels are transmitted under one net head. Such solution reduces stream intensity as well.

The primal problem of IP-telephony is approximation of service quality to telephone service. It means the necessity of developing of transport mechanisms which would minimize the delivery time both of service and voice information.

Gateway realization for IP-telephony

As we mentioned in the beginning all IP-telephony systems can be divided into the basic schemes: for PC-users and users of a telephone network (via the Internet without PC usage).

The first scheme has two variants of realization: software (when all procedures are carried out by the PC with a built-in soundcard), and soft hardware (when DSP card is installed in the PC that fulfils the basic functions and unloads the PC for other operation). The companies releasing different software prefer the first variant. The most widespread product is Net Meeting of Microsoft.

The second variant is also widespread enough. For this variant only soft-hardware realization is possible, when the set consists of specialized DSP cards or modules working under the control of some module of CPU. First products of this kind have appeared approximately 1.5 -2 years ago and were made on the basis of boards of Dialogic company and software of VocalTech company. The gateway was called VocalTech Gateway and it is available in the present time. The similar product V/IP was made by Micom company, its basis is a DSP board installed in IBM-PC and working under the control of special software.

The similar methods of gateway building are quite convenient for office and, probably, corporate applications, but it is not so for large Internet providers and telecommunication companies which must have multichannel systems because of unreliability of operation and complexity of maintenance of a huge number of channels. These problems should be solved at gateway hardware development taking into account the limitation of specific cost for a channel. Modern development of element base and standardization in PC industry allows to solve these problems quite effective .

The basic component for gateway hardware realization is a Digital Signal Processor (DSP). During the last years we have been witnessing rapid growing of nomenclature of devices, their productivity, extending of chip functionality. It is necessary to mark out the DSP developed for multichannel processing that reduces the specific cost of equipment. The first and most powerful DSP of this class is TMS320C6201 of Texas Instruments company (up to 1600 MIPs), on which the realization of 16 and more voice channels via IP is possible. Analog Devices company has recently taken part in rapid race and boosted its 600 MIPs DSP with a floating point ADSP21160, which despite lower productivity has a larger memory and improved architecture.

One of the most popular platforms is based on the Compact PCI bus which has high speed (that is necessary for multichannel systems), widespread and cheap software (the complete electrical and functional analog of PCI bus), strong support of the manufacturers of industrial systems. Notice, that there are standardized optional buses for telecommunication applications. The first such bus was SCbus that was developed by Dialogic company. And about a year ago the CTbus bus appeared as a development of the SNbus.

For all mentioned buses there are specialized chips necessary for bus adapter development, which simplifies and makes cheaper hardware production.

The big companies, the manufacturers of the telecommunication equipment, such as Siemens, Lucent, Motorola, Nokia develop actively this perspective segment of IP-telephony market. As a rule, each manufacturer offers its own architecture, internal bus, control and monitoring methods. Small and middle companies are also competing with the giants thanks to rapid development of standardization of industrial computers, availability and low cost of componentry (from system block to any other components) providing with all necessary features of industrial systems.

The problems appearing while gateway developing in many respects are similar to problems which are being solved while developing of modern station equipment. At the same time, there is a specificity defined by broad application of DSP (up to ten and more on one board) and features of used algorithms.

If analog and digital parts occupy the gateway board together, there arises a problem of electromagnetic compatibility. If analog and digital parts are placed apart, there is a problem of their coupling.

If a great number of powerful DSP occupy one board, i.e. TMS320C6201, there appears a problem of huge power consumption.

When building a gateway it is important to ensure the coordination of algorithm and hardware. The hardware should rationally serve the algorithm of gateway operation. It is not always easy to do if using the equipment economically. At the same time, admissible modifications of algorithm (parallelization of calculations, optimization of resource control, rational order of calculations etc) can influence the hardware realization structure and, as a whole, give the best solution.


The purpose of this article is to show not only technical realization of IP-telephony but also particular features of this realization, scientific and technical problems, reasons of their appearing and possible solutions. We wanted to show to the reader that IP-telephony is a particular sphere of telephone communication that integrates methods and means of digital signal processing, voice techniques, control of computing resources on the basis of high techniques. This is not only of large commercial interest, as you often can see in the newspapers and magazines, but it leads to fascinating scientific research works and engineering developments, and it is a beneficial and grateful field for students and young engineers.

Write a comment below. No registration needed!

Article navigation:

blog comments powered by Disqus

  Most Popular Reviews More    RSS  

AMD Phenom II X4 955, Phenom II X4 960T, Phenom II X6 1075T, and Intel Pentium G2120, Core i3-3220, Core i5-3330 Processors

Comparing old, cheap solutions from AMD with new, budget offerings from Intel.
February 1, 2013 · Processor Roundups

Inno3D GeForce GTX 670 iChill, Inno3D GeForce GTX 660 Ti Graphics Cards

A couple of mid-range adapters with original cooling systems.
January 30, 2013 · Video cards: NVIDIA GPUs

Creative Sound Blaster X-Fi Surround 5.1

An external X-Fi solution in tests.
September 9, 2008 · Sound Cards

AMD FX-8350 Processor

The first worthwhile Piledriver CPU.
September 11, 2012 · Processors: AMD

Consumed Power, Energy Consumption: Ivy Bridge vs. Sandy Bridge

Trying out the new method.
September 18, 2012 · Processors: Intel
  Latest Reviews More    RSS  

i3DSpeed, September 2013

Retested all graphics cards with the new drivers.
Oct 18, 2013 · 3Digests

i3DSpeed, August 2013

Added new benchmarks: BioShock Infinite and Metro: Last Light.
Sep 06, 2013 · 3Digests

i3DSpeed, July 2013

Added the test results of NVIDIA GeForce GTX 760 and AMD Radeon HD 7730.
Aug 05, 2013 · 3Digests

Gainward GeForce GTX 650 Ti BOOST 2GB Golden Sample Graphics Card

An excellent hybrid of GeForce GTX 650 Ti and GeForce GTX 660.
Jun 24, 2013 · Video cards: NVIDIA GPUs

i3DSpeed, May 2013

Added the test results of NVIDIA GeForce GTX 770/780.
Jun 03, 2013 · 3Digests
  Latest News More    RSS  

Platform  ·  Video  ·  Multimedia  ·  Mobile  ·  Other  ||  About us & Privacy policy  ·  Twitter  ·  Facebook

Copyright © Byrds Research & Publishing, Ltd., 1997–2011. All rights reserved.