IP-telephony problems
Introduction
One of the leading tendencies
in telecommunication development is joining of this scientific and technical
sphere together with information science. It became possible due to the
fact that a computer does not just operate with the data but it can transfer
and receive them as well. So, in this context the network is considered
to be some universal medium for spreading of information.
The modern networks are based on the methods of packet transfer
and commutation. They use a simple idea of presentation of any kind of
information (data, images, speech, sound, service and controlling messages)
as a numerical order which is divided into small parts called packets
that have the necessary information for their identification, routing,
errors correction etc provided. This approach allows to transfer all kinds
of information, use different means to transfer the data and use universal
commutation systems.
Under the conditions of unlimited networks and unlimited
channel capacity the development of such networks is a pure technical
problem. The scientific and technical problems emerge when we come across
the resource boundedness. Moreover, these problems differ depending on
the kind of information, and they require the specialists in the topic.
In this article we will consider some issues concerned with
the telephone talks on the net with packet transfer and commutation under
the conditions of scarce resources. In fact, these problems include juridical,
economic, scientific, technical and organizational parts. But since the
technical and scientific aspects are given practically no attention in
home publications, we will touch upon this most.
Essentially, the basic task of the telephony variant under
discussion is to provide the voice communication of 2 or more abonents
of different networks by one net.
Since the majority of the networks and mainly the Internet
use IP (Internet Protocol) to form the packets, then it will be quite
rational to use the name of IP-telephony for the telephony on the Internet
and Intranet instead of "Internet-telephony" as we used to come
over in home and foreign publications.
This article is written due to the following plan. First
of all there will be considered variants of telephony based on the Internet,
and this will allow to mark out the basic elements of IP-telephony. Then
we will move to the voice signal representation in telephony. We will
discuss in detail the features of the Internet channel and take a look
at the experimental data received when studying the key features. Taking
into account the channel characteristics and ways of organization of telephony
connections we will formulate the rules and ways of creation of vocoders.
We will discuss the architecture of the gateway as a basic element of
IP telephony systems, and problems of signal processing in the gateway.
We will also touch upon the voice transfer on the Internet and estimate
the possibility and rationality of usage of the present protocols. And
in conclusion we will discuss the ways of gateway soft hardware implementation.
Different models of IP-telephony
There are 2 basic schemes
of IP-telephony which are widely adopted. The first one (fig.1) is concerned
with telephone talks organization between PC users. The computers must
have multimedia and/or special programs, soft hardware maintaining duplex
telephone talks, the required service and control provided. The users'
PCs must be connected to the local net, have a personal IP-address or
connect the Internet via the modem.
Fig. 1. Structure chart of organization of telephone connection
via Internet
The second scheme (fig.2) provides for including special
multifunction devices - gateways. The gateway is used to represent analog
voice and service signals as a numerical order, organize Internet packets
out of this order, transfer them onto Internet, receive the pockets, convert
digital signal into analog. It's also used to organize interface, generate
and detect signals of abonent signaling, operate the modes of telephone
talks etc.
The gateways can be installed on the servers of Internet
providers, city telephone exchanges, private-branch offices, local network
servers, Web-servers of the companies needed in voice hot-lines, technical
support services and routers.
Depending on the scheme of connection organization the gateway
architecture differs, that is some functions and interfaces can change.
However, it implements the basic functions: quality duplex packet transmission
and commutation of digital signals.
The basic schemes described above can combine. There are
different ways of organizing of IP-telephony using gateways which are
in different net points. However, according to many reviews the advertisements
of the most of companies working in the field of IP-telephony, the gateway
usage is a mainline nowadays, and the gateway itself is a key element.
Fig. 2. Structure chart of organization of telephone connection
via Internet using gateways
Voice signal representation
Let's consider a voice dialogue
in the Internet. This process has three stages:
- connection (of the abonents)
- information exchange
- disconnection
At the first and the third stages there only the service
data that transfer, and at the second stage the abonents exchange both
service data and information.
The source of informational data is a voice signal. There
are different types of signal segments: vocalized, unvocalized, intermediate
and pauses. The length of different signals in digital form takes up different
number of bits for data encoding and transfer. Therefore, the transmission
rates of different signals can also differ. That's why voice data transmission
in each direction of duplex channel is considered as transmission of anisochronous
segments of transactions with block synchronization included.
The described model is a basis for analysis and synthesis
of IP-telephony system. Anisochronism of transactions allows at one hand
to optimize the traffic at the expense of decreasing of the average transmission
rate, and at the other hand to compensate fluctuation in the channel at
the expense of relative free reproduction of each transaction. That's
why the described model of voice signal allows to change the standard
problem setting of voice signal codec construction for IP-telephony systems.
This type of codecs is to be built with the variable rate. This issue
we will consider later.
Internet channel features
The Internet channels are characterized
by:
- real bandwidth defined as a "bottleneck" in virtual channel
at the given moment.
- traffic that depends on the time;
- packet latency that depends on traffic, number of routers, physics
characteristics of channels, delays for operating in signals occurring
in voice codecs and other gateway devices; all these are also dependent
on the time;
- packet losses which ride on "bottleneck", queues;
- interchanging of packets which are delivered by different ways.
The described effects we can demonstrate on the graphics.
So, the fig. 3 shows packet latency histograms which demonstrate empirical
probability distribution of delays. The abscissa axis indicates the relative
delay of the real packet from the ideal one per time unit.
Fig. 3. Packet latency histograms
The packet delays are greatly dependant on the time. The
graphic chart has a large dynamic range and rate of changing. Noticeable
alteration of transmission time can occur during one short communication
session, and fluctuation of transmission time can constitute from 10 ms
till 1 s.
Fig. 4. Packet losses histograms
The fig. 3 shows the values of the delays and their probabilities.
These data help to organize processing procedure and choose processing
parameters. Thus, a temporary structure of voice packet stream is changing.
So, there is a necessity to create a buffer to convert a packet voice
signal, that has delays in the channel or packets interexchanges, into
a contiguous natural real-time voice signal. The buffer parameters depend
on signal latency value in a duplex mode and packet losses percentage.
Packet losses is another negative factor in IP-telephony.
Fig .4 demonstrates packet losses histograms. The abscissa
axis indicates the number of packets lost in succession. The histogram
shows that losses of one, two or three pockets are probable more then
losses of big packs.
It's essential that losses of big packets can lead to inconvertible
local voice deformation, whereas losses of 1-3 packets can be compensated.
Traffic increase causes delays and losses in the channel.
Since the bandwidth is limited, it occurs when there is heavy traffic
both integral and local. The curves (fig. 3, 4) achieved at different
transmission rates indicate the necessity of usage of less voice transmission
rates to get a desirable telephony quality.
IP telephony vocoders
The features of voice transmission
channels (mainly on the Internet) and possible models of telephony on
the basis of Internet make a set of demands to vocoders. Since the voice
data are transferred within the packets, there is no need of encoding
and synchronous transfer of the voice signals equal in duration. As we
have said already the most natural and rational way for IP-telephony is
usage of codecs with variable voice signal encoding rate. The vocoder
with variable rate is based on the input signal classifier which defines
the data amount and, thus, chooses the encoding method and voice data
transmission rate. One of the simplest voice signal detectors is Voice
Activity Detector (VAD) which extracts active speech and pauses out of
the input signal. The "active speech" signal is encoded according
one of the popular algorithms (as a rule, on the base of Code Excited
Linear Prediction (CELP) method) at the rate of 4-8 KBytes/s. The "pause"
signal is encoded and transferred at low rate (0.1 - 0.2 KBytes/s) or
not transferred at all. The first case is more desirable.
Since there are more effective detectors of input signal,
it allows to optimize encoding strategy choice (data transmission rate)
when the signals of more importance for speech quality have higher rate
than those of less importance. This model allows to reach low average
rates (2-4 KBytes/s) at high quality of synthesized voice.
Notice, that for concerned applications the traditional
problem for vocoders of decreasing the delays with the signal being in
codec is not actual because the total delay in IP-telephony mainly depends
on the delays when the signal passes the Internet channels. Nevertheless,
the solutions allowing to decrease the delay in vocoder are of practical
interest.
Analysis of voice quality shows that the main source of
artifacts appearing, quality decreasing and synthesized speech intelligibility
is an interruption of voice stream caused by packet losses or exceeding
of maximum permissible time for voice packet delivery. Fig. 4 shows that
one packet loss probability is higher than probability of loss of packet
series in succession. We expect that in future under the growing bandwidth,
optimization of routers and protocols the leading role will be belong
to one packet losses. Notice, that when the packet is delivered the data,
as a rule, have no losses. And under these conditions antinoise coding
is not rational.
So, one of the central problems of voceder developing for
IP-telephony is creation of voice compression algorithms which are tolerant
to packet losses.
For servicing of wide net of abonents the IP-telephony with
gateway usage must include abonent communication lines with analog ends.
This means that the analog voice signal synthesized in the gateway will
proceed by connection line to the abonent's telephone. And the similar
signal will go from the microphone of abonent's telephone by analog line
to the vocoder in the gateway. The classic algorithms of low-rate voice
compression are sensitive to amplitude-frequency distortions which can
occur in connection lines and acoustic tracks. That's why it must be taken
into account when creating algorithms of low-rate vocoders.
What are the perspectives for vocoder development of IP-telephony?
What do we have now and what do we expect to achieve in the nearest future?
According to different issue data there are no so far any research works
for Internet-telephony which were recommended by ITU-T. Among the international
standards recommended for systems of this kind the following standards
are mentioned most of all: G.723.1 for the voice rate of 5.3 and 6.3 KBytes/s,
and G.729 for the rate of 8 KBytes/s.
These standards ensured quite high quality of voice transmission
under the ideal conditions. First they were developed for channels different
from Internet and a bit later they became partly adopted to packet losses.
The developments of these standards include Voice Activity Detector and
elements which process voice signal synthesis in the segments that correspond
to the lost voice data. Nowadays the firms and universities leading in
the sphere of telecommunications are developing the algorithms of vocoders
for Internet-telephony. According to the ad publications and our research
works we expect compression algorithms with average rate at 2-4 KBytes/s
and lower quality of synthesized voice with permissible distortions under
the conditions of 20% voice packet losses.
Now let's pay attention to the perspective ways of developing
of low-rate vocoders with variable rate. In each case the methods which
use linear prediction are preferable. The usage of CELP-algorithms is
best for rates more than 3 KBytes/s. For lower rates the algorithms will
be developed on the base of proper detection of voice signal followed
by rational encoding.
The gateway and its architecture.
The gateway is a basis of IP-telephony.
It converts service signals and data from one net (i.e. PSTN) into the
Internet packets and back. The convertion must not distort a voice signal
much, and the transmission mode must keep the exchange of information
between abonents in a real-time mode.
The functions of gateway at the point to point connection:
- Realization of physical interface with network.
- Detection and generation of signals of abonent signaling
- Convertion of signals of abonent signaling into data packets and back.
- Connection of abonents.
- Transmission of signaling and voice packets.
- Disconnection of abonents.
The most functions of gateways with the architecture TCP/IP
are carried out in the applied processes.
The functions of different types lead to the problem of
its soft hardware realization. The rational solution of this problem is
based on the usage of distributed system where service tasks and net connection
are carried out using the universal processor, and signal processing and
telephony interface hold on the digital processor.
Signal processing in the gateway
Fig.6 demonstrates the signal
processing in the gateway with analog 2-wire PSTN channel connected.
Fig. 6. Scheme of signal processing in the gateway
The telephony signal proceeds from 2-wire trunk to the differential
system that divides the receiving and delivering parts of the channel.
Then the output signal together with a small part of input signal is delivered
onto ADC where it converts into 12- or 8-digit signal. In the echo-canceller
the part of the input signal is deleted. The echo-canceller is an adaptive
nonrecursive filter, the memory length of which and adaptation mechanism
are chosen to meet the requirements of ÌÊKÒÒ
(ITU-T) G.165. For detecting of MF signals, DTMF signals or pulse dialing
they use the corresponding detectors. The further processing of input
signal is carrying out in a voice coder in session mode where the signal
divides in separate segments (each 30 ms), and each input block is correlated
with I-frame (137 b).
VAD (voice activity detector) differentiates the pauses
and voice. If the pause appears the I-frame may not proceed to the virtual
channel service. Let's look at the pause frame transmission mode. Only
every fifth signal of the same type proceeds to the session level. The
present spectral parameters take 27 bit for encoding in the absence of
voice. The logical channel receives either I-frame (137 or 227 bit) or
confirmation of pause. On the pause frames a generator of comfort noise
reproduces the spectral distribution of the pause signal. On receiving
the pause I-frame, the parameters of the generator renew. The I-frame
(137 bit) switches on a voice decoder that forms 12-digit voice signal.
For echo-canceller this signal is a signal of the distant abonent, the
filtering of which gives a component of electrical echo in the delivered
signal.
The analysis of the scheme of signal processing and the
experience allow to define the following problems of digital processing
of signals in the gateway.
With the usage of 2-wire trucks the actual problem is echo-cancelling
when it's necessary to compensate speech (voice) and telephony signaling.
Another important problem is detecting of telephony signaling since service
and voice signals can interchange.
The key problem of vocoder developing was discussed in the
part "IP-telephony vocoders". The close problem is VAD synthesis
when it's necessary to detect the pauses on the background of quite intensive
noise (offices, streets, cars etc).
Net protocols
When organizing telephony talks
on the net, it's necessary to transfer 2 types of information: service
and voice. The first one includes the call signals, disconnection signals
and other service messages.
The foundation stone of the Internet is Internet Protocol
(IP). This protocol is of net level, it provides the packet routing on
the net. Though it does not guarantee the ideal delivery of packets. So,
packets can be distorted, delayed have different rout (that is different
delivery time). On the basis of IP there are protocols of transport level
Transport Control Protocol (TCP) and User Datagram Protocol (UDP).
The basic requirement to command information transmission
is absence of errors. So, it is necessary to use the reliable message
delivery protocol. One of this kind is TCP that provides guaranteed message
delivery. The delivery time is of great importance as well, but it is
unstable because if errors appeared the message is transmitted again and
again until successfully. And the duration of service procedures can unlimited
increases that is inadmissible for connection stage and some other procedures.
That's why the problem of creation of the reliable transmission mechanism
remains. It must both guarantee error-free delivery of information and
minimize the delivery time when errors appear.
The problem of voice packet delivery time is a central one.
It is caused by necessity to maintain the talk of the abonents in real
time mode, for what the delays must not exceed 250 - 300 ms. Under such
conditions no repeated message transmissions must occur. Therefore, there
used inauthentic transport protocols, for example, UDP. If a transmission
error occurs it is registered without any repeated transmissions. The
packets transmitted due to the UDP protocol can be lost. It's concerned
either with equipment or with the fact that the "lifetime" of
the packet has elapsed and he was destroyed on one of the routers. In
the second case no repeated transmissions are organized. In the transmission
process both the packet transpositions and packet distortion can happen,
though the second happens seldom.
A voice stream must be restored before it comes onto the
decoder, for what the real-time protocol is used. The head of the given
protocol contains a time mark and a packet number. These parameters allow
to define not only the order of the packets in the stream, but also a
moment of decoding of each packet, that is, it allows to restore the stream.
The most widespread protocol of this kind is Real Time Protocol (RTP)
recommended to usage in the standard on construction of H.323 real time
systems.
The packet stream distortions are concerned with the net
load. A voice packet stream can considerably load the net, especially
in the case of multichannel systems. It happens due to the high intensity
of the stream (small-size frames has 20 bt/30ms rate) and big length of
the transmitted service information. The general head length of a voice
packet 2 times exceeds the packet size. The transmission of the service
data of such length is unacceptable, especially in multichannel systems.
Thus, it is necessary to search for methods of decreasing of service data
length. There are two possible solutions of this problem. The first proposes
to build special transport protocols for a IP-telephony, which could reduce
a head of the protocol of a transport level. The second version suggests
multiplexing of channels in multichannel systems. In this case the voice
packets from different channels are transmitted under one net head. Such
solution reduces stream intensity as well.
The primal problem of IP-telephony is approximation of service
quality to telephone service. It means the necessity of developing of
transport mechanisms which would minimize the delivery time both of service
and voice information.
Gateway realization for IP-telephony
As we mentioned in the beginning
all IP-telephony systems can be divided into the basic schemes: for PC-users
and users of a telephone network (via the Internet without PC usage).
The first scheme has two variants of realization: software
(when all procedures are carried out by the PC with a built-in soundcard),
and soft hardware (when DSP card is installed in the PC that fulfils the
basic functions and unloads the PC for other operation). The companies
releasing different software prefer the first variant. The most widespread
product is Net Meeting of Microsoft.
The second variant is also widespread enough. For this variant
only soft-hardware realization is possible, when the set consists of specialized
DSP cards or modules working under the control of some module of CPU.
First products of this kind have appeared approximately 1.5 -2 years ago
and were made on the basis of boards of Dialogic company and software
of VocalTech company. The gateway was called VocalTech Gateway and it
is available in the present time. The similar product V/IP was made by
Micom company, its basis is a DSP board installed in IBM-PC and working
under the control of special software.
The similar methods of gateway building are quite convenient
for office and, probably, corporate applications, but it is not so for
large Internet providers and telecommunication companies which must have
multichannel systems because of unreliability of operation and complexity
of maintenance of a huge number of channels. These problems should be
solved at gateway hardware development taking into account the limitation
of specific cost for a channel. Modern development of element base and
standardization in PC industry allows to solve these problems quite effective
.
The basic component for gateway hardware realization is
a Digital Signal Processor (DSP). During the last years we have been witnessing
rapid growing of nomenclature of devices, their productivity, extending
of chip functionality. It is necessary to mark out the DSP developed for
multichannel processing that reduces the specific cost of equipment. The
first and most powerful DSP of this class is TMS320C6201 of Texas Instruments
company (up to 1600 MIPs), on which the realization of 16 and more voice
channels via IP is possible. Analog Devices company has recently taken
part in rapid race and boosted its 600 MIPs DSP with a floating point
ADSP21160, which despite lower productivity has a larger memory and improved
architecture.
One of the most popular platforms is based on the Compact
PCI bus which has high speed (that is necessary for multichannel systems),
widespread and cheap software (the complete electrical and functional
analog of PCI bus), strong support of the manufacturers of industrial
systems. Notice, that there are standardized optional buses for telecommunication
applications. The first such bus was SCbus that was developed by Dialogic
company. And about a year ago the CTbus bus appeared as a development
of the SNbus.
For all mentioned buses there are specialized chips necessary
for bus adapter development, which simplifies and makes cheaper hardware
production.
The big companies, the manufacturers of the telecommunication
equipment, such as Siemens, Lucent, Motorola, Nokia develop actively this
perspective segment of IP-telephony market. As a rule, each manufacturer
offers its own architecture, internal bus, control and monitoring methods.
Small and middle companies are also competing with the giants thanks to
rapid development of standardization of industrial computers, availability
and low cost of componentry (from system block to any other components)
providing with all necessary features of industrial systems.
The problems appearing while gateway developing in many
respects are similar to problems which are being solved while developing
of modern station equipment. At the same time, there is a specificity
defined by broad application of DSP (up to ten and more on one board)
and features of used algorithms.
If analog and digital parts occupy the gateway board together,
there arises a problem of electromagnetic compatibility. If analog and
digital parts are placed apart, there is a problem of their coupling.
If a great number of powerful DSP occupy one board, i.e.
TMS320C6201, there appears a problem of huge power consumption.
When building a gateway it is important to ensure the coordination
of algorithm and hardware. The hardware should rationally serve the algorithm
of gateway operation. It is not always easy to do if using the equipment
economically. At the same time, admissible modifications of algorithm
(parallelization of calculations, optimization of resource control, rational
order of calculations etc) can influence the hardware realization structure
and, as a whole, give the best solution.
Conclusion
The purpose of this article
is to show not only technical realization of IP-telephony but also particular
features of this realization, scientific and technical problems, reasons
of their appearing and possible solutions. We wanted to show to the reader
that IP-telephony is a particular sphere of telephone communication that
integrates methods and means of digital signal processing, voice techniques,
control of computing resources on the basis of high techniques. This is
not only of large commercial interest, as you often can see in the newspapers
and magazines, but it leads to fascinating scientific research works and
engineering developments, and it is a beneficial and grateful field for
students and young engineers.