|
|
|
|
|
|
|
|
|
| This white paper describes the protocols involved in the transmission of
voice samples through an IP based network. This document aims to give the reader the
basic grounding that is required to further investigate the bandwidth requirements of
voice over IP. This paper does not discuss header compression schemes, and does not
discuss layer 2 protocols. Furthermore, this paper only considers IPv4 and not
IPv6. |
|
|
| In common with many communications systems, the protocols involved in
Voice over IP (VoIP) follow a layered hierarchy which can be compared with the theoretical
model developed by the International Standards Organisation (OSI seven layer model).
Breaking a system into defined layers can make that system more manageable and
flexible. Each layer has its job, and does not need a detailed understanding of the
layers around it. For example, IP datagrams can be transported across a variety of link
layer systems including serial lines (using PPP), Ethernet and Token Ring. The link
layer protocol is for the most part irrelevant to IP (unless that protocol limits the size
of its datagrams), and need not be the same for the first link of a Voice over IP call and
the final link of a VoIP call.
As always there are exceptions (such as IP over ATM), but the simple discreet layered
model will be considered in this document.
The effect of each layer's contribution the the communication process is an additional
header preceding the information being transmitted. The complete packet which a
layer creates (header and data) becomes the data passed to the next level for
processing. That layer will then add a header portion, and so on...
Each layer, started at the Network (or Internet) Layer are considered in the sections
which follow. |
|
|
| The Internet Protocol is the lowest level protocol considered in this
document. It is responsible for the delivery of packets (or datagrams)
between host computers. IP is a connectionless protocol, that is, it does not
establish a virtual connection through a network prior to commencing transmission; this is
the job for higher level protocols. IP makes no guarantees concerning reliability, flow
control, error detection or error correction. The result is that datagrams could
arrive at the destination computer out of sequence, with errors or not even arrive at
all. Nevertheless, IP succeeds in making the network transparent to the upper layers
involved in voice transmission through an IP based network.
Any Voice over IP transmission must use IP (by definition). IP is not well
suited to voice transmission. Real time applications such as voice and video require
guaranteed connection with consistent delay characteristics. Higher layer protocols
address these issues (to a certain extent).
The diagram below shows the header that proceeds the data payload to be transmitted.
In its most basic form, the header comprises 20 octets. There are optional
fields which can be appended to the basic header, but these offer additional capabilities
which are not necessary for VoIP transmission as described in this document.
The fields shown are briefly described below:
- Version
- The version of IP being used. For this format header, the version would be
4.
- IHL
- The length of the IP header in units of four octets (32 bits). For the
basic header shown in this diagram, the value would be 5 (each line in the diagram
represents four octets).
- Type of service
- Specifies the quality of service requested by the host computer sending the
datagram. This is not always effectively supported by routers or Internet Service
Providers.
- Total length
- The length of the datagram, measured in octets, including the header and payload.
- Identification
- As well as handling the addressing of datagrams between two computers (or hosts),
IP needs to handle the splitting of data payloads into smaller packages. This
process, known as fragmentation, is required because, although a single IP
datagram can handle a theoretical maximum length of 65,515 octets, lower link layer
protocols such as Ethernet cannot always handle these large packet sizes. This field
is a unique reference number assigned by the sending host to aid in the reassembly of a
fragmented datagram.
- Flags
- These flags indicate whether the datagram may be fragmented, and, if it has been
fragmented, whether further fragments follow this one.
- Fragment offset
- This field indicates where in the datagram this fragment belongs. It is
measured in units of 8 octets (64 bits).
- Time to live
- This field indicates the maximum time the datagram is permitted to remain in the
internet system. This parameter ensures that a datagram which cannot reach its
destination host is given a finite lifetime.
- Protocol
- This indicates the higher level protocol in use for this datagram. Numbers
have been assigned for use with this field to represent such transport layer protocols as
TCP and UDP.
- Header checksum
- This is a checksum covering the header only.
- Source address
- The IP address of the host which generated this datagram. IPv4 addresses
are 32 bits in length and, when written or spoken, a dotted decimal notation is
used (e.g.: 192.168.0.1).
- Destination address
- The IP address of the destination host.
|
 |
 |
UDP (User
Datagram Protocol)
|
 |
 |
|
| Generally, there are two protocols available at the transport layer when
transmitting information through an IP network. These are TCP (Transmission Control
Protocol) and UDP (User Datagram Protocol). Both protocols enable the transmission
of information between the correct processes (or applications) on host computers.
These processes are associated with unique port numbers (for example, the HTTP application
is usually associated with port 80). TCP is a connection oriented protocol; that is, it
establishes a communications path prior to transmitting data. It handles sequencing
and error detection, ensuring that a reliable stream of data is received by the
destination application.
Voice is a real-time application, and mechanisms must be in place with ensure that
information is received in the correct sequence, reliably and with predictable delay
characteristics. Although TCP would address these requirements to a certain extent,
there are some functions which are reserved for the layer above TCP. Therefore, for
the transport layer, TCP is not used, and the alternative protocol, UDP, is commonly used.
In common with IP, UDP is a connectionless protocol. UDP routes data to it's
correct destination port, but does not attempt to perform any sequencing, or to ensure
data reliability.
The fields shown are briefly described below:
- Source port
- Identifies the higher layer process which originated the data.
- Destination port
- Identifies with higher layer process to which this data is being transmitted.
- Length
- The length in octets of the UDP header and payload (minimum 8).
- Checksum
- Optional field supporting error detection.
|
 |
 |
RTP (Real-time
Transport Protocol)
|
 |
 |
|
| Real time applications require mechanisms to be in place to ensure that a stream of
data can be reconstructed accurately. Datagrams must be reconstructed in the
correct order, and a means of detecting network delays must be in place. Jitter
is the variation in delay times experienced by the individual packets making up the data
stream. In order to reduce the effects of jitter, data must be buffered at the
receiving end of the link so that it can be played out at a constant rate. To
support this requirement, two protocols have been developed. These are RTP
(Real-time Transport Protocol) and RTCP (RTP Control Protocol).
RTCP provides feedback on the quality of the transmission link. RTP transports
the digitised samples of real time information. RTP and RTCP do not reduce the
overall delay of the real time information. Nor do they make any guarantees
concerning quality of service.
The RTP header, which precedes the data payload, is shown in the diagram below:
- Version
- Identifies the version of RTP (currently 2).
- Padding
- A flag which indicates whether the packet has been appended with padding octets
after the payload data.
- X (Header extension)
- Indicates whether an optional fixed length extension has been added to the RTP
header.
- CC (CSRC count)
- Although not shown on this header diagram, the 12 octet header can optionally be
expanded to include a list of up to contributing sources. Contributing sources are
added by mixers, and are only relevant for conferencing application where elements of the
data payload have originated from different computers. For point to point
communications, CSRCs are not required.
- M (Marker)
- Alllows significant events such as frame boundaries to be marked in the packet
stream.
- PT (Payload type)
- This field identifies the format of the RTP payload and determines its
interpretation by the application
- Sequence number
- A unique reference number which increments by one for each RTP packet sent.
It allows the receiver to reconstruct the sender's packet sequence.
- Timestamp
- The time that this packet was transmitted. This field allows the received
to buffer and playout the data in a continuous stream.
- Synchronisation source (SSRC)
number
- A randomly chosen number which identifies the source of the data stream.
|
|
|
| The headers of the three payload carrying protocols discussed are sent
sequentially before the digitised voice or video samples, which are actually the payload
the RTP header. The result is a 40 octet overhead for every packet of data:
|
|
|
| The IP, UDP and RTP headers are followed by the data payload of the RTP
header. This comprises digitised samples of voice and video. The length of
these samples can vary, but for voice, samples representing 20ms are considered the
maximum duration for the payload. The selection of this payload duration is a compromise
between bandwidth requirements and quality. Smaller payloads demand higher bandwidth
per channel band, because the header length remains at forty octets. However, if
payloads are increased, the overall delay of the system will increase, and the system will
be more susceptible to the loss of individual packets by the network.
This subject is discussed in more detail in the white paper Bandwidth requirements for Voice over IP transmission. |
|
|
| This document has detailed a common set of protocols used for the
transmission of voice over IP through a local or wide area network. It should be
borne in mind that there are other methods of transmitting voice through an IP based
network. Some of these are vendor specific, and some are still under development by
the Internet Engineering Task Force. Specifically, header compression and multiplexing
techniques can go some way towards reducing the bandwidth requirement across a WAN.
Why not try our Voice
over IP Forum? It is an interactive newsgroup which you can use to exchange ideas
about network convergence.
Return
to the technical document index
 |
| This document
should not be viewed as a consultative document. It is the readers' responsibility to
ensure that the most appropriate telecommunications strategy is applied to his or her
business. No liability is accepted by the authors for omission or error. |
|
|
|