Video Transcoding in DiaStar
From ProjectDiaStar
Contents |
Why Is Transcoding Needed?
The Advanced Video License in DiaStar provides the ability to play and record multimedia - audio and video - and while doing so, to convert from one format of audio or video stream to another. While this tech brief is mainly concerned with video transcoding, audio is also mentioned.
So, why are these conversions - usually referred to as transcoding - needed? Let's look at the following scenario:
The first caller on a mobile device requires MPEG4/QCIF (176x144) for his 3G-324M connection. The second has a SIP Desktop phone that supports an H.264 video codec at in a CIF (354x288) format. The caller using a soft SIP phone on his PC needs H.263/CIF. Without the ability to transcode, all clips would have to be recorded using all 3 possible video codecs for all possible formats. It may be possible to do this if there are only static video clips that are pre-produced. But if callers are recording and then having the video recordings played by other callers, unsatisfied codec requests would occur. Some sort of almost real-time, automated transcoding service would have to be in place to make sure all users and codecs could be handled. And, if support for a new codec were added, a new set of recordings would need to be made for it and its supported formats.
It is much more desirable to transcode "on the fly", as needed, when needed. DiaStar offers that capability.
For video conferencing, transcoding is a necessity. Consider the following:
Four conferees call into a video conference and the resulting conference needs to be replayed as a 4-tile conference, with a live picture of each caller shown in one of the tiles. Mixing the 4 separate video streams into the conference output with 4 separate stream requires that the streams be broken down into their basic YUV components. In a YUV color system, a black and white image (luminance - Y) is "colored" by 2 chrominance values - blue/yellow (U) and red/yellow (V) color difference components. Once the video frames have been decoded into this "lowest common denominator" format, they can be easily manipulated - combined or overlayed with a text or graphic image - and then recoded back into the desired video codec.
This is the same as is done with audio for conferencing - compressive audio codecs are upsampled to straight PCM (pulse code modulation - digitized, uncompressed audio) and then mixed and reconverted back to the desired audio codec.
So, not only may callers arrive at a conference using different video codecs - the video conference mixing itself requires transcoding.
Media Streams in Diastar
Several sets of media streams are in play between a DiaStar server and its client. For the following examples, we will use an Asterisk client. Understanding the streams will help in deciding how to use transcoding.
Streams for Play and Record
The simplest scenario is when SIP video calls terminate on the DiaStar server. There are two bidirectional RTP streams - one audio and one video - set up between the calling SIP endpoint and the DiaStar Media Engine. Characteristics for the streams such as codecs, format, framerate, etc. are obtained from the SDP parts of the SIP messages that are used in setting up the call.
A second set of streams are established between the DiaStar server and the Asterisk client as part of the Woomera signaling between the two entities:
This second set of streams makes the media from the outside call available to the Asterisk application for operations such as bridging to another Asterisk technology, for example a PSTN call. The media may also be used by another Asterisk application such as audio voicemail.
Keep in mind that Asterisk does not transcode video. So whatever video codec is used by Asterisk SIP must also be used in the video stream to/from DiaStar. As there is presently there limited video support in Asterisk, the video codec setting for this stream is best set to H.263, the mostly likely choice for Asterisk video.
This, if an H.264 video clip is being played to a SIP caller who has requested H.264 and an MPEG4 clip played to a caller who has requested MPEG4, transcoded H.263 video of each will be made available to Asterisk for its use.
Streams for Video Conferencing
Video conferencing is somewhat more complicated. The conference is mixed on DiaStar. But for a a call that also terminates on DiaStar, the streams "loop through" Asterisk. In order to do this, video conferences on DiaStar are set up as a separate "protocol" from Asterisk. The call termination protocols used by DiaStar are sip, isdn and ss7, while the conference setup protocol is conf.
This means that a call coming into DiaStar as part of a video conference does so using the conf protocol, along with various parameters that describe the conference. The call enters the conference from the Asterisk platform. Here is a typical dialplan command to create conference #1, with 4 video tiles and a timeout of 30 seconds on the Dial if the conference cannot be reached:
Dial(WOOMERA/conf:1/tiles=4,30)
A conference with two DiaStar callers and an audio-only Asterisk caller results in the following media streams:
Notice the video transcoding taking place to and from the video conferees.
Setting Asterisk to DiaStar Codecs
Codecs for these streams are set in the DiaStar configuration file (default name/location /etc/diastar/diastar.conf):
; The following section is used to configure the Woomera protocol. [woomera] ; audio_format defines the codec used for audio rtp [pcmu | pcma]. audio_format = pcmu ; video_format defines the codec used for video rtp [h263 | h263-1998 | mpv4-es | h264]. video_format = h263
Choosing the Atserisk-DiaStar video codec depends on the expected users. If most/all will only be H.263 and some H.263 may come in from the Asterisk side, that is the obvious choice. On the other hand, for a conferencing application using H.264 endpoints, choosing the higher quality H.264 streaming between Asterisk and DiaStar and avoiding transcoding will result in better quality video and fewer CPU cycles used for transcoding.
Audio codec choice is simpler. Both audio choices differ very little - PCM using Mu-law or A-law companding. PCMU or PCMA according to the likelihood of either SIP or PSTN audio codec use. In the PSTN, PCMU is used in Japan and the US (T1) and PCMA (E1) elsewhere.




