Science of IP: Making Sense of Ethernet Packets for Studio Video-Over-IP Networks

by Scott Barella, Utah Scientific CTO and VP of Engineering

We just finished with my white paper and webinar on IT Essentials, a base-level overview for broadcasters new to IP and needing an elementary introduction. Now we move into more system-level architectures.

Video has been used in Ethernet environments for decades, with the most common means of transporting video being the Universal Datagram Packet (UDP). Since UDPs don’t require any direct connection management, they’re versatile for data such as essence video. But here’s the catch: UDPs are so common there’s no way to number them – and that’s a problem for video, which is played out according to a number of frames per second. It’s critical that these frames are in the correct order, and that’s where Real Time Protocol (RTP) comes in.

RTP is ideal for video because the videos can be time-stamped, which solves the vital task of sequencing the packets. Best of all, the timecode exists as an entirely separate data stream, rather than an actual marker placed on the video. That means we can finally ditch the old-school sync pulse that’s existed since the beginning of video.

The method used to time-stamp the RTP packets is Precision Time Protocol (PTP). The idea is that the transmitting device is responsible for reading the PTP packets that are on the network and stamping the RTP packets as they are emitted to the Ethernet network. In this manner, PTP packets serve as a synchronization source. PTP, as a pure Ethernet timing method, is the ideal synchronization centerpiece of the new SMPTE ST 2110 standard — itself built from the ground up on Ethernet.

But what about audio? In their wisdom, the SMPTE ST 2110 authors saw no need to re-invent the wheel and adopted the use of AES 67 audio. Since AES 67 also is based on time-stamped RTP packets, it’s nearly a perfect fit for video RTP packets. Because the audio stream address will only carry one to eight channels of audio at a time, it’s likely that most streams will carry multiple audio addresses to accommodate, for instance, a multi-channel group, a stereo group, and a language group, as well as compressed audio groups such as Dolby E or Dolby D.

Another data stream is ancillary data, traditionally carried within video vertical blanking interval lines 10-21. This stream carries closed-captioning, AFD, and a number of other data typically distributed by broadcast, cable, and satellite plants.  Therefore, if the audio and video use RTP time-stamped data packets, it makes perfect sense for the ancillary data to do likewise.

Now our SMPTE ST 2110 soup contains PTP packets (2110-10), video RTP packets (2110-20), audio RTP packets (2110-30), and ancillary data RTP packets (2110-40), and it’s a tasty soup indeed. The next challenge – which we’ll address in a future post – is about putting it all together in a workable combination.

Watch for my full white paper on this topic coming soon!