WebRTC for New Users: Terminology and How It Works
WebRTC stands for Web Real-Time Communication. It’s a low-latency technology that uses JavaScript APIs to access your computer’s camera and microphone, thereby enabling media to be sent back and forth directly between two peers. With WebRTC, you can live stream via a browser (like Chrome or Firefox) without using a plugin or app (like Zoom). And because WebRTC transports data in a matter of milliseconds, it’s perfect for interactive use cases like video chats.
WebRTC is a powerful tool that’s quickly becoming the technology of choice for real-time streaming. It also works across all browsers and mobile operating systems that support the WebRTC APIs.
So, without drowning in all the terminology, how does WebRTC work? Let’s break it down into digestible chunks.
WebRTC Defined
You will typically see this description for WebRTC:
- WebRTC provides real-time, peer-to-peer communication between web applications.
Translation:
- WebRTC lets your browser talk directly to your friends’ browsers.
You might be wondering: Doesn’t that already happen when two people are communicating through their computers? To clarify, let’s review what a typical exchange over the public internet looks like.
HTTP Requests
Say you’re checking your Gmail, and start by clicking on Gmail icon. This action sends a request to open the Gmail login page. Google then receives the request, processes it, and returns the login page. At that point, you’re prompted to enter your login info. Once you do so and click enter, you send a new request to Google to open the page containing your emails. Google must receive this request and approve your login credentials before opening your inbox.
This is how common HTTP requests work, and the response chain of events will continue for each action you take on the site.
With WebRTC, you’re able to skip the request and response chain entirely. Once the initial connection is established between the two computers, WebRTC lets users freely send media back and forth in real time.
A common misunderstanding for those new to WebRTC is that with a peer-to-peer connection, a server isn’t required. In reality, a server is essential to create the initial connection — but you’re able to communicate directly after that. We’ll discuss this later on.
WebRTC APIs
WebRTC uses three APIs:
- getUserMedia
- RTCPeerConnection
- RTCDataChannel
The WebRTC API getUserMedia is built into Chrome and Firefox and allows browsers to capture the output from the camera and microphone to stream voice and video. This is what makes WebRTC so powerful: It enables browsers to take video and audio information and convert it into Javascript objects.
The RTCPeerConnection API connects your local computer with the remote peer and maintains a stable connection for efficient communication.
Finally, the RTCDataChannel API enables the exchange of all non-audiovisual data (such as text-based chats and image sharing).
Use Case: Video Chat Without Using an App
To better demonstrate where each API comes into play, we’ll now walk through the step-by-step process for a common use case: video-based chat.
Step 1: Get Your Browser Info
The first thing you need to do to start a WebRTC-powered video chat is reach out to your friend’s computer to say, “hey, here’s my IP address, let’s have a video chat!” Of course, in order to do that, you’ll actually need to know your computer’s contact info. This is no different from the way you would need to know your phone number if you wanted someone call you. And similar to telephones, every internet-connected computer has a unique number known as the IP address.
In order to keep the IP address safe from potential hackers, it’s typically protected by a firewall. To describe this in more technical terms, your computer sits safely behind a Network Address Translation (NAT) device. NAT devices are used for security purposes and they never reveal a computer’s true private IP address. Instead, they provide a public-facing IP address and translate between the public and private IP address used. If you have a wireless router in your home of office, then you’d most likely be using NAT.
So, here you are. You want to have a video chat with your friend using WebRTC, but you need to know the public IP address for both your computer and your friend’s in order to connect the two browsers.
How do you get it? This is where signaling enters the picture.
Step 2: Signaling
You will typically see this description for signaling:
- Signaling most often uses the ICE protocol to generate media traversal candidates, which can then be used in WebRTC applications.
Translation:
- Signaling allows you get the public IP address that the firewall was blocking access to.
- Signaling is not a standard included in WebRTC and therefore includes many protocol options.
- To share and view the necessary information to create a secure connection, signaling most often uses the ICE, STUN, and TURN protocols.
These acronyms will all make sense soon, so don’t you worry! Let’s start by spelling them out.
ICE Candidates
Signaling uses a protocol called Interactive Connectivity Establishment (ICE). The ICE protocol, as mentioned above, enables two browsers to connect and agree on the best way to create a secure connection. ICE makes establishing a connection between peers very efficient, but in order to do so, it requires quick access to a peer’s ICE candidates. These ICE candidates are the methods that one peer can use to connect and exchange data. A list of ICE candidates usually includes information like the IP address, port, and transport protocols that will be used in the secure WebRTC application.
There are different types of candidates, but ICE can obtain this information with the help of the STUN and TURN protocols.
STUN Server
A Session Traversal Utilities for NAT (STUN) server is located on the public internet and is capable of seeing the public IP address and port for your browser. For example, you can use Google’s test STUN server to reveal your public IP address, and then use that information to reach out or “signal” your friend’s computer to start your video chat connection. Sometimes a STUN server fails to get around the NAT firewall and is unable to obtain the public IP address. In these cases, TURN servers come in handy.
TURN Server
When a STUN server fails to retrieve the public IP address due to an especially tricky NAT firewall, a Traversal Using Relays around NAT (TURN) server can be used as an extension of STUN. The primary job of the TURN server is to act as a relay server by receiving your media information, video, and audio, and promptly delivering it to you friend’s computer. This is different than the direct browser-to-browser exchange through a STUN server and is not a true peer-to-peer connection. A TURN server can also be used when you have a very large amount of data to send. That said, these often come with a much larger overhead as far as bandwidth requirements and costs go.
For our use case, let’s assume the STUN server was successful in obtaining your computer’s public-facing IP address and move on.
Step 3: Session Description Protocol (SDP)
You will typically see this description for SDP:
- The Session Description Protocol is a standard for defining the parameters for the exchange of media (often streaming media) between two endpoints.
Translation:
- SDP negotiates and creates a secure channel with one or more peers by declaring a compatible set of parameters.
If we examine the process more closely, we would see two endpoints participating in a session, with each endpoint sending an SDP declaration to inform the other endpoint of its specifications and capabilities. SDP does not, in itself, deliver any media. Rather, its role is to communicate and negotiate a compatible set of media exchange parameters.
A typical SDP declaration would tell us:
- Which IP address is prepared to receive the incoming media stream
- Which port number is listening for the incoming media stream.
- What media type the endpoint is expecting to receive.
- Which protocol the endpoint is expecting to exchange information in.
- Which codec the endpoint is capable of decoding.
In the previous step for signaling, we assumed that you successfully leveraged a STUN server and retrieved your ICE candidates. ICE can now get to work and pass the information from the STUN server and incorporate it into the RTCPeerConnection API. This step actually negotiates and establishes the secure session connection with your peer. Remember that encryption is mandatory for any WebRTC data exchange, so be sure to already have SSL/TLS configured.
By using a signaling protocol like SDP, you can create an offer and send it to your friend using the RTCPeerConnection API. Your offer is basically saying to your friend’s browser, “hey, let’s create a secure channel to share our media, here are my ICE candidates.” This information is organized and displayed using SDP.
Step 4: Exchanging Media
Once your friend’s computer receives your offer, it can respond with a list of their ICE candidates. Remember, ICE candidates are things like a computer’s IP address, port, etc. Your computer and its peer will review each other’s browser information and agree upon which of the candidates both support and want to use. There are more steps involved that are outside the scope of this article, but know that the RTCPeerConnection API also encodes, decodes, handles network issues, and sends the media across the network.
Once the agreement is complete between you (the agent) and your friend (the peer), a secure channel will be created. At this point, you’re ready to begin sharing your video and audio media. You’re also welcome to have other peers join in, which would require following the same steps.
But what if you’d like to type messages back and forth to during your video chat? In addition to audio and video, WebRTC supports real-time communication for other types of data like text chat and gaming graphics. The API RTCDataChannel is used to share this type of information across the secure channel and it’s kept protected via the mandatory encryption requirement for WebRTC.
WebRTC Recap
Whew! That was a lot. But you made it and now have a high-level overview of how WebRTC works.
To recap:
- When initiating a video chat, you and your peer need a STUN or TURN server to get around any NAT firewalls and obtain each other’s browser info.
- You create an offer with your ICE candidates and send it your peer through SDP.
- Your peer will respond with their ICE Candidates.
- The two browsers will negotiate and create a secure, encrypted channel.
- At the completion of these steps, WebRTC will allow you to share video, audio, and text data across that channel for real-time communication.
Using Wowza to Build WebRTC Applications
Wowza Streaming Engine
With Wowza Streaming Engine, you can ingest and play WebRTC streams with all major desktop and mobile browsers that support WebRTC APIs.
Wowza Streaming Engine supports the following codecs for WebRTC:
Video | Audio |
|
See the complete list of WebRTC workflows with Streaming Engine here.
Wowza Video
The Wowza Video platform supports WebRTC ingest and will transcode the WebRTC stream through the live stream or transcoder workflow.
For WebRTC streams created through the live stream workflow, Wowza Video provides a hosted publish page that automatically applies your live stream settings and allows you to start streaming right away.
Wowza Video also delivers Real-Time Streaming at Scale, a WebRTC-based solution for sub-second streaming to a million viewers.
See all WebRTC workflows available with Wowza Video here.