WebRTC App Development: Building Real-time Video/Audio Chats
Implementing WebRTC requires expertise in building real-time video/audio streaming apps. Which can be a blocker for companies who work in this field, but don’t have appropriate engineering resources. Because it’s quite easy to make mistakes and worsen architecture that negates all the benefits of WebRTC implementation.
This guide aims to bridge that expertise gap and provide you with valuable insights into the practical aspects of building WebRTC solutions. Additionally, we’ll address common questions about WebRTC app development.
WebRTC in a Nutshell
Adding real-time communication capabilities to web apps has been gaining traction for a while. Under these circumstances, WebRTС appeared on the horizon as a silver-bullet solution for connecting, and communicating between modern browsers. The primary objective was to facilitate real-time audio and video communication among users who initiate conversations through web browsers.
Software engineers consider WebRTC as a straightforward concept: you open a UDP port, identify the partner’s IP port, and encapsulate the traffic in RTP.
Now, let’s delve into what happens between capturing the image from the camera and video playback on the screen. This intricate process unfolds across seven fundamental steps:
1. Capture of camera
First of all, a browser needs to get access to a camera or microphone by applying the API method navigator.getUserMedia => MediaStream. Media streams become available after a user gives permission to leverage their video and audio. However, these media streams can’t be sent directly to the interlocutor because they are quite large without compression.
For instance, a single 640×480 BMP format frame of video is 1.2 Mb, and if you have 30 of these images in one second, the video for just one second would be 36 Mb. This means the required bit rate would be 288 Mbps. Bitrate calculations are much more complicated in practice, but for the sake of simplicity, let’s use these approximations. Then, to make the data manageable for transfer, compression is necessary. Therefore, the next essential step is coding.
Codecs simplify the compression of audio and video streams. WebRTC supports various codecs, like VP9, for instance. VP9 is used to code images in WebRTC. It can transmit 1280×720 resolution images, compressing them so that 30 frames take up only 1.5 Mbps. VP9 achieves this by sending the mainframe and representing other frames as differences from the mainframe. The more changes in the frame, the larger the image size.
3. Packing in RTP
Data is packed into RTP (Real-time Transport Protocol), which maintains package order information. This step is crucial because packages can arrive out of order or get lost. Knowing the package order helps in correct reproduction. RTP also stores timing information for audio and video synchronization. Extra RTP details add only a minor 5% overhead.
4. Network Transmission over UDP
Data is sent as UDP packages, which excel in minimal package delivery delay. However, UDP has drawbacks: packages may be lost, arrive late, or be out of order.
5. Unpacking RTP
This step restores the package order, allowing the video traffic to reach the decoder.
Data arrives in the correct order, resulting in a clean video stream, or MediaStream.
7. Drawing on the screen
The video stream is attached to a video element to display the image. During peer-to-peer browser communication, you may encounter video issues like squares or freezing, often caused by package loss due to various problems such as random loss, lossy networks, accidental package drops, or network congestion.
Building Real-time Video/Audio Chats & Other Web Apps with WebRTC
Let’s dive into the process of WebRTC app development and find out what components are required for an effective real-time video system.
There are several ways to implement WebRTC peer functionality outside of a browser environment:
- WebRTC Native APIs: The official WebRTC library, based on existing standards, provides a low-level approach to WebRTC. It’s written in C++, developed by Google, and covered by the BSD 3-clause license.
WebRTC Native APIs are powerful but can be challenging to use for beginners (simplest example for the reference). It offers separate SDKs for Android and iOS, although the logic may need to be duplicated between these platform-specific solutions.
- GStreamer: GStreamer is a dynamic media pipeline framework that supports plugins and is licensed under LGPL. Developed in C and based on the GObject type system, GStreamer offers an object-oriented model that is relatively easy to work with across various programming languages. Since version 1.14, it has included limited WebRTC implementation, although certain features like data channels may still be missing. LGPL has commercial use limitations, and for that reason is rarely used.
Additionally, there are abandoned repositories implementing WebRTC for various platforms based on libjingle, which was a previous Google effort in building peer-to-peer solutions. However, much of its functionality is now incorporated into the WebRTC software itself.
Despite the initial excitement surrounding WebRTC when it emerged in 2011, developers quickly realized that working with this technology was more complex than it appeared. WebRTC demands a sophisticated architecture.
The intricacy of WebRTC stems from its need for adaptation to different browsers and the challenge of configuring it when issues arise. To achieve the desired results, familiarity with STUN, TURN, and NAT is essential.
At its core, WebRTC operates as a peer-to-peer technology, enabling direct client-to-client connections. Routing configuration is managed through the ICE (Interactive Connectivity Establishment) protocol, which can rely on an additional server, either STUN or TURN. Even in the worst-case scenario (two browsers behind symmetric NATs, rendering direct connection impossible), all traffic can be rerouted through a TURN server.
On the backend side, three essential components are required:
- Multimedia server: This component’s necessity and usage are explained in a separate section.
- Signalling server: Responsible for implementing business logic and connecting clients to dedicated rooms on the media server.
- TURN/STUN server: Necessary for establishing connectivity between peers.
- A STUN server is used to create openings in a peer’s NAT or firewall, allowing multimedia traffic to flow. As long as at least one peer isn’t behind symmetric NAT, a STUN server can facilitate direct connectivity between peers. This is often the case when using multimedia servers, as their network availability is typically controlled by the project team.
- A TURN server, on the other hand, relays all traffic between peers when direct connections cannot be established, such as when both peers are behind symmetric NAT.
STUN servers require minimal resources to handle clients, and there are publicly available servers like Google-hosted stun.l.google.com, which can be used instead of setting up one’s own server. However, relying on third-party services carries known risks, so it’s advisable to include a STUN server in your infrastructure from the outset.
Additionally, there can be either a statically served SPA client or a separate server providing a feature-rich web client. These servers can be scaled independently, with different instance types for multimedia servers depending on specific use cases, as it is the most resource-intensive component. TURN servers follow as the second most resource-intensive, with less demand for them.
Best Practices for Building a Video Chat Web App with WebRTC
We will share some key points to pay special attention to when developing solutions with WebRTC. These insights will help ensure a successful project implementation.
Different browsers may offer varying levels of support for WebRTC features, leading to potential compatibility issues. For instance, support for advanced features like screen sharing or data channels can vary across browsers. To address this, it’s important to assess the feature support matrix for your target browsers and implement fallback mechanisms when needed.
WebRTC relies on direct peer-to-peer connections, but network configurations like firewalls and NATs can pose challenges. Some browsers handle network traversal better, using techniques like ICE (Interactive Connectivity Establishment) and TURN (Traversal Using Relays around NAT). Adapting your app to these differences is crucial for smooth operation in various network environments.
Browser support for audio and video codecs may also differ, impacting the quality and compatibility of media streams. Using widely supported codecs and considering transcoding or fallback options can mitigate codec compatibility issues.
To tackle compatibility challenges, we recommend considering the following practices:
- Rigorously test your app on target browsers and devices to identify and resolve compatibility issues.
- Employ feature detection to gracefully handle situations where specific APIs or features are unavailable.
- Use WebRTC libraries or frameworks with abstraction layers to manage browser inconsistencies.
- Stay updated on browser changes and WebRTC standards to adapt your app accordingly.
Ensuring Stable Video Communication
For stable video communication, addressing packet loss is essential, and there are four primary solutions:
- Jitter buffer: Rendering one round-trip time (RTT) later allows for requesting missing packets, reducing freezing in cases of significant loss. However, this introduces additional constant delay.
- Bitrate adjustment: Manipulating bitrate by modifying factors like frames per second, quality, or resolution.
- Forward Error Correction: The codec duplicates data to increase the chance of successful content delivery, though it can contribute to network congestion.
- Network optimization: Creating optimal network routes and configuring servers and routers can improve performance.
Security and Privacy
Securing real-time audio and video data transmission is critical due to sensitive user information being sent over the network. Implement encryption, authentication, and secure key exchange mechanisms to protect against eavesdropping, tampering, and unauthorized access.
WebRTC incorporates encryption through the Secure Real-time Transport Protocol (SRTP). Ensure proper configuration and enforcement of encryption to maintain data confidentiality and integrity. Strong user authentication mechanisms should also be in place, including secure tokens, user authentication protocols, and access control measures to prevent unauthorized access to WebRTC sessions.
WebRTC apps comprise various components, including signalling servers, media servers, and client-side code, making testing and debugging complex. Thorough testing, encompassing unit testing, integration testing, and real-world scenario testing, is vital to ensure the reliability and stability of WebRTC apps.
MobiDev Case Studies for WebRTC Development
We will illustrate practical business use cases of WebRTC through MobiDev’s case studies. These real-world examples will showcase how WebRTC can be applied to address specific business needs and enhance communication and collaboration solutions.
Case Study #1: ML-based single sign-on solution with WebRTC
Our client, a US-based company, needed a Face & Voice recognition authentication solution. The task was to develop an enterprise verification-as-a-service (EVaaS) tool to secure access to sensitive data.
Microservice-Based Architecture and WebRTC:
- Used microservices to compartmentalize functionality for easier development, support, and enhancements.
- Integrated WebRTC for real-time media data processing, essential for biometric authentication.
Machine Learning for Biometric Recognition:
- Developed the capability to identify users based on voice, photos, and questions.
- Collected initial voice and photo data sets and evaluated over 10 available solutions for validating US driver’s licenses, ultimately integrating Google Vision for its accuracy in real-world scenarios.
Development Team’s Contribution to Product Ideas:
- Fostered a startup-like spirit within the development team, encouraging creative thinking.
- Introduced features such as face anti-spoofing, question generation using NLP (Natural Language Processing), and additional voice and photo datasets, enriching the product.
Read also:Face & Voice Authentication Case Study
Case Study #2: Remote assistance app with augmented reality and WebRTC
In this case study, we’ll showcase how the combination of AR and WebRTC can revolutionize peer-to-peer communication, taking it beyond traditional 2D video calls. We worked on the remote assistance app enhanced by WebRTC.
Remote assistance with AR and WebRTC demo
Augmented Reality Remote Assistance in Action
Video chat technology has transformed remote assistance efficiency, but it lacks interactivity and hands-on guidance. Augmented reality bridges this gap by enabling more engaging remote interactions. Professionals can remotely guide users in a 3D space, enhancing troubleshooting and support.
For instance, imagine an equipment operator in a factory seeking assistance from a field technician. If an in-person visit isn’t possible or more information is needed, the field technician uses AR to pinpoint specific machine parts in a 3D space. This allows for detailed discussions and problem-solving, leading to effective solutions.
Implementing Remote Assistance with AR
The product relies on three key technologies:
- WebRTC: Facilitates real-time two-way peer-to-peer communication.
- ARKit (or ARCore for Android): Powers augmented reality experiences.
- Swift (iOS) or Kotlin (Android): Used for app development.
One limitation is that the augmented scene must remain static, making it unsuitable for applications in moving vehicles.
Building an AR Remote Assistance App
While the fundamental elements include real-time communication, AR frameworks, and app structure, the development process’s quality and vision determine the final product’s effectiveness. Consider platform compatibility, as native development may be preferable for iOS and Android. Integration with existing business structures and software, such as CRM or ERP systems, can be crucial for seamless data transfer and a personalized user experience.
WebRTC for IoT Solutions
There are two notable examples of WebRTC applications for IoT products:
Remote Delivery Solution for Online Shopping and Logistics
In this project, the goal was to create an IoT product for a company deeply entrenched in online shopping and logistics. The primary aim was to implement a solution that could facilitate remote package delivery for both delivery companies and suburban residents.
A device was installed outside users’ homes, and we developed a dedicated mobile application for this purpose. Through this mobile app, residents living in suburban areas, who often spend the day commuting or at work, gained the ability to remotely unlock their smart mailboxes. This feature enabled them to send and receive deliveries with the assistance of couriers, regardless of the time of day. The real-time communication between users and couriers was made possible by the integration of WebRTC.
Smart Intercoms for Large Office Buildings
In this project, the focus was on smart intercoms designed for large office buildings equipped with a centralized access system at their entrances. These intercoms included screens and were intended to streamline access for office workers.
When external visitors arrived, they could input the relevant cabinet number. This action initiated a connection with the corresponding cabinet terminal. Workers inside the cabinet could then visually identify the visitor, grant access, and admit them into the building. The system recorded these events for security and monitoring purposes.
WebRTC played a pivotal role in both projects, as it provided users with the capability to engage in real-time communication, enhancing the functionality of these IoT solutions. If you want to learn more about the use of WebRTC with IoT, check our dedicated article that contains insights on the topic.
FAQ on WebRTC development
- What is the limit of users in a WebRTC call?
Different resources provide varying figures based on their experiences. From our practical experience, it’s advisable to focus on group calls with a maximum of 8 users, but the number of streams can be scaled to handle more users. The limitations arise from considerations related to server architecture and available bandwidth.
Remember that the architecture you develop directly influences the system’s bandwidth capabilities, and you can significantly reduce server load by taking business requirements into account. In cases involving a larger number of users in a single session, custom solutions are often developed to address specific needs.
- What affects the price of developing a WebRTC solution?
The cost of maintaining a WebRTC solution is influenced by various factors. For simple one-to-one calls, a media server may not be necessary, only a signaling server is required. If the solution is intended for use across mobile networks and WiFi, STUN/TURN servers may also be necessary. Due to resource constraints on STUN/TURN and media servers, the primary evaluation criterion for a project is the number of CONCURRENT users being serviced. This refers specifically to users who are actively streaming or viewing content, not just those who are online.
While a signaling server can handle concurrent sessions numbering in the hundreds of thousands, media and TURN servers can typically take a maximum of around 150-200 streams concurrently. Therefore, the server resources required for media and TURN servers may be similar to or even greater than those needed for a signaling server.
- Does WebRTC pose any security risks?
When properly implemented, WebRTC connections are inherently secure. WebRTC has native built-in features to address security concerns, including encryption and secure communication protocols. While WebRTC provides these inherent security features, it’s crucial for developers to have a comprehensive understanding of potential security risks specific to their applications. By implementing appropriate security measures, adhering to best practices, and regularly updating and testing the application for security vulnerabilities, developers can create secure WebRTC applications that safeguard user data and communications.
Onboarding WebRTC App Development Team
WebRTC app development is quite specific, and the technology is always evolving, making it essential to work with experts who have practical experience. Hiring an external WebRTC development company can bridge the expertise gap in your team and lay the foundation for effective real-time video communication in your product.
At MobiDev, our in-house experts are highly experienced in video streaming and WebRTC technologies. We have engineers with a strong background in WebRTC and solution architects who can assist you in creating a robust product architecture and ensuring smooth real-time video and audio communication
WebRTC development services and solutions we provide:
- Video streaming libraries and frameworks integration
- Video conferencing platforms
- Multimedia file-sharing applications
- Live streaming and broadcasting solutions
- Integrating third-party video solutions like Zoom
- Custom API development
The MobiDev team will carefully assess your needs and define a solution that meets your tech and business requirements and may support you throughout your product development journey.
Feel free to get in touch with us and book a call with a MobiDev’s representative:
+48 790 675 136 (EU)
+1 267 944 6127 (USA/Canada)
Call us 10 a.m. – 7 p.m.