This article presents performance benchmark numbers captured for software CPU (MainConcept) transcoding, NVIDIA GPU accelerated transcoding, and AMD Xilinx accelerated transcoding. These numbers are for guidance and reference only. They represent the hardware's performance in basic circumstances and against similarly-sized AWS instances.
Our tests address a simple RTMP to HLS streaming scenario. More complicated use cases, for example, involving DVR recording, WebRTC setups, or transcoding CMAF HLS over HEVC, can drastically change these benchmarks. Your results may also vary depending on network traffic, source file composition, configuration, overall operating system overhead, etc.
Note: For information about how to measure the transcoding benchmarks for your Wowza Streaming Engine configuration, see Capture Transcoder benchmark statistics in Wowza Streaming Engine.
Recommendations
CPU transcoding is a good fit for simple workflows with smaller viewership and limited distribution. For example, this workflow may be more cost-efficient when streaming a weekly video with a single adaptive bitrate (ABR) transcoding to 200 viewers within a smaller geographic area.
CPU transcoding at scale is not suggested due to inefficiency and potentially higher server hardware and maintenance costs. GPU-accelerated encoding may provide a better, more cost-efficient alternative for more complex use cases.
Testing methodology
When running Wowza Streaming Engine at load, we recommend that operations don't exceed more than 85 percent of the total CPU usage. This leaves sufficient overhead for network interruptions or other unexpected issues that may occur when streaming.
Before executing the tests, the test servers were tuned using the guidelines in Tune Wowza Streaming Engine for optimal performance.
All tests were conducted using the following guidelines and steps:
- We loaded the hardware with as many incoming streams as possible until Wowza Streaming Engine reached approximately 90 percent CPU utilization. At that point, we collected benchmarks for 10 minutes while ensuring minimal skipped frames.
- When determining if a server can handle an additional stream, our approach differed based on the hardware:
- For NVIDIA GPU transcoding, we used the NVIDIA System Management Interface (nvidia-smi) command line utility to display the GPU load. The 90 percent threshold served as an arbitrary value, high enough for processing while not completely overloading the GPU. Above this load, we experienced more skipped frames.
- For CPU (MainConcept) transcoding, we used the same approach as the NVIDIA GPU tests, aiming for 90 percent usage while checking the console commands.
- For AMD Xilinx transcoding, the card determines the amount of resources it can work with, measuring in pixels per second. Each card can handle two 4K60fps streams (2 x 3840 X 2160 x 60 pixels per second) or eight 1080p60fps streams (8 x 1920 x 1080 x 60 pixels per second). We ran this hardware up to 100 percent utilization without encountering any issues. The number of streams you can handle is a direct calculation based on the resolution and frames per second.
- We monitored skipped frames and ensured the server didn't crash after reaching 90 percent usage or the 100 percent threshold for the AMD Xilinx U30 card. Some skipped frames are expected and tolerated at that load. However, we monitored skipped frames to ensure they were within acceptable levels and close to zero. We also checked that the ALLFRAMESOFF message didn't appear.
Input test streams
When measuring and analyzing performance benchmarks for the Transcoder in Wowza Streaming Engine, we used the following source files. You can use these files to standardize your process and benchmark your own instances:
All metadata is available when you view the file properties. Benchmarks were captured in August 2023 using an H.264/AAC 1080p source with 720p, 360p, and 140p transcoded renditions. These benchmarks don't account for hardware changes to any of the mentioned instance types after this time.
CPU (MainConcept) benchmarks
The following table summarizes benchmark test results for three Amazon EC2 C5 instances using CPU (MainConcept) transcoding. Download CPU (MainConcept) testing benchmarks
Wowza Streaming Engine release: 4.8.24+4
Implementation: CPU (MainConcept)
Operating system: Ubuntu 22.04.2
EC2 Instance Type | c5.4xlarge | c5.12xlarge | c5.24xlarge | ||||||
1080p Source FPS |
60 | 30 | 24 | 60 | 30 | 24 | 60 | 30 | 24 |
Number of vCPUs | 16 | 16 | 16 | 48 | 48 | 48 | 96 | 96 | 96 |
Instance Memory GB |
32 | 32 | 32 | 96 | 96 | 96 | 192 | 192 | 192 |
CPU Ingests | 2 | 4 | 5 | 6 | 10 | 13 | 10 | 18 | 20 |
CPU Utilization % |
99 | 99 | 99 | 96 | 97 | 99 | 88 | 93 | 90 |
Number of Transcoded Streams |
6 | 12 | 15 | 18 | 30 | 39 | 30 | 54 | 60 |
Hardware | Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz |
NVIDIA GPU benchmarks
The following table summarizes the benchmark test results for five Amazon EC2 G4 instances using NVIDIA GPU accelerated transcoding. Download NVIDIA GPU testing benchmarks
Wowza Streaming Engine release: 4.8.24+4
Implementation: NVIDIA GPU
Operating system: Ubuntu 22.04.2
Driver Version: 525.85.05 CUDA Version: 12.0 Ubuntu 22.04.2 | |||||||||||||||
CPU | 2nd Generation Intel Xeon Scalable Processors (Cascade Lake P-8259L) | ||||||||||||||
EC2 Instance Type | g4dn.12xlarge | g4dn.xlarge | g4dn.2xlarge | g4dn.8xlarge | g4dn.16xlarge | ||||||||||
1080p Source FPS |
60 | 30 | 24 | 60 | 30 | 24 | 60 | 30 | 24 | 60 | 30 | 24 | 60 | 30 | 24 |
Number of GPUs | 4 | 4 | 4 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Number of vCPUs | 48 | 48 | 48 | 4 | 4 | 4 | 8 | 8 | 8 | 32 | 32 | 32 | 64 | 64 | 64 |
NVIDIA GPU |
Tesla T4 | ||||||||||||||
Instance Memory GB |
192 | 192 | 192 | 16 | 16 | 16 | 32 | 32 | 32 | 128 | 128 | 128 | 256 | 256 | 256 |
Total Stream Ingests | 36 | 70 | 90 | 10 | 20 | 25 | 10 | 20 | 27 | 10 | 20 | 28 | 11 | 22 | 28 |
NVENC Encoding Utilization % | 90 | 90 | 90 | 99 | 99 | 99 | 90 | 99 | 99 | 90 | 90 | 99 | 99 | 99 | 100 |
NVDEC Decoding Utilization % | 50 | 50 | 40 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 55 | 55 | 54 |
CPU Utilization % | 20 | 20 | 20 | 50 | 60 | 60 | 30 | 30 | 30 | 10 | 10 | 10 | 6 | 6 | 6 |
GPU Utilization % | 20 | 20 | 20 | 25 | 25 | 25 | 20 | 20 | 20 | 20 | 20 | 22 | 24 | 25 | 24 |
Number of Transcoded Streams |
108 | 210 | 270 | 30 | 60 | 75 | 30 | 60 | 81 | 30 | 60 | 84 | 33 | 66 | 84 |
AMD Xilinx benchmarks
The following table summarizes the benchmark test results for two Amazon EC2 VT1 instances using AMD Xilinx accelerated transcoding. Download AMD Xilinx testing benchmarks
Wowza Streaming Engine release: 4.8.24+4
Implementation: AMD Xilinx U30 media accelerator card
Operating system: Ubuntu 22.04
Driver Version: 3.0.1 | ||||||
EC2 Instance Type |
vt1.6xlarge | vt1.3xlarge | ||||
1080p Source FPS |
60 | 30 | 24 | 60 | 30 | 24 |
Number of GPUs |
2 | 2 | 2 | 1 | 1 | 1 |
Number of vCPUs |
24 | 24 | 24 | 12 | 12 | 12 |
Instance Memory GB |
48 | 48 | 48 | 24 | 24 | 24 |
Total Stream Ingests | 16 | 32 | 36 | 8 | 16 | 18 |
Encoding Utilization % | 57.7 | 57.7 | 52 | 57.7 | 57.7 | 52 |
Decoding Utilization % | 100 | 100 | 90 | 100 | 100 | 90 |
CPU Utilization % |
16 | 20 | 16 | 15 | 17 | 14 |
GPU Utilization % |
N/A | N/A | N/A | N/A | N/A | N/A |
Nbr Transcoded Streams |
48 | 96 | 108 | 24 | 48 | 54 |