This article explains methods for tuning Wowza Streaming Engine™ media server software for optimal performance on your hardware configuration.
Access Wowza Streaming Engine performance tuning
In Wowza Streaming Engine Manager, click the Server tab at the top of the page, and then click Performance Tuning in the contents panel.
The Performance Tuning page shows the server's OS architecture, the amount of memory available to Wowza Streaming Engine, the number of core processors on the system, and the Java version and architecture (bitness) in use.
The contents panel also provides access to various performance tuning options, including Java settings, server thread pools, and virtual hosts.
Tune Java settings
By default, Wowza Streaming Engine 4.2.0 (and later) installs with a supported version of Java and is tuned to a development state optimized for the hardware on which it's running. The server can be tuned to a production level by adjusting the Java settings. Knowing that the server is running to the best of its ability makes Wowza Streaming Engine easy to deploy in a production environment. Advanced users can alter the tuning further if required.
Note: If you're running Wowza Streaming Engine 4.1.2 or earlier, you'll get best results if you run the most recent Oracle Java Development Kit (JDK). The best option is to run a 64-bit OS with the 64-bit Java VM, which enables Java heap sizes greater than 2 GB. For details, see Manually install and troubleshoot Java on Wowza Streaming Engine.
- Click Java Settings in the contents panel. The Java Settings page shows the current Java settings, including the Java Heap Size, which is the amount of memory allocated to Wowza Streaming Engine, and the Java Garbage Collection Settings.
- To change these settings, click Edit. Click Save after making changes to any of the following settings.
Java heap size
The default for the Java Heap Size is the Development level. To run a dedicated server in a production environment, change the Java Heap Size setting to the Production level.
If your server runs other services that have high memory consumption or you encounter OutOfMemory errors while running the server, you may want to change the Java Heap Size setting to the Custom level and enter a specific value. If you're running the 64-bit version of the Java VM and have 4 GB or more of RAM in your computer, set your Java heap size to between 3000 MB and 5000 MB. If you have at least 16 GB of RAM, set your heap size to 8000 MB. In Wowza Streaming Engine Manager, you can specify a maximum of 100000 MB, which should be sufficient for most streaming needs.
Java garbage collection settings
Garbage collection (GC) tuning in Java can be challenging because settings on one server may not function the same across servers. Through trial and error and customer feedback, we recommend the configurations outlined in this section.Garbage-First (G1) Garbage Collector
The Garbage-First (G1) Garbage Collector is the default for Wowza Streaming Engine versions 4.8.27 and earlier. If you update to Wowza Streaming Engine 4.8.28 and later, you'll keep prior G1 garbage collection settings from earlier versions. New installations of Wowza Streaming Engine 4.8.28 and later include generational support for the Z Garbage Collector (ZGC).
We recommend using the default G1 tuning, which works well for many streaming situations and doesn't require modification. The G1 collector is designed for low pause time and high-throughput applications. It's a server-style garbage collector targeted for multi-processor computers with large memories, and it's fully supported in Oracle JDK 7 Update 4 and later releases.
Note: You can adjust the pause time for the G1 collector in Custom collector settings. The following is an example of a suggested custom setting:
-XX:+UseG1GC -XX:MaxGCPauseMillis=100
Generational Z Garbage Collector (ZGC)
The Generational Z Garbage Collector (ZGC) is fully supported, starting with Java 21. It serves as the default for new installations of Wowza Streaming Engine 4.8.28 and later that add support for Java 21. If you manually roll back the Wowza Streaming Engine 4.8.28 installer to use an earlier Java version, you must also update the garbage collection settings to avoid startup errors. For compatibility information, see the OpenJDK ZGC wiki.
Notes: To use Java 21 with Generational ZGC on Windows, ensure you're running a 64-bit version of Windows 10 (version 1803 or later) or Windows Server 2019 or later. You also need a 64-bit JVM.
Generational ZGC extends the Z Garbage Collector and improves performance and efficiency, especially for applications with long-running processes and large heaps. Generational ZGC splits the heap into generations for young (short-lived) objects and old (long-lived) objects. Each generation is collected independently. ZGC can then focus on gathering profitable young objects that require fewer resources and yield more memory.
To optimize performance, we recommend using the default Generational ZGC tuning. This configuration helps to manage heap memory efficiently and aims to keep garbage collection pauses minimal, which is critical for streaming services that require low latency and high performance:
-XX:+UseZGC -XX:+ZGenerational -XX:MaxGCPauseMillis=200 -XX:ParallelGCThreads=4 -XX:ConcGCThreads=2 -XX:InitiatingHeapOccupancyPercent=70
- -XX:+UseZGC: Activates the Z Garbage Collector for low-latency memory management.
- -XX:+ZGenerational: Enables generational collection within ZGC to better handle short-lived objects.
- -XX:MaxGCPauseMillis=200: Aims to keep garbage collection pause times below 200 milliseconds.
- -XX:ParallelGCThreads=4: Uses four threads for parallel garbage collection tasks.
- -XX:ConcGCThreads=2: Uses two threads for concurrent garbage collection tasks.
- -XX:InitiatingHeapOccupancyPercent=70: Triggers garbage collection when heap occupancy reaches 70 percent.
NUMA-aware allocator (Custom collector settings)
A NUMA-aware allocator enables application performance optimization on computers with non-uniform memory architecture (NUMA) by increasing the application's use of lower latency memory. These are typically computers with multiple physical CPU sockets. By default, this option is disabled, and no optimization for NUMA is made. The option is only available when the parallel garbage collector is used (-XX:+UseParallelGC).
For example, to enable NUMA optimization on a multi-CPU socket system, enter the following in Custom collector settings:
-XX:+UseParallelGC -XX:+UseNUMA
Monitor GC pause times
To monitor GC pause times, open [install-dir]/conf/Tune.xml in a text editor and then add the following <VMOption> property:
For Linux:
<VMOptions> <VMOption>-server</VMOption> <VMOption>-Djava.net.preferIPv4Stack=true</VMOption> <VMOption>-Xlog:gc*,gc+heap=trace,safepoint*:file=${com.wowza.wms.AppHome}/logs/gc_${com.wowza.wms.StartupDateTime}.log:time,level,tags</VMOption> </VMOptions>
For Windows:
<VMOptions> <VMOption>-server</VMOption> <VMOption>-Djava.net.preferIPv4Stack=true</VMOption> <VMOption>-Xlog:gc*,gc+heap=trace,safepoint*:file="\"${com.wowza.wms.AppHome}/logs/gc_${com.wowza.wms.StartupDateTime}.log\"":time,level,tags</VMOption> </VMOptions>
Then restart Wowza Streaming Engine to apply the changes.
Note:
- On Windows, this property creates a log file for debugging purposes and shouldn't be left running. The gc_${com.wowza.wms.StartupDateTime}.log file is only created when starting Wowza Streaming Engine in service mode or if using the command line.
Tune server thread pools
Click Server Thread Pools in the contents panel. The Server Thread Pools page shows the current Handler Thread Pool Size and Transport Thread Pool Size.
To change these settings, click Edit. When left at Set automatically, Wowza Streaming Engine calculates the Handler Thread Pool Size and Transport Thread Pool Size as follows:
- Handler Thread Pool Size = 60 x Processor Cores
- Transport Thread Pool Size = 40 x Processor Cores
To use the default number of server-level threads in the handler and transport thread pools, select Set automatically. Otherwise, select the alternate option button and enter a specific number of threads (between 10 and 4096 in the box).
Tune Media Cache
Click Media Cache Tuning in the contents panel. Media Cache Tuning settings apply to VOD Edge applications. They include the current Writer Thread Pool, Readahead Thread Pool, Maximum Pending Write Request Size, and Maximum Pending Readahead Request Size.
To change these settings, click Edit. When left at Set automatically, Wowza Streaming Engine calculates the Writer Thread Pool and Readahead Thread Pool as follows:
- Writer Thread Pool = 2 x Processor Cores
- Readahead Thread Pool = 1 x Processor Cores
Maximum Pending Write Request Size and Maximum Pending Readahead Request Size are calculated based on the Java Heap Size.
Java Heap Size | 1200 MB to 3999 MB | 4000 MB to 7999 MB | 8000 MB or greater |
Maximum Pending Write Request Size | 160 MB | 500 MB | 1000 MB |
Maximum Pending Readahead Request Size | 80 MB | 250 MB | 500 MB |
Tune virtual host processors
Click Virtual Host Processors in the contents panel. The Virtual Host Processors page shows the number of threads used at the VHost level to service various connection types.
To change the settings, click Edit. When left at Set automatically, Wowza Streaming Engine calculates the values as follows:
- Net Connections Processor Count = 4 x Processor Cores
- Media Caster Processor Count = 4 x Processor Cores
- Idle Worker Count = 2 x Processor Cores
- Unicast Incoming Processor Count = 2 x Processor Cores
- Unicast Outgoing Processor Count = 2 x Processor Cores
- Multicast Incoming Processor Count = 2 x Processor Cores
- Multicast Outgoing Processor Count = 2 x Processor Cores
Notes:
- For the Net Connections Processor Count and Media Caster Processor Count:
If the product of 4 times the number of processor cores is less than or equal to 6, then the value displayed is 6; if the product is greater than or equal to 32, the value displayed is 32.
- For the Idle Worker Count, Unicast Incoming Processor Count, Unicast Outgoing Processor Count, Multicast Incoming Processor Count, and Multicast Outgoing Processor Count:
If the product of 2 times the number or processor cores is less than or equal to 4, the value displayed is 4; if the product is greater than or equal to 24, the value displayed is 24.
The Client Idle Frequency is the time (in milliseconds) between idle events for Adobe Flash client connections. For basic on-demand streaming, 250 ms provides the best reliability-to-performance ratio. For live, low-latency streaming, 125 to 250 ms is better. If you aren't doing low-latency streaming, Client Idle Frequency can be increased to 500, which reduces CPU usage and allows more concurrent connections. Values between 1 and 1000 are supported.
The RTP Idle Frequency value is the time (in milliseconds) between idle events for RTP connections. Values between 1 and 1000 are supported.
Tune virtual host ports
The Virtual Host Ports page shows the current open ports and the processor count associated with each port.
To change the settings, click Edit. When left at Set automatically, Wowza Streaming Engine sets the default processor counts as follows:
- Port 1935 Processor Count = 2 x Processor Cores
- Port 8086 Processor Count = 2 x Processor Cores
Tune virtual host thread pools
Click Virtual Host Thread Pools in the contents panel. Virtual Host Thread Pools are required when running multiple virtual hosts (VHosts) on the server.
If you're running multiple virtual hosts, configure the Handler Thread Pool Size and Transport Thread Pool Size to Use Server Settings. When configured to Use Server Settings, the handler and transport threads are divided equally between all of the active VHosts that are running on the server.
Additional tuning options
Optimize Transcoder Memory Utilization (Linux Only)
When using Wowza Streaming Engine to perform transcoding, your memory utilization may get higher than expected. Under certain circumstances, this can cause out of memory issues and may result in server crashes. The most common reason this occurs is multiple publish/unpublish of streams, for example, if you frequently publish/unpublish while streaming WebRTC.
To prevent memory over utilization, configure the MALLOC_ARENA_MAX parameter to limit the number of arenas allowed per CPU core. By default, this is set to 8 x Processor Cores. Limiting the number of arenas could impact performance; we recommend starting by limiting the number of arenas allowed per CPU core to 4 x Processor Cores and then limiting more if needed.
For example:
- For a server with 8 processor cores, start by limiting the number of arenas allowed per CPU core to 32 (4 x 8):
export MALLOC_ARENA_MAX=32 - If the memory utilization is still high, limit the number of arenas further until you prevent memory over utilization:
export MALLOC_ARENA_MAX=4
Note: MALLOC_ARENA_MAX can be set to as low as 1, but performance could be impacted.
Add the parameter as an entry at the end of the setenv.sh file found in [install-dir]/bin/ as shown in the preceding examples. After you save your changes, restart Wowza Streaming Engine.
Optimize Transcoder performance
In addition to making sure that the deployed server that is running Wowza Streaming Engine is tuned properly, refer to the following guidance to optimize transcoder performance:
- Determine the available server-to-client bandwidth. Increasing the target bitrate will increase the quality. When you make this kind of change, keep in mind that clients must have enough bandwidth available to play the higher bitrate stream.
- Hardware acceleration is recommended but not required for transcoding. For information about the hardware acceleration resources Wowza Streaming Engine is using, see Verify how Transcoder is running in Wowza Streaming Engine.
- Use the performance benchmark numbers for software (default MainConcept) encoding, NVIDIA NVENC accelerated transcoding, and AMD Xilinx accelerated transcoding in the Wowza Streaming Engine Transcoder performance benchmark article as a guideline for estimating your transcoding performance.
Note: Whether you have multiple encoding presets in one template or multiple templates, performance isn't affected given the same number of incoming live streams and the same number of encoded output renditions.
Optimize for low-latency chat applications
For low-latency chat applications, use smaller socket buffer sizes (16000 bytes for read and write). Socket buffer sizes are configured in [install-dir]/conf/VHost.xml in the <NetConnections>/<SocketConfiguration> property:
<ReceiveBufferSize>16000</ReceiveBufferSize> <SendBufferSize>16000</SendBufferSize>
- Click the Server tab at the top of the page.
- In the Server contents panel, click Virtual Host Setup.
- On the Virtual Host Setup page Properties tab, click Net Connctions in the Quick Links bar.
Note: Access to the Properties tab is limited to administrators with advanced permissions. For more information, see Manage credentials. - In the Net Connections properties area, click Edit, adjust the receiveBufferSize and sendBufferSize property values, and then click Save.
- Restart the virtual host when prompted to apply the changes.
Optimize network socket performance
For optimal network socket performance, set the receive and send socket buffer sizes to 0. Socket buffer sizes are configured in [install-dir]/conf/VHost.xml in the <NetConnections>/<SocketConfiguration> property:
<ReceiveBufferSize>0</ReceiveBufferSize> <SendBufferSize>0</SendBufferSize>
- Click the Server tab at the top of the page.
- In the Server contents panel, click Virtual Host Setup.
- On the Virtual Host Setup page Properties tab, click Net Connctions in the Quick Links bar.
Note: Access to the Properties tab is limited to administrators with advanced permissions. For more information, see Manage credentials. - In the Net Connections properties area, click Edit, adjust the receiveBufferSize and sendBufferSize property values, and then click Save.
- Restart the virtual host when prompted to apply the changes.
CPU resource-based tuning
To tune Wowza Streaming Engine based on the available CPU resources of your server, use the following guidelines:
The [total-core-count] refers to the total number of CPU cores in your server. For example, if you have dual quad-core processors (two quad-core processors), your [total-core-count] is 8.
If your server supports hyper-threading, then use the total number of threads. If hyper-threading is available on a system with dual quad-core processors, for example, the total number of threads is:
2 processors x 4 cores x 2 threads per core = 16
With the number of cores and threads per physical processor continually growing, we suggest a maximum number of threads for each value below, which you can set in the configuration file [install-dir]/conf/VHost.xml:
- HostPort/ProcessorCount – 2 x [total-core-count] (maximum of 32)
- IdleWorkers/WorkerCount – 2 x [total-core-count] (maximum of 32)
- NetConnections/ProcessorCount – 4 x [total-core-count] (maximum of 32)
- RTP/UnicastIncoming/ProcessorCount – 2 x [total-core-count] (maximum of 12)
- RTP/UnicastOutgoing/ProcessorCount – 2 x [total-core-count] (maximum of 24)
- RTP/MulticastIncoming/ProcessorCount – 2 x [total-core-count] (maximum of 12)
- RTP/MulticastOutgoing/ProcessorCount – 2 x [total-core-count] (maximum of 12)
- HandlerThreadPool/PoolSize – (60 x [total-core-count]) (maximum of 750)
- TransportThreadPool/PoolSize – (40 x [total-core-count]) (maximum of 500)
These configuration calculations assume that you have at least 1 GB of memory per core or, if you have four or more total cores, that you're running the 64-bit Java VM, and that you're using the suggested memory settings above.
Tune for multiple virtual hosts
If you're running more than one virtual host (VHost), resource allocations must be distributed between each VHost. The simplest approach to running more than one VHost is to divide the settings listed in "CPU resource-based tuning," which are intended for a single VHost, and distribute the resources across each VHost on your system. The settings don't have to be evenly divided, however, the total should equal what you would allocate if you were configuring for a single VHost. If one of your VHosts is idle most of the time, you may allocate more memory than the combined total. Be careful with this setting because excessive allocations are risky. An out-of-memory error occurs if both VHosts exceed the combined, available resources.
Alternatively, in [install-dir]/conf/VHost.xml, you can set each VHost/HandlerThreadPool/PoolSize and VHost/TransportThreadPool/PoolSize property to 0, which causes the [install-dir]/conf/Server.xml settings for these properties to be used instead. This instructs Wowza Streaming Engine to manage the pool size across all VHosts.
A mixed approach can also be used with the [install-dir]/conf/VHost.xml file by setting the PoolSize properties to 0 for idle/minimal-use VHosts while using higher values for busy/high-performance VHosts with higher resource requirements.
Tune idle client system checks
If you aren't doing low-latency streaming and you have a client-side buffer of 3 or more seconds (NetStream.bufferTime), you can reduce the CPU load on the server and handle more concurrent sessions by changing the following values in [install-dir]/conf/VHost.xml:
- IdleWorkers/CheckFrequency – 50
- Client/IdleFrequency – 250