NavTalk Official Support for NVIDIA RTX 5090 on Linux

NavTalk’s digital human lip-sync and real-time audio/video capabilities are fully supported for deployment and operation on Linux servers equipped with NVIDIA RTX 5090. End-to-end adaptation and validation—from drivers and frameworks to the inference engine—have been completed for the latest generation (Blackwell architecture and corresponding NVIDIA drivers and libraries), ensuring a stable, high-performance real-time digital human experience on current hardware.

This document describes NavTalk’s official support for RTX 5090 on Linux in terms of technology stack, adaptation work, and product value, and provides recommended concurrent real-time chat Session counts for RTX 5090 / 4090 / 3090 based on measured results, for evaluation and sizing reference.

1. Why RTX 5090 and Linux Matter

▪️ Compute upgrade: RTX 5090 is based on the Blackwell architecture, with significantly higher memory and compute, suited for real-time high-resolution lip-sync and multi-session concurrency.

▪️ Linux first: Most production and cloud environments run Linux; NavTalk offers a full set of services on Linux (including real-time lip-sync, video lip-sync, and other APIs), making integration and scaling straightforward.

▪️ Long-term compatibility: Adaptation has been completed for the latest NVIDIA drivers and AI runtime (e.g. CUDA 12.8, PyTorch 2.7), keeping NavTalk aligned with the official software stack for the foreseeable future and reducing upgrade cost.

Thus, “deployable, operable, and scalable” on RTX 5090 Linux is a clear commitment from NavTalk for production and high-end compute scenarios. We recommend using NVIDIA drivers that support RTX 5090 (e.g. 5xx series) and a common Linux distribution (e.g. Ubuntu 22.04 LTS or newer).

2. Technology Stack and Adaptation

NavTalk’s runtime on RTX 5090 Linux is selected and validated separately from environments used for older GPUs (e.g. CUDA 11.8), and is maintained independently to avoid wrong or mixed installations and to simplify environment isolation and issue reproduction.

2.1 Core Runtime (5090-specific)

The table below lists officially verified software versions for NavTalk on RTX 5090, for operations and integration reference. Python is the runtime; CUDA is the NVIDIA compute platform; PyTorch is the main framework for AI models; mmcv / mmdet / mmpose are the vision libraries used for face and pose, etc.

Component	5090 Linux recommended version	Notes
Python	3.10.11	Runtime version
CUDA	12.8	NVIDIA compute platform for RTX 5090
PyTorch	2.7.0+cu128	AI model framework (vision, audio, etc.)
TensorFlow	≥2.16.0	Required when enabling related features
NumPy	1.26.0	Numerical library, compatible with image processing
mmcv	2.1.0	Computer vision base (face, image processing, etc.)
mmdet	3.2.0	Detection library paired with mmcv
mmpose	1.2.0	Pose library paired with mmcv

NavTalk maintains a dedicated dependency list for the 5090 environment, including the above components and versions, with notes on TensorFlow, CUDA 12.8, NumPy, etc., separate from older GPU environments, reflecting 5090-specific adaptation and maintainability.

2.2 5090 Architecture Compatibility

RTX 5090 uses the new Blackwell architecture (compute capability 9.0). Some vision libraries may not ship prebuilt packages for 5090. Compatibility has been verified and adapted for the 5090 architecture so that face, pose, and related capabilities run correctly on 5090, enabling full usability.

2.3 Inference and Model Management

▪️ NavTalk’s lip-sync core is based on MuseTalk 1.5 (a widely used high-quality lip-sync model) and runs on 5090 with the PyTorch 2.7 + CUDA 12.8 stack above.

▪️ NavTalk provides unified GPU and model management: models are loaded on demand, and multi-task contention for the GPU is avoided, improving stability in multi-service or multi-GPU setups and long-term operation on 5090.

All versions and adaptation work above have been verified, representing reproducible, deliverable engineering support, not just “theoretical” compatibility.

3. Product Value and Use Cases

▪️ Latency and quality: On 5090, NavTalk can leverage the new generation’s compute for real-time lip-sync at 30+ fps and higher resolution with multi-session concurrency, suitable for digital humans, virtual hosts, and live interaction where latency and quality matter.

▪️ Service forms: On 5090 Linux, NavTalk offers real-time lip API, video lip API, digital human avatar API, and other interfaces for live, recorded, and interactive use; the real-time lip API is optimized for low latency and streaming.

▪️ Production-ready: Concurrency, quality enhancements (e.g. face enhancement, mouth sharpening), GPU options, and output directories are configurable, easing integration with your existing business systems, storage, and monitoring.

Thus, NavTalk on 5090 Linux is not only “runs” but full production support for the latest compute, supporting evaluation and rollout.

4. RTX 5090 / 4090 / 3090 Concurrency and Responsiveness

Conclusions in this section are based on single-node, single-GPU measured memory usage (service port 8800, real-time chat WebSocket call scenario). The following gives RTX 5090, 4090, and 3090 concurrent Session recommendations from a memory perspective; if the GPU is shared with other processes (e.g. LLM services), recalculate using available memory.

4.1 Memory and Single-Session Peak (Measured)

Item	Value	Notes
RTX 5090 total memory	32,607 MiB (~31.8 GiB)	Single-GPU physical memory; after small desktop usage, still ~32 GiB for planning.
Single-session real-time chat peak	10,410 MiB (~10.2 GiB)	NavTalk process group usage when one real-time chat Session is inferring.

Composition of the single-session peak (measured): main process during inference ~8,746 MiB, plus two worker processes at 832 MiB each, total 8,746 + 832×2 = 10,410 MiB. In the current deployment, each real-time chat Session corresponds to a separately started service process set (not multi-threaded sharing), so each additional Session adds ~10.2 GiB memory; this peak is used for sizing.

Share of total capacity: 10,410 MiB ÷ 32,607 MiB ≈ 31.9%.

4.2 Concurrent Session Count (Memory-Based)

When NavTalk has exclusive use of RTX 5090:

▪️ Usable memory for NavTalk is 32,607 MiB (still close to 32 GiB after desktop, etc.).

▪️ Floor by single-session peak 10,410 MiB: 32,607 ÷ 10,410 ≈ 3.13 → 3 concurrent real-time chat Sessions.

▪️ Check: 3 × 10,410 = 31,230 MiB < 32,607 MiB; ~1,377 MiB headroom for fragmentation and short-term spikes.

When other processes use GPU memory (e.g. LLM inference, other services):

▪️ Available memory = 32,607 MiB − other process usage;

▪️ Concurrent Sessions = ⌊ available memory ÷ 10,410 ⌋ (floor).

▪️ Actual concurrency limits also depend on system RAM, CPU, and network; we recommend load testing in the target environment (including whether the GPU is shared).

4.3 Three-GPU Concurrent Session Recommendations (Measured and Inferred)

Single-session real-time chat peak is taken from 5090 measurements: 10,410 MiB (~10.2 GiB). 5090 was tested on Linux; 4090 and 3090 on Windows. Using memory floor and measured results, recommended planning is:

GPU	Total memory	Environment	Recommended concurrent real-time chat Sessions
RTX 5090	32,607 MiB (~31.8 GiB)	Linux	3
RTX 4090	24,564 MiB (~24.0 GiB)	Windows	2
RTX 3090	24,576 MiB (~24.0 GiB)	Windows	1

The above are memory-based recommendations; actual limits also depend on system memory, CPU, and network. We recommend load testing in the target environment.

5. Summary

▪️ NavTalk is officially supported and runs fully on NVIDIA RTX 5090 + Linux. For 5090, NavTalk specifies runtime versions (e.g. Python 3.10, CUDA 12.8, PyTorch 2.7), a 5090-specific dependency list, and recommended versions for face/pose and related libraries, with end-to-end adaptation and validation from drivers through the inference engine.

▪️ Compatibility has been addressed for the 5090 architecture; where prebuilt packages are unavailable, building from source and similar approaches are supported to run correctly on 5090.

▪️ Concurrency and responsiveness: Based on measurements and memory sizing, RTX 5090 (Linux, exclusive) supports 3 concurrent real-time chat Sessions, RTX 4090 (Windows) is recommended at 2, and RTX 3090 (Windows, with system/desktop usage) at 1; if the GPU is shared with other processes, recalculate from available memory. Higher compute improves low-latency and real-time lip experience.

This document describes product-level support for RTX 5090 on Linux, for external communication and technical evaluation.