Cluster Configuration

When to Enable Cluster Mode

Single-process deployments usually do not need cluster mode.

Enable it when:

Multiple SyncTV replicas serve the same instance.
Multiple servers share one PostgreSQL and Redis backend.
Cross-node room sync, kicks, cache invalidation, livestream coordination, or leader election are required.

cluster:
  enabled: true

Redis and server.cluster_secret become required.

Design Goal

Cluster mode solves runtime consistency when multiple SyncTV processes serve the same instance. It is not just a load-balancing switch, and it does not replace PostgreSQL, Redis, Ingress, or a deliberate livestream storage/proxy model.

Single Source of Truth

PostgreSQL stores durable state: users, rooms, permissions, providers, preferences, and audit data. Every node must use the same database.

Runtime Coordination

Redis stores ephemeral shared state: node registration, pub/sub, Redis Stream catch-up, leader election, rate limits, and short-lived auth state.

Inter-Node Trust

server.cluster_secret authenticates inter-node gRPC calls. It must be identical across replicas and must not be exposed to clients.

Livestream Reachability

RTMP publishers can land on any node, and HLS segments can be requested from any node; livestreaming therefore needs a publisher registry and a deliberate choice between local and shared HLS backends.

SyncTV cluster runtime architecture showing clients reaching multiple nodes through HTTP/gRPC, with nodes sharing PostgreSQL and Redis, and reading livestream segments through an HLS backend or publisher-node proxy. — In cluster mode, durable business state goes to PostgreSQL and cross-node runtime state goes to Redis. HLS can use local backends with publisher-node proxying, or shared filesystem/OSS backends for direct segment reads from every replica.

How It Works

Each node starts with the same PostgreSQL and Redis backends and derives or reads its node identity.
Nodes register and discover peers through cluster.discovery_mode. The default redis mode fits most environments; Kubernetes deployments can use k8s_dns to assist Pod discovery.
Room events, permission changes, cache invalidations, and kicks are distributed through Redis pub/sub. After a short disconnect, nodes replay recent Redis Stream events within cluster.catchup_window_secs.
Background work uses cluster.leader_election_mode so only one replica performs global tasks at a time.
Livestream publishers are recorded in a shared registry so another node can determine which node owns a room/media publisher.
If an HLS request lands on a non-publisher node and the segment is not directly readable there, the node proxies playlist/segment reads to the publisher node through the HLS gRPC proxy.

Deployment Shapes

Shape	Use case	Recommended configuration
Single process	Small self-hosted instance, development, testing	`cluster.enabled=false`; Redis is optional but recommended in production
Fixed multi-node	Stable VM or bare-metal node count	`cluster.enabled=true`, `discovery_mode=static` or `redis`
Kubernetes replicas	Horizontal scaling, rolling updates, Ingress exposure	`cluster.enabled=true`, `discovery_mode=redis` or `k8s_dns`, separate HTTP/gRPC Services
Low-traffic multi-replica livestream	HLS/FLV playback must survive cross-node routing, but segment request volume is low	`cluster.enabled=true`; `memory` or local `file` can rely on publisher-node proxying
High-traffic multi-replica livestream	High HLS request volume or cleaner rolling-upgrade boundaries	Add `file` shared storage or the `oss` HLS backend on top of cluster configuration

Requirements

All nodes must:

Connect to the same PostgreSQL database.
Connect to the same Redis deployment.
Use the same server.cluster_secret.
Be able to reach each other’s API/gRPC address.

Channel Capacity

Field	Default	Purpose
`cluster.critical_channel_capacity`	`1000`	High-priority events such as kicks and permission changes
`cluster.publish_channel_capacity`	`10000`	Normal Redis publish events

Critical events apply backpressure when full. Normal events may be dropped under extreme pressure with a warning to protect the main flow.

Discovery Modes

`redis`

Default mode. Nodes register and discover each other through Redis.

Use it for most deployments because it works in Docker, servers, and Kubernetes.

`static`

Use explicit peer addresses:

cluster:
  discovery_mode: "static"
  peers:
    - "node2.example.com:8080"
    - "node3.example.com"

This works for small fixed-size clusters. If a peer omits the port, SyncTV tries server.port.

`k8s_dns`

Uses Kubernetes headless service DNS to discover Pods.

Required environment variables:

HEADLESS_SERVICE_NAME
POD_NAMESPACE

Kubernetes DNS does not replace Redis. Redis is still required for health monitoring, load balancing state, pub/sub, and catch-up.

Leader Election

Mode	Use case
`redis`	Default; works across Docker, servers, and Kubernetes
`k8s_lease`	Kubernetes-native Lease resource

k8s_lease requires:

POD_NAME
POD_NAMESPACE
RBAC permissions for coordination.k8s.io/v1 Lease resources

The Helm chart can render the required Kubernetes resources when configured.

Stream Catch-Up

cluster.catchup_window_secs default: 300.

When a node joins or reconnects, it replays recent Redis Stream events within this window. Increase it if startup is slow or you want more conservative replay. Decrease it if event volume is high and fast startup matters more.

cluster.stream_max_length default: 10000.

This controls approximate Redis Stream retention. If traffic is high and nodes disconnect, too small a value can trim events before a node catches up. Larger values use more Redis memory.

HLS and Multi-Replica Deployments

Clustered HLS has two supported models. SyncTV records the publisher owner in the publisher registry and can proxy playlist/segment reads from non-publisher nodes to the publisher node through the HLS gRPC proxy.

Publisher-Node Proxy

Applicable backends:

memory
file with hls_shared_storage=false

This model is simple and does not require a shared segment directory. The tradeoff is that remote HLS segment requests go through the publisher node; if that node restarts, becomes unreachable, or is partitioned, remote nodes may be unable to read the stream’s segments.

Example:

livestream:
  hls_storage_backend: "memory"

or:

livestream:
  hls_storage_backend: "file"
  hls_shared_storage: false
  hls_storage_path: "/var/lib/synctv/hls"

Shared Backend

Filesystem option:

livestream:
  hls_storage_backend: "file"
  hls_shared_storage: true
  hls_storage_path: "/var/lib/synctv/hls"

All replicas must read and write the same path, for example through NFS, an RWX PVC, or a CSI volume.

Object storage option:

livestream:
  hls_storage_backend: "oss"
  hls_oss:
    endpoint: "https://s3.example.com"
    bucket: "synctv-hls"
    base_path: "synctv/hls/"

The oss backend uses S3-compatible object storage and does not use hls_shared_storage. hls_shared_storage=true is valid only with hls_storage_backend=file; configuration validation rejects it with memory or oss.

Helm Checklist

The Helm chart does not enable cluster mode by default. Before scaling replicas, explicitly set config.cluster.enabled=true. HLS can start with the publisher-node proxy model; for high-traffic production HLS, configure shared file-backed HLS or OSS-backed HLS.

Before scaling replicas:

Redis is configured and reachable.
server.cluster_secret is stable and shared by every replica.
The HLS model is explicit: local backend with publisher-node proxying for small deployments, or file + hls_shared_storage=true + RWX/PVC / oss for high-traffic production.
HTTP and gRPC Services/Ingresses match your network design.
Leader election mode matches your platform.