Capacity Planning

Planning Dimensions

Resource	Driven by	Main signals
HTTP/gRPC	Login, lists, Provider browse, polling	`http_requests_total`, latency, errors
WebSocket	Online users, rooms, multi-tab clients	`websocket_connections_active`, message rate
PostgreSQL	Rooms, members, playlists, permissions	Pool usage, query latency, waiting connections
Redis	Rate limits, OAuth2/WebAuthn state, cache, cluster	Latency, errors, pub/sub health
Provider/proxy	Upstream requests, Range, bandwidth, cache	Provider errors, proxy latency, cache hit ratio
Livestream	RTMP publish, HLS/FLV, storage	Publishers, viewers, bytes, pull errors

Shape	Fits	Required action
Single SyncTV + PostgreSQL	Small evaluation or personal production	Persistent secrets, DB backups, TLS
Single node + Redis	Recommended production baseline	Redis HA or accepted short-state loss
Multi-replica SyncTV	Rolling update or horizontal scale	Shared PostgreSQL, Redis, cluster secret, drain
Multi-replica + livestream	Large live or highly available entry	Clear HLS backend and publisher proxy/storage validation

Do not add replicas before database, Redis, secrets, and HLS storage boundaries are clear.

Estimate concurrent users and connections per user.
Estimate active rooms, members per room, and message frequency.
Estimate login, refresh, Provider browse, playback info, and list requests per minute.
Estimate media mode: direct, proxy, livestream, average bitrate.
Calculate proxy bandwidth: proxy viewers times average bitrate times peak factor.
Calculate database pool total: replicas times database.max_connections.
Set independent alerts for Redis, PostgreSQL, Ingress, and SyncTV.

Configuration	Purpose
`database.max_connections`	Per-replica database pool limit
`redis.*`	L2 cache, rate limits, short-lived state, cluster coordination
`connection_limits.*`	WebSocket user, room, global, lifetime, and per-connection message limits
`request_rate_limits.websocket_*`	WebSocket connection-attempt limits
`request_rate_limits.*`	HTTP login, API, media, admin, and streaming limits
`request_rate_limits.*`	gRPC API and verification limits
`messaging_rate_limits.*`	Chat message limits
`proxy_slice_cache.*`	Range slice cache and file backend
`server.shutdown_drain_timeout_seconds`	Connection drain during rolling updates
`cluster.*`	Discovery, leader election, catch-up
`livestream.*`	RTMP/FLV/HLS and backend

Database guidance:

Keep total pool size below the database limit, leaving room for migrations and operations.
In multi-replica mode, multiply pool size by replica count.
Use pagination for large lists.
Validate migrations against production-like data.

Redis guidance:

Multi-replica mode requires shared Redis and a consistent redis.key_prefix.
Redis loss affects OAuth2 state, WebAuthn challenge, email codes, rate limits, token blacklist, and cluster short-lived state.
If strong token revocation matters, use Redis HA and shorter JWT access token lifetime.

Proxy playback puts SyncTV in the media data path:

proxy egress bandwidth = concurrent_proxy_viewers * average_bitrate * peak_factor

Example:

40 * 6 Mbps * 1.3 = 312 Mbps

Slice cache can reduce upstream fetches for shared content, but it does not reduce SyncTV-to-client egress bandwidth.

Alert	Threshold idea
Sustained HTTP 5xx	Route-level growth for 5-10 minutes
p95 HTTP latency	Separate API, Provider, and proxy paths
WebSocket active near limit	Warn around 70%-80%
DB connection waiting	Sustained nonzero waiting is actionable
Redis pub/sub unhealthy	Multi-replica realtime risk
Provider timeout/5xx	Separate upstream and local network
Livestream pull errors	Check publisher, HLS backend, slow clients

Metric names are listed in Metrics Catalog.

Test real login, room lists, member lists, playback info, and Provider browse paths. Do not only load test /health/ready.