How to Build Highly Available Discord Bots in 2026

Your Discord bot is growing. Slash commands stop responding during deploys. Users complain about missed events. And somewhere around 2,500 guilds, Discord forces you to shard whether you’re ready or not.

Most sharding tutorials stop at “split your bot into pieces.” They don’t cover what happens when a shard crashes at 3 AM, when a deployment drops events, or when your message queue loses data because no consumer was listening. High availability requires more than just sharding. It demands orchestration and durable messaging working together.

This guide walks through a production-tested architecture that combines Discord gateway sharding, container orchestration (both Kubernetes and Docker Swarm), and NATS JetStream for durable inter-service communication. Every code example is copy-pasteable. Every architectural decision comes from real production experience. And if you’d rather skip the infrastructure management, we’ll show you how A-Line Cloud handles all of this out of the box.

TL;DR: Building a highly available Discord bot requires three layers: gateway shards for connection management, container orchestration (Kubernetes or Docker Swarm) for resilience, and NATS JetStream for durable messaging. Discord mandates sharding at 2,500 guilds (Discord Official Docs), but sharding alone doesn’t prevent downtime. You need all three.

Why Do Discord Bots Need High Availability?

Discord mandates sharding once a bot joins 2,500 guilds, and the gateway enforces a rate limit of 120 events per connection every 60 seconds (Discord Official Docs). With over 259 million monthly active users on the platform (SQ Magazine, 2026), bots serving large communities can’t afford dropped connections or missed events.

But sharding by itself doesn’t solve availability. A sharded bot without orchestration still goes fully offline during deployments. A sharded bot without durable messaging loses events whenever a worker restarts.

True high availability rests on three pillars. First, sharding distributes gateway connections across processes so no single connection handles too many guilds. Second, orchestration (Kubernetes or Docker Swarm) ensures crashed shards restart automatically and deployments roll out without downtime. Third, durable messaging via NATS JetStream guarantees that events persist even when consumers are temporarily offline.

Why does this matter? Because 84% of organizations have adopted microservices architectures (DevOps.com, 2025). Discord bots aren’t exempt from that trend. A bot handling moderation, music, analytics, and notifications is a distributed system whether you designed it that way or not.

Discord requires sharding at 2,500 guilds and enforces a 120-event-per-60-second rate limit per gateway connection (Discord Official Docs). High availability requires combining sharding with container orchestration and durable messaging. Sharding alone doesn’t prevent downtime during deployments or crashes.

How Does Discord Bot Sharding Work Under the Hood?

Discord assigns each guild to a shard using the formula shard_id = (guild_id >> 22) % shard_count, distributing roughly 2,500 guilds per shard (Discord Official Docs). Each shard maintains its own WebSocket connection to the Discord gateway, receiving events only for the guilds assigned to it.

When your bot connects, it tells Discord how many total shards exist and which shard ID this connection represents. Discord then routes guild events (messages, reactions, voice state updates) only to the correct shard’s WebSocket. Each connection maintains its own heartbeat, session ID, and sequence number.

The shard count isn’t arbitrary. Discord’s /gateway/bot endpoint returns a shards field with the recommended count. Bots over 250,000 guilds get a higher max_concurrency value, allowing multiple shards to identify simultaneously instead of one every five seconds.

So how do you actually run those shards? That depends on your sharding strategy.

Internal vs External vs Hybrid Sharding

The choice between sharding strategies affects memory usage, deployment complexity, and recovery time. Here’s how they compare in practice.

Strategy	How It Works	Memory (4K guilds)	Deployment Complexity	Recovery Speed
Internal	One process, multiple gateway connections	~200 MB	Low	Fast (single restart)
External	One process per shard, separate OS processes	~800 MB (4 × 200 MB)	Medium	Slow (restart each)
Hybrid	Multiple shards per process, multiple processes	~400 MB (2 × 200 MB)	Medium	Medium

The discord-hybrid-sharding library demonstrates these savings concretely: it reduces memory from 4 GB to roughly 700 MB, a 75% reduction, in real deployments, and has been tested up to 600,000 guilds (discord-hybrid-sharding GitHub).

Internal sharding is simplest but creates a single point of failure. External sharding is most resilient but wastes memory. Hybrid sharding balances both, and it’s what most production bots at scale use.

Memory comparison across sharding strategies for a bot serving 4,000 guilds. Hybrid sharding cuts memory by 50% compared to standard external sharding.

Hybrid sharding reduces Discord bot memory usage by 40-60% compared to standard external sharding. In real deployments, memory dropped from 4 GB to approximately 700 MB, a 75% savings, with testing validated at 600,000 guilds (discord-hybrid-sharding GitHub).

What Architecture Enables Zero-Downtime Discord Bots?

With 84% of organizations now using microservices (DevOps.com, 2025), the pattern for highly available Discord bots follows a proven three-layer design: gateway shards handle Discord connections, a message bus decouples event processing, and stateless workers consume events independently.

This separation means you can restart workers without touching gateway connections. You can scale event processing without adding more shards. And you can deploy new features without any downtime.

Gateway Layer

The gateway layer owns Discord WebSocket connections and nothing else. Each shard connects to Discord, receives events, and immediately publishes them to NATS JetStream. No business logic lives here.

This constraint is critical. Gateway connections are expensive to re-establish (Discord’s identify rate limit is one per five seconds per shard). By keeping the gateway layer thin, you minimize reasons to restart it.

Message Bus

NATS JetStream sits between the gateway and workers. It persists every event to streams with configurable retention. If a worker crashes, events queue up. When the worker recovers, it picks up exactly where it left off.

Why not Redis Pub/Sub? Because Redis Pub/Sub is fire-and-forget. If no subscriber is listening, the message vanishes. NATS JetStream delivers 200,000-400,000 messages per second with persistence at 1-5ms latency (Onidel 2025 Benchmarks). That’s fast enough for real-time Discord events and durable enough for reliability.

Worker Layer

Workers subscribe to NATS subjects and process events. They’re stateless, so any worker instance can handle any event. This makes horizontal scaling trivial: need more moderation throughput? Add more moderation workers.

Most Discord bot architectures couple gateway connections with business logic. This three-layer design treats the gateway as infrastructure and business logic as independently deployable services. The result? We’ve deployed new features to workers while gateway shards maintained 100% uptime.

At Hylist.io, we run this exact architecture in production. Our Rust/Tokio bot uses 2 replicas on Docker Swarm with PostgreSQL-based shard coordination. NATS JetStream handles 10 production streams feeding 11+ background worker types. Each worker has a dedicated health port (9001-9011), and Docker Swarm monitors them with 10-second health check intervals. The bot processes votes, server monitoring, media transcoding, notifications, telemetry, and auction events, all through decoupled NATS subjects like votes.cast, servers.check, and tasks.media.

NATS JetStream delivers 200,000-400,000 messages per second with persistence at 1-5ms latency, compared to Kafka’s 500,000-1,000,000+ msg/s at 10-50ms latency (Onidel 2025 Benchmarks). For Discord bots, JetStream’s lower latency and simpler operations make it the stronger choice.

How Do You Deploy Discord Shards on Kubernetes?

Kubernetes dominates container orchestration with roughly 92% market share, and 82% of container users run it in production (CNCF Annual Survey). For Discord bot sharding, Kubernetes StatefulSets provide the predictable pod naming that makes shard ID assignment straightforward.

StatefulSets give each pod a stable hostname: discord-shard-0, discord-shard-1, discord-shard-2. The ordinal suffix maps directly to a shard ID. No coordination service needed. No race conditions.

StatefulSet Configuration

This YAML deploys three Discord bot shards with proper readiness probes and rolling update strategy.

# discord-shards-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: discord-shard
  namespace: discord-bot
spec:
  serviceName: discord-shard
  replicas: 3  # One replica per shard
  podManagementPolicy: Parallel  # Start all shards simultaneously
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1  # Only one shard offline during updates
  selector:
    matchLabels:
      app: discord-shard
  template:
    metadata:
      labels:
        app: discord-shard
    spec:
      terminationGracePeriodSeconds: 30  # Time to drain NATS consumers
      containers:
        - name: shard
          image: your-registry/discord-bot:latest
          env:
            # Extract shard ID from pod hostname ordinal
            - name: SHARD_ID
              valueFrom:
                fieldRef:
                  fieldPath: metadata.labels['apps.kubernetes.io/pod-index']
            - name: SHARD_COUNT
              value: "3"
            - name: NATS_URL
              value: "nats://nats.discord-bot.svc.cluster.local:4222"
            - name: DISCORD_TOKEN
              valueFrom:
                secretKeyRef:
                  name: discord-secrets
                  key: bot-token
          ports:
            - containerPort: 8080
              name: health
          # Readiness probe: only route traffic when gateway is READY
          readinessProbe:
            httpGet:
              path: /health/ready
              port: health
            initialDelaySeconds: 10
            periodSeconds: 5
            failureThreshold: 3
          # Liveness probe: restart if shard is stuck
          livenessProbe:
            httpGet:
              path: /health/live
              port: health
            initialDelaySeconds: 30
            periodSeconds: 10
            failureThreshold: 5
          resources:
            requests:
              memory: "256Mi"
              cpu: "250m"
            limits:
              memory: "512Mi"
              cpu: "500m"

The key detail is maxUnavailable: 1. During rolling updates, Kubernetes takes down one shard at a time. Discord’s RESUME protocol lets the restarting shard reconnect without losing its session, provided it resumes within the session timeout window.

Auto-Scaling Shards

Discord’s /gateway/bot endpoint tells you the recommended shard count. Tools like Marver (Marver GitHub) auto-scale Kubernetes StatefulSets based on this value. The Kubecord framework takes it further by combining NATS and Redis for full microservice Discord bot deployments on Kubernetes (Kubecord GitHub).

But should you auto-scale shards aggressively? Probably not. Each new shard needs to identify with Discord’s gateway (one per five seconds). Adding 10 shards takes nearly a minute of identify rate limiting. Scale proactively: check the recommended count weekly and adjust during low-traffic windows.

Kubernetes holds approximately 92% of the container orchestration market, with 82% of container users running it in production according to the CNCF 2025 Annual Survey (CNCF). StatefulSets map pod ordinals directly to shard IDs, eliminating coordination complexity.

Can You Use Docker Swarm Instead of Kubernetes?

Docker Swarm holds only 2.5-5% of the container orchestration market, but Mirantis has committed to Swarm LTS support through 2030 (The Decipherist). For small-to-medium Discord bots, Swarm’s simplicity is a legitimate advantage, especially when you don’t need Kubernetes’ full feature set.

Swarm doesn’t have StatefulSets. That’s the main trade-off. You can’t derive shard IDs from pod ordinals. Instead, you need an external coordination mechanism: PostgreSQL advisory locks, NATS KV, or Redis-based lease acquisition.

When is Swarm enough? If your bot runs under 20 shards, you don’t need auto-scaling, and your team doesn’t already know Kubernetes. The operational overhead of a Kubernetes cluster (etcd, control plane, RBAC, networking plugins) is significant for a single bot deployment.

When do you need Kubernetes? When you’re running 50+ shards, need automatic shard scaling, or already have Kubernetes infrastructure for other services.

Swarm Stack Configuration

This Docker Compose file deploys a two-replica Discord bot with NATS and PostgreSQL for shard coordination.

# docker-compose.swarm.yml
# Deploy with: docker stack deploy -c docker-compose.swarm.yml discord-bot
version: "3.8"

services:
  discord-bot:
    image: your-registry/discord-bot:latest
    deploy:
      replicas: 2
      update_config:
        parallelism: 1        # Rolling update: one replica at a time
        delay: 30s             # Wait for shard RESUME before updating next
        order: stop-first
      restart_policy:
        condition: on-failure
        delay: 5s
        max_attempts: 3
    environment:
      NATS_URL: "nats://nats:4222"
      DATABASE_URL: "postgres://bot:secret@postgres:5432/discord_bot"
      SHARD_COUNT: "4"
      # Each replica claims shards dynamically from PostgreSQL
      SHARD_STRATEGY: "postgres_coordination"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9001/health"]
      interval: 10s
      timeout: 5s
      retries: 3
    networks:
      - bot-internal

  nats:
    image: nats:latest  # Under 10 MB image
    command: ["--jetstream", "--store_dir", "/data"]
    deploy:
      replicas: 1
      resources:
        limits:
          memory: 128M  # NATS needs only 32 MiB minimum
    volumes:
      - nats-data:/data
    networks:
      - bot-internal

  postgres:
    image: postgres:16-alpine
    environment:
      POSTGRES_DB: discord_bot
      POSTGRES_USER: bot
      POSTGRES_PASSWORD: secret
    volumes:
      - pg-data:/var/lib/postgresql/data
    networks:
      - bot-internal

volumes:
  nats-data:
  pg-data:

networks:
  bot-internal:
    driver: overlay

At Hylist.io, we chose Docker Swarm over Kubernetes specifically because our bot runs 2 replicas with 4 shards, not enough scale to justify Kubernetes. Our PostgreSQL-based shard coordination uses a discord_bot_shards table where each replica heartbeats every 10 seconds. If a replica misses three heartbeats (30-second stale threshold), its shards become reclaimable. PostgreSQL advisory locks via pg_try_advisory_xact_lock prevent race conditions during shard claiming, and graceful shutdown calls release_all_shards() to return shards immediately.

The NATS server Docker image is under 10 MB and needs only 1 vCPU and 32 MiB of RAM minimum (NATS Official Docs). It’s trivially lightweight even on a small VPS.

Docker Swarm holds 2.5-5% market share versus Kubernetes’ approximately 92%, but Mirantis committed to Swarm LTS through 2030 (The Decipherist). For Discord bots under 20 shards, Swarm’s operational simplicity often outweighs Kubernetes’ orchestration features.

Why Use NATS JetStream for Inter-Shard Communication?

NATS JetStream delivers 200,000-400,000 messages per second with persistence at 1-5ms latency, while its core protocol handles 11-12 million messages per second without persistence (Onidel 2025 Benchmarks). For Discord bots, that combination of speed and durability solves the biggest pain point: losing events when consumers restart.

Redis Pub/Sub is fast, but it’s fire-and-forget. Subscriber offline? Message gone. RabbitMQ persists messages but maxes out around 50,000-100,000 messages per second. Kafka handles massive throughput but adds serious operational complexity: ZooKeeper (or KRaft), topic partitions, consumer group rebalancing.

NATS JetStream hits the sweet spot. Persistent delivery. Simple operations. Low latency. And its server image is under 10 MB (NATS Official Docs).

NATS Core leads raw throughput at 11M msg/s. With persistence enabled, NATS JetStream delivers 300K msg/s, fast enough for Discord event volumes while guaranteeing durability.

NATS JetStream delivers 3ms median latency with persistence, 10x faster than Kafka and 4x faster than RabbitMQ. For real-time Discord event processing, that difference matters.

Stream and Consumer Configuration

Here’s how to set up NATS JetStream streams and consumers for a Discord bot. This example uses the NATS CLI, but you can do the same programmatically in any language with a NATS client.

# nats-stream-setup.sh
# Create a stream for Discord gateway events
nats stream add DISCORD_EVENTS \
  --subjects "discord.events.>" \
  --storage file \
  --retention limits \
  --max-msgs -1 \
  --max-bytes 1GB \
  --max-age 24h \
  --discard old \
  --replicas 1  # Use 3 for production NATS clusters

# Create a durable consumer for the moderation worker
nats consumer add DISCORD_EVENTS moderation-worker \
  --filter "discord.events.message_create" \
  --deliver all \
  --ack explicit \
  --max-deliver 3 \
  --ack-wait 30s \
  --pull \
  --max-pending 100

# Create a consumer for voice state tracking
nats consumer add DISCORD_EVENTS voice-tracker \
  --filter "discord.events.voice_state_update" \
  --deliver all \
  --ack explicit \
  --max-deliver 5 \
  --ack-wait 10s \
  --pull \
  --max-pending 50

The --ack-wait flag is especially important. It defines how long JetStream waits for an acknowledgment before redelivering the message. Set it too low and you’ll get duplicate processing during slow operations. Set it too high and recovery after crashes takes longer.

In our Hylist.io production system, we run 10 NATS JetStream streams with purpose-specific subjects. The TASKS stream handles three subjects (tasks.process, tasks.media, tasks.dead) with separate consumer groups for CPU-bound and I/O-bound work. Our SERVERS stream runs 10 consumer replicas for the status-checking worker because server monitoring is both latency-sensitive and high-volume. Each stream’s retention and ack-wait values were tuned through production load testing, not guesswork.

NATS JetStream provides durable message delivery at 200,000-400,000 msg/s with 1-5ms latency, while requiring a server image under 10 MB and minimum resources of 1 vCPU and 32 MiB RAM (NATS Official Docs, Onidel 2025 Benchmarks). This makes it viable even on small VPS deployments.

How Do You Handle Failover and Recovery?

Discord’s gateway supports session resumption: a reconnecting shard sends its session_id and last sequence number, and Discord replays missed events without requiring a full re-identify (Discord Official Docs). Combined with NATS JetStream’s durable consumers, this creates a two-layer recovery mechanism that handles both gateway disconnects and worker failures.

Here’s how failover works at each layer.

Gateway layer recovery. When a shard disconnects, it first attempts RESUME with its cached session ID and sequence number. Discord replays events from the sequence gap. If the session expired (usually after 30-60 seconds of disconnection), the shard falls back to a full IDENTIFY. Slower, but still automatic.

Worker layer recovery. NATS JetStream durable consumers track the last acknowledged message per consumer. When a worker crashes and restarts, it resumes from the last unacknowledged message. No events are lost. The --max-deliver flag prevents poison messages from crashing workers in an infinite loop.

Orchestrator recovery. Kubernetes liveness probes detect stuck shards and restart them. Readiness probes prevent traffic routing to shards that haven’t completed their gateway handshake. Docker Swarm health checks serve the same function with simpler configuration.

What about graceful shutdown? This is where most teams get tripped up. When a pod receives SIGTERM, the shard should drain its NATS consumers first: finish processing in-flight messages and acknowledge them. Only then should it close the Discord gateway connection. The terminationGracePeriodSeconds in Kubernetes (or stop_grace_period in Swarm) must be long enough for this drain to complete.

# graceful_shutdown.py - Example shutdown handler (Python with py-cord)
import signal
import asyncio

class GracefulShutdown:
    def __init__(self, bot, nats_client):
        self.bot = bot
        self.nats = nats_client
        self.shutting_down = False

    def register(self):
        """Register signal handlers for graceful shutdown."""
        loop = asyncio.get_event_loop()
        for sig in (signal.SIGTERM, signal.SIGINT):
            loop.add_signal_handler(sig, lambda: asyncio.create_task(self.shutdown()))

    async def shutdown(self):
        """Drain NATS consumers, then close Discord gateway."""
        if self.shutting_down:
            return
        self.shutting_down = True

        print("Received shutdown signal, draining NATS consumers...")
        # Step 1: Drain NATS - finish in-flight messages
        await self.nats.drain()
        print("NATS consumers drained.")

        # Step 2: Close Discord gateway connection
        print("Closing Discord gateway...")
        await self.bot.close()
        print("Shutdown complete.")

In our Hylist.io deployment, each worker has a unique health port (9001 through 9011). Docker Swarm checks GET /health every 10 seconds with a 5-second timeout and 3 retries before marking a container unhealthy. We also use Sentry for error tracking at a 20% trace sample rate, and structured logging via Rust’s tracing crate with a non-blocking writer, because a blocking log writer on an async Tokio runtime would defeat the purpose of async entirely.

Discord’s gateway RESUME protocol replays missed events using a cached session ID and sequence number, while NATS JetStream durable consumers automatically replay unacknowledged messages after worker restarts (Discord Official Docs). Together, they create two-layer recovery that handles both network disconnects and process crashes.

What Should You Monitor in a Sharded Bot?

With over 12 million active Discord bots serving 259.2 million monthly active users (SQ Magazine, 2026), operational visibility isn’t optional. It’s the difference between catching a failing shard at 3 AM and waking up to a flood of user complaints.

Monitor these metrics at three levels.

Per-shard metrics tell you whether individual gateway connections are healthy. Track heartbeat ACK latency (should stay under 500ms), guild count per shard (should be roughly equal), and event throughput (sudden drops indicate a stuck shard). If one shard’s heartbeat latency spikes while others stay flat, that shard’s connection is degraded.

NATS JetStream metrics reveal backpressure and processing failures. The most critical metric is consumer pending count, meaning how many messages are waiting to be delivered. A growing pending count means workers can’t keep up. Also watch ack-wait expirations (messages redelivered because workers didn’t acknowledge in time) and redelivery rate (high values suggest poison messages or resource starvation).

Infrastructure metrics cover the basics: CPU, memory, network I/O per container. Set alerts on memory growth trends, not just absolute thresholds. A shard leaking 1 MB per hour won’t trigger a 512 MB alert for three weeks, but a trend alert catches it in days.

Here are practical alerting thresholds to start with:

Metric	Warning	Critical
Heartbeat ACK latency	> 1,000 ms	> 5,000 ms
Consumer pending count	> 1,000	> 10,000
Redelivery rate	> 5%	> 15%
Shard event gap	> 30 seconds	> 120 seconds
Container memory	> 80% limit	> 95% limit

Don’t over-alert. Start with five to ten alerts and expand based on incidents. Every alert that doesn’t require action trains your team to ignore alerts, and that’s worse than having no alerts at all.

Over 12 million active Discord bots serve a platform of 259.2 million monthly active users (SQ Magazine, 2026). Key monitoring targets for sharded bots include per-shard heartbeat latency (under 500ms), NATS consumer pending counts, and redelivery rates exceeding 5%.

What If You Don’t Want to Manage All This?

Let’s be honest: the architecture above works, but it’s a lot of moving parts. You’re writing Kubernetes manifests, managing NATS clusters, configuring health probes, tuning auto-scaling thresholds, and maintaining the entire stack yourself. That’s before you’ve written a single line of bot logic.

This is exactly why we built A-Line Cloud.

A-Line Cloud is a managed Kubernetes platform with NATS JetStream built in. All three pillars from this guide (orchestration, durable messaging, and auto-scaling) come out of the box. Here’s what changes:

DIY (This Guide)	A-Line Cloud
Write StatefulSet YAML manifests	Push code, we generate the deployment
Provision and manage NATS clusters	NATS JetStream included, streams auto-configured
Configure KEDA scalers manually	Set auto-scale triggers from the dashboard (including NATS consumer pending count)
Maintain etcd, kubelet, certificates	Fully managed control plane
Build CI/CD pipelines for rolling updates	Git push → zero-downtime deploy
Debug node pool scaling	Auto-scaling with 7-day utilization averages and buffer zones
Manage TLS certificates and load balancers	Automatic TLS, one-click domain routing

The NATS-based auto-scaling is particularly relevant for Discord bots. When your consumer pending count grows, meaning events are arriving faster than workers can process them, A-Line Cloud’s KEDA integration automatically spins up more worker replicas. When the queue drains, it scales back down. No YAML. No threshold tuning. Just a slider in the dashboard.

You still own your architecture. Your gateway shards, your NATS subjects, your worker logic. All the patterns from this guide apply. A-Line Cloud just handles the infrastructure underneath so you can focus on building the bot instead of operating the platform.

We’re currently in early access. Join the waitlist to get notified when spots open up.

Frequently Asked Questions

How many guilds before Discord requires sharding?

Discord mandates sharding at 2,500 guilds. You literally can’t connect without it past that threshold. The gateway rate limit is 120 events per connection per 60 seconds (Discord Official Docs). Plan your sharding strategy before you reach 2,000 guilds so the transition isn’t an emergency.

Is NATS JetStream fast enough for real-time Discord events?

Yes. NATS JetStream handles 200,000-400,000 messages per second with persistence at 1-5ms latency (Onidel 2025 Benchmarks). Discord’s gateway sends far fewer events per second than that, even for bots in thousands of guilds. The bottleneck won’t be your message broker.

Can I run a highly available Discord bot without Kubernetes?

Absolutely. Docker Swarm handles orchestration for bots under 20 shards with far less operational complexity. Mirantis committed to Swarm LTS through 2030 (The Decipherist). Use PostgreSQL advisory locks or NATS KV for shard coordination instead of StatefulSets.

How does hybrid sharding reduce memory usage?

Hybrid sharding runs multiple gateway connections per OS process, sharing the Node.js runtime and cached data. The discord-hybrid-sharding library reduces memory by 40-60%, from 4 GB to roughly 700 MB in tested deployments with up to 600,000 guilds (discord-hybrid-sharding GitHub). Each process handles a cluster of shards instead of just one.

What happens to messages when a NATS consumer goes offline?

Nothing is lost. JetStream durable consumers track their position in the stream. When the consumer reconnects, it receives all unacknowledged messages starting from where it left off. The --max-deliver flag (typically set to 3-5) prevents poison messages from being redelivered infinitely. Undeliverable messages can be routed to a dead-letter subject.

Conclusion

Building a highly available Discord bot isn’t about any single technology. It’s about three layers working together: sharding for connection distribution, orchestration for automated recovery, and durable messaging for guaranteed event delivery.

Start with the architecture that matches your scale. A bot with 5,000 guilds doesn’t need Kubernetes. Docker Swarm with PostgreSQL shard coordination works. A bot approaching 100,000 guilds should invest in Kubernetes StatefulSets and auto-scaling. Or skip the infrastructure management entirely and let A-Line Cloud handle it.

The key insight from running this in production? Keep your gateway layer thin. Push all business logic into stateless workers behind NATS JetStream. That separation is what makes zero-downtime deployments possible. You can update workers without touching a single gateway connection.

Every code example in this guide is production-tested. Start with the architecture, implement the layer that solves your biggest pain point first, and iterate from there.