Skip to main content

Advanced Kafka Resilience: Dead-Letter Queues, Circuit Breakers, and Exactly-Once Delivery

Introduction

In distributed systems, failures are inevitable—network partitions, broker crashes, or consumer lag can disrupt data flow. While retries help recover from transient issues, you need stronger guarantees for mission-critical systems.

This guide covers three advanced Kafka resilience patterns:

  1. Dead-Letter Queues (DLQs) – Handle poison pills and unprocessable messages.
  2. Circuit Breakers – Prevent cascading failures when Kafka is unhealthy.
  3. Exactly-Once Delivery – Avoid duplicates in financial/transactional systems.

Let's dive in!


1. Dead-Letter Queues (DLQs) in Kafka

What is a DLQ?

A dedicated Kafka topic where "failed" messages are sent after max retries (e.g., malformed payloads, unrecoverable errors).

Why Use DLQs?

  • Isolate bad messages instead of blocking retries.
  • Audit failures for debugging.
  • Reprocess later (e.g., after fixing a bug).

Implementation (Spring Kafka)

Step 1: Configure a DLQ Topic

bash
kafka-topics --create --topic orders-dlq \
  --partitions 3 --replication-factor 2

Step 2: Route Failures to DLQ

java
@KafkaListener(topics = "orders")
public void listen(Order order, @Header(KafkaHeaders.RECEIVED_TOPIC) String topic) {
    try {
        processOrder(order); // May throw UnprocessableOrderException
    } catch (Exception ex) {
        // Send to DLQ
        kafkaTemplate.send("orders-dlq", order.getKey(), order);
    }
}

Step 3: Add Retry + DLQ Logic

yaml
spring:
  kafka:
    listener:
      default:
        retry:
          max-attempts: 3
          backoff: 1000ms

Key Properties:

  • max-attempts: Retry limit before DLQ.
  • backoff: Delay between retries.

2. Circuit Breakers for Kafka Producers

What is a Circuit Breaker?

A pattern that stops sending requests to a failing service (e.g., Kafka) to avoid cascading failures.

Why Use It?

  • Prevent thread pool exhaustion from endless retries.
  • Fail fast when Kafka is down for minutes/hours.

Implementation (Resilience4j)

Step 1: Add Dependencies

xml
<dependency>
    <groupId>io.github.resilience4j</groupId>
    <artifactId>resilience4j-spring-boot2</artifactId>
</dependency>

Step 2: Annotate Kafka Producer

java
@CircuitBreaker(name = "kafkaProducer", fallbackMethod = "fallbackSend")
public void sendOrder(Order order) {
    kafkaTemplate.send("orders", order.getId(), order);
}

public void fallbackSend(Order order, Exception ex) {
    // Store in DB or local queue
    deadLetterQueue.save(order); 
}

Step 3: Configure Circuit Breaker

yaml
resilience4j:
  circuitbreaker:
    instances:
      kafkaProducer:
        failure-rate-threshold: 50    # Trip after 50% failures
        wait-duration-in-open-state: 30s
        sliding-window-size: 10       # Last 10 calls

Behavior:

  • If Kafka fails 5/10 times, the circuit opens for 30 seconds.
  • All requests skip Kafka and go to fallbackSend.

3. Exactly-Once Delivery Patterns

The Problem: Duplicate Messages

Without idempotence, Kafka may redeliver messages during:

  • Producer retries
  • Consumer rebalances

Solution 1: Idempotent Producers

yaml
spring:
  kafka:
    producer:
      enable-idempotence: true  # Ensures exactly-once per partition

How It Works:

  • Kafka deduplicates messages using a producer ID + sequence number.

Solution 2: Transactional Producers

java
@Transactional
public void processAndSend(Order order) {
    db.save(order);               // DB write
    kafkaTemplate.send("orders",  // Kafka write
        order.getId(), order);
}

Requirements:

  • Set spring.kafka.producer.transaction-id-prefix.
  • Consumers must read committed messages only (isolation.level=read_committed).

Solution 3: Consumer Deduplication

java
@KafkaListener(topics = "orders")
public void listen(Order order) {
    if (db.exists(order.getId())) {  // Skip duplicates
        return;
    }
    processOrder(order);
}

Use Case:

  • When producers can't guarantee idempotence.

Comparison Table

Technique Use Case Pros Cons
Dead-Letter Queues Malformed/unprocessable messages Easy debugging Extra topic to manage
Circuit Breakers Prolonged Kafka downtime Prevents cascading failures Adds complexity
Exactly-Once Kafka Financial/transactional systems No duplicates Higher latency

Best Practices

  1. Combine DLQs + Circuit Breakers:
    • Retry transient errors → Fallback to DLQ → Trip circuit if Kafka is down.
  2. Monitor DLQs:
    • Alert if DLQ volume spikes (e.g., Prometheus + Grafana).
  3. Test Failure Scenarios:
    • Simulate broker crashes during chaos testing.

Comments

Popular posts from this blog

Project Reactor Important Methods Cheat Sheet

🔹 1️⃣ subscribeOn – "Decides WHERE the Pipeline Starts" 📝 Definition: subscribeOn influences the thread where the data source (upstream) (e.g., data generation, API calls) runs . It affects the source and everything downstream (until a publishOn switches it). Flux<Integer> flux = Flux.range(1, 3) .doOnNext(i -> System.out.println("[Generating] " + i + " on " + Thread.currentThread().getName())) .subscribeOn(Schedulers.boundedElastic()) // Change starting thread .map(i -> { System.out.println("[Processing] " + i + " on " + Thread.currentThread().getName()); return i * 10; }); flux.blockLast(); Output: [Generating] 1 on boundedElastic-1 [Processing] 1 on boundedElastic-1 [Generating] 2 on boundedElastic-1 [Processing] 2 on boundedElastic-1 [Generating] 3 on boundedElastic-1 [Processing] 3 on boundedElastic-1 📢 Key Insight: ...

🔄 Kafka Producer Internals: send() Explained with Delivery Semantics and Transactions

Kafka Producer Internal Working Apache Kafka is known for its high-throughput, fault-tolerant message streaming system. At the heart of Kafka's data pipeline is the Producer —responsible for publishing data to Kafka topics. This blog dives deep into the internal workings of the Kafka Producer, especially what happens under the hood when send() is called. We'll also break down different delivery guarantees and transactional semantics with diagrams. 🧠 Table of Contents Kafka Producer Architecture Overview What Happens When send() is Called Delivery Semantics Kafka Transactions & Idempotence Error Handling and Retries Diagram: Kafka Producer Internals Conclusion 🏗️ Kafka Producer Architecture Overview Kafka Producer is composed of the following core components: Serializer : Converts key/value to bytes. Partitioner : Determines which partition a record should go to. Accumulator : Buffers the records in memory be...