Skip to main content

Posts

Showing posts from May, 2025

Advanced Kafka Resilience: Dead-Letter Queues, Circuit Breakers, and Exactly-Once Delivery

Introduction In distributed systems, failures are inevitable—network partitions, broker crashes, or consumer lag can disrupt data flow. While retries help recover from transient issues, you need stronger guarantees for mission-critical systems. This guide covers three advanced Kafka resilience patterns: Dead-Letter Queues (DLQs) – Handle poison pills and unprocessable messages. Circuit Breakers – Prevent cascading failures when Kafka is unhealthy. Exactly-Once Delivery – Avoid duplicates in financial/transactional systems. Let's dive in! 1. Dead-Letter Queues (DLQs) in Kafka What is a DLQ? A dedicated Kafka topic where "failed" messages are sent after max retries (e.g., malformed payloads, unrecoverable errors). Why Use DLQs? Isolate bad messages instead of blocking retries. Audit failures for debugging. Reproce...

Handling Kafka Retries in Spring Boot: Blocking vs. Reactive Approaches

  Introduction Apache Kafka is designed for high availability, but failures still happen—network issues, broker crashes, or cluster downtime. To ensure message delivery, applications must implement retry mechanisms. However, retries behave differently in traditional (blocking) vs. reactive (non-blocking) Kafka producers. This guide covers: ✅ Kafka’s built-in retries ( retries ,  retry.backoff.ms ) ✅ Blocking vs. non-blocking retry strategies ✅ Reactive Kafka retries with backoff ✅ Fallback strategies for guaranteed delivery ✅ Real-world failure scenarios and fixes 1. Kafka Producer Retry Basics When Do Retries Happen? Kafka producers automatically retry on: Network errors (e.g., broker disconnect) Leader election (e.g., broker restart) Temporary errors (e.g.,  NOT_ENOUGH_REPLICAS ) Key Configuration Properties Property Default Description retries 0 Number of retries for transient failures. retry.backoff...