Skip to main content

๐Ÿ”„ Kafka Producer Internals: send() Explained with Delivery Semantics and Transactions

Kafka Producer Internal Working

Apache Kafka is known for its high-throughput, fault-tolerant message streaming system. At the heart of Kafka's data pipeline is the Producer—responsible for publishing data to Kafka topics. This blog dives deep into the internal workings of the Kafka Producer, especially what happens under the hood when send() is called. We'll also break down different delivery guarantees and transactional semantics with diagrams.

๐Ÿง  Table of Contents

  1. Kafka Producer Architecture Overview
  2. What Happens When send() is Called
  3. Delivery Semantics
  4. Kafka Transactions & Idempotence
  5. Error Handling and Retries
  6. Diagram: Kafka Producer Internals
  7. Conclusion

๐Ÿ—️ Kafka Producer Architecture Overview

Kafka Producer is composed of the following core components:

  • Serializer: Converts key/value to bytes.
  • Partitioner: Determines which partition a record should go to.
  • Accumulator: Buffers the records in memory before sending.
  • Sender Thread: Batches and sends data to brokers asynchronously.
  • Metadata Manager: Keeps broker and partition metadata up to date.
  • NetworkClient: Handles the actual communication over TCP.

๐Ÿ“ค What Happens When send() is Called?

producer.send(new ProducerRecord<>("topic", key, value));

Internal Flow (Simplified):

  1. Serialization – Key and value are serialized using configured serializers.
  2. Partition Selection – The partitioner chooses a partition.
  3. Buffering in Accumulator – The record is added to a batch buffer.
  4. Sender Thread Wakeup – A background thread sends the batch.
  5. Broker Acknowledgment – Broker persists and acknowledges.
  6. Callback Execution – Optional callback is executed after response.

๐Ÿšฆ Delivery Semantics

Kafka provides 3 main delivery guarantees:

✅ At Most Once

Data is sent once, no retries. Can be lost if errors occur.

Config:

acks=0
retries=0

Use Case: Logs, analytics
Pros: Low latency
Cons: No durability guarantee

๐Ÿ” At Least Once

Retries if acknowledgment not received → duplicates possible.

Config:

acks=all
retries=3

Use Case: Payment events
Pros: Reliable delivery
Cons: Possible duplicates

๐Ÿงพ Exactly Once (EOS)

Guarantees no duplicates or loss, even during retries.

Config:

enable.idempotence=true
acks=all
retries=3
transactional.id=your-producer-id

Use Case: Financial systems
Pros: Precise semantics
Cons: Slightly higher latency

๐Ÿ” Kafka Transactions & Idempotence

✅ Idempotent Producer

  • Each producer gets a unique Producer ID (PID)
  • Each message has a sequence number
  • Broker deduplicates based on PID + sequence

๐Ÿ”„ Transactions

producer.initTransactions();
producer.beginTransaction();
producer.send(record1);
producer.send(record2);
producer.commitTransaction(); // or abortTransaction()

Transactional state is maintained via Kafka’s internal transaction log. Consumers will only see committed transactions if configured with isolation.level=read_committed.

๐Ÿ›ก️ Error Handling & Retries

Failure Scenario Kafka Producer Response
Network timeout Retries if enabled
Leader not available Refresh metadata and retry
Serialization failure Fails immediately
Batch too large Logs and throws exception
Idempotent retry mismatch Producer is fenced

๐Ÿงญ Diagram: Kafka Producer Internal Flow


+-------------------------+
|     Kafka Producer      |
+-------------------------+
          |
          v
+-------------------------+        +-------------+
|     Serializer          | -----> | Serialized  |
|  (Key + Value)          |        | Record      |
+-------------------------+        +-------------+
          |
          v
+-------------------------+
|     Partitioner         | -----> Select Partition
+-------------------------+
          |
          v
+-------------------------+
|     Record Accumulator  | -----> Batches by Partition
+-------------------------+
          |
          v
+-------------------------+
|     Sender Thread       | -----> Sends Batch
+-------------------------+
          |
          v
+-------------------------+
|     NetworkClient       | -----> Kafka Broker
+-------------------------+
          |
          v
+-------------------------+
|   Broker Acknowledges   |
+-------------------------+
          |
          v
+-------------------------+
|    Callback Executed    |
+-------------------------+

✅ Conclusion

Understanding Kafka Producer internals is essential for building reliable, performant data pipelines. Whether you’re dealing with logs, transactions, or real-time processing, choose the appropriate delivery semantics and configure retries/acks wisely.

Key Takeaways:

  • send() is asynchronous
  • Data flows through serialization → partitioning → buffering → batching
  • Kafka supports at-most-once, at-least-once, and exactly-once guarantees
  • Use idempotence and transactions for full exactly-once support

✍️ Feel free to share this blog or connect with me for more deep-dives into Kafka, streaming, and distributed systems!

Comments

Popular posts from this blog

Mastering Java Logging: A Guide to Debug, Info, Warn, and Error Levels

Comprehensive Guide to Java Logging Levels: Trace, Debug, Info, Warn, Error, and Fatal Comprehensive Guide to Java Logging Levels: Trace, Debug, Info, Warn, Error, and Fatal Logging is an essential aspect of application development and maintenance. It helps developers track application behavior and troubleshoot issues effectively. Java provides various logging levels to categorize messages based on their severity and purpose. This article covers all major logging levels: Trace , Debug , Info , Warn , Error , and Fatal , along with how these levels impact log printing. 1. Trace The Trace level is the most detailed logging level. It is typically used for granular debugging, such as tracking every method call or step in a complex computation. Use this level sparingly, as it can generate a large volume of log data. 2. Debug The Debug level provides detailed information useful during dev...

Choosing Between Envoy and NGINX Ingress Controllers for Kubernetes

As Kubernetes has become the standard for deploying containerized applications, ingress controllers play a critical role in managing how external traffic is routed to services within the cluster. Envoy and NGINX are two of the most popular options for ingress controllers, and each has its strengths, weaknesses, and ideal use cases. In this blog, we’ll explore: How both ingress controllers work. A detailed comparison of their features. When to use Envoy vs. NGINX for ingress management. What is an Ingress Controller? An ingress controller is a specialized load balancer that: Manages incoming HTTP/HTTPS traffic. Routes traffic to appropriate services based on rules defined in Kubernetes ingress resources. Provides features like TLS termination, path-based routing, and host-based routing. How Envoy Ingress Controller Works Envoy , initially built by Lyft, is a high-performance, modern service proxy and ingress solution. Here's how it operates in Kubernetes: Ingress Resource : You d...