Skip to content

Glossary

This glossary defines every term used across the Kafka Streams documentation. Use it when you encounter unfamiliar terminology.

Terms are organized alphabetically. Each definition includes the module where the concept lives and links to deeper explanations. Other pages in this documentation link directly to specific glossary entries. For example, the Visibility page links to event time when first introducing the concept.


Atomicity, Consistency, Isolation, Durability - the four classical guarantees of a SQL database transaction. Streaming systems give you a different bundle of guarantees, and the Visibility versus ACID databases page works through the mismatch.

A Riffle concept: a set of watermark sources whose watermarks should not diverge by more than a configured bound. A fast source whose watermark out-paces the group’s slowest member by more than agBound is backpressured: the runtime pauses fetching from it until the slowest member catches up. Lives in Kafka.Streams.Watermark.AlignmentGroup.

The consumer-group identity and the prefix every internal topic (changelog, repartition) inherits. Changing it is a fresh-start deploy, not a rollout. See Topology evolution.

The component that decides which member of a consumer group owns which partition (and, in Streams, which task). The built-in default is cooperative-sticky: it prefers to keep tasks on their current owner across rebalances, and revocations are incremental rather than stop-the-world. See also KIP-848.

A Riffle Prim that runs the user’s IO action on a bounded worker pool instead of on the stream thread. Provides backpressure, EOS-correct offsets, ordered or unordered output, per-request timeout / retry, and explicit failure policy. Full walkthrough in Enrichment via external systems.

The default processing guarantee. Every record is processed at least once: and therefore possibly more than once on a rebalance or fault. External side effects should be idempotent.


The mechanism by which a slow downstream stage signals an upstream stage to slow down. In Kafka Streams the consumer poll loop is naturally backpressured by the processor it feeds: a full inbox blocks new fetches. In Riffle async I/O, a full in-flight queue blocks the stream thread on enqueue.

A node in a Kafka cluster: the server-side process that stores topic logs and serves produce / fetch requests. Multiple brokers form a cluster; one is the controller (under KRaft mode) that coordinates metadata.


A pattern (and protocol family) where each row-level change in an upstream database (insert, update, delete with before / after images) is captured and published downstream. Debezium and AWS DMS are the canonical implementations. Riffle’s Kafka.Streams.Sources.CDC ships a primitive for materialising CDC feeds into KTables with snapshot/streaming-phase awareness and key-aware compaction.

The internal Kafka topic the framework writes a state store’s updates to. On instance loss, the state is rebuilt by replaying this topic from offset 0 (or, with snapshot stores, from the last snapshot’s offset). Convention: <applicationId>-<storeName>-changelog.

The requirement that two topics being joined share the same partition count and the same key partitioner, so each (key, A) record and its matching (key, B) record land on the same task. Streams validates co-partitioning at startup; mismatches throw during topology validation.

A Kafka Streams DSL operator that lets you aggregate multiple input streams into one output table with a single aggregator function per stream and a single shared store. The Haskell port is cogroup / addCogrouped / aggregateCogrouped in Kafka.Streams.Cogroup.

The orchestrated sequence the runtime runs every commitIntervalMs (default 30 s) to make a batch of work durable. Under EOS-V2 with Riffle 2PC sinks the six steps are: beginTxn → flush → commitOffsets → preCommit2PC → commitTxn → commit2PC → storeCommit. See Kafka.Streams.Runtime.EOS.runCommitCycle and the Exactly-once page.

A set of Kafka consumers sharing the same group.id (in Streams, the applicationId) that the broker coordinator collectively assigns partitions to. Each partition is owned by at most one member at a time.

See Assignor.

CQRS (Command Query Responsibility Segregation)

Section titled “CQRS (Command Query Responsibility Segregation)”

An architectural pattern in which writes go to one model (commands) and reads come from another (queries), connected by an asynchronous projection. Kafka Streams + downstream query store is a natural CQRS implementation; the trade-off discussion lives in Visibility versus ACID databases.


A topic (or other sink) that receives records the runtime couldn’t process. Riffle’s bounded suppress supports a dead-letter overflow policy; processing.exception.handler (KIP-1033) supports a DLQ disposition for records that throw.

The Riffle StreamsConfig knob that picks which Kafka.Streams.Runtime.WorkerPool constructor the runtime uses. Three values: DispatchPartition (parity default: explicit per-worker partition ownership), DispatchHashed (parity hashing by (topic, partition)), DispatchKeyGroup (Riffle key-group routing). See Scaling.

Domain-specific language. In Kafka Streams, the typed combinator API (stream, mapValues, filter, groupBy, count, …) as opposed to the lower-level Processor API. The library exposes two DSL surfaces: the Free-Arrow Free DSL (Kafka.Streams.Topology.Free) and the imperative builder DSL (Kafka.Streams.KStream, etc.).


Exactly-once semantics. EOS-V2 (KIP-447, KIP-129) is the transactional-producer-plus-TxnOffsetCommit story that ensures records written by Streams and consumer offsets advance atomically. EOS-V3 / KIP-892 adds transactional state stores so a store’s writes commit in lockstep with the producer transaction. The processingGuarantee config selects between AtLeastOnceP and ExactlyOnceP.

Riffle generalisation of the JVM Streams EmitStrategy (KIP-825). A first-class EmitPolicy value any windowed / stateful operator can consume: EmitOnUpdate, EmitOnWindowClose, EmitOnCount n, or EmitCustom with a user predicate. Lives in Kafka.Streams.EmitPolicy.

The timestamp associated with a record by its producer: when the underlying business event happened. Contrast with processing time: the timestamp the runtime saw the record. The pair drives all of windowing, watermarks, and grace periods.

Riffle KV-store wrapper: state expires based on the coordinated watermark, not on wall-clock. Lives in Kafka.Streams.State.KeyValue.TTL. Pair with ttlClockFromCoordinator for the coordinator-driven clock.


A producer the broker has rejected because a newer producer with the same transactional.id was observed. Surfaces as ProducerFencedException / InvalidProducerEpochException. Almost always means two instances are running for the same (applicationId, taskId). See the zombie-producer runbook.

A KTable-KTable join keyed not on the record key but on a caller-supplied extractor of the left value. Streams handles the subscription-token bookkeeping. Haskell port: foreignKeyJoinKTable / leftForeignKeyJoinKTable in Kafka.Streams.ForeignKeyJoin.

A free construction over the Arrow typeclass: the typed DSL is represented as a value, then interpreted / optimised / compiled at the boundary. Kafka.Streams.Topology.Free builds a FreeArrow Prim AST that the compiler walks. The benefit: topologies are first-class values you can inspect, snapshot, and optimise before they run.


A Haskell ADT whose constructors can refine the type parameters of the resulting value. Used by Prim in Kafka.Streams.Topology.Free to encode the input / output types of each operator at the type level.

A KTable replicated in full on every instance, not partitioned across the group. Used for small reference data (currency rates, country lookups) where you want zero-network-cost lookups. Loaded from a Kafka topic via globalTable.

The amount of time a window stays open accepting late records past its end. A 1-hour tumbling window with a 10-minute grace closes 70 minutes after the window’s start in event time. Configured via withGracePeriod / withSessionGracePeriod.


The periodic message a consumer group member sends the coordinator to confirm liveness. Under KIP-848, heartbeats also carry subscription metadata and receive reconciliation deltas.

A fixed-size window that advances by a step smaller than its size: adjacent windows overlap. A 5-minute hopping window with a 1-minute advance produces a new window every minute, each containing the last 5 minutes’ worth of records.


An operation that has the same effect when invoked multiple times as when invoked once. PUT /resource/42 with the same body is idempotent; POST /create-order typically isn’t. Critical for external side effects under at-least-once or for the recovery path of a two-phase commit sink.

A stable per-record identifier (usually the upstream (topic, partition, offset) tuple) written to a state store to deduplicate side effects on replay. The pattern: on each record, check the store; if absent, fire the side effect, then write the token. See Enrichment.

A watermark source that hasn’t produced records for longer than its IdleAfter threshold. The coordinator excludes idle sources from the min-watermark computation so a quiet partition doesn’t stall downstream windows.

Read-only access to a live state store from outside the stream thread. Lives in Kafka.Streams.InteractiveQueries. The query layer is your code; the library exposes the typed handles + cross-instance discovery metadata (KIP-535).

A Kafka topic the framework auto-creates and owns: changelog topics for state stores, repartition topics for keyed shuffles. Their names derive from the applicationId and the stable name of the owning operator.


An operator that combines records from two streams (or a stream and a table). Five shapes: stream-stream (windowed), stream-table, table-table, stream-GlobalKTable, and foreign-key.


The runtime handle. newKafkaStreams constructs it from a topology + config; startKafkaStreams runs it. closeKafkaStreams drains and shuts down. Most live operations (setStateListener, addStreamThread, pauseKafkaStreams, etc.) take a KafkaStreams value.

A Riffle abstraction: a fixed routing space (default 128) into which record keys hash, decoupling parallelism from partition count. Lives in Kafka.Streams.Runtime.KeyGroup. See Scaling.

Intermediate types produced by groupBy / groupByKey on a stream / table. They’re the input to aggregation operators (count, reduce, aggregate).

Kafka Improvement Proposal: the Apache Kafka design- review process. Specific KIPs that come up by name across these docs:

KIPWhat
KIP-295topology.optimization config knob (reuse-source-topics, merge-repartitions, single-store-self-join)
KIP-307Stable, deterministic processor names
KIP-418Named branches (Branched.*)
KIP-441Probing rebalance for warmup-ready standbys
KIP-447EOS-V2 producer-per-instance with TxnOffsetCommit
KIP-535Cross-instance IQ discovery (StreamsMetadata, KeyQueryMetadata)
KIP-591default.dsl.store config
KIP-825First-class EmitStrategy
KIP-848Next-gen consumer-group protocol: broker-side incremental reconciliation
KIP-892EOS-V3 transactional state stores
KIP-924In-process TaskAssignor plug-in
KIP-925Rack-aware assignment strategy
KIP-1033First-class processing-exception handler with DLQ disposition

A Kafka Streams record stream: an append-only sequence of (key, value, timestamp, headers) tuples. Compare with KTable.

A Kafka Streams changelog stream interpreted as a key-value table: later records overwrite earlier ones for the same key. Materialised into a state store (in-memory or RocksDB). GlobalKTable is the fully-replicated variant.

Kafka’s Raft-based metadata protocol that replaced ZooKeeper for cluster coordination. The library’s integration tests spin up a KRaft-mode broker; the streams runtime is KRaft-agnostic at the client layer.


A consistency model in which every operation appears to take effect atomically at some point between its invocation and its response. The Kafka Streams in-memory stores’ single-key reads are linearisable; iterators are eager snapshots, not linearisable across mutations.

The relation L = λW: the average number of items in a system equals the arrival rate times the average time in the system. Used in Enrichment to size async-I/O worker pools: workers ≈ throughput × latency.

A Kafka topic-level retention policy that keeps the latest value per key indefinitely (rather than truncating by time or size). Changelog topics use compaction so state recovery from offset 0 remains bounded by unique-key count, not by total record count.


The DSL knob that controls which state store backend an aggregation, join, or table-source uses. Picks the store builder and the serdes. Mirrors the JVM Materialized builder.

KIP-848 versioning. Member epoch bumps every time a member’s owned assignment changes; rebalance epoch bumps every time the group-wide target changes. Stale-epoch heartbeats must reconcile before continuing.


Riffle’s abstraction over S3 / GCS / Azure Blob / a filesystem. Snapshot stores write their snapshot blobs through an ObjectStoreClient and read them back on recovery. Lives in Kafka.Streams.Runtime.ObjectStore.

The position of a record within a partition. The consumer commits offsets to the broker so a restart resumes from the right place. Under EOS, offsets advance only when the commit cycle commits.

A changelog or repartition topic on the broker that doesn’t correspond to any store / operator in the current topology. Usually a sign that a previous deploy renamed something. Detected by Kafka.Streams.Observability.OrphanTopics.detectOrphans. See the runbook.


The number of concurrent units of work the runtime can run. In parity Streams: bounded by numStreamThreads × instances, capped by the source-topic partition count. Under Riffle key-groups: bounded by kgcTotal (default 128), independent of partition count.

A subset of a Kafka topic’s data, distributed across brokers and consumed in parallel. Records with the same key always land on the same partition (under the default partitioner). The partition is the unit of consumer-group assignment.

A newtype in Kafka.Streams.Pipeline wrapping a -> IO b with Category, Arrow, ArrowChoice, Functor, and Applicative instances. Used to build reusable, named topology fragments that compose with (>>>). The ArrowChoice instance gives you the ROP (+++) / (|||) combinators over Either.

Riffle hook (ProcessorContext.ctxRegisterPreCommitDrain) that lets an async operator block the commit cycle until its in-flight queue is empty. Used by asyncMapValues etc. to make offset commits EOS-correct.

The GADT constructor type in Kafka.Streams.Topology.Free that represents one operator in the typed AST. Every DSL combinator (mapValues, filter, groupBy, …) corresponds to a Prim.

The cadence at which the runtime re-issues a rebalance to promote standby tasks that are within acceptableRecoveryLag. Lives in Kafka.Streams.Runtime.ProbingRebalance. Default cadence: 10 minutes (probingRebalanceIntervalMs).

AtLeastOnceP vs ExactlyOnceP in StreamsConfig.processingGuarantee. Determines whether the producer is transactional, whether offsets commit through TxnOffsetCommit, and whether state stores use the transactional buffer. See EOS.

The wall-clock time the runtime saw a record. Contrast with event time.

The lower-level Kafka Streams API: write a Processor / FixedKeyProcessor directly with process + init callbacks, access state stores by name, schedule punctuators. The DSL ultimately compiles to processor-API calls.

A scheduled callback inside a processor. Two clocks: WallClockTimePunctuation (every N ms of real time) and StreamTimePunctuation (every N ms of event time advance). Used for time-driven emits, idle-window detection, cache eviction.


The process by which a consumer group reassigns partition / task ownership across its members in response to a membership change. Under the classic protocol it was stop-the-world; under KIP-848 it’s incremental per-task with no double-ownership at any point.

KIP-848 term for the diff between a member’s currently-owned assignment and its target assignment. Reconciliation carries rAdd and rRemove sets; a losing member acknowledges its rRemove before the gaining member sees the task in rAdd. Defined in Kafka.Streams.Runtime.RebalanceProtocol.

A Riffle KV-store backend with no local state: every get / put is a network call against a remote store (FoundationDB / TiKV / DynamoDB shape). Node restart is a metadata operation. Lives in Kafka.Streams.State.KeyValue.Remote.

The operation of re-keying a stream and re-publishing it to an internal topic so downstream stateful operators see records co-partitioned. Performed by repartition / through / implicit auto-insert (optAutoInsertRepartition).

The internal topic a repartition operator writes to. Convention: <applicationId>-<nodeName>-repartition.

Re-processing records the runtime already saw, after a fault or rebalance rewinds the consumer to a prior committed offset. Under at-least-once this is the normal recovery path; under EOS it still happens but the duplicate output is aborted.

A pattern, popularised by Scott Wlaschin’s F# write-up, for modelling pipelines of fallible operations as two parallel tracks: a success track and a failure track. Each stage either advances the value on the success track or routes it to the failure track; failure short-circuits all downstream success stages cleanly. Kafka Streams’ DeserializationHandler / ProcessingExceptionHandler / ProductionHandler / AsyncFailurePolicy / SinkOutcome are all track-switch surfaces; the Pipeline newtype’s ArrowChoice instance is the explicit ROP combinator set. Full mapping in Railway-oriented programming with streams.

The orchestrator function in Kafka.Streams.Runtime.EOS that drives one commit cycle. Takes an EOSCoordinator

  • a flush body + a getter for offsets-to-commit; returns CommitSucceeded / CommitAborted / CommitFatal.

Confluent’s per-subject schema-versioning service. The library’s serdes (Kafka.Streams.Serde.Avro, Kafka.Streams.Serde.JsonSchema, Kafka.Streams.Serde.Protobuf) speak the Confluent envelope and the registrySerdeChecked wrapper enforces compatibility-mode checks at construction time.

Riffle KV-store wrapper that tags every write with a SchemaVersion and migrates reads forward through a chain of SchemaMigration callbacks. burnInMigrate rewrites older entries in-place with resumable progress. Lives in Kafka.Streams.State.KeyValue.SchemaVersioned.

The Riffle identifier for one in-flight transaction on a two-phase commit sink. Made stable across restarts (typically applicationId-instanceId-cycleCounter) so recovery can correlate prepared txns with their producer counterparts.

A Riffle KV-store backend that incrementally writes snapshots of its state to an object store. Recovery time becomes O(time-since-last-snapshot) rather than O(state-size). Lives in Kafka.Streams.State.KeyValue.Snapshot.

A deterministic, build-stable name for an operator node. Mirrors the JVM’s KIP-307 generator: KSTREAM-MAPVALUES-0000000007. The name is part of the deployment contract because internal topics inherit from it. Lives in Kafka.Streams.Topology.StableNames.

A warm replica of an active task’s state, maintained by tailing the active’s changelog. Promotion on instance loss becomes metadata-only if the standby is within acceptableRecoveryLag. Riffle’s SnapshotPointer mode lets a standby hold only (snapshotId, advancedTo) rather than a full replica.

The persistent (or in-memory) key-value, window, or session store attached to a stateful operator. Backends: in-memory + RocksDB (via +rocksdb); Riffle adds snapshot, tiered, remote, and versioned variants. Read-only IQ access via Kafka.Streams.InteractiveQueries.

The top-level runtime configuration record. Mirrors the Java StreamsConfig properties. Defined in Kafka.Streams.Config.

KIP-535 record: what each instance in the group looks like to its peers (host:port, owned partitions per store, standby partitions per store). Used by an external IQ proxy to route key-level queries to the right host.

The per-task event-time clock: running max of extracted timestamps on records this task has seen. Replaced by the coordinated watermark where Riffle opts in.

The OS thread that drives one consumer + N workers in this runtime. (Different from the JVM Streams “one stream thread per consumer” model: see the README for the comparison.) The thread that runs user-supplied processor code; nothing in user code should block it for long.

Bytes a consumer group member attaches to its JoinGroup request. Streams uses it to advertise application.server, owned standby tasks, and the assignor’s per-member state.

The DSL operator that holds emissions back until a condition is met: typically “until window closes” or “until time limit expires”. Riffle adds a bounded variant with explicit BufferOverflowPolicy (DropOldestSilently, ShutdownWhenFull, suppressWindowedShed to DLQ).


The unit of parallelism. One task owns one partition of one subtopology + its local state stores. TaskId identifies a (subtopology, partition) pair.

KIP-924 in-process plug-in for the leader-side assignment path. Replaces the built-in cooperative-sticky assignor. The runtime constructs an ApplicationState from the live view and invokes taAssign.

A Riffle KV-store backend with a hot tier (local in-memory or RocksDB) + a cold tier (S3 / GCS). Reads probe hot, fall through to cold, and promote. Eviction decides which entries demote when the hot tier exceeds its budget. Lives in Kafka.Streams.State.KeyValue.Tiered.

The compiled, validated graph the runtime executes. Topology i o is a type with two parameters, like a function i -> o.

  • Input type i: What stream type enters the topology. Void means the topology pulls from sources (Kafka topics), not from other code.
  • Output type o: What stream type exits the topology. () means the topology pushes to sinks (Kafka topics), not to other code.

Built from the DSL value of type Topology Void () (or other combinations), or directly via the imperative Kafka.Streams.Topology builder. Validated by validateTopology before the runtime starts.

KIP-303 sink-side dynamic-topic-name extractor. Lets one sink route records to different topics based on the record. The library exposes it via toExtracted + TopicNameExtractor. Useful when you want “dynamic topology” without actually mutating the compiled topology.

A Kafka producer with a transactional.id that supports beginTransaction / commitTransaction / abortTransaction / sendOffsetsToTransaction. The foundation of EOS-V2.

A fixed-size, non-overlapping window. A 5-minute tumbling window produces a new window every 5 minutes; each record belongs to exactly one window.

A Riffle sink interface that commits external-system writes atomically with the Kafka commit cycle. Five operations: tpsStage (per-record buffer), tpsPrepare (promote batch to “prepared”), tpsCommit (atomically make visible), tpsAbort (discard), tpsRecover (resolve half- committed txns on restart). Lives in Kafka.Streams.Sinks.TwoPhase.


A durable log of intended changes written before the changes are applied to the underlying store. Under Riffle snapshot stores the changelog topic is the WAL between snapshots: not the sole source of truth.

A timestamp that the runtime guarantees no later record will arrive before. Lets downstream operators decide when a window can close, when state can be expired, when a stream- stream join’s window is finalised. Derived from a WatermarkGenerator (MonotonicTimestamps, BoundedOutOfOrderness, CustomGenerator).

The Riffle per-StreamsApp component that aggregates per-source watermarks into the effective watermark (= min of live, non-idle sources), handles idle-source detection, and enforces alignment-group backpressure. Lives in Kafka.Streams.Watermark.

A bounded slice of event time over which a stateful operator aggregates. Four shapes: tumbling, hopping, sliding, and session. Each has its own builder in Kafka.Streams.Window.


If a term appears in the docs and isn’t defined here, add it. Prefer one-paragraph definitions; link out to deeper coverage rather than restating it.