Visibility versus ACID databases
A Postgres SELECT after INSERT ... COMMIT always returns the inserted row. A streams app’s view of its state is more nuanced. These differences are not bugs. They follow from using an append-only log as the source of truth and rebuilding derived state asynchronously.
This page maps ACID properties onto wireform-kafka-streams so you know which guarantees you get for free and which require explicit work.
A vs. ACID
Section titled “A vs. ACID”| ACID property | Kafka Streams analogue |
|---|---|
| Atomicity | Per-record processing is atomic on a single task; per-commit-cycle atomicity is EOS |
| Consistency | Eventual: derived state catches up to the log |
| Isolation | Read-uncommitted by default; read-committed with isolation.level = read_committed; transactional stores buffer writes until commit |
| Durability | Kafka log is durable; derived state is durable after changelog flush |
Five operationally-interesting differences: commit boundary, read path freshness, multi-instance distribution, event-time vs processing-time, and replay-on-failure.
1. The commit boundary
Section titled “1. The commit boundary”In SQL, the commit boundary is COMMIT. Reads after see the write.
In Kafka Streams, the commit boundary is commitIntervalMs (or sooner if cache fills). Between commits:
- Producer accumulates records in its transactional batch
- State store buffers writes in memory and (under EOS) in its transactional buffer
- Consumer position moves forward but offset commit hasn’t fired
A downstream consumer with isolation.level = read_committed sees nothing until commit. A consumer with read_uncommitted sees records as written, including potential aborts: READ UNCOMMITTED semantics.
Within a stream thread, a processor that writes then reads sees its own write immediately (read-your-writes-within-task). This does not extend to cross-task reads.
Why this matters operationally
Section titled “Why this matters operationally”- 30 s
commitIntervalMs(default) means downstreamread_committedconsumers see up to 30 s staleness.
sequenceDiagram participant Up as Upstream producer participant Eng as Streams engine participant Store as State store participant IQ as Interactive query participant Down as read_committed consumer Up->>Eng: record at t=0 Eng->>Store: put (buffered in txn store) IQ->>Store: get Store-->>IQ: pre-commit value Note over Eng: ...more records... Eng->>Eng: commit cycle at t=30s Eng->>Down: records become visible Eng->>Store: storeCommit drains txn buffer IQ->>Store: get Store-->>IQ: committed value
If downstream SLAs are tight, shorten the interval or accept the staleness budget.
- Interactive queries read from the uncommitted store view by default. See “Read path freshness” below.
2. Read path freshness
Section titled “2. Read path freshness”Interactive queries hit IORef-backed maps for in-memory stores and the persistent shadow map for RocksDB. Writes use atomicModifyIORef'; reads use readIORef.
Semantics:
- Linearisable reads at
readIOReftime. A query thread sees a consistent snapshot. - Eager iterators (
queryRange/queryAll): materialized at creation time; concurrent writes are not visible.
What you don’t get:
- Read-after-write across threads. A processor’s
putis immediate, but query thread visibility depends on IORef ordering. In practice this is microseconds, but not a contract. - Read-committed semantics under EOS. Transactional stores buffer writes until
storeCommit. IQ through the underlying store sees pre-commit values; through the overlay sees buffered values. DefaultqueryKVStoreexposes the underlying store (committed-only).
For strict committed-only semantics, query through the transactional overlay’s getCommitted after confirming commitTotal advanced.
For fresh reads including in-flight writes, accept the EOS abort risk: the value might roll back on next cycle abort. Most consumers tolerate this.
3. Multi-instance distribution
Section titled “3. Multi-instance distribution”A SQL database is a logical singleton. You connect to it, and you read whatever the latest committed value is, regardless of which backend was the writer.
A Kafka Streams state store is partitioned and replicated. A key lives on exactly one instance’s active task; standbys hold a warm replica. A query for that key must route to the active instance.
The routing primitives:
Kafka.Streams.Discovery.StreamsMetadataadvertises per-instance ownership.Kafka.Streams.Discovery.KeyQueryMetadataanswers “for this (store, key), which instance is the active and which are the standbys”.Kafka.Streams.Discovery.HostInfois thehost:portan instance advertises viaapplication.server.
A query against the wrong instance returns StoreNotFound. There
is no automatic redirect within the library: the query proxy is
your code.
Standby reads
Section titled “Standby reads”A standby’s state lags the active by up to acceptableRecoveryLag
records. Reading from a standby gives you:
- Lower contention with the stream thread on the active.
- Stale data bounded by lag.
Whether that trade-off is worth it depends on the use case. Read queries that are part of a user-visible request flow usually go to the active; bulk analytical scans usually go to standbys.
The library doesn’t enforce a routing policy: you decide.
KeyQueryMetadata.standbyHosts returns every standby for a key;
pick by your own policy (round-robin, latency-aware, freshness-
preferring).
Rebalance-window unavailability
Section titled “Rebalance-window unavailability”During a rebalance, the partition (and therefore its state) is in
flux. KIP-848’s incremental reconciliation guarantees no
double-ownership but does mean there’s a window where the task is
owned by no member. IQ during that window returns either
StoreNotFound or a stale read from a standby.
A query layer should handle this:
on IQ: fetch StreamsMetadata route to KeyQueryMetadata.activeHost on StoreNotFound or rebalance-error: fall back to KeyQueryMetadata.standbyHosts in order if all fail: backoff + refresh metadata retry up to N times4. Event-time vs processing-time
Section titled “4. Event-time vs processing-time”A SQL INSERT has one timestamp: when the row was inserted. A
Kafka record has two: the event time (when the underlying
business event happened, carried in the record’s timestamp or
embedded in its value) and the processing time (when the
runtime saw the record).
This is the single biggest mental shift coming from databases. A
windowed aggregation is defined over event time. A late record
(event time before the current StreamTime) lands in a window
that may already have closed, depending on withGracePeriod. The
result is correct in event-time terms but surprising to a
caller who expected processing-time semantics.
The pieces:
| Concept | Where it lives |
|---|---|
| Record timestamp extraction | TimestampExtractor family in Kafka.Streams.Time |
| Per-task event-time clock | engineStreamTime: running max of extracted timestamps on this task |
| Cross-source coordination | Kafka.Streams.Watermark.WatermarkCoordinator |
| Window grace period | withGracePeriod / withSessionGracePeriod on window builders |
| Out-of-order handling | BoundedOutOfOrderness watermark generator |
| Idle-source handling | IdleAfter in IdlenessConfig |
Why per-task StreamTime is not enough
Section titled “Why per-task StreamTime is not enough”The Phase-1 engineStreamTime is the running max of extracted
timestamps on this task. A topology that joins two streams
whose source partitions are consumed by different tasks must wait
for the laggard, but per-task StreamTime doesn’t know about the
laggard. Three failure modes:
- Idle source stall. One partition stops receiving records;
its
StreamTimestops advancing; a downstream window stalls waiting for it. - Cross-source skew. Source A is at event time
t; source B is att + 1h. A join’s window expiry uses one side’sStreamTimeand misses or admits records depending on which side fires first. - Replay race. During a rebalance the new owner’s
StreamTimestarts at zero until the first record arrives.
The Riffle WatermarkCoordinator fixes all three by computing the
effective watermark = min of live, non-idle sources, and by
optionally backpressuring fast sources via alignment groups. See
Kafka.Streams.Watermark for the API.
Operational reading: don’t conflate the two clocks
Section titled “Operational reading: don’t conflate the two clocks”- A dashboard showing “records per second” is processing-time.
- A windowed aggregate emitting
windowEndis event-time. - The two diverge during catch-up after an outage: processing rate is high (catching up), event time is also moving fast (replaying old records), but neither is wall-clock.
punctuateTotal on the wall-clock punctuator and on the
stream-time punctuator are different metrics. The wall-clock one
tells you “scheduled effects fired”; the stream-time one tells you
“event time advanced past this trigger”. A wall-clock punctuator
that fires at the expected rate while the stream-time one is
stalled is a sign that an idle source is holding back event time.
5. Replay-on-failure
Section titled “5. Replay-on-failure”A SQL transaction either commits or doesn’t. There is no scenario in which a row is committed twice.
Kafka Streams may replay records on failure:
- At-least-once: any unrecoverable error rewinds the consumer to
the last committed offset and replays from there. Side effects
in
peek/foreach/mapValuesMhappen again. - Exactly-once-V2: the producer transaction either commits or aborts atomically. State-store writes drain in step with the producer commit. External side effects that aren’t inside a two-phase-commit sink replay on abort.
The alternative is dropping records on partial failure, but replay has consequences for code that thinks in database terms:
- No “commit-then-act” pattern. You cannot “commit, then call the external API knowing the commit is durable” because the commit boundary is implicit and the runtime can rewind.
- Idempotency is non-negotiable for external side effects. Either the side effect is naturally idempotent (a PUT to a versioned URL, an UPSERT keyed on a stable id) or you make it idempotent (token in a state store, 2PC sink).
PROCESSING_GUARANTEE = AT_LEAST_ONCEis the default, and it does what it says: every record is processed at least once, which means possibly more than once. Plan for it.
A worked comparison
Section titled “A worked comparison”The user-facing question “did this transfer go through?” mapped to both worlds:
Postgres
Section titled “Postgres”BEGIN;INSERT INTO transfers (id, amount, status) VALUES (?, ?, 'pending');UPDATE accounts SET balance = balance - ? WHERE id = ?;UPDATE accounts SET balance = balance + ? WHERE id = ?;UPDATE transfers SET status = 'complete' WHERE id = ?;COMMIT;
-- Then:SELECT status FROM transfers WHERE id = ?;-- → 'complete'. Guaranteed.Kafka Streams (EOS)
Section titled “Kafka Streams (EOS)”[transfers topic] → process → updates [accounts changelog] + [transfers-out topic] ↓ [external read service] ← IQ ← [transfers-out KTable]A read on [transfers-out KTable] for the transfer id may return:
- The current value, if the commit cycle has fired since the record was processed.
- The previous value (or “not found”), if processing happened but the commit cycle hasn’t fired yet.
- The previous value, if a rebalance is in flight and the active is mid-transfer.
- Inconsistent values across instances during a brief rebalance window.
For “did the transfer go through?” semantics you typically:
- Process the transfer atomically in Streams.
- Sink the final state to a downstream system the user reads from: Postgres, a key-value store, or a separate IQ-fronted service.
- The downstream system is the source of truth for the user’s read; it just lags Kafka by a bounded amount.
This is the same architectural pattern as CQRS: and the same trade-off. You give up read-your-writes in exchange for unlimited horizontal write throughput, replayability, and the ability to materialise the same source data into many downstream views.
Mental-model tips
Section titled “Mental-model tips”- Treat the Kafka log as the source of truth. Everything else is a materialised view.
- Treat the commit cycle as the consistency boundary. Within a cycle, things are in flux; between cycles, things are durable.
- Don’t expect a query to see what a write-then-read pair would return. Use IQ as a debugger or as a bounded-staleness cache, not as a replacement for a query layer.
- Event time and processing time are different clocks. Pick one per concern and stick with it.
- For external-system effects, design idempotently or use 2PC.
Related reading
Section titled “Related reading”- Exactly-once across Kafka and other systems - the commit cycle in detail.
- Enrichment via external systems - the idempotency-token pattern for external writes.
- Observability: IQ as a debugging surface and what its iterators actually show you.