Kafka Streams
wireform-kafka-streams is a Haskell library for building streaming applications on Apache Kafka. You write topologies as ordinary Haskell values and run them inside your service. Kafka handles durability and ordering; the library handles state stores, joins, windowing, exactly-once semantics, and rebalancing.
Start here
Section titled “Start here”New to Kafka Streams? Quickstart → What is Kafka Streams? → Your first topology
Have a specific problem?
- Calling external APIs from my topology → Enrichment guide
- Deploying to production → Going to production
- Exactly-once to Postgres/S3/etc → Exactly-once guide
- Something’s on fire → Runbooks
Coming from Java Kafka Streams? The API mirrors the Java client. Check the Riffle extensions for features the Java client doesn’t have.
The tutorial (30 minutes)
Section titled “The tutorial (30 minutes)”Five self-contained parts. Run against an in-process test driver. No external Kafka broker needed.
- What is Kafka Streams?: The mental model and vocabulary
- Your first topology: Read from one topic, write to another
- Stateful processing: Count words and query the results
- Joins and tables: Enrich a stream with reference data
- Going to production: Eight things to set up before deploying
Common tasks
Section titled “Common tasks”| When you need to… | Read this |
|---|---|
| Call external HTTP/SQL/GRPC APIs | Enrichment guide |
| Scale past your partition count | Scaling and rebalancing |
| Deploy in Kubernetes without losing state | Running in containers |
| Write to Postgres/S3 with exactly-once semantics | Exactly-once across systems |
| Roll out a new topology version | Topology evolution |
| Understand why IQ reads don’t match writes | Visibility versus ACID databases |
| Set up monitoring and alerts | Observability |
| Handle an incident | Runbooks |
Extended features (Riffle)
Section titled “Extended features (Riffle)”Optional extensions for advanced use cases. These solve problems that standard Kafka Streams doesn’t address.
-
Async I/O: Call slow external APIs without blocking processing. The operator handles concurrency, timeouts, retries, and exactly-once semantics automatically.
-
Snapshot stores: Recover large state stores quickly. Instead of replaying hours of changelog to rebuild state, restore from a recent checkpoint.
-
2PC sinks: Write to Postgres, S3, or HTTP endpoints with exactly-once semantics. Uses two-phase commit to keep Kafka and external systems in sync.
-
Watermark coordinator: Handle streams with very different data rates. Prevents windows from stalling when one source goes idle.
-
Key-group routing: Scale your application past your topic’s partition count. Useful when you need more parallelism than your input topics provide.
See Extended features for details.
Reference
Section titled “Reference”- Glossary: Definitions for all terminology
- Dynamic topology changes: What you can change at runtime versus what requires a restart
How it fits together
Section titled “How it fits together”- A topology is a typed Haskell value (
Topology input output) composed withControl.Category.(>>>). Like a functioninput -> output, it transforms streams. Common pattern:Topology Void ()for topologies that read from and write to Kafka topics. - The runtime is a library, not a cluster. Your service contains the topology; scaling means running more processes in the same consumer group.
- State stores live next to your service (local disk or memory), backed by Kafka changelog topics for durability and standby tasks for fast failover.