wireform-network
wireform-network is the receive-side infrastructure shared by every
networked wireform package. It owns the magic-ring transport: a
double-mapped pinned buffer that the kernel writes recv data into and that
the wireform parser reads decoded values out of, with no intermediate
ByteString allocation and no per-call malloc. Built on top of it:
withRecvTransport/withRecvBufTransport/newRecvBufTransport: the magic-ringWireform.Transportconstructors used bywireform-kafka,wireform-http1, andwireform-http2as the exclusive read path.Wireform.Network.TLS.OpenSSL: direct OpenSSL FFI that decrypts TLS plaintext into a caller-supplied pointer (i.e. the magic ring’s backing memory), so the entire socket → TLS → parser pipeline runs copy-free.
Why a magic ring
Section titled “Why a magic ring”The classic Haskell socket recv path allocates a fresh pinned
ByteString per recv() call, hands it to the parser, and lets the
GC free it later. Parsers that need more bytes than fit in one
recv() then concatenate chunks, another allocation and copy. For a
parser that processes hundreds of thousands of small frames per
second (Kafka pipelining, HTTP/2 streams, HTTP/1.1 keep-alive) those
allocations dominate.
A magic ring sidesteps the whole thing. It is a power-of-two-sized
region of pinned memory mapped twice into adjacent virtual addresses
(Linux: memfd_create + two mmap MAP_FIXED calls; macOS and
Windows have equivalents). Any read of up to N bytes starting
anywhere in the first [base, base + N) page is contiguous in
virtual memory, because the MMU silently picks the second mapping
when the cursor crosses the wrap point. The parser never sees a
wrap-around boundary; the recv path never allocates a ByteString.
virtual address: |XXXXXXXXXXXXXXXX|XXXXXXXXXXXXXXXX| ^ ^ ^ base +N (wrap) +2Nphysical memory: shared FD page-mapped twiceThe parser’s Wireform.Parser (in wireform-core) is built around
this contract: takeBs n returns a zero-copy slice of the ring’s
backing memory, valid until the transport’s tail is advanced past it.
Transport constructors
Section titled “Transport constructors”The transport surface is symmetric: a ReceiveTransport carries
parser-side reads, a SendTransport carries encoder-side writes,
and DuplexTransport pairs them on one underlying byte stream.
Receive side
Section titled “Receive side”| Function | When to use |
|---|---|
withReceiveTransport :: TransportConfig -> Socket -> (ReceiveTransport -> IO a) -> IO a | TCP socket recv, bracket-scoped. The straightforward path. |
withReceiveBufTransport :: TransportConfig -> ReceiveFn -> (ReceiveTransport -> IO a) -> IO a | Wrap any Ptr Word8 -> Int -> IO Int recv callback (TLS, in-memory test pipe, mock socket). Bracket-scoped. |
newReceiveBufTransport :: TransportConfig -> ReceiveFn -> IO ReceiveTransport | Same as above but lifetime-managed by the caller; receiveClose unmaps the ring. |
Send side (symmetric)
Section titled “Send side (symmetric)”| Function | When to use |
|---|---|
withSendTransport :: TransportConfig -> Socket -> (SendTransport -> IO a) -> IO a | TCP socket send, bracket-scoped. The dual of withReceiveTransport. |
withSendBufTransport :: TransportConfig -> SendFn -> IO () -> (SendTransport -> IO a) -> IO a | Wrap any Ptr Word8 -> Int -> IO Int send callback (TLS, in-memory test sink). The IO () is the shutdown-write action. |
newSendBufTransport :: TransportConfig -> SendFn -> IO () -> IO SendTransport | Lifetime-managed variant. |
Encoders interact with the send ring via the reservation API
exposed from Wireform.Transport.Send:
reserveSend :: SendTransport -> Int -> IO (Ptr Word8, Word64)withSendReservation :: SendTransport -> Int -> (Ptr Word8 -> Int -> IO Int) -> IO IntsendByteString :: SendTransport -> ByteString -> IO ()sendByteStringMany :: SendTransport -> [ByteString] -> IO ()sendBuilder :: SendTransport -> Builder -> IO ()Duplex (paired on one wire)
Section titled “Duplex (paired on one wire)”| Function | When to use |
|---|---|
withDuplexTransport :: TransportConfig -> Socket -> (DuplexTransport -> IO a) -> IO a | The shape downstream Connection objects in wireform-http1 / wireform-http2 / wireform-kafka build on. |
newDuplexTransport :: TransportConfig -> Socket -> IO DuplexTransport | Lifetime-managed variant. |
newDuplexPipe :: TransportConfig -> IO (DuplexTransport, DuplexTransport) | In-memory paired duplex for tests; replaces the per-package mkPipeTransport variants. |
The accompanying TransportConfig selects the ring size (default
1 MiB; Pipeline callers in wireform-kafka configure 16 MiB to fit
typical Fetch responses) and an IO-manager wait policy.
chunkedReceiveFn :: [ByteString] -> IO ReceiveFn is a test fixture
that delivers a fixed chunk list one at a time then signals EOF.
The streaming-parser test suites in every downstream package use it
to drive the magic ring without a real socket pair.
TLS-on-ring via OpenSSL
Section titled “TLS-on-ring via OpenSSL”Wireform.Network.TLS.OpenSSL is the architecturally clean TLS path:
plaintext bytes flow from libssl straight into the magic ring’s
backing memory with zero intermediate ByteString allocations.
libssl (cbits/wireform_openssl.c) │ SSL_read_ex(ssl, dst, dst_len, &n) ← writes plaintext into dst ▼tlsRecvFn :: SslConn -> RecvFn (Ptr Word8 -> Int -> IO Int) ▼newRecvBufTransport / withRecvBufTransport (magic ring) ▼Wireform.Transport → StreamingReader / FrameParserThe surface mirrors the bits OpenSSL exposes: newClientCtx /
newServerCtx for SSL_CTX construction with PEM cert + key load,
setAlpnClient / setAlpnServer for ALPN negotiation,
newClient / newServer to drive SSL_connect / SSL_accept with
WANT_READ / WANT_WRITE parked on the GHC IO manager,
tlsRecvFn for the magic-ring direct read path,
tlsSend for the symmetric write side, plus
getAlpn / setClientHostnameVerify for the usual ergonomics.
withTlsRecvTransport :: TransportConfig -> SslConn -> (Transport -> IO a) -> IO a
glues it all together: hands you a magic-ring Transport that the
streaming-parser readers in any wireform package can drive.
OpenSSL is the only TLS implementation in the repo: wireform-kafka,
wireform-http1, wireform-http2 (both the new stack and the
vendored grapesy engine under Network.HTTP2.Engine.*), and
wireform-grpc all go through Wireform.Network.TLS.OpenSSL.
The pure-Haskell tls package + the crypton-x509-* family are
no longer dependencies anywhere. The contrast versus the previous
arrangement:
| Concern | tls bridge | OpenSSL direct |
|---|---|---|
| Plaintext into ring (no memcpy) | ✗ (one copy per record) | ✓ |
| Crypto implementation | tls (pure Haskell) | libssl (system) |
| Per-record allocation | 1 ByteString | 0 |
| External system dep | none | libssl |
| Auditability | pure Haskell | C |
See docs/tls-on-ring.md
in the repo for the detailed design notes.
Benchmarks: faster than the classic recv path
Section titled “Benchmarks: faster than the classic recv path”Three head-to-head benchmarks compared the classic recv-buffer +
parser path against the magic-ring + streaming-reader path on the
same workload, with the magic ring amortised outside the per-iteration
loop (rings are connection-scoped in production; a per-iteration
mmap would dwarf the parser cost we’re trying to measure). All
numbers are per-iteration with criterion --time-limit 2 on a
single x86_64 core:
HTTP/1
Section titled “HTTP/1”| Workload | Classic RecvBuffer + parseRequest | Magic-ring StreamingReader.readRequestHead | Speedup |
|---|---|---|---|
| Small request, whole chunk | 339 ns | 245 ns | −28 % |
| Big request (~1 KiB), whole chunk | 972 ns | 828 ns | −15 % |
| Big request, 64-byte recv chunks | 1.89 µs | 1.32 µs | −30 % |
| Big request, 4-byte recv chunks | 15.5 µs | 7.59 µs | −51 % |
The 4-byte-chunk case is where the gap is biggest: every wireform
parser pass through the same SIMD CRLFCRLF scanner the classic
parser uses, but the magic ring’s double-mapping means we never
compact the recv buffer and the scanner picks up where the previous
round left off (scanFrom argument plumbed through
findCRLFCRLF). The classic recv buffer compacts on every refill
and the SIMD scan restarts from offset zero each time.
HTTP/2
Section titled “HTTP/2”| Workload | Classic RecvBuffer + decodeFrameHeader/Payload | Magic-ring Frame.StreamingReader.readFrameFrom | Speedup |
|---|---|---|---|
| 100 small DATA frames (11 byte body) | 1.98 µs | 2.16 µs | +9 % |
| 1000 small DATA frames | 22.4 µs | 22.1 µs | −2 % |
| 100 big DATA frames (1 KiB body) | 5.47 µs | 4.18 µs | −24 % |
For HTTP/2 the per-frame cost is dominated by the (already cheap) 9-byte header decode + payload slice; the magic ring path wins on medium and large frames and is at parity on very small ones. The small +9% on 100 small frames is criterion’s per-batch overhead divided across a small constant cost; the 1000-frame number (where the per-frame cost is unambiguous) is a 1.5 % win.
| Workload | Classic connectionGetExact + runGet | Magic-ring kafkaFrameParser | Speedup |
|---|---|---|---|
| 100 small frames (64 B body) | 15.0 µs | 5.59 µs | −63 % |
| 1000 small frames | 150 µs | 58.1 µs | −61 % |
| 100 big frames (4 KiB body) | 37.4 µs | 13.6 µs | −64 % |
Kafka shows the biggest win because the classic path
(connectionGetExact + Data.Binary.Get.runGet) allocates two
fresh ByteStrings per frame (one for the length prefix, one for
the body) and walks the body’s first 4 bytes through runGet to
extract the correlation id. The wireform pipeline parses the
length + correlation id with anyInt32be twice and returns a
zero-copy takeBs slice for the body. That’s about 2.5-2.8×
faster end-to-end.
Reproducing the numbers
Section titled “Reproducing the numbers”The benchmarks themselves were removed once the migration completed
(the magic-ring path is the only recv path now in
wireform-kafka, wireform-http1, and wireform-http2). To
reproduce the comparison you’d reintroduce the classic path locally;
the head-to-head sources lived at
wireform-{http1,http2,kafka}/bench/RecvVsTransport.hs before the
removal commit.
Magic-ring sizing
Section titled “Magic-ring sizing”The ring’s size sets a hard cap on the largest single takeBs n
the parser can ask for: requesting more than the ring holds
deadlocks the wait loop. Defaults reflect the worst-case in each
package:
| Package | Default ring size | Tuning knob | Rationale |
|---|---|---|---|
wireform-network (raw socket) | 1 MiB | TransportConfig.ringSizeHint | Generic; large enough for typical message frames. |
wireform-http1 Connection | 256 KiB | newConnectionFromTransportWithRingSize | h2o’s 32 KiB header-block cap + several chunked-TE body chunks + room. |
wireform-http2 Connection | 1 MiB | (constant) | Well over the practical 16 KiB SETTINGS_MAX_FRAME_SIZE. |
wireform-kafka Pipeline | 16 MiB | PipelineConfig.pipelineRingSize | Sized for typical Fetch responses; tune up to fetch.max.bytes for big workloads. |
Magic-ring virtual address space is cheap on Linux: only the pages the recv path actually touches are paged in, so over-provisioning has near-zero physical cost. A 16 MiB ring across 1000 idle Kafka connections is 16 GiB of vmem but ~0 RSS.
Stand-alone use
Section titled “Stand-alone use”You don’t need any of the HTTP / Kafka packages to use the magic ring. The minimal idiom is:
import Wireform.Network (withRecvTransport, defaultTransportConfig)import Wireform.Parser (anyWord32be, takeBs)import Wireform.Parser.Driver (runParserLoop, LoopControl (..))
drainFrames :: Socket -> IO ()drainFrames sock = withReceiveTransport defaultTransportConfig sock $ \t -> runParserLoop t lengthPrefixedFrame $ \body -> do handle body pure Continue where lengthPrefixedFrame = do len <- anyWord32be takeBs (fromIntegral len)The handler receives body as a zero-copy slice of the ring; if
you need to retain it past the next loop iteration call
Data.ByteString.copy first.