Skip to content

wireform-core

wireform-core contains the C and SIMD-accelerated primitives shared by every wireform-* format package. You generally don’t import it directly. It exists so that the hot paths common to multiple formats (varint decoding, UTF-8 validation, byte scanning, JSON escaping, hashing) are implemented once and tuned in one place.

Serialization formats share a surprising amount of low-level machinery. Varints appear in Protocol Buffers, Avro, Thrift, and CBOR. UTF-8 validation matters everywhere that decodes text. JSON escaping shows up in any format that has a JSON bridge. Rather than reimplementing these per-format and optimizing them separately, wireform-core centralizes them behind a C FFI boundary with SWAR (SIMD Within A Register) and hardware SIMD implementations.

C-backed primitives from cbits/fast_decode.c and cbits/fast_scan.c. These use SWAR bit tricks and, where available, SSE/AVX/NEON intrinsics.

CategoryKey functions
Varint decodedecodeVarintSWAR, countPackedVarints, packedAllSingleByte
UTF-8 validationvalidateUtf8SWAR
Byte scanningfindByte, findNul, isAscii
JSON escapingfindJsonEscape, escapeJSONStringBS, escapeJSONText
Text decodedecodeTextFast (UTF-8 bytes to Text, fast path)
EndiannessreadBE16H .. readBE64H, readLE16H .. readLE64H (and write variants)
C encode helpersencodeLengthDelimitedC, encodeVarintFieldC

The ByteString variants (findByteBS, findNulBS, isAsciiBS, findJsonEscapeBS) operate directly on pinned byte arrays without copying.

An offset-based encoder that writes directly into a pre-allocated buffer. Format-specific encoders build on this to avoid intermediate Builder allocations when the output size is known or bounded.

FunctionWhat it writes
directEncodeTop-level entry: allocate N bytes, run a write callback, return ByteString
dVarintLEB128 varint
dWord32LE / dWord64LEFixed-width little-endian
dFloatLE / dDoubleLEIEEE 754
dBytes / dTextLength-prefixed byte/text blobs
dVarintField, dStringField, …Tag + value pairs (proto wire format)

Hashing and bitmap primitives from cbits/wireform_hash_simd.c, used by Parquet bloom filters, Iceberg partition transforms, and internal hash tables.

FunctionAlgorithm
murmur3_32Murmur3 (32-bit), used by Iceberg bucket transforms
xxh64XXHash64 with configurable seed
bucketLong / bucketBytesIceberg bucket partition hash
roaringDecodeArray / roaringDecodeBitsetRoaring bitmap container decode
roaringContainsPoint-query a decoded bitmap container
roaringEncodeArray / roaringEncodeBitsetEncode bitmap containers

The package ships three C source files:

FileContents
cbits/fast_decode.cVarint SWAR, UTF-8 validation, C encode helpers
cbits/fast_scan.cByte/NUL/ASCII/whitespace/JSON-escape scanners
cbits/wireform_hash_simd.cMurmur3, XXH64, Roaring codec

These use __attribute__((target(...))) for multi-versioning so the correct SIMD path is selected at runtime without requiring compile-time -msse4.2 or -mavx2 flags.

Almost never. The format packages (wireform-proto, wireform-cbor, etc.) depend on wireform-core internally and expose higher-level APIs. The main reasons to import wireform-core directly:

  1. You’re writing a new format package for wireform and need the shared encode buffer or scanning primitives.
  2. You need xxh64 or murmur3_32 for non-wireform purposes and want to avoid pulling in a separate hashing library.
  3. You’re debugging a performance issue in a format decoder and want to benchmark the underlying C primitives in isolation.