wireform-iceberg
wireform-iceberg implements the Apache Iceberg open table format. Iceberg
adds ACID transactions, hidden partitioning, schema evolution, and time
travel on top of object-storage data files. Use this package when you need
to read table metadata, plan scans with predicate pushdown, or integrate
with Iceberg REST, Glue, Hadoop, or SQL catalogs from Haskell.
Key features
Section titled “Key features”- Table metadata as canonical JSON via
Iceberg.JSON - Manifest and manifest-list readers and writers (Avro-encoded)
- Scan planning with manifest pruning and file selection
- Schema evolution rules and compatibility checks
- Partition transforms (
identity,bucket,truncate, time transforms) - Deletion vectors and position/equality delete file handling
- Puffin statistics index format support
- Catalog clients for REST, AWS Glue, Hadoop filesystem, and SQL backends
- Time travel via snapshot refs, snapshot IDs, and timestamp lookup
- Interop-tested against pyiceberg
Basic usage
Section titled “Basic usage”Open a table by parsing its metadata JSON, then plan a scan over the current snapshot. The scan planner resolves the manifest list, reads each manifest, and collects the data file paths your reader should open:
import qualified Data.Aeson as Aesonimport qualified Data.ByteString as BSimport qualified Data.Map.Strict as Mapimport qualified Iceberg.Expression as Exprimport qualified Iceberg.JSON as IJimport qualified Iceberg.Read as IRimport Iceberg.Snapshot (currentSnapshot)
openTableMetadata :: FilePath -> IO (Either String Iceberg.Types.TableMetadata)openTableMetadata metadataPath = do jsonBytes <- BS.readFile metadataPath pure $ case Aeson.eitherDecodeStrict jsonBytes of Left err -> Left err Right val -> IJ.metadataFromJSON val
planScanWithLocalManifests :: Iceberg.Types.TableMetadata -> ByteString -> Map Text ByteString -> Either String IR.ScanPlanplanScanWithLocalManifests tm manifestListBytes manifests = let filterExpr = Expr.and_ (Expr.greaterThanOrEq "event_time" (Expr.LLong 1700000000000)) (Expr.equal "region" (Expr.LString "us-west")) readManifest path = maybe (Left ("missing manifest: " ++ T.unpack path)) Right (Map.lookup path manifests) in case currentSnapshot tm of Nothing -> Left "table has no current snapshot" Just _ -> IR.planScanWithFilter tm manifestListBytes readManifest filterExprThe resulting ScanPlan carries the resolved snapshot, schema, manifest
paths, and data file paths. Pass each data file path to wireform-parquet
(or another format reader) to materialize rows.
For catalog-backed tables, use Iceberg.Catalog.REST.Client or
Iceberg.Catalog.Glue to load metadata before calling the scan planner.
Performance
Section titled “Performance”Hot-path microbenchmarks: C kernel vs pure Haskell
Section titled “Hot-path microbenchmarks: C kernel vs pure Haskell”Deletion vector
Section titled “Deletion vector”| Operation | C kernel | pure Haskell | ratio |
|---|---|---|---|
| decode 1001 positions | 10081 ns | 27392 ns | 2.72x |
| contains check | 11.9 ns | 1054 ns | 88.30x |
Last run 2026-06-27 11:45:59 UTC. ghc-9.8.4 on darwin-aarch64, criterion 1.6.5.
Murmur3 hash
Section titled “Murmur3 hash”| Operation | C kernel | pure Haskell | ratio |
|---|---|---|---|
| 8 B | 11.5 ns | 20.7 ns | 1.80x |
| 64 B | 22.9 ns | 110 ns | 4.80x |
| 1 KiB | 391 ns | 1551 ns | 3.96x |
| 64 KiB | 26010 ns | 97717 ns | 3.76x |
Last run 2026-06-27 11:45:59 UTC. ghc-9.8.4 on darwin-aarch64, criterion 1.6.5.
The C kernels for deletion-vector bitmap operations and Murmur3 hashing are several times faster than the pure Haskell fallbacks. The contains check is the most dramatic: a single bitmap probe takes nanoseconds in C vs over 1 µs in pure Haskell. Both kernels are used by default.
The charts and tables above are regenerated by wireform-stats from wireform-iceberg/bench-results/summary/iceberg-{deletion-vector,murmur3-c-vs-pure}.json — the same source the README charts are built from.
Notable modules
Section titled “Notable modules”| Module | Purpose |
|---|---|
Iceberg.Types | TableMetadata, Schema, Snapshot, partition specs |
Iceberg.JSON | metadataToJSON / metadataFromJSON |
Iceberg.Snapshot | Snapshot lookup, refs, time travel, ancestry |
Iceberg.Read | planScan, planScanWithFilter, manifest readers |
Iceberg.Write | Snapshot and manifest emission |
Iceberg.Expression | Predicate AST and manifest pruning evaluators |
Iceberg.Partition / Iceberg.Transform | Partition spec evaluation |
Iceberg.SchemaEvolution | Allowed schema changes |
Iceberg.Delete / Iceberg.DeletionVector | Row-level delete handling |
Iceberg.Puffin | Puffin auxiliary index files |
Iceberg.Catalog.* | REST, Glue, Hadoop, and SQL catalog bindings |
Iceberg.Parquet | Iceberg Parquet data file bridge |
Interop
Section titled “Interop”The probe suite round-trips table metadata, manifest lists, and manifest files against pyiceberg and fastavro fixtures captured from real tables.