Skip to content

wireform-iceberg

wireform-iceberg implements the Apache Iceberg open table format. Iceberg adds ACID transactions, hidden partitioning, schema evolution, and time travel on top of object-storage data files. Use this package when you need to read table metadata, plan scans with predicate pushdown, or integrate with Iceberg REST, Glue, Hadoop, or SQL catalogs from Haskell.

  • Table metadata as canonical JSON via Iceberg.JSON
  • Manifest and manifest-list readers and writers (Avro-encoded)
  • Scan planning with manifest pruning and file selection
  • Schema evolution rules and compatibility checks
  • Partition transforms (identity, bucket, truncate, time transforms)
  • Deletion vectors and position/equality delete file handling
  • Puffin statistics index format support
  • Catalog clients for REST, AWS Glue, Hadoop filesystem, and SQL backends
  • Time travel via snapshot refs, snapshot IDs, and timestamp lookup
  • Interop-tested against pyiceberg

Open a table by parsing its metadata JSON, then plan a scan over the current snapshot. The scan planner resolves the manifest list, reads each manifest, and collects the data file paths your reader should open:

import qualified Data.Aeson as Aeson
import qualified Data.ByteString as BS
import qualified Data.Map.Strict as Map
import qualified Iceberg.Expression as Expr
import qualified Iceberg.JSON as IJ
import qualified Iceberg.Read as IR
import Iceberg.Snapshot (currentSnapshot)
openTableMetadata :: FilePath -> IO (Either String Iceberg.Types.TableMetadata)
openTableMetadata metadataPath = do
jsonBytes <- BS.readFile metadataPath
pure $ case Aeson.eitherDecodeStrict jsonBytes of
Left err -> Left err
Right val -> IJ.metadataFromJSON val
planScanWithLocalManifests
:: Iceberg.Types.TableMetadata
-> ByteString
-> Map Text ByteString
-> Either String IR.ScanPlan
planScanWithLocalManifests tm manifestListBytes manifests =
let filterExpr =
Expr.and_
(Expr.greaterThanOrEq "event_time" (Expr.LLong 1700000000000))
(Expr.equal "region" (Expr.LString "us-west"))
readManifest path =
maybe (Left ("missing manifest: " ++ T.unpack path)) Right
(Map.lookup path manifests)
in case currentSnapshot tm of
Nothing -> Left "table has no current snapshot"
Just _ -> IR.planScanWithFilter tm manifestListBytes readManifest filterExpr

The resulting ScanPlan carries the resolved snapshot, schema, manifest paths, and data file paths. Pass each data file path to wireform-parquet (or another format reader) to materialize rows.

For catalog-backed tables, use Iceberg.Catalog.REST.Client or Iceberg.Catalog.Glue to load metadata before calling the scan planner.

Hot-path microbenchmarks: C kernel vs pure Haskell

Section titled “Hot-path microbenchmarks: C kernel vs pure Haskell”
Iceberg deletion vector hot paths: C vs pure Haskell Iceberg deletion vector hot paths: C vs pure Haskell lower is better · ns · ghc-9.8.4 on darwin-aarch64, criterion 1.6.5 0 12500 25000 37500 50000 10081 27392 11.9 1054 decode 1001 positions contains check C kernel pure Haskell Iceberg deletion vector hot paths: C vs pure Haskell lower is better · ns · ghc-9.8.4 on darwin-aarch64, criterion 1.6.5 0 12500 25000 37500 50000 10081 27392 11.9 1054 decode 1001 positions contains check C kernel pure Haskell
OperationC kernelpure Haskellratio
decode 1001 positions10081 ns27392 ns2.72x
contains check11.9 ns1054 ns88.30x

Last run 2026-06-27 11:45:59 UTC. ghc-9.8.4 on darwin-aarch64, criterion 1.6.5.

Iceberg Murmur3 hash: C kernel vs pure Haskell across input sizes Iceberg Murmur3 hash: C kernel vs pure Haskell across input sizes lower is better · ns · ghc-9.8.4 on darwin-aarch64, criterion 1.6.5 0 25000 50000 75000 100000 11.5 20.7 22.9 110 391 1551 26010 97717 8 B 64 B 1 KiB 64 KiB C kernel pure Haskell Iceberg Murmur3 hash: C kernel vs pure Haskell across input sizes lower is better · ns · ghc-9.8.4 on darwin-aarch64, criterion 1.6.5 0 25000 50000 75000 100000 11.5 20.7 22.9 110 391 1551 26010 97717 8 B 64 B 1 KiB 64 KiB C kernel pure Haskell
OperationC kernelpure Haskellratio
8 B11.5 ns20.7 ns1.80x
64 B22.9 ns110 ns4.80x
1 KiB391 ns1551 ns3.96x
64 KiB26010 ns97717 ns3.76x

Last run 2026-06-27 11:45:59 UTC. ghc-9.8.4 on darwin-aarch64, criterion 1.6.5.

The C kernels for deletion-vector bitmap operations and Murmur3 hashing are several times faster than the pure Haskell fallbacks. The contains check is the most dramatic: a single bitmap probe takes nanoseconds in C vs over 1 µs in pure Haskell. Both kernels are used by default.

The charts and tables above are regenerated by wireform-stats from wireform-iceberg/bench-results/summary/iceberg-{deletion-vector,murmur3-c-vs-pure}.json — the same source the README charts are built from.

ModulePurpose
Iceberg.TypesTableMetadata, Schema, Snapshot, partition specs
Iceberg.JSONmetadataToJSON / metadataFromJSON
Iceberg.SnapshotSnapshot lookup, refs, time travel, ancestry
Iceberg.ReadplanScan, planScanWithFilter, manifest readers
Iceberg.WriteSnapshot and manifest emission
Iceberg.ExpressionPredicate AST and manifest pruning evaluators
Iceberg.Partition / Iceberg.TransformPartition spec evaluation
Iceberg.SchemaEvolutionAllowed schema changes
Iceberg.Delete / Iceberg.DeletionVectorRow-level delete handling
Iceberg.PuffinPuffin auxiliary index files
Iceberg.Catalog.*REST, Glue, Hadoop, and SQL catalog bindings
Iceberg.ParquetIceberg Parquet data file bridge

The probe suite round-trips table metadata, manifest lists, and manifest files against pyiceberg and fastavro fixtures captured from real tables.