wireform-iceberg

wireform-iceberg implements the Apache Iceberg open table format. Iceberg adds ACID transactions, hidden partitioning, schema evolution, and time travel on top of object-storage data files. Use this package when you need to read table metadata, plan scans with predicate pushdown, or integrate with Iceberg REST, Glue, Hadoop, or SQL catalogs from Haskell.

Key features

Table metadata as canonical JSON via Iceberg.JSON
Manifest and manifest-list readers and writers (Avro-encoded)
Scan planning with manifest pruning and file selection
Schema evolution rules and compatibility checks
Partition transforms (identity, bucket, truncate, time transforms)
Deletion vectors and position/equality delete file handling
Puffin statistics index format support
Catalog clients for REST, AWS Glue, Hadoop filesystem, and SQL backends
Time travel via snapshot refs, snapshot IDs, and timestamp lookup
Interop-tested against pyiceberg

Basic usage

Open a table by parsing its metadata JSON, then plan a scan over the current snapshot. The scan planner resolves the manifest list, reads each manifest, and collects the data file paths your reader should open:

import qualified Data.Aeson              as Aeson
import qualified Data.ByteString         as BS
import qualified Data.Map.Strict         as Map
import qualified Iceberg.Expression      as Expr
import qualified Iceberg.JSON            as IJ
import qualified Iceberg.Read            as IR
import           Iceberg.Snapshot          (currentSnapshot)

openTableMetadata :: FilePath -> IO (Either String Iceberg.Types.TableMetadata)
openTableMetadata metadataPath = do
  jsonBytes <- BS.readFile metadataPath
  pure $ case Aeson.eitherDecodeStrict jsonBytes of
    Left err  -> Left err
    Right val -> IJ.metadataFromJSON val

planScanWithLocalManifests
  :: Iceberg.Types.TableMetadata
  -> ByteString
  -> Map Text ByteString
  -> Either String IR.ScanPlan
planScanWithLocalManifests tm manifestListBytes manifests =
  let filterExpr =
        Expr.and_
          (Expr.greaterThanOrEq "event_time" (Expr.LLong 1700000000000))
          (Expr.equal "region" (Expr.LString "us-west"))
      readManifest path =
        maybe (Left ("missing manifest: " ++ T.unpack path)) Right
          (Map.lookup path manifests)
  in case currentSnapshot tm of
       Nothing -> Left "table has no current snapshot"
       Just _  -> IR.planScanWithFilter tm manifestListBytes readManifest filterExpr

The resulting ScanPlan carries the resolved snapshot, schema, manifest paths, and data file paths. Pass each data file path to wireform-parquet (or another format reader) to materialize rows.

For catalog-backed tables, use Iceberg.Catalog.REST.Client or Iceberg.Catalog.Glue to load metadata before calling the scan planner.

Performance

Hot-path microbenchmarks: C kernel vs pure Haskell

Deletion vector

Operation	C kernel	pure Haskell	ratio
decode 1001 positions	10081 ns	27392 ns	2.72x
contains check	11.9 ns	1054 ns	88.30x

_{Last run 2026-06-27 11:45:59 UTC. ghc-9.8.4 on darwin-aarch64, criterion 1.6.5.}

Murmur3 hash

Operation	C kernel	pure Haskell	ratio
8 B	11.5 ns	20.7 ns	1.80x
64 B	22.9 ns	110 ns	4.80x
1 KiB	391 ns	1551 ns	3.96x
64 KiB	26010 ns	97717 ns	3.76x

_{Last run 2026-06-27 11:45:59 UTC. ghc-9.8.4 on darwin-aarch64, criterion 1.6.5.}

The C kernels for deletion-vector bitmap operations and Murmur3 hashing are several times faster than the pure Haskell fallbacks. The contains check is the most dramatic: a single bitmap probe takes nanoseconds in C vs over 1 µs in pure Haskell. Both kernels are used by default.

The charts and tables above are regenerated by wireform-stats from wireform-iceberg/bench-results/summary/iceberg-{deletion-vector,murmur3-c-vs-pure}.json — the same source the README charts are built from.

Notable modules

Module	Purpose
`Iceberg.Types`	`TableMetadata`, `Schema`, `Snapshot`, partition specs
`Iceberg.JSON`	`metadataToJSON` / `metadataFromJSON`
`Iceberg.Snapshot`	Snapshot lookup, refs, time travel, ancestry
`Iceberg.Read`	`planScan`, `planScanWithFilter`, manifest readers
`Iceberg.Write`	Snapshot and manifest emission
`Iceberg.Expression`	Predicate AST and manifest pruning evaluators
`Iceberg.Partition` / `Iceberg.Transform`	Partition spec evaluation
`Iceberg.SchemaEvolution`	Allowed schema changes
`Iceberg.Delete` / `Iceberg.DeletionVector`	Row-level delete handling
`Iceberg.Puffin`	Puffin auxiliary index files
`Iceberg.Catalog.*`	REST, Glue, Hadoop, and SQL catalog bindings
`Iceberg.Parquet`	Iceberg Parquet data file bridge

Interop

The probe suite round-trips table metadata, manifest lists, and manifest files against pyiceberg and fastavro fixtures captured from real tables.