wireform-parquet
wireform-parquet implements the Apache Parquet columnar file format. Parquet
is the on-disk format behind most data warehouses and lakehouse table formats
(Iceberg, Delta Lake, Hudi). Use this package when you need to read or write
Parquet files directly in Haskell, with support for the encodings and
compression codecs that real-world writers emit.
Key features
Section titled “Key features”- Full read and write via
Parquet.HighLeveland lower-level page APIs - All major encodings: PLAIN, dictionary, DELTA_BINARY_PACKED, BYTE_STREAM_SPLIT, and hybrid RLE index pages
- Compression codecs behind Cabal flags: Snappy, Zstd, LZ4, Gzip, and Brotli
- Bloom filters and page indexes for sub-row-group predicate pruning
- Modular column encryption (AES-GCM) per the Parquet Modular Encryption spec
- Predicate pushdown over footer statistics, page indexes, and bloom filters
- Nested columns (lists, maps, structs) via
Parquet.Nested - Arrow bridge for typed record batches through
Parquet.Arrow - Template Haskell deriver via
Parquet.Derive - Interop-tested against pyarrow
Basic usage
Section titled “Basic usage”Most callers start with the high-level decode API for in-memory bytes, or
openParquetReader for mmap-aware streaming over large files on disk:
import qualified Data.Vector as Vimport qualified Parquet.HighLevel as PHimport qualified Parquet.Read as PRimport qualified Parquet.Types as PT
readParquetBytes :: ByteString -> IO ()readParquetBytes bytes = case PH.decodeParquet PH.defaultReadOptions bytes of Left err -> putStrLn err Right pf -> let fm = PR.pfFooter pf in putStrLn $ "rows=" ++ show (PT.fmNumRows fm) ++ " rowGroups=" ++ show (V.length (PT.fmRowGroups fm))
readParquetFile :: FilePath -> IO ()readParquetFile path = do result <- PR.openParquetReader path case result of Left err -> putStrLn err Right (pf, _rowGroupIter) -> let fm = PR.pfFooter pf in putStrLn $ "rows=" ++ show (PT.fmNumRows fm) ++ " rowGroups=" ++ show (V.length (PT.fmRowGroups fm))For writing, pass an Arrow-shaped schema and column batches to
encodeParquet:
import qualified Parquet.HighLevel as PH
writeParquet :: PH.Schema -> [V.Vector PH.ColumnData] -> ByteStringwriteParquet schema rowGroups = PH.encodeParquet PH.defaultWriteOptions schema rowGroupsWhen you need projection or filter pushdown without loading every column,
use the predicate and aggregate modules together with the Arrow bridge or
the cross-format Wireform.Columnar facade.
Performance
Section titled “Performance”XXH64 hash: C kernel vs pure Haskell
Section titled “XXH64 hash: C kernel vs pure Haskell”| Operation | C kernel | pure Haskell | ratio |
|---|---|---|---|
| 8 B | 12.1 ns | 10.3 ns | 0.85x |
| 64 B | 18.1 ns | 42.10 ns | 2.37x |
| 1 KiB | 79.9 ns | 457 ns | 5.71x |
| 64 KiB | 4387 ns | 28291 ns | 6.45x |
Last run 2026-06-27 11:45:59 UTC. ghc-9.8.4 on darwin-aarch64, criterion 1.6.5.
The C kernel pulls ahead of the pure Haskell fallback from 64 bytes up. At 8 bytes the pure path wins slightly due to call overhead. The C path is the default when the FFI is available. Page-level decode throughput depends heavily on encoding (PLAIN, DELTA_BINARY_PACKED, RLE/bitpacked) and compression codec.
The chart and table above are regenerated by wireform-stats from wireform-parquet/bench-results/summary/parquet-xxh64-c-vs-pure.json — the same source the README chart is built from.
Notable modules
Section titled “Notable modules”| Module | Purpose |
|---|---|
Parquet.HighLevel | encodeParquet, decodeParquet, WriteOptions, ReadOptions |
Parquet.Read | loadParquetFilePath, openParquetReader, column chunk decoders |
Parquet.Write | Page encoders, row group assembly, buildParquetFile |
Parquet.Footer | Thrift-encoded footer parse and emit |
Parquet.Page / Parquet.PageIndex | Data page headers and per-page statistics |
Parquet.BloomFilter | Split-block bloom filter decode |
Parquet.Encryption | Column-level and footer encryption (PME, AES-GCM) |
Parquet.Predicate | Statistics and bloom-filter predicate evaluation |
Parquet.Aggregate | count(*), count(col), min, max from footer stats |
Parquet.Arrow | Parquet columns to Arrow ColumnArray bridge |
Parquet.Derive | Template Haskell deriver with wireform-derive annotations |
Interop
Section titled “Interop”The reader handles files produced by pyarrow, parquet-cpp, and arrow-rs, including dictionary-encoded strings, delta-packed integers, and BYTE_STREAM_SPLIT floats. Cross-language round-trip tests live in the package probe suite.