Skip to content

Minarrow

Arrow-compatible data that moves cleanly between Rust and Python, stays ready for SIMD, and plugs into the rest of the Python data ecosystem.

PyPI License Minarrow Rust crate

Minarrow is a compact Python interface over Minarrow Rust. It gives Rust and Python the same columnar data model, with Apache Arrow-compatible memory layouts and 64-byte-aligned buffers. Data produced in Rust can be exposed directly to Python, operated on by native SIMD kernels, and passed to PyArrow, Polars, DuckDB and other Arrow-aware libraries through the Arrow PyCapsule interface.

  • Fast


    Data crosses between Rust and Python zero-copy, with no serialisation step.

  • Compatible


    Buffers sit on SIMD 64-byte boundaries, ready for AVX2 and AVX-512 kernels.

  • Pluggable


    Hand data to Polars, DuckDB, pandas and PyArrow Ecosystem through the Arrow PyCapsule interface.

  • Simple


    Array, Table, ChunkedArray and ChunkedTable cover flat columnar data.

The same data model exists on both sides of the boundary.

import minarrow as ma

table = ma.Table(
    {"id": [1, 2, 3], "price": [9.5, 10.0, 11.2]},
    name="prices",
)

series = table["price"].to_polars()
relation = table.to_duckdb()
use minarrow::{Table, fa_i64, fa_f64};

let table = Table::new(
    "prices",
    Some(vec![
        fa_i64!("id", 1, 2, 3),
        fa_f64!("price", 9.5, 10.0, 11.2),
    ]),
);

let series = table.c[1].to_polars();
let batch = table.to_apache_arrow();

Overview

Minarrow provides four primary Python containers:

  • Array for typed columnar data
  • Table for named collections of equal-length arrays
  • ChunkedArray for one logical column split across multiple chunks
  • ChunkedTable for a sequence of table batches

The Python package is backed by the Minarrow Rust core, which provides the underlying array types, schemas and aligned buffers.

Minarrow is intended for:

  • Passing columnar data between Rust and Python
  • Building Python extensions that operate on Arrow-compatible data
  • Feeding Polars, DuckDB, PyArrow and other Arrow-aware libraries
  • Running SIMD-oriented native kernels over guaranteed aligned buffers
  • Keeping binary size and dependency footprint small

Installation

pip install minarrow

Basic usage

import minarrow as ma

table = ma.Table(
    {
        "id": [1, 2, 3],
        "price": [9.5, 10.0, 11.2],
    },
    name="prices",
)

print(table.columns)
# ['id', 'price']

print(table.dtypes)
# {'id': DType.Integer, 'price': DType.Float}

Create an individual array:

ids = ma.Array([1, 2, 3], name="id")

Pass data to another Arrow-aware library:

polars_series = ids.to_polars()
duckdb_relation = table.to_duckdb()

See the Quickstart for arrays, tables, indexing and schema operations.

Why Minarrow

Small typed API

Minarrow exposes a compact object model rather than reproducing the full Apache Arrow surface area.

Arrays retain a concrete data type, exposed through .dtype, while tables provide named access to their constituent arrays.

Arrow ecosystem integration

Array and Table implement the Arrow PyCapsule interface. Compatible libraries can import their schema and buffers without requiring an intermediate serialisation format.

This provides an interoperability path to:

  • Polars
  • DuckDB
  • PyArrow
  • pandas through an Arrow-compatible backend
  • Native Python extensions that consume Arrow capsules

The PyCapsule integration is zero-copy.

64-byte-aligned buffers

The Rust core stores supported data buffers with 64-byte alignment.

This is useful for native kernels using SIMD instruction sets such as AVX2 or AVX-512, as suitably aligned buffers can be processed without first copying them into a separate aligned allocation.

Alignment does not make an operation SIMD-accelerated by itself. It provides a predictable memory layout for native code that implements vectorised kernels.

Rust-backed implementation

The core data structures are implemented in Rust using concrete array types and enum-based dispatch. This allows type-specific paths to be compiled and optimised without requiring dynamic Python dispatch inside inner loops.

Python remains the orchestration layer, while storage and native operations are handled by the Rust implementation.

Interoperability

Minarrow is a columnar data container/bridge as opposed to a full dataframe execution engine.

Use it to construct or receive data, operate on it through native extensions, and pass it to an appropriate query or dataframe system zero-copy.

Requirement Example integration
Dataframe expressions Polars
SQL queries DuckDB
General Arrow interchange PyArrow
pandas workflows pandas with an Arrow-compatible path
Custom native computation Rust, C or C++ through Arrow capsules

The Rust crate does provide a minimal set of tabular data-processing capabilities but without full parallelism, which serves as a useful basis for foundational operations in that environment. It compiles in less than 2 seconds with the standard feature set to help you stay productive, and your project lightweight.

See Ecosystem interoperability for supported conversion paths and ownership behaviour.

Documentation

Support

Report issues through the Minarrow GitHub repository.

Minarrow is built and maintained by SpaceCell.