Quickstart¶

A runnable version of this guide is available at examples/quickstart.py.

Arrays¶

Create an Array from any supported Python sequence:

import minarrow as ma

array = ma.Array([1, 2, 3, None])

len(array)         # 4
array.dtype        # DType.Integer
array.dtype.group  # TypeClass.Numeric
array.bit_width    # 64
array.null_count   # 1

array[0]           # 1
array[3]           # None
array[1:3]         # Array containing [2, 3]

Minarrow infers the element type from the input. Python None values are represented as nulls.

Named arrays¶

An array may carry a field name:

ids = ma.Array([1, 2, 3], name="id")

ids.name  # "id"

Tables¶

A Table is a named collection of equal-length arrays:

table = ma.Table(
    {
        "id": [1, 2, 3],
        "price": [9.5, 10.0, 11.2],
    },
    name="prices",
)

table.name     # "prices"
table.n_rows   # 3
table.n_cols   # 2
table.columns  # ["id", "price"]
table.dtypes   # {"id": DType.Integer, "price": DType.Float}
table.schema   # Schema([id: Int64, price: Float64])

An unnamed table has None as its name.

A Minarrow Table corresponds to an Arrow record batch: every column has the same row count, and the table carries one schema.

Indexing¶

Tables support positional row and column selection:

table["price"]                  # Column by name -> Array
table[1:3]                      # Row slice -> Table
table[1:3, "price"]             # Row slice and one column -> Array
table[1:3, ["id", "price"]]     # Row slice and columns -> Table
table[:, 0]                     # Column by position -> Array

There is no row-label index.

Integers select one position.
Slices select a range.
Column lists select multiple columns.
Negative positions count from the end.
Invalid positions raise an exception.

Slices return views where the underlying representation permits it.

Fields and schemas¶

ArrowType, Field and Schema describe the table layout and its metadata.

amount = ma.Field(
    "amount",
    ma.ArrowType.Float64(),
    nullable=False,
    metadata={"unit": "USD"},
)

amount.name        # "amount"
amount.arrow_type  # Float64
amount.nullable    # False
amount.metadata    # {"unit": "USD"}

Build a schema from fields:

schema = ma.Schema(
    [
        ma.Field("id", ma.ArrowType.Int64(), nullable=False),
        amount,
    ]
)

schema.names      # ["id", "amount"]
schema["amount"]  # Field(name: amount, arrow_type: Float64, nullable: false)

See Types and schemas for the complete type system.

Chunked arrays¶

A ChunkedArray represents one logical column as an ordered sequence of arrays:

chunked = ma.ChunkedArray(
    [
        ma.Array([1, 2, 3]),
        ma.Array([4, 5]),
    ],
    name="id",
)

chunked.name      # "id"
chunked.n_chunks  # 2
len(chunked)      # 5
chunked.chunk(0)  # Array containing [1, 2, 3]

Chunked arrays avoid requiring all values to be combined into one contiguous allocation.

Chunked tables¶

A ChunkedTable represents a logical table as an ordered sequence of table batches:

chunked_table = ma.ChunkedTable(
    [
        ma.Table(
            {
                "id": [1, 2],
                "price": [9.5, 10.0],
            }
        ),
        ma.Table(
            {
                "id": [3],
                "price": [11.2],
            }
        ),
    ],
    name="prices",
)

chunked_table.name       # "prices"
chunked_table.n_batches  # 2
chunked_table.n_rows     # 3
chunked_table.batch(0)   # Table

Each batch must have a compatible schema.

Schema metadata¶

A ChunkedTable can carry an explicit schema with table-level and field-level metadata:

schema = ma.Schema(
    [
        ma.Field(
            "id",
            ma.ArrowType.Int64(),
            nullable=False,
            metadata={"role": "key"},
        ),
        ma.Field(
            "price",
            ma.ArrowType.Float64(),
        ),
    ],
    metadata={
        "source": "prices-feed",
        "version": "1",
    },
)

chunked_table = ma.ChunkedTable(
    [
        ma.Table(
            {
                "id": [1],
                "price": [9.5],
            }
        )
    ],
    name="prices",
    schema=schema,
)

chunked_table.schema.metadata
# {"source": "prices-feed", "version": "1"}

chunked_table.schema.fields[0].metadata
# {"role": "key"}

When an explicit schema is supplied, .schema returns it rather than deriving a schema from the batches.

Arrow interoperability¶

Minarrow imports and exports data through the Arrow PyCapsule interface.

import minarrow as ma
import pyarrow as pa

arrow_array = pa.array([1, 2, 3])
arrow_batch = pa.RecordBatch.from_pydict(
    {
        "id": [1, 2, 3],
        "price": [9.5, 10.0, 11.2],
    }
)

array = ma.Array.from_arrow(arrow_array)
table = ma.Table.from_arrow(arrow_batch)

Export to PyArrow:

pyarrow_array = array.to_arrow()
pyarrow_table = table.to_arrow()

Arrow-aware consumers can also use the capsule methods directly:

pyarrow_array = pa.array(array)
pyarrow_table = pa.table(table)

The PyCapsule integration is zero-copy, so the Arrow buffers are shared without serialisation.

ChunkedArray and ChunkedTable also expose the Arrow PyCapsule interface:

import polars as pl

frame = pl.from_arrow(chunked_table)

See Ecosystem interoperability for Polars, DuckDB, pandas, DataFusion, cuDF and ADBC integrations.