Data Frame Conversion in miniextendr
miniextendr provides comprehensive support for converting between Rust types and R data frames, with three complementary approaches offering different trade-offs between ergonomics and flexibility.
miniextendr provides comprehensive support for converting between Rust types and R data frames, with three complementary approaches offering different trade-offs between ergonomics and flexibility.
πOverview
| Approach | Best For | Code Generation | Flexibility |
|---|---|---|---|
#[derive(DataFrameRow)] | Type-safe, ergonomic APIs | β Generates DataFrame type | βββ Easy |
DataFrame<T> | Generic, reusable code | β No codegen | ββ Moderate |
impl IntoDataFrame | Full control, complex cases | β Manual impl | β Advanced |
πCore Traits
πIntoDataFrame
The foundational trait for converting Rust types to R data frames.
pub trait IntoDataFrame {
fn into_data_frame(self) -> List;
}
Key Points:
- Consumes
self(owning conversion) - Returns a
Listwith data.frame attributes - Used by all other approaches under the hood
Related:
AsDataFrame(inas_coercemodule) - S3 coercion methods foras.data.frame()on ExternalPtr typesIntoDataFrame(this trait) - Direct conversion for return values
πApproach 1: Derive Macro (Recommended)
Use #[derive(DataFrameRow)] for the most ergonomic experience. The macro generates a companion DataFrame type and all necessary conversions.
πBasic Usage
use miniextendr_api::{miniextendr, DataFrameRow, IntoList};
#[derive(Clone, IntoList, DataFrameRow)]
struct Measurement {
time: f64,
value: f64,
sensor: String,
}
// Auto-generates:
// - struct MeasurementDataFrame { time: Vec<f64>, value: Vec<f64>, sensor: Vec<String> }
// - impl IntoDataFrame for MeasurementDataFrame
// - impl From<Vec<Measurement>> for MeasurementDataFrame
// - impl IntoIterator for MeasurementDataFrame -> Measurement
// - Measurement::to_dataframe() and from_dataframe() methods
#[miniextendr]
fn get_measurements() -> MeasurementDataFrame {
let rows = vec![
Measurement { time: 1.0, value: 10.0, sensor: "A".into() },
Measurement { time: 2.0, value: 20.0, sensor: "B".into() },
Measurement { time: 3.0, value: 30.0, sensor: "C".into() },
];
Measurement::to_dataframe(rows) // or: rows.into()
}πHeterogeneous Types
The derive macro fully supports different types in different fields:
#[derive(Clone, IntoList, DataFrameRow)]
struct Person {
name: String, // character in R
age: i32, // integer in R
height: f64, // numeric in R
is_student: bool, // logical in R
}
// Each field maintains its distinct type throughout conversionπCollection Expansion
Fixed-size arrays [T; N] are automatically expanded into N suffixed columns.
Use #[dataframe(expand)] or #[dataframe(unnest)] explicitly if desired,
though arrays expand by default.
#[derive(Clone, DataFrameRow)]
struct Point3D {
label: String,
coords: [f64; 3], // β coords_1, coords_2, coords_3
}
// Generates:
// struct Point3DDataFrame {
// label: Vec<String>,
// coords_1: Vec<f64>,
// coords_2: Vec<f64>,
// coords_3: Vec<f64>,
// }
For Vec<T>, Box<[T]>, and &[T], two expansion modes are available:
Fixed width (width = N): Expands into exactly N columns at compile time.
#[derive(Clone, DataFrameRow)]
struct Scored {
name: String,
#[dataframe(width = 3)]
scores: Vec<f64>, // β scores_1, scores_2, scores_3 as Option<f64>
}
- Shorter vecs: padded with
NA - Longer vecs: truncated to N (extra elements silently dropped)
Auto-expand (expand or unnest): Column count determined at runtime
from the maximum length across all rows.
#[derive(Clone, DataFrameRow)]
struct Measured {
name: String,
#[dataframe(expand)] // or: #[dataframe(unnest)]
readings: Vec<f64>, // β readings_1, readings_2, ... (as many as needed)
}
- Shorter vecs: padded with
NA - All elements preserved (no truncation)
- If all vecs are empty: no expansion columns produced
Box<[T]> and &[T] work identically to Vec<T> for all expansion modes. They
share the same .get(), .len(), and indexing behavior.
Note: Using &[T] introduces a lifetime parameter on both the row struct and
the generated companion struct (e.g., FooDataFrame<'a>). This is zero-cost: &[T]
is Copy (just a fat pointer), so pushing into the companion struct copies only the
pointer, not the data.
Without width or expand/unnest, Vec<T>, Box<[T]>, and &[T] stay as opaque single columns (list columns in R).
πField-Level Attributes
#[derive(Clone, DataFrameRow)]
struct Row {
#[dataframe(skip)] // Omit from DataFrame
internal_id: u64,
#[dataframe(rename = "lbl")] // Custom column name
label: String,
#[dataframe(as_list)] // Suppress expansion (keep as single column)
coords: [f64; 3],
#[dataframe(width = 5)] // Expand Vec to 5 columns
scores: Vec<f64>,
}| Attribute | Effect | Valid On |
|---|---|---|
skip | Omit field from DataFrame | Any field |
rename = "name" | Custom column name | Any field |
as_list | Suppress expansion | [T; N], Vec<T>, Box<[T]>, &[T] |
expand | Explicit expansion (default for [T; N]; auto-expand for Vec<T>/Box<[T]>/&[T]) | [T; N], Vec<T>, Box<[T]>, &[T] |
unnest | Alias for expand | [T; N], Vec<T>, Box<[T]>, &[T] |
width = N | Pin expansion width (truncates longer vecs/slices) | Vec<T>, Box<[T]>, &[T] |
Conflicts: as_list + expand/unnest, as_list + width are compile errors.
Note on round-tripping: Structs with expanded fields donβt generate IntoIterator or from_dataframe(), since the companion struct shape differs from the original. Use to_dataframe() only.
πOther Collection Types
Non-expanded collection fields work natively for both struct and enum DataFrameRows:
use std::collections::{HashSet, BTreeSet};
#[derive(Clone, DataFrameRow)]
struct ComplexRow {
measurements: Vec<f64>, // opaque list column
data: Box<[i32]>, // opaque list column
tags: HashSet<String>, // opaque list column
categories: BTreeSet<i32>, // opaque list column
}
In struct DataFrameRows the columns land as Vec<C> and convert to a VECSXP list-column. In enum DataFrameRows they land as Vec<Option<C>> with None for variants that donβt carry the field β these convert to a VECSXP list-column with NULL for absent rows. See docs/CONVERSION_MATRIX.md for the full set of supported C.
HashMap<K, V> / BTreeMap<K, V> variant fields are supported and expand to two parallel list-columns (see Map fields below). Struct-typed and nested-enum variant fields are covered in Nested enum fields below.
πMap fields β parallel list-column expansion
HashMap<K, V> and BTreeMap<K, V> fields on enum variants expand to two parallel list-columns named <field>_keys and <field>_values. Each cell holds a vector of K and a vector of V respectively, in the same entry order:
#[derive(Clone, DataFrameRow)]
#[dataframe(align, tag = "_type")]
enum Event {
Tally { label: String, tally: BTreeMap<String, i32> },
Empty { label: String },
}
// In R (BTreeMap, sorted key order):
// _type label tally_keys tally_values
// Tally "a" list("a","b") list(1L, 2L)
// Empty "b" NULL NULL
Absent-variant rows produce NULL in both columns (not NA). An empty map produces character(0) / integer(0), not NULL.
HashMap ordering: HashMap iteration order is non-deterministic. Keys and values are parallel within a single row, but the key order may differ across rows and across runs. Use setequal or sort-based comparison in R tests, never expect_equal on unsorted key vectors.
BTreeMap ordering: keys are always in sorted order per the BTreeMap contract. expect_equal is safe.
as_list opt-out: annotate the field with #[dataframe(as_list)] to keep it as a single opaque named-list column (the pre-expansion behavior). Only use this when the named-list per-row shape is needed directly in R.
Detection caveats: classify_field_type detects HashMap / BTreeMap by matching the last path segment (HashMap or BTreeMap) and requiring exactly two generic type arguments. It also detects struct-typed fields by matching bare path types (single- or multi-segment, e.g. Point or crate::geom::Point) whose last segment has no generic arguments.
Rejected wrapper types β the following shapes produce a compile error (since #484) because they cannot be automatically expanded and would otherwise silently produce a confusing opaque list-column:
Option<T>β includingOption<HashMap<K,V>>,Option<UserStruct>, etc.Cow<T>,Rc<T>,Arc<T>,RefCell<T>,Cell<T>,Mutex<T>,RwLock<T>
For all of these, use #[dataframe(as_list)] to opt into an explicit opaque list-column, or unwrap to the inner type (e.g. store HashMap<K,V> directly and use an empty map for the absent case):
#[derive(Clone, DataFrameRow)]
struct Row {
id: i32,
// `counts: Option<HashMap<String, i32>>` β compile error without `as_list`.
#[dataframe(as_list)]
counts: Option<HashMap<String, i32>>,
}
Type aliases are not automatically unwrapped β type Counts = HashMap<String, i32>; field: Counts has Counts as the last segment, so map expansion is not triggered. Use the concrete type directly (field: HashMap<String, i32>), or annotate with #[dataframe(as_list)]. See #604 for tracking.
Note: multi-segment paths whose last segment does NOT implement DataFrameRow (e.g. std::ffi::CString) produce a clear compile-time error from the _assert_inner_is_dataframe_row assertion β this is intentional. Use #[dataframe(as_list)] on the field or an import alias to a newtype wrapper if a non-DataFrameRow stdlib type needs to be stored.
πNested enum fields β flatten + opt-outs
A variant field whose type is itself a DataFrameRow enum flattens into prefixed columns by default. The inner enum must #[derive(DataFrameRow)]; the outer fieldβs name acts as a prefix. The inner enum should use #[dataframe(tag = "variant")] so that its discriminant column merges cleanly as <field>_variant:
#[derive(Clone, DataFrameRow)]
#[dataframe(align, tag = "variant")] // inner enum's own discriminant is "variant"
enum Status { Ok, Err { code: i32 } }
#[derive(Clone, DataFrameRow)]
#[dataframe(align, tag = "_type")]
enum Event {
Tracked { id: i32, status: Status },
Other { id: i32 },
}
// Columns in R:
// _type character ("Tracked" / "Other")
// id integer
// status_variant character ("Ok" / "Err" / NA for Other rows)
// status_code integer (NA for Ok rows and Other rows; error code for Err rows)
Absent-variant rows (e.g. Other above, which has no status field) produce NA in all prefixed columns.
Inner tag naming: use #[dataframe(tag = "variant")] on the inner enum β the outer prefix then produces <field>_variant (single underscore). Using #[dataframe(tag = "_variant")] (with leading underscore) produces <field>__variant (double underscore). Avoid leading underscores on inner tags.
πas_factor β unit-only inner enum
When the inner enum has only unit variants (no payload), annotate the field with #[dataframe(as_factor)] to emit a single R factor column instead of flattening. The inner enum does not need DataFrameRow for this path β only UnitEnumFactor, which is auto-emitted by #[derive(DataFrameRow)] for unit-only enums:
#[derive(Clone, Copy, DataFrameRow)]
#[dataframe(tag = "variant")]
enum Direction { North, South, East, West }
#[derive(Clone, DataFrameRow)]
#[dataframe(align, tag = "_type")]
enum Move {
Step { id: i32, #[dataframe(as_factor)] dir: Direction },
Stop { id: i32 },
}
// R column: dir β integer factor with levels c("North","South","East","West")
// Stop rows have NA in dir.
Factor levels are the variant idents in declaration order. is.factor(df$dir) returns TRUE. Annotating a payload-bearing enum with as_factor is a compile error (missing UnitEnumFactor implementation).
Note on generic unit enums: #[derive(DataFrameRow)] auto-emits UnitEnumFactor only when the enum has no generic type parameters (impl_generics.is_empty()). Generic unit enums must implement UnitEnumFactor manually if as_factor is needed.
πas_list β opaque list-column
Use #[dataframe(as_list)] to keep any inner enum as a single opaque VECSXP list-column. Each present row gets a list cell; absent-variant rows get NULL:
enum Event {
Move { id: i32, #[dataframe(as_list)] dir: Direction },
Stop { id: i32 },
}
// R column: dir β list-column; Move rows have a list cell, Stop rows have NULL.
as_list works for any inner type (unit-only or payload-bearing, with or without DataFrameRow).
π<field>_variant collision detection
When a field kind: Inner is flattened, the macro detects a compile-time collision if any sibling field in the same variant produces a column named kind_variant (the name that the inner enumβs discriminant column will receive after prefixing). Rename the colliding field or change the inner enumβs tag:
// ERROR: kind_variant is both the flatten discriminant and a sibling field name.
enum Bad {
Wrap { kind: Inner, kind_variant: String },
}
// OK: rename sibling field, or change inner tag.
enum Good {
Wrap { kind: Inner, #[dataframe(rename = "kind_type")] kind_type: String },
}πEnum Align Mode
Enums derive a companion DataFrame where each variantβs fields contribute to a unified schema. Fields absent in a variant are filled with None (β NA in R):
#[derive(Clone, DataFrameRow)]
#[dataframe(tag = "_type")]
enum Event {
Click { id: i64, x: f64, y: f64 },
Impression { id: i64, slot: String },
Error { id: i64, code: i32, message: String },
}
// In R:
// _type id x y slot code message
// Click 1 1.5 2.5 NA NA NA
// Impression 2 NA NA top_banner NA NA
// Error 3 NA NA NA 404 not found
Key points:
- All enum columns are
Vec<Option<T>>(absent fields getNone) tag = "col"adds a variant discriminator columnalignis implicit for enums (accepted but not required)- Borrowed fields (
&'a str,&'a [T]) work in enum variants β same lifetime is propagated through the companion struct. Explicit lifetime params on#[miniextendr]fns/impls are still rejected (MXL112); see CLAUDE.md.
πType Conflicts Across Variants
If two variants use the same field name with different types, the derive fails by default. Use conflicts = "string" to coerce all conflicting columns to String:
#[derive(Clone, DataFrameRow)]
#[dataframe(conflicts = "string")]
enum Mixed {
A { value: f64 },
B { value: String }, // value column becomes String for all variants
}πEnum Field Attributes
All field-level attributes (skip, rename, as_list, width) work in enum variants too:
#[derive(Clone, DataFrameRow)]
#[dataframe(tag = "_type")]
enum Observation {
Point { id: i32, coords: [f64; 2] }, // coords β coords_1, coords_2
Measurement { id: i32, #[dataframe(width = 3)] readings: Vec<f64> },
}πEnum Split Mode (to_dataframe_split)
Alongside to_dataframe (which produces a single aligned data.frame with NA/NULL fill for variants that donβt carry a field), enums also expose to_dataframe_split which partitions the rows by variant. Each partition is a data.frame with only that variantβs own columns β no NA-filled columns from sibling variants.
| Variants Γ rows in input | Return type |
|---|---|
| Single-variant enum, any number of rows | bare data.frame |
| Multi-variant enum, mixed rows | named list of data.frames, one per variant in snake_case |
let rows = vec![
Event::Click { id: 1, x: 1.5, y: 2.5 },
Event::Impression { id: 2, slot: "top_banner".to_string() },
Event::Error { id: 3, code: 404, message: "not found".to_string() },
];
Event::to_dataframe_split(rows)
// In R: list(click = <1-row df with id, x, y>,
// impression = <1-row df with id, slot>,
// error = <1-row df with id, code, message>)
Variants absent from the input still appear in the result as 0-row data.frames carrying that variantβs column shape. Unit variants produce a 0-column data.frame with the correct row count. Tuple variants name positional columns _0, _1, β¦ . See the cardinality matrix in rpkg/tests/testthat/test-dataframe-enum-payload-matrix.R for the full set of guarantees (PR #463).
πWith Serde (when serde feature enabled)
use serde::Serialize;
#[derive(Serialize, DataFrameRow)] // Serialize implies IntoList!
struct Reading {
timestamp: f64,
temperature: f64,
humidity: f64,
}
#[miniextendr]
fn get_readings() -> ReadingDataFrame {
Reading::to_dataframe(vec![
Reading { timestamp: 1.0, temperature: 20.5, humidity: 65.0 },
Reading { timestamp: 2.0, temperature: 21.0, humidity: 63.0 },
])
}πGenerated Methods
The derive macro adds these methods to your row type:
impl Measurement {
/// Name of the generated companion DataFrame type
pub const DATAFRAME_TYPE_NAME: &'static str = "MeasurementDataFrame";
/// Transpose rows to columns
pub fn to_dataframe(rows: Vec<Self>) -> MeasurementDataFrame;
/// Transpose columns back to rows
pub fn from_dataframe(df: MeasurementDataFrame) -> Vec<Self>;
}
For enums, the derive additionally generates:
impl Event {
/// Partition rows by variant. Returns `data.frame` for single-variant enums,
/// or a named `list` of per-variant data.frames otherwise. See "Enum Split Mode".
pub fn to_dataframe_split(rows: Vec<Self>) -> miniextendr_api::List;
}πIterating Over Rows
The generated DataFrame type implements IntoIterator:
let df = get_measurements();
// Iterate over rows
for measurement in df {
println!("Time: {}, Value: {}", measurement.time, measurement.value);
}
// Or collect back to Vec
let rows: Vec<Measurement> = df.into_iter().collect();πRequirements
The row type must implement IntoList:
- Automatically via
#[derive(IntoList)] - Via
#[derive(Serialize)]whenserdefeature is enabled - Via manual implementation using
List::from_raw_pairs()(for heterogeneous fields)
πContainer Attributes
#[derive(DataFrameRow)]
#[dataframe(
name = "Measurements", // Custom DataFrame name (default: {StructName}DataFrame)
tag = "_type", // Add variant discriminator column (enums)
parallel, // Enable rayon parallel fill (requires `rayon` feature)
conflicts = "string", // Coerce type conflicts to String (enums)
)]
struct Measurement { /* ... */ }πParallel Fill with Rayon
Every DataFrameRow companion type gets explicit sequential and parallel constructors.
The parallel path requires the rayon feature.
# Cargo.toml
[dependencies]
miniextendr-api = { version = "0.1", features = ["rayon"] }#[derive(Clone, IntoList, DataFrameRow)]
pub struct Point {
pub x: f64,
pub y: f64,
pub label: String,
}
#[miniextendr]
pub fn big_points() -> PointDataFrame {
let points: Vec<Point> = (0..100_000)
.map(|i| Point { x: i as f64, y: (i * 2) as f64, label: format!("p{}", i) })
.collect();
// Explicit parallel - always uses rayon, no threshold check
PointDataFrame::from_rows_par(points)
}
Generated methods on every companion type:
DfType::from_rows(rows): sequential push-based fill (always available)DfType::from_rows_par(rows): parallel scatter-write viaColumnWriter(#[cfg(feature = "rayon")])From<Vec<Row>>/RowType::to_dataframe(rows): sequential (unchanged)
How from_rows_par works:
- Pre-allocates column vectors to exact size, then fills indices in parallel
- Uses
rayon::par_iter()withColumnWriter<T>for safe concurrent writes to disjoint indices - No threshold: the caller explicitly opts in to parallelism
Enum support: Parallel fill also works with enum DataFrameRow types:
#[derive(Clone, DataFrameRow)]
#[dataframe(tag = "_kind")]
pub enum Event {
Click { id: i32, x: f64, y: f64 },
Impression { id: i32, slot: String },
}
// Use the parallel path:
let df = EventDataFrame::from_rows_par(events);
Performance: Parallel fill is most beneficial for:
- Large row counts (10k+)
- Structs with many fields (wide data frames)
- Expensive
Clone/conversion per field
For small data frames, use from_rows to avoid rayon overhead.
πColumnar Serialization via Serde
When you have types that already implement serde::Serialize, you can convert them
directly to R data frames without deriving DataFrameRow:
use serde::Serialize;
use miniextendr_api::serde::ColumnarDataFrame;
#[derive(Serialize)]
struct LogEntry {
timestamp: f64,
level: String,
message: String,
}
#[miniextendr]
fn get_logs() -> miniextendr_api::ffi::SEXP {
let logs = vec![
LogEntry { timestamp: 1.0, level: "INFO".into(), message: "started".into() },
LogEntry { timestamp: 2.0, level: "ERROR".into(), message: "failed".into() },
];
ColumnarDataFrame::from_rows(&logs).expect("serialization failed")
}
Requires the serde feature. Column types are inferred from serde field types:
| Rust Type | R Column |
|---|---|
bool | logical |
i8/i16/i32 | integer |
i64/u64/f32/f64 | numeric |
String/&str | character |
Option<T> | Same type with NA for None |
This is useful when you already have serde-serializable types and donβt want to
add IntoList + DataFrameRow derives. For new types, prefer #[derive(DataFrameRow)]
which gives you a typed companion type and better ergonomics.
πApproach 2: DataFrame<T>
Generic type for transposing row-oriented data. Works with any T: IntoList.
πWith IntoList Types
#[derive(IntoList)]
struct Point {
x: f64,
y: f64,
}
#[miniextendr]
fn points() -> DataFrame<Point> {
DataFrame::from_rows(vec![
Point { x: 1.0, y: 2.0 },
Point { x: 3.0, y: 4.0 },
])
}πWith Serialize Types
When the serde feature is enabled, use from_serialize() for the simplest experience:
use serde::Serialize;
use miniextendr_api::SerializeDataFrame;
#[derive(Serialize)]
struct Event {
timestamp: f64,
message: String,
}
#[miniextendr]
fn events() -> SerializeDataFrame<Event> {
let events = vec![
Event { timestamp: 1.0, message: "start".into() },
Event { timestamp: 2.0, message: "end".into() },
];
SerializeDataFrame::from_serialize(events)
}
SerializeDataFrame<T> is a type alias for DataFrame<AsSerializeRow<T>>, and from_serialize() handles wrapping each row automatically.
Alternative (explicit wrapping):
If you prefer the explicit form or need more control:
#[miniextendr]
fn events() -> DataFrame<AsSerializeRow<Event>> {
DataFrame::from_rows(vec![
AsSerializeRow(Event { timestamp: 1.0, message: "start".into() }),
AsSerializeRow(Event { timestamp: 2.0, message: "end".into() }),
])
}πMethods
impl<T: IntoList> DataFrame<T> {
pub fn new() -> Self;
pub fn from_rows(rows: Vec<T>) -> Self;
pub fn push(&mut self, row: T);
pub fn len(&self) -> usize;
pub fn is_empty(&self) -> bool;
}
// Also implements FromIterator
let df: DataFrame<Point> = points.into_iter().collect();
πApproach 3: Manual Implementation
For full control or complex scenarios, implement IntoDataFrame manually.
πColumn-Oriented Data (Homogeneous Types)
For data frames where all columns have the same element type, use List::from_pairs():
struct TimeSeries {
timestamps: Vec<f64>,
values: Vec<f64>,
}
impl IntoDataFrame for TimeSeries {
fn into_data_frame(self) -> List {
List::from_pairs(vec![
("timestamp", self.timestamps),
("value", self.values),
])
.set_class_str(&["data.frame"])
.set_row_names_int(self.timestamps.len())
}
}
#[miniextendr]
fn time_series() -> TimeSeries {
TimeSeries {
timestamps: vec![1.0, 2.0, 3.0],
values: vec![10.0, 20.0, 30.0],
}
}
// Automatically converts to data.frame via IntoRπColumn-Oriented Data (Heterogeneous Types)
Important: For data frames with different column types, use List::from_raw_pairs() instead of from_pairs():
use miniextendr_api::IntoR;
struct MixedData {
names: Vec<String>,
ages: Vec<i32>,
heights: Vec<f64>,
}
impl IntoDataFrame for MixedData {
fn into_data_frame(self) -> List {
List::from_raw_pairs(vec![
("name", self.names.into_sexp()),
("age", self.ages.into_sexp()),
("height", self.heights.into_sexp()),
])
.set_class_str(&["data.frame"])
.set_row_names_int(self.names.len())
}
}
Why? from_pairs() is generic over a single type T: IntoR, so all columns must have the same type. from_raw_pairs() accepts pre-converted SEXP values, allowing heterogeneous columns.
πCall-Site Control with Wrappers
Force conversion for a specific return without changing the typeβs default:
#[miniextendr]
fn as_dataframe() -> ToDataFrame<TimeSeries> {
ToDataFrame(TimeSeries { /* ... */ })
}
// Or use the extension trait
#[miniextendr]
fn with_extension() -> ToDataFrame<TimeSeries> {
TimeSeries { /* ... */ }.to_data_frame()
}πType-Level Default with PreferDataFrame
Make a type always convert to data.frame when returned:
#[derive(PreferDataFrame)]
struct MyData {
// ... fields ...
}
impl IntoDataFrame for MyData {
fn into_data_frame(self) -> List {
// ... implementation ...
}
}
#[miniextendr]
fn get_data() -> MyData { // Automatically becomes data.frame in R
MyData { /* ... */ }
}
πComparison: Row vs Column Oriented
πRow-Oriented (Vec of structs)
vec![
Measurement { time: 1.0, value: 10.0 },
Measurement { time: 2.0, value: 20.0 },
]
Pros:
- Natural Rust data structure
- Easy to work with in Rust code
- Type-safe field access
Cons:
- Needs transposition for R
- Memory layout not optimal for R
πColumn-Oriented (Struct of Vecs)
MeasurementDataFrame {
time: vec![1.0, 2.0],
value: vec![10.0, 20.0],
}
Pros:
- Direct R data.frame representation
- No transposition needed
- Memory efficient for R
Cons:
- Less ergonomic in Rust
- Easy to create invalid data (mismatched lengths)
πBest Practices
πChoosing an Approach
-
Use
#[derive(DataFrameRow)]when:- You have row-oriented data in Rust
- You want type-safe field access
- You want automatic conversions
-
Use
DataFrame<T>when:- You need generic code over many row types
- Youβre working with existing IntoList types
- You want runtime flexibility
-
Use manual
impl IntoDataFramewhen:- You already have column-oriented data
- You need custom data.frame attributes
- Youβre handling complex validation
πHandling Missing Data
Use Option<T> for nullable fields:
#[derive(IntoList, DataFrameRow)]
struct Record {
id: i32,
value: Option<f64>, // Becomes NA in R when None
}πValidation
Always validate column lengths when manually constructing data frames:
impl IntoDataFrame for MyData {
fn into_data_frame(self) -> List {
assert_eq!(self.col1.len(), self.col2.len(), "Column length mismatch");
List::from_pairs(vec![
("col1", self.col1),
("col2", self.col2),
])
.set_class_str(&["data.frame"])
.set_row_names_int(self.col1.len())
}
}
πImplementation Notes
πRow Names
R data frames require row names. miniextendr provides two helpers:
list.set_row_names_int(n) // Compact: c(NA, -n) form
list.set_row_names(names_vec) // Explicit: character vectorπClass Attribute
Data frames need the "data.frame" class:
list.set_class_str(&["data.frame"])
For subclasses (e.g., tibbles):
list.set_class_str(&["tbl_df", "tbl", "data.frame"])πEmpty Data Frames
List::from_raw_pairs(Vec::<(&str, SEXP)>::new())
.set_class_str(&["data.frame"])
.set_row_names_int(0)
πFeature Flags
- Base functionality: No features required
- Serde integration: Requires
serdefeature- Enables
impl IntoList for T: Serialize - Enables
AsSerializeRow<T>wrapper - Allows
#[derive(Serialize, DataFrameRow)]
- Enables
πExamples
See rpkg/src/rust/dataframe_examples.rs for complete working examples.