pub unsafe fn try_recover_r_sexp(
data_ptr: *const u8,
expected_type: SEXPTYPE,
expected_len: usize,
) -> Option<SEXP>Expand description
Try to recover the source R SEXP from a data pointer.
Given a pointer that may point into an R vector’s data area, this subtracts the known SEXPREC header size to get a candidate SEXP, then verifies it:
- The SEXP type tag (bits 0-4 of sxpinfo) matches
expected_type ALTREP(candidate)is false (only non-ALTREP vectors have fixed-offset data)XLENGTH(candidate)matchesexpected_len(safe for non-ALTREP)
Returns None if:
- The offset hasn’t been initialized yet
- The pointer doesn’t come from an R vector
- The candidate SEXP has the wrong type or length
- The candidate is an ALTREP vector (data not at fixed offset from SEXP)
§Why this is outside Rust’s memory model (see #63)
This is a conservative-GC-style probe, analogous to Boehm GC scanning
the heap without allocation provenance. We compute a speculative pointer
via wrapping_byte_sub (well-defined pointer arithmetic) and read the
first 4 bytes (sxpinfo bits) to check whether the address looks like the
start of a SEXPREC. For pointers that did not come from an R SEXP, that
read has no valid allocation provenance under Rust’s Stacked / Tree
Borrows model — it’s defined behavior at the hardware level (the heap
is contiguous mapped memory), but Miri correctly flags it as UB.
We guard the read with a 4096-byte address floor (below which the candidate would cross into unmapped memory), the ALTREP bit check (prevents calling dispatch fns on garbage), and the length check (filters random garbage with high probability). Callers that cannot tolerate a false positive must not rely on this path alone.
To keep Miri green, the whole recovery is a no-op under #[cfg(miri)]:
we always return None, and callers fall back to the copy path. This
is not a correctness change — the copy path is always a valid alternative.
§Safety
Must be called on R’s main thread. The data pointer must be valid
(i.e., it must point to readable memory for at least expected_len
elements, which is guaranteed if it came from an Arrow buffer).