Skip to main content

try_recover_r_sexp

Function try_recover_r_sexp 

Source
pub unsafe fn try_recover_r_sexp(
    data_ptr: *const u8,
    expected_type: SEXPTYPE,
    expected_len: usize,
) -> Option<SEXP>
Expand description

Try to recover the source R SEXP from a data pointer.

Given a pointer that may point into an R vector’s data area, this subtracts the known SEXPREC header size to get a candidate SEXP, then verifies it:

  1. The SEXP type tag (bits 0-4 of sxpinfo) matches expected_type
  2. ALTREP(candidate) is false (only non-ALTREP vectors have fixed-offset data)
  3. XLENGTH(candidate) matches expected_len (safe for non-ALTREP)

Returns None if:

  • The offset hasn’t been initialized yet
  • The pointer doesn’t come from an R vector
  • The candidate SEXP has the wrong type or length
  • The candidate is an ALTREP vector (data not at fixed offset from SEXP)

§Why this is outside Rust’s memory model (see #63)

This is a conservative-GC-style probe, analogous to Boehm GC scanning the heap without allocation provenance. We compute a speculative pointer via wrapping_byte_sub (well-defined pointer arithmetic) and read the first 4 bytes (sxpinfo bits) to check whether the address looks like the start of a SEXPREC. For pointers that did not come from an R SEXP, that read has no valid allocation provenance under Rust’s Stacked / Tree Borrows model — it’s defined behavior at the hardware level (the heap is contiguous mapped memory), but Miri correctly flags it as UB.

We guard the read with a 4096-byte address floor (below which the candidate would cross into unmapped memory), the ALTREP bit check (prevents calling dispatch fns on garbage), and the length check (filters random garbage with high probability). Callers that cannot tolerate a false positive must not rely on this path alone.

To keep Miri green, the whole recovery is a no-op under #[cfg(miri)]: we always return None, and callers fall back to the copy path. This is not a correctness change — the copy path is always a valid alternative.

§Safety

Must be called on R’s main thread. The data pointer must be valid (i.e., it must point to readable memory for at least expected_len elements, which is guaranteed if it came from an Arrow buffer).