dvs
Independent data version control for large or sensitive datasets, with or without Git.
dvs versions large or sensitive datasets, common in pharma and
other data-intensive work, without committing their contents to your source
tree. It works alongside Git or on its own. Add, restore, and check the
status of versioned data from R or the command line.
🔗What dvs is
dvs (data version system) versions large or sensitive datasets, common in pharma and other data-intensive work, without committing their contents to your source tree. File contents live in a content-addressed blob store (typically a shared drive). Each tracked file gets a small text meta file that lives next to your code.
dvs is an independent version control system. It works alongside Git,
keeping multi-gigabyte data out of your history while the meta files travel with
your commits. It also works without Git, on its own. Either way the four verbs
are the same: init, add, status, get.
You can drive it from the R package (library(dvs)) or the CLI (the
dvs binary). This guide covers both, side by side.
🔗Where to start
Getting Started has the install steps and a short walkthrough of the core workflow on a small dataset, for both the CLI and R. From there the R Package and CLI sections document every function and command (the R Package section also covers the R-only helper utilities), and Internals goes deeper on storage, configuration, the audit log, and the error surface.
Sections
Getting Started
Run the core dvs workflow on a small dataset, from R or the CLI.
SectionR Package
Version data from R with library(dvs): init, add, status, get.
SectionCLI
Version data from the terminal with the dvs binary.
SectionInternals
Storage layout, the dvs.toml file, the audit log, and the error surface.