dvs

Independent data version control for large or sensitive datasets, with or without Git.

dvs versions large or sensitive datasets, common in pharma and other data-intensive work, without committing their contents to your source tree. It works alongside Git or on its own. Add, restore, and check the status of versioned data from R or the command line.

🔗What dvs is

dvs (data version system) versions large or sensitive datasets, common in pharma and other data-intensive work, without committing their contents to your source tree. File contents live in a content-addressed blob store (typically a shared drive). Each tracked file gets a small text meta file that lives next to your code.

dvs is an independent version control system. It works alongside Git, keeping multi-gigabyte data out of your history while the meta files travel with your commits. It also works without Git, on its own. Either way the four verbs are the same: init, add, status, get.

You can drive it from the R package (library(dvs)) or the CLI (the dvs binary). This guide covers both, side by side.

🔗Where to start

Getting Started has the install steps and a short walkthrough of the core workflow on a small dataset, for both the CLI and R. From there the R Package and CLI sections document every function and command (the R Package section also covers the R-only helper utilities), and Internals goes deeper on storage, configuration, the audit log, and the error surface.

Sections