Transparent Query Rewriting

SQL queries are automatically rewritten to track provenance circuits – across inner and outer joins, subqueries anywhere (EXISTS, IN, quantified and scalar comparisons), aggregation, set operations, CTEs, and recursive queries (WITH RECURSIVE). No changes to your schema or application code required.

Learn more

Semiring Provenance

A unified semiring API for Boolean, counting, why-, how-, and which-provenance, symbolic formulas, tropical, Viterbi, Łukasiewicz, and min-max / max-min evaluations – all through one compiled-evaluation path.

Semiring docs

Aggregate Provenance

Aggregation results carry provenance too: agg_token values record symbolically how a SUM, COUNT, MIN/MAX, or AVG depends on base tuples, support further arithmetic, evaluate in any m-semiring, and give exact probabilities to HAVING predicates – with SQL-faithful empty-group and NULL semantics.

Aggregation docs

Probabilistic Databases

Attach probabilities to input tuples and get the probability of every query answer. Ask for a guarantee – exact, or additive / relative (ε, δ) – and a cost-based chooser runs the cheapest method that meets it: independent evaluation, tree decomposition, d-DNNF knowledge compilation, certified-bounds d-trees, sieve, Monte Carlo, Karp-Luby, or weighted model counting. Tractable classes (hierarchical, FD-aware, inversion-free queries) are recognised at the planner and evaluated in linear time.

Probability docs

Continuous Distributions

First-class random-variable columns. Build queries with Normal, Uniform, Exponential, Erlang, Mixture, and Categorical distributions; evaluate expectations and moments analytically or by Monte Carlo; condition on filter predicates inline.

Continuous-distribution docs

Shapley & Banzhaf Values

Quantify each input tuple’s contribution to a query answer through Shapley and Banzhaf values, computed in a single circuit traversal.

Shapley docs

Where-Provenance

Column-level provenance: track which source cells – not just which rows – each output value was copied or derived from, through projections and equijoins. Studio’s Where mode shows it live: hover an output cell to highlight its origins.

Where-provenance docs

Temporal & Update Provenance

Provenance of data modifications: INSERT / UPDATE / DELETE are tracked as update gates, enabling audit and undo. Combined with the interval-union semiring, validity timestamps turn a provenance-tracked database into a temporal one – time-travel queries included.

Temporal docs

Lean Formalization

The algebraic core – m-semirings, annotated databases, relational-algebra and aggregation semantics, query rewriting, circuits and probability – is formally verified in Lean 4, with machine-checked theorems backing ProvSQL’s architecture.

Lean formalization

ProvSQL Studio

A web UI for provenance inspection, in three modes: render the circuit DAG behind any result token and evaluate any compiled semiring on a pinned subnode, hover output cells to highlight the source rows that produced them, or work in notebooks. Available on PyPI as provsql-studio.

Studio docs

Notebooks

Jupyter-style notebooks over a ProvSQL database: SQL, Markdown, circuit, and evaluation cells, per-cell provenance schemes, saved and loaded as standard .ipynb files. The tutorial and the case studies ship as runnable example notebooks.

Notebook docs

ProvSQL Playground

The whole system in your browser: PostgreSQL with ProvSQL compiled to WebAssembly, Studio running on top – no install, no server, nothing leaves the page. Pre-loaded databases and runnable notebooks for the tutorial and the case studies.

Open the Playground

SQL API

Full SQL-level API for managing provenance tokens and circuit gates.

SQL API docs

C/C++ API

Internal C/C++ API for extending ProvSQL with new semirings and gate types.

C/C++ API docs

Publications

Research papers describing the theory and implementation of ProvSQL.

See publications