Transparent Query Rewriting
SQL queries are automatically rewritten to track provenance circuits – across inner and outer joins, subqueries anywhere (EXISTS, IN, quantified and scalar comparisons), aggregation, set operations, CTEs, and recursive queries (WITH RECURSIVE). No changes to your schema or application code required.
Semiring Provenance
A unified semiring API for Boolean, counting, why-, how-, and which-provenance, symbolic formulas, tropical, Viterbi, Łukasiewicz, and min-max / max-min evaluations – all through one compiled-evaluation path.
Aggregate Provenance
Aggregation results carry provenance too: agg_token values record symbolically how a SUM, COUNT, MIN/MAX, or AVG depends on base tuples, support further arithmetic, evaluate in any m-semiring, and give exact probabilities to HAVING predicates – with SQL-faithful empty-group and NULL semantics.
Probabilistic Databases
Attach probabilities to input tuples and get the probability of every query answer. Ask for a guarantee – exact, or additive / relative (ε, δ) – and a cost-based chooser runs the cheapest method that meets it: independent evaluation, tree decomposition, d-DNNF knowledge compilation, certified-bounds d-trees, sieve, Monte Carlo, Karp-Luby, or weighted model counting. Tractable classes (hierarchical, FD-aware, inversion-free queries) are recognised at the planner and evaluated in linear time.
Continuous Distributions
First-class random-variable columns. Build queries with Normal, Uniform, Exponential, Erlang, Mixture, and Categorical distributions; evaluate expectations and moments analytically or by Monte Carlo; condition on filter predicates inline.
Shapley & Banzhaf Values
Quantify each input tuple’s contribution to a query answer through Shapley and Banzhaf values, computed in a single circuit traversal.
Where-Provenance
Column-level provenance: track which source cells – not just which rows – each output value was copied or derived from, through projections and equijoins. Studio’s Where mode shows it live: hover an output cell to highlight its origins.
Temporal & Update Provenance
Provenance of data modifications: INSERT / UPDATE / DELETE are tracked as update gates, enabling audit and undo. Combined with the interval-union semiring, validity timestamps turn a provenance-tracked database into a temporal one – time-travel queries included.
Lean Formalization
The algebraic core – m-semirings, annotated databases, relational-algebra and aggregation semantics, query rewriting, circuits and probability – is formally verified in Lean 4, with machine-checked theorems backing ProvSQL’s architecture.
ProvSQL Studio
A web UI for provenance inspection, in three modes: render the circuit DAG behind any result token and evaluate any compiled semiring on a pinned subnode, hover output cells to highlight the source rows that produced them, or work in notebooks. Available on PyPI as provsql-studio.
Notebooks
Jupyter-style notebooks over a ProvSQL database: SQL, Markdown, circuit, and evaluation cells, per-cell provenance schemes, saved and loaded as standard .ipynb files. The tutorial and the case studies ship as runnable example notebooks.
ProvSQL Playground
The whole system in your browser: PostgreSQL with ProvSQL compiled to WebAssembly, Studio running on top – no install, no server, nothing leaves the page. Pre-loaded databases and runnable notebooks for the tutorial and the case studies.
C/C++ API
Internal C/C++ API for extending ProvSQL with new semirings and gate types.