Julien Corman, Juan L. Reutter and Ognjen Savkovic.
Abstract: With the popularity of RDF as an independent data model came the need for specifying constraints on RDF graphs, and for mechanisms to detect violations of such constraints. One of the most promising schema languages for RDF is SHACL, a recent W3C recommendation. Unfortunately, the specification of SHACL leaves open the problem of validation against recursive constraints. This omission is important because SHACL by design favors constraints that reference other ones, which in practice may easily yield reference cycles.
In this paper, we propose a concise formal semantics for the so-called "core constraint components" of SHACL.
This semantics handles arbitrary recursion, while being compliant with the current standard. Graph validation is based on the existence of an assignment of SHACL "shapes" to nodes in the graph under validation, stating which shapes are verified or violated, while verifying the targets of the validation process. We show in particular that the nature of SHACL forces us to consider cases in which these assignments are partial, or, in other words, where the truth value of a constraints at some nodes of a graph may be left unknown.
Dealing with recursion comes at a price, as validating an RDF graph against SHACL constraints is NP-hard in the size of the graph, and this lower bound still holds for a fragment of SHACL using stratified negation. Therefore we also propose a tractable approximation to the validation problem.
Keywords: SHACL; RDF Graph Validation; Graph constraints; RDF; Recursive constraints