Daniel Hernandez, Claudio Gutierrez and Aidan Hogan.
Abstract: Blank nodes in RDF graphs can be used to represent existential values known to exist but whose identity remains unknown. A prominent example of such usage can be found in the Wikidata dataset where, e.g., the author of Beowulf is given as a blank node. However, while SPARQL considers blank nodes in a query as existentials, it treats blank nodes in data more like constants. Running SPARQL queries over datasets with unknown values thus may lead to uncertain results, which may make the SPARQL semantics unsuitable for datasets with existential blank nodes. We thus explore the feasibility of an alternative SPARQL semantics based on certain answers. In order to estimate the performance costs that would be associated with such a change in semantics for current implementations, we adapt and evaluate approximation techniques proposed in a relational database setting for a core fragment of SPARQL. To further understand the impact that such a change in semantics may have on query solutions, we analyse how such a change would affect the results of user queries over Wikidata.
Keywords: sparql; blank nodes; semantics; query