Semantic Web Challenge 2018

Introduction

The conference hosts an annual challenge that aims to promote the use of innovative and new approaches to creation and use of the semantic web. This year’s challenge will focus on knowledge graphs. Both public and privately owned, knowledge graphs are currently among the most prominent implementations of Semantic Web technologies.

This year’s challenge is centered on fact extraction from Internet sources to create new relationships within a knowledge graph. The graph in question is the Thomson Reuters permid.org open dataset of organizations, people and financial entities. The relationships to find are supply chain relationships, indicating a supplier/customer relationship between two organizations.

The evaluation of challenge participants will be carried out on the Knowledge Graph owned by Thomson Reuters (TR). The graph has a public and a private part; the public part can be used for building and training the candidate systems, the private part will be used for evaluation.

Core Data Set

The core dataset for the challenge will be the open data exposed at permid.org. This dataset consists of an authoritative graph of entities of interest to and mastered by TR. A ground truth of supply chain relationships owned by Thomson Reuters will be used for scoring the submissions.

Task description

Each organization in the permid.org dataset has a unique identifier, it’s “permanent identifier”. For example, Hankook tire company has ID 4295881024.

The task is to identify supplier/customer relationships defined as pairs of these Perm IDs.

For example, Hankook tire is a supplier of VW as described in Hankook’s press releases.

This relationship would be indicated in RDF as follows

<http://data.thomsonreuters.com/sc/supplychain_agreement/4295881024_4295869244> <http://ontology.thomsonreuters.com/supplyChain#customer> <https://permid.org/1-4295869244> .
<http://data.thomsonreuters.com/sc/supplychain_agreement/4295881024_4295869244> <http://ontology.thomsonreuters.com/supplyChain#supplier> <https://permid.org/1-4295881024> .

The challenge is to use web based structured and unstructured data sources to create these statements. Sources might include press releases, filings and data sources such as Wikipedia, Wikidata, dbpedia or CommonCrawl (the organization URL predicate from permid.org might be useful in all of these cases).

For each submitted relationship, challengers should create a single pair of triples one each for the supplier and customer predicates. Each relationship should be supported by one or more snippets identifying the source(s) for the relationship assertion.

On submission to Gerbil we will compute recall (how many of the relationships were predicted) and precision (how many of the predictions were correct), and solutions will be scored by their F-measure.

To aid challengers a set of candidate customers will be provided as a list of URIs.

Predicates for the task

The following predicates will be used in scoring the challenge

http://ontology.thomsonreuters.com/supplyChain#supplier (PermID URI of the supplier)
http://ontology.thomsonreuters.com/supplyChain#customer (PermID URI of the customer)

Additional predicates should be used to clarify the source(s) of the relationship

http://ontology.thomsonreuters.com/supplyChain#aggregatedConfidenceScore (Float between 0 and 1 expressing aggregate confidence in the relationship.)
proof: each proof point consists of a separate URI with the following predicates
- http://www.w3.org/ns/prov#wasQuotedFrom (URI – where the proof was derived from)
- http://www.w3.org/ns/prov#value (String – Snippet from full text of the proof point, if applicable)
- http://ontology.thomsonreuters.com/supplyChain#field (String – field(s) used if from structured data)
- http://www.w3.org/ns/prov#wasDerivedFrom (URI – of the aggregate relationship)
- http://ontology.thomsonreuters.com/supplyChain#confidenceScore (Float between 0 and 1 expressing confidence in the proof: This might include confidence of entity match, freshness of information, trustworthiness of source)

While the snippet and source predicates won’t be used directly for scoring they will be used for spot checking submissions.

Sample Data

Sample Candidate Companies

Candidate companies are provided as a set of triples with org name for convenience:

@prefix vcard: <http://www.w3.org/2006/vcard/ns#> .

<https://permid.org/1-4295869244> vcard:organization-name "Volkswagen AG" .
<https://permid.org/1-4295894743> vcard:organization-name "GKN PLC" .
<https://permid.org/1-4295904406> vcard:organization-name "Lear Corp" .

Sample Relationships

Here are two sample relationships for suppliers of VW. The first has one proof point, the second two:

@prefix sc: <http://ontology.thomsonreuters.com/supplyChain#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix prov: <http://www.w3.org/ns/prov#> .

<http://data.thomsonreuters.com/sc/supplychain_agreement/4295904406_4295869244>
  sc:aggregatedConfidenceScore 2.996760e-1 ;
  sc:supplier <https://permid.org/1-4295904406> ;
  sc:customer <https://permid.org/1-4295869244> .

<http://data.thomsonreuters.com/sc/snippet/123>
  prov:wasQuotedFrom <http://www.lear.com/blog/2015/03/changchun-lear-fawsn-seating-wins-faw-vw-award/> ;
  prov:wasDerivedFrom <http://data.thomsonreuters.com/sc/supplychain_agreement/4295904406_4295869244> ;
  prov:value "As Changchun Lear FAWSN Seating’s primary customer, FAW-VW..." ;
  sc:confidenceScore 2.996760e-1 .

<http://data.thomsonreuters.com/sc/supplychain_agreement/4295894743_4295869244>
  sc:aggregatedConfidenceScore 7.992360e-1 ;
  sc:supplier <https://permid.org/1-4295894743> ;
  sc:customer <https://permid.org/1-4295869244> .

<http://data.thomsonreuters.com/sc/snippet/456>
  prov:wasQuotedFrom <https://www.gkn.com/en/newsroom/events/auto-shanghai-2017/features-and-insights/strong-partnership/> ;
  prov:wasDerivedFrom <http://data.thomsonreuters.com/sc/supplychain_agreement/4295894743_4295869244> ;
  prov:value "Volkswagen, one of GKNs biggest customers..." ;
  sc:confidenceScore 8.347230e-1 .

<http://data.thomsonreuters.com/sc/snippet/789>
  prov:wasQuotedFrom <http://www.iii.co.uk/articles/268419/uk-suppliers-caught-vw-fallout> ;
  prov:wasDerivedFrom <http://data.thomsonreuters.com/sc/supplychain_agreement/4295894743_4295869244> ;
  prov:value "...unsurprising given that VW is the largest customer at GKN's division..." ;
  sc:confidenceScore 7.637490e-1 .

Training Data

Training data is subject to license and available on request. Please contact dan.bennett _at_ tr.com if you’d like a copy.

Participation Instructions

Challenge Timeline

February 28th, 2018: Sample ground truth data (Complete)
March 31st, 2018: Submissions system open (Complete), candidate customer URIs published (Complete)
September 21st, 2018: Submissions close: Finalists invited to present at ISWC
October 8th-12th, 2018: Conference

Candidate Companies

The candidate companies file contains all companies that are included in the ground truth that challengers will be scored against.

Challenge submission

As last year, we will be using the Gerbil system for challenge submission and scoring. Challengers may choose to submit their solutions as frequently as they like but only submissions published to the leaderboard will be accepted as to the competition.

Prizes

The winning team of each task will receive a monetary award courtesy of our sponsors. The sum will be shared amongst the winning teams if several teams are declared the winners.

Presentation

Winning teams will be invited to present their implementation at the conference. Any submitting team may also provide posters for display at the conference accompanied by papers to be published in the special issue of the Journal of Web Semantics.

Contacts

Heiko Paulheim, Professor at University of Mannheim, Germany
Axel-C. Ngonga Ngomo, Professor at Paderborn University, Germany
Dan Bennett, VP, Enterprise Data Services at Thomson Reuters, USA