The conference hosts an annual challenge that aims to promote the use of innovative and new approaches to creation and use of the semantic web. This year’s challenge will focus on knowledge graphs. Both public and privately owned, knowledge graphs are currently among the most prominent implementations of Semantic Web technologies.
This year’s challenge is centered on fact extraction from Internet sources to create new relationships within a knowledge graph. The graph in question is the Thomson Reuters permid.org open dataset of organizations, people and financial entities. The relationships to find are supply chain relationships, indicating a supplier/customer relationship between two organizations.
The evaluation of challenge participants will be carried out on the Knowledge Graph owned by Thomson Reuters (TR). The graph has a public and a private part; the public part can be used for building and training the candidate systems, the private part will be used for evaluation.
The core dataset for the challenge will be the open data exposed at permid.org. This dataset consists of an authoritative graph of entities of interest to and mastered by TR. A ground truth of supply chain relationships owned by Thomson Reuters will be used for scoring the submissions.
Each organization in the permid.org dataset has a unique identifier, it’s “permanent identifier”. For example, Hankook tire company has ID 4295881024.
The task is to identify supplier/customer relationships defined as pairs of these Perm IDs.
For example, Hankook tire is a supplier of VW as described in Hankook’s press releases.
This relationship would be indicated in RDF as follows
<http://data.thomsonreuters.com/sc/supplychain_agreement/4295881024_4295869244> <http://ontology.thomsonreuters.com/supplyChain#customer> <https://permid.org/1-4295869244> .
<http://data.thomsonreuters.com/sc/supplychain_agreement/4295881024_4295869244> <http://ontology.thomsonreuters.com/supplyChain#supplier> <https://permid.org/1-4295881024> .
The challenge is to use web based structured and unstructured data sources to create these statements. Sources might include press releases, filings and data sources such as Wikipedia, Wikidata, dbpedia or CommonCrawl (the organization URL predicate from permid.org might be useful in all of these cases).
For each submitted relationship, challengers should create a single pair of triples one each for the supplier and customer predicates. Each relationship should be supported by one or more snippets identifying the source(s) for the relationship assertion.
On submission to Gerbil we will compute recall (how many of the relationships were predicted) and precision (how many of the predictions were correct), and solutions will be scored by their F-measure.
To aid challengers a set of candidate customers will be provided as a list of URIs.
The following predicates will be used in scoring the challenge
http://ontology.thomsonreuters.com/supplyChain#supplier
(PermID URI of the supplier)http://ontology.thomsonreuters.com/supplyChain#customer
(PermID URI of the customer)Additional predicates should be used to clarify the source(s) of the relationship
http://ontology.thomsonreuters.com/supplyChain#aggregatedConfidenceScore
(Float between 0 and 1 expressing aggregate confidence in the relationship.)http://www.w3.org/ns/prov#wasQuotedFrom
(URI – where the proof was derived from)http://www.w3.org/ns/prov#value
(String – Snippet from full text of the proof point, if applicable)http://ontology.thomsonreuters.com/supplyChain#field
(String – field(s) used if from structured data)http://www.w3.org/ns/prov#wasDerivedFrom
(URI – of the aggregate relationship)http://ontology.thomsonreuters.com/supplyChain#confidenceScore
(Float between 0 and 1 expressing confidence in the proof: This might include confidence of entity match, freshness of information, trustworthiness of source)While the snippet and source predicates won’t be used directly for scoring they will be used for spot checking submissions.
Candidate companies are provided as a set of triples with org name for convenience:
@prefix vcard: <http://www.w3.org/2006/vcard/ns#> . <https://permid.org/1-4295869244> vcard:organization-name "Volkswagen AG" . <https://permid.org/1-4295894743> vcard:organization-name "GKN PLC" . <https://permid.org/1-4295904406> vcard:organization-name "Lear Corp" .
Here are two sample relationships for suppliers of VW. The first has one proof point, the second two:
@prefix sc: <http://ontology.thomsonreuters.com/supplyChain#> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . @prefix prov: <http://www.w3.org/ns/prov#> . <http://data.thomsonreuters.com/sc/supplychain_agreement/4295904406_4295869244> sc:aggregatedConfidenceScore 2.996760e-1 ; sc:supplier <https://permid.org/1-4295904406> ; sc:customer <https://permid.org/1-4295869244> . <http://data.thomsonreuters.com/sc/snippet/123> prov:wasQuotedFrom <http://www.lear.com/blog/2015/03/changchun-lear-fawsn-seating-wins-faw-vw-award/> ; prov:wasDerivedFrom <http://data.thomsonreuters.com/sc/supplychain_agreement/4295904406_4295869244> ; prov:value "As Changchun Lear FAWSN Seating’s primary customer, FAW-VW..." ; sc:confidenceScore 2.996760e-1 . <http://data.thomsonreuters.com/sc/supplychain_agreement/4295894743_4295869244> sc:aggregatedConfidenceScore 7.992360e-1 ; sc:supplier <https://permid.org/1-4295894743> ; sc:customer <https://permid.org/1-4295869244> . <http://data.thomsonreuters.com/sc/snippet/456> prov:wasQuotedFrom <https://www.gkn.com/en/newsroom/events/auto-shanghai-2017/features-and-insights/strong-partnership/> ; prov:wasDerivedFrom <http://data.thomsonreuters.com/sc/supplychain_agreement/4295894743_4295869244> ; prov:value "Volkswagen, one of GKNs biggest customers..." ; sc:confidenceScore 8.347230e-1 . <http://data.thomsonreuters.com/sc/snippet/789> prov:wasQuotedFrom <http://www.iii.co.uk/articles/268419/uk-suppliers-caught-vw-fallout> ; prov:wasDerivedFrom <http://data.thomsonreuters.com/sc/supplychain_agreement/4295894743_4295869244> ; prov:value "...unsurprising given that VW is the largest customer at GKN's division..." ; sc:confidenceScore 7.637490e-1 .
Training data is subject to license and available on request. Please contact dan.bennett _at_ tr.com if you’d like a copy.
The candidate companies file contains all companies that are included in the ground truth that challengers will be scored against.
As last year, we will be using the Gerbil system for challenge submission and scoring. Challengers may choose to submit their solutions as frequently as they like but only submissions published to the leaderboard will be accepted as to the competition.
The winning team of each task will receive a monetary award courtesy of our sponsors. The sum will be shared amongst the winning teams if several teams are declared the winners.
Winning teams will be invited to present their implementation at the conference. Any submitting team may also provide posters for display at the conference accompanied by papers to be published in the special issue of the Journal of Web Semantics.
Heiko Paulheim, Professor at University of Mannheim, Germany
Axel-C. Ngonga Ngomo, Professor at Paderborn University, Germany
Dan Bennett, VP, Enterprise Data Services at Thomson Reuters, USA