Mayank Kejriwal, Jing Peng, Haotian Zhang and Pedro Szekely.
Abstract: n domains such as humanitarian assistance and disaster relief (HADR), events, rather than named entities, are the primary focus of analysts and aid officials. An important problem that must be solved to provide situational awareness to aid providers is automatic clustering sub-events that refer to the same underlying event. An effective solution to the problem requires judicious use of both domain-specific and semantic information, as well as statistical methods like deep neural embeddings. In this paper, we present an approach, AugSEER (Augmented feature sets for Structured Event Entity Resolution), that combines advances in deep neural embeddings both on text and graph data with minimally supervised inputs from domain experts. AugSEER can operate in both online and batch scenarios. On five real-world HADR datasets, AugSEER is found, on average, to outperform the next best baseline result by almost 15% on the cluster purity metric and by 3% on the F1-Measure metric. In contrast, text-based approaches are found to perform poorly, demonstrating the importance of semantic information in devising a good solution. We also use sub-event clustering visualizations to illustrate the qualitative potential of AugSEER.
Keywords: Events; Structured Event Resolution; Hybrid Embeddings; Crisis Informatics; Humanitarian and Disaster Relief; Clustering; Situation Frames