July 27, 2024

Introduction

The World-Wide Web contains billions of relational data in the form of HTML tables, i.e. web tables (Cafarella et al. [1, 8]; Lehmberg et al. [2]), which carries valuable structured information. This high-quality relational data is an important data source for knowledge extraction on the Web.

In order to make machines to understand these tables, one of the critical steps is to map the mentions in table cells to their corresponding entities in a given knowledge base (KB), which is called table entity linking or table entity disambiguation. For example, in the web table in Fig. 1, this task aims to link the mention “Louvre” in the first column to the entity “Louvre Museum” in Wikipedia. Table entity linking is an important and challenging stage in table semantic understanding since the mentions in tables are usually ambiguous.

In this study, we specifically focus on tables where each row represents a distinct object and the columns represent different characteristics of each object. We assume that the linkable mentions in the tables are already known and exclude un-linkable content like numbers. Compared to entity linking in free-form text, disambiguating mentions in tables is more challenging due to the limited contextual information in table cells. Previous research has primarily relied on collective classification techniques, graph-based algorithms, and multi-layer perceptrons, but these approaches often fail to capture the semantic features of mentions and entities effectively. To address this, we propose a hybrid semantic matching model that captures the local semantic information between table mentions and candidate entities from various semantic aspects.

Given that cells in the same column of a table have similar content and belong to the same category, it is natural to jointly disambiguate mentions in the same column. Moreover, we have observed that mentions vary in difficulty of disambiguation depending on the quality of contextual information. By sorting mentions in the same column and prioritizing the easier ones, we can leverage information from previously disambiguated entities to aid subsequent disambiguation processes.

In this paper, we propose a joint model with hybrid semantic matching for table entity linking, which is called JHSTabEL for short. This model consists of two modules: Hybrid Semantic Matching Model and Global Decision Model. The Hybrid Semantic Matching Model encodes the contextual information of each mention and its candidate entities. It uses the representation-based and interaction-based models to capture matching features at abstract and concrete levels respectively, and then aggregates them to obtain the hybrid semantic features, based on which the similarity scores of the mentions and entities are calculated. Before entering the global model, the mentions in the same column are sorted according to local similarity scores. The Global Decision Model uses an LSTM network to encode the local representations of mention-entity pairs and jointly disambiguate the mentions via a sequential manner. In summary, we make the following contributions: