SemTab 2025
Semantic Web Challenge on Tabular Data to Knowledge Graph Matching
About the Challenge
Tabular data in the form of CSV files is the common input format in a data analytics pipeline. However, a lack of understanding of the semantic structure and meaning of the content may hinder the data analytics process. Thus gaining this semantic understanding will be very valuable for data integration, data cleaning, data mining, machine learning and knowledge discovery tasks.
Tables on the Web may also be the source of highly valuable data. The addition of semantic information to Web tables may enhance a wide range of applications, such as web search, question answering, and Knowledge Base (KB) construction.
Tabular data to Knowledge Graph (KG) matching is the process of assigning semantic tags from KGs (e.g., Wikidata or DBpedia) to the elements of the table. This task however is often difficult in practice due to metadata (e.g., table and column names) being missing, incomplete or ambiguous.
The SemTab challenge aims at benchmarking systems dealing with the tabular data to KG matching problem, so as to facilitate their comparison on the same basis and the reproducibility of the results.
The 2025 edition of this challenge will be collocated with the International Semantic Web Conference .
Challenge Tracks
MammoTab
TASK: CEA (Wikidata v. 20240720)
Participants will address the Semantic Table Interpretation
challenges using the new version of the
MammoTab dataset
.
MammoTab is a large-scale benchmark designed to provide
realistic and complex scenarios, including tables affected by
typical challenges of web and Wikipedia data.
Only approaches based on Large Language Models are allowed, either:
- in fine-tuning settings, or
- using Retrieval-Augmented Generation strategies.
The evaluation will focus on the Cell Entity Annotation (CEA) task using the Wikidata KG (v. 20240720) :
, but will also take into account the ability of the proposed approaches to effectively deal with the following key challengesParticipants are expected to demonstrate not only strong CEA performance, but also robustness and versatility across all these dimensions, which are critical for real-world table interpretation scenarios.
Round 1
The first round involves the execution of the CEA Task on a
carefully selected subset of
870 tables comprising a total of
84,907 cell annotations.
Target Knowledge Graph:
Wikidata KG (v. 20240720)
.
Datasets' Structure
The test set is not included in the dataset in order to preserve the impartiality of the final evaluation and to discourage ad-hoc solutions.
Targets Format
CEA task
filename, row id (0-indexed), column id (0-indexed), entity id
Annotation:
LYQZQ0T5,1,1,Q3576864
Table LYQZQ0T5:
col0,col1,col2
1976,Eat My Dust!,Charles Byron Griffith
1976,Hollywood Boulevard,Joe Dante
1976,Hollywood Boulevard,Allan Arkush
1977,Grand Theft Auto,Ron Howard
Evaluation Criteria
Precision, Recall and F1 Score are calculated:
Notes:
- \(\#\) denotes the number.
- \(F_1\) is used as the primary score, and \(Precision\) is used as the secondary score.
Submission
Are you ready? Then, submit the annotations via Google Formsecu-table
TASK: CEA (Wikidata) TASK: CEA, CTA, CPA (SEPSES)
Participants will address the Semantic Table Interpretation
challenges using the Secu-table dataset.
The secu-table dataset involved security data extracted from
Common Vulnerability and Exposure (CVE) and Common Weakness
Enumeration (CWE) data sources.
The evaluation will focus on the Cell Entity Annotation (CEA) task, the Column Type Annotation (CTA) task, and the Column Property Annotation (CPA) task using the SEPSES Computer Security Knowledge Graph and the Wikidata KG .
The evaluation of the participants' results will consider the Recall, Precision, and F-score. In addition to these scores, the participants are invited to provide the evaluation of the LLMs capabilities to make a prediction or to abstain (or to say "I don't know").
Round 1
The first round involves the execution of the CEA, CTA, and CPA Tasks on a dataset composed of 1,554 tables.
The evaluation will focus on the CEA, CTA, and CPA tasks using the SEPSES Computer Security Knowledge Graph and the CEA task using the Wikidata KG .
Datasets' Structure
Secu-table dataset is composed of 1,554 tables, divided into 76 tables provided as ground truth and 1,478 tables for testing. The dataset contains 20% of tables without any errors and 80% of tables containing errors such as ambiguity, NIL, missing context, misspelt data.
Targets Format
CEA task
filename, row id (0-indexed), column id (0-indexed), entity
CTA task
filename, column id (0-indexed), entity
CPA task
filename, col0 , column (1-indexed), property
Evaluation Criteria
Precision, Recall and F1 Score are calculated:
Notes:
For selective prediction we want to consider the fact that the LLMs consider to say "I don't know" as seen in this picture:
Submission
Are you ready? Then, submit the annotations via Google FormPaper Guidelines
We invite participants to submit a paper, using EasyChair.
Submissions must not exceed six pages and should be formatted using either the CEUR LaTeX or Word template. Each paper will be reviewed by one or two challenge organisers.
Accepted papers will be published in a CEUR-WS volume. By submitting a paper, authors agree to comply with the CEUR-WS publication guidelines.
Co-Chairs
Marco Cremaschi
University of Milan - Bicocca
Fabio D'Adda
University of Milan - Bicocca
Fidel Jiomekong Azanzi
University of Yaoundé, Cameroon
Jean Petit Yvelos
University of Yaoundé, Cameroon
Ernesto Jimenez-Ruiz
City St George's, University of London
Oktie Hassanzadeh
IBM Research
Acknowledgements
The challenge is currently supported by IBM Research and the ISWC 2025.

Tentative Schedule
Release of datasets and instructions
June 9th, 2025
Round 1 & Paper Submission Deadline
August 8th, 2025 (AoE) New deadline
Note: To be invited to the conference for a presentation, you must submit to Round 1.
Initial Results & ISWC 2025 Presentation Invitations
August 15th, 2025 New deadline
Camera-ready Paper Deadline
September 15th, 2025
New or Revised Submissions Accepted Until (Round 2)
October 20th, 2025
Final results to be announced at the conference
November 2-6, 2025