https://github.com/eccenca/ck25-dataset

The CK25 Corporate Knowledge Reference Dataset for Benchmarking Text 2 SPARQL Question Answering Approaches
https://github.com/eccenca/ck25-dataset

Last synced: 8 months ago
JSON representation

The CK25 Corporate Knowledge Reference Dataset for Benchmarking Text 2 SPARQL Question Answering Approaches

Host: GitHub
URL: https://github.com/eccenca/ck25-dataset
Owner: eccenca
License: cc-by-4.0
Created: 2025-05-06T07:00:43.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-05-19T09:39:59.000Z (about 1 year ago)
Last Synced: 2025-06-20T21:41:06.924Z (12 months ago)
Language: Python
Homepage:
Size: 481 KB
Stars: 1
Watchers: 4
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE.txt

Awesome Lists containing this project

README

# The CK25 Corporate Knowledge Reference Dataset for Benchmarking Text 2 SPARQL Question Answering Approaches

Text to SPARQL is a task that aims to automatically generate SPARQL queries which retrieve answers from RDF-based Knowledge Graphs.
As pointed out in [Strappazon et al.](https://hal.science/hal-04918564) ...

> ... the task of parsing natural language questions into structured queries is even more prevalent now, given the high potential that Large Language Models (LLMs) offer in Retrieval-Augmented conversational agents and the problems they face with using the correct information as context.

In order to support researchers of RAG-based systems with reference questions and queries for a Knowledge Graph with an industrial background, we provide this dataset repository, which was created for the [First International TEXT2SPARQL Challenge](https://text2sparql.aksw.org/), which we organized as part of the [4th International Workshop on LLM-Integrated Knowledge Graph Generation from Text (Text2KG)](https://aiisc.ai/text2kg2025) at the [22th European Semantic Web Conference (ESWC2025)](https://2025.eswc-conferences.org/).

## Important Files

- `questions.yml`
- This YAML file is the curated list of 50 questions together with a working reference SAPRQL query
- `graphs/`
- This directory contains Turtle files (one for the ontology, one for the instance data)
- The `*.graph` files contain the named graph IRI we use

## How to use this repository?

In order to test your own Text to SPARQL approach, provide it as a TEXT2SPARQL endpoint as described [here](https://text2sparql.aksw.org/challenge/#process).
Then you can use the [text2sparql](https://pypi.org/project/text2sparql-client/) command line client like this:

``` shell-session
$ text2sparql ask questions.yml [YOUR ENDPOINT] --output result.json
2025-05-05 09:47:45.049 | INFO | text2sparql_client.commands.ask:ask_command:72 - Asking questions about dataset https://text2sparql.aksw.org/2025/corporate/ on endpoint [YOUR ENDPOINT].
2025-05-05 09:47:45.049 | INFO | text2sparql_client.commands.ask:ask_command:77 - In which department is Ms. Brant? (en) ...
2025-05-05 09:47:45.064 | INFO | text2sparql_client.commands.ask:ask_command:77 - What is the telephone of Baldwin Dirksen? (en) ...
2025-05-05 09:47:45.069 | INFO | text2sparql_client.commands.ask:ask_command:77 - Who is the manager of Heinrich Hoch? (en) ...
... more questions
2025-05-05 09:49:43.421 | INFO | text2sparql_client.commands.ask:ask_command:100 - Writing 50 responses to result.json.
```

The new file result.json contains a structure like this:

``` JSON
[
{
"dataset": "https://text2sparql.aksw.org/2025/corporate/",
"question": "In which department is Ms. Brant?",
"query": "...",
"endpoint": "[YOUR ENDPOINT]",
"qname": "ck25:1-en",
"uri": "https://text2sparql.aksw.org/2025/corporate/1-en"
},
...
]
```

This file can be used with the same CLI for analysis:

``` shell-session
$ text2sparql evaluate API_NAME questions.yml result.json
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/eccenca/ck25-dataset

Awesome Lists containing this project

README