https://github.com/datadotworld/cwd-benchmark-data
Data for the Chat With Your Data benchmark.
https://github.com/datadotworld/cwd-benchmark-data
dwstruct-t50-public-projects
Last synced: 8 months ago
JSON representation
Data for the Chat With Your Data benchmark.
- Host: GitHub
- URL: https://github.com/datadotworld/cwd-benchmark-data
- Owner: datadotworld
- License: apache-2.0
- Created: 2023-10-12T20:28:44.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-12-01T10:24:37.000Z (over 2 years ago)
- Last Synced: 2025-05-08T21:14:23.258Z (about 1 year ago)
- Topics: dwstruct-t50-public-projects
- Language: Shell
- Homepage:
- Size: 37.1 KB
- Stars: 136
- Watchers: 9
- Forks: 25
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# Chat with your Data (cwd) Benchmark Data
## Introduction
This repository contains the data and metadata for the "Chat with your Data" benchmark. The aim of this project is to provide a comprehensive set of test scenarios for Language-to-query (specifically SQL and SPARQL) systems.
It focuses on testing whether these systems are capable of accurately converting natural language questions into valid, effective queries against various data sources.
## Repository Structure
This repository is divided into multiple directories, each containing a specific type of data or metadata:
- `ontology/`: This directory contains OWL file(s) representing the ontology data.
- `DDL/`: This directory contains the DDL definitions for the database schema.
- `investigation/`: Each Turtle (.ttl) file in this directory represents a complete benchmark investigation, which includes pointers to the dataset, metadata, and a set of inquiries.
- `data/`: This directory contains the dataset(s) used for the benchmark. The data is represented in multiple formats to support a wide range of query languages. In addition to the CSV files, there is an R2RML file that describes the mapping between the ontology and the data tables.
## File Formats
- OWL: Web Ontology Language, used for representing the ontology data.
- DDL: Data Definition Language, used for defining and managing databases.
- TTL: Turtle form of RDF, used to represent the complete benchmark investigation.
- R2RML: a TTL file that describes mappings according to the [RDB to RDF Mapping Language](https://www.w3.org/TR/r2rml/)
- CSV/TSV/etc.: Various data formats used for the benchmark dataset.