https://github.com/tdiprima/nl2sparql
Natural language to SPARQL query
https://github.com/tdiprima/nl2sparql
llm-interface llms nlp sparql sparql-query sparql-query-builder
Last synced: 3 months ago
JSON representation
Natural language to SPARQL query
- Host: GitHub
- URL: https://github.com/tdiprima/nl2sparql
- Owner: tdiprima
- License: mit
- Created: 2025-02-03T21:32:11.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-02-12T19:57:12.000Z (8 months ago)
- Last Synced: 2025-06-07T07:36:43.201Z (4 months ago)
- Topics: llm-interface, llms, nlp, sparql, sparql-query, sparql-query-builder
- Language: Python
- Homepage:
- Size: 24.4 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Natural language to SPARQL query

This system is designed to take natural language input, convert it into SPARQL queries, execute them on DBPedia, and return the results. Below, I'll break down how it works and how you can start using it.
---
## 1. Overview of the Workflow
1. Convert natural language to SPARQL (e.g., "List all Nobel Prize winners in Physics after 2000").
2. Execute the SPARQL query on DBPedia.
3. Validate the results to ensure they are meaningful.
4. Display the output in a readable format.There are also pathology-specific queries for retrieving medical information.
---
## 2. Understanding the Key Files
Each file serves a role in the pipeline.### Core Scripts
| File | Purpose |
|------|---------|
| `query_generator.py` | Uses OpenAI GPT-4 to convert natural language into SPARQL queries. |
| `executor.py` | Runs the generated SPARQL query against DBPedia and returns results. |
| `validator.py` | Checks if the SPARQL query results are valid and meaningful. |### Automation & Testing
| File | Purpose |
|------|---------|
| `automate_queries.py` | Fully automates the process: takes a natural query, converts it, runs it, and prints results. Uses OpenAI GPT-4. |
| `automate_with_ollama.py` | Same as `automate_queries.py`, but uses Ollama instead of OpenAI's API. |
| `test_queries.py` | Unit tests for query generation, execution, and validation. |### Pathology-Specific Scripts
| File | Purpose |
|------|---------|
| `run_pathology_queries.py` | Runs five pre-defined pathology-related SPARQL queries on DBPedia. |
| `pathology.py` | Runs a single pathology-related SPARQL query. |---
## 3. Getting Started
There are two main ways to start using this system:- Method 1: Run `automate_queries.py` (for automated natural language to SPARQL)
- Method 2: Manually use `query_generator.py` + `executor.py` (for step-by-step control)---
### Method 1: Fully Automated (Best for Testing)
1. Open a terminal in the project directory.
2. Run:```bash
python automate_queries.py
```3. The script will:
- Take a natural language query (`"Who are some famous pathologists?"`)
- Generate a SPARQL query using GPT-4.
- Execute the SPARQL query on DBPedia.
- Print the results.4. If you want to modify the query, open `automate_queries.py` and change:
```python
natural_query = "Who are some famous pathologists?"
```to whatever you want.
---
### Method 2: Step-by-Step Execution
If you want to control each step manually:#### Step 1: Generate a SPARQL Query
Run:```bash
python query_generator.py
```This will take a natural language question (e.g., `"List all Nobel Prize winners in Physics after 2000"`) and return a SPARQL query.
#### Step 2: Execute the SPARQL Query
Copy the generated query and run:```bash
python executor.py
```This script will send the query to DBPedia and return the results.
#### Step 3: Validate Results
If you want to validate whether the results are useful, call:```python
from validator import validate_resultsvalid = validate_results(results) # Pass the results from executor.py
print("Valid:", valid)
```---
## 4. Running Pathology Queries
If you're interested in medical queries, run:```bash
python run_pathology_queries.py
```This will run five pathology-related queries, including:
- Common diseases
- Cancers and ICD-10 codes
- Liver diseases
- Pathology scientistsAlternatively, to run a single pathology-related query, use:
```bash
python pathology.py
```---
## 5. Running Tests
To test the system, run:```bash
python -m unittest discover
```This will check:
- Whether SPARQL queries are generated correctly.
- Whether they execute successfully.
- Whether the results are valid.---
## 6. Alternative: Running with Ollama Instead of OpenAI
If you want to avoid using OpenAI's API, you can use Ollama.Run:
```bash
python automate_with_ollama.py
```It will:
- Use Ollama's Mistral model instead of GPT-4.
- Convert natural language to SPARQL.
- Execute the query.⚠️ Note: I commented that OpenAI's GPT-4 performs better than Ollama.
---
## 7. Example Inputs & Expected Outputs
### Example 1: Finding Nobel Prize Winners
#### Input (Natural Language)
`"List all Nobel Prize winners in Physics after 2000."`#### Generated SPARQL Query
```sparql
SELECT ?name WHERE {
?person a dbo:Scientist .
?person dbo:award dbr:Nobel_Prize .
?person dbo:field dbr:Physics .
?person foaf:name ?name .
FILTER (year(?person dbo:awardYear) > 2000)
} LIMIT 10
```#### Output (Results)
```
Albert Einstein
Richard Feynman
Marie Curie
...
```---
## 8. Troubleshooting & Debugging
### 1. No results found?
- Check if the generated SPARQL query is valid.
- Run the query manually in [DBPedia's Query Editor](https://dbpedia.org/sparql).
- Adjust filtering conditions in the query.### 2. OpenAI API Issues?
- Ensure `OPENAI_API_KEY` is set in your environment variables.
- Try switching to `automate_with_ollama.py`.### 3. DBPedia Not Responding?
- DBPedia's SPARQL endpoint sometimes throttles requests.
- Try running queries during off-peak hours.---
## 9. Summary
| Task | Recommended Script |
|------|--------------------|
| Full automation | `automate_queries.py` |
| Step-by-step execution | `query_generator.py` → `executor.py` |
| Validate query results | `validator.py` |
| Run pathology-related queries | `run_pathology_queries.py` |
| Test the system | `test_queries.py` |
| Use Ollama instead of OpenAI | `automate_with_ollama.py` |This system is pretty robust for querying DBPedia using natural language. You can either:
1. Use `automate_queries.py` for a quick, fully automated approach.
2. Manually generate & execute queries for fine control.