An open API service indexing awesome lists of open source software.

https://github.com/odysa/rdf4j-python

Python client for Eclipse RDF4J — interact with RDF4J repositories, execute SPARQL queries, and manage RDF data seamlessly in Python.
https://github.com/odysa/rdf4j-python

database rag rdf sementic sparql

Last synced: 4 months ago
JSON representation

Python client for Eclipse RDF4J — interact with RDF4J repositories, execute SPARQL queries, and manage RDF data seamlessly in Python.

Awesome Lists containing this project

README

          

# rdf4j-python

[![PyPI version](https://badge.fury.io/py/rdf4j-python.svg)](https://badge.fury.io/py/rdf4j-python)
[![Python Versions](https://img.shields.io/pypi/pyversions/rdf4j-python.svg)](https://pypi.org/project/rdf4j-python/)
[![CI](https://github.com/odysa/rdf4j-python/actions/workflows/ci.yaml/badge.svg)](https://github.com/odysa/rdf4j-python/actions/workflows/ci.yaml)
[![License](https://img.shields.io/badge/License-BSD_3--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause)
[![Documentation](https://img.shields.io/badge/docs-sphinx-blue.svg)](https://github.com/odysa/rdf4j-python/tree/main/docs)

**A modern Python client for the Eclipse RDF4J framework, enabling seamless RDF data management and SPARQL operations from Python applications.**

rdf4j-python bridges the gap between Python and the robust [Eclipse RDF4J](https://rdf4j.org/) ecosystem, providing a clean, async-first API for managing RDF repositories, executing SPARQL queries, and handling semantic data with ease.

## Features

- **Async-First Design**: Native support for async/await with synchronous fallback
- **Repository Management**: Create, access, and manage RDF4J repositories programmatically
- **SPARQL Support**: Execute SELECT, ASK, CONSTRUCT, and UPDATE queries effortlessly
- **SPARQL Query Builder**: Fluent, programmatic query construction with method chaining
- **Transaction Support**: Atomic operations with commit/rollback and isolation levels
- **Flexible Data Handling**: Add, retrieve, and manipulate RDF triples and quads
- **File Upload**: Upload RDF files (Turtle, N-Triples, N-Quads, RDF/XML, JSON-LD, TriG, N3) directly to repositories
- **Multiple Formats**: Support for various RDF serialization formats
- **Repository Types**: Memory stores, native stores, HTTP repositories, and more
- **Named Graph Support**: Work with multiple graphs within repositories
- **Inferencing**: Built-in support for RDFS and custom inferencing rules

## Installation

### Prerequisites

- Python 3.11 or higher
- RDF4J Server (for remote repositories) or embedded usage

### Install from PyPI

```bash
pip install rdf4j-python
```

### Install with Optional Dependencies

```bash
# Include SPARQLWrapper integration
pip install rdf4j-python[sparqlwrapper]
```

### Development Installation

```bash
git clone https://github.com/odysa/rdf4j-python.git
cd rdf4j-python
uv sync --group dev
```

## Usage

### Quick Start

```python
import asyncio
from rdf4j_python import AsyncRdf4j
from rdf4j_python.model.repository_config import RepositoryConfig, MemoryStoreConfig, SailRepositoryConfig
from rdf4j_python.model.term import IRI, Literal

async def main():
# Connect to RDF4J server
async with AsyncRdf4j("http://localhost:19780/rdf4j-server") as db:
# Create an in-memory repository
config = RepositoryConfig(
repo_id="my-repo",
title="My Repository",
impl=SailRepositoryConfig(sail_impl=MemoryStoreConfig(persist=False))
)
repo = await db.create_repository(config=config)

# Add some data
await repo.add_statement(
IRI("http://example.com/person/alice"),
IRI("http://xmlns.com/foaf/0.1/name"),
Literal("Alice")
)

# Query the data
results = await repo.query("SELECT * WHERE { ?s ?p ?o }")
for result in results:
print(f"Subject: {result['s']}, Predicate: {result['p']}, Object: {result['o']}")

if __name__ == "__main__":
asyncio.run(main())
```

### SPARQL Query Builder

Build queries programmatically with method chaining instead of writing raw SPARQL strings:

```python
from rdf4j_python import select, ask, construct, describe, GraphPattern, Namespace

ex = Namespace("ex", "http://example.org/")
foaf = Namespace("foaf", "http://xmlns.com/foaf/0.1/")

# SELECT with typed terms — IRIs serialize automatically
query = (
select("?person", "?name")
.where("?person", foaf.type, ex.Person)
.where("?person", foaf.name, "?name")
.optional("?person", foaf.email, "?email")
.filter("?name != 'Bob'")
.order_by("?name")
.limit(10)
.build()
)

# Or use string-based prefixed names
query = (
select("?name")
.prefix("foaf", "http://xmlns.com/foaf/0.1/")
.where("?person", "a", "foaf:Person")
.where("?person", "foaf:name", "?name")
.build()
)

# GROUP BY with aggregation
query = (
select("?city", "(COUNT(?person) AS ?count)")
.where("?person", ex.city, "?city")
.group_by("?city")
.having("COUNT(?person) > 1")
.order_by("DESC(?count)")
.build()
)

# ASK, CONSTRUCT, and DESCRIBE
ask_query = ask().where("?s", ex.name, "?name").build()

construct_query = (
construct(("?s", ex.fullName, "?name"))
.where("?s", ex.firstName, "?fname")
.bind("CONCAT(?fname, ' ', ?lname)", "?name")
.build()
)

describe_query = describe(ex.alice).build()
```

The query builder supports FILTER, OPTIONAL, UNION, BIND, VALUES, sub-queries, DISTINCT, ORDER BY, GROUP BY, HAVING, LIMIT, and OFFSET. Both raw strings and typed objects (`IRI`, `Variable`, `Literal`, `Namespace`) work as terms.

### Working with Multiple Graphs

```python
from rdf4j_python.model.term import Quad

async def multi_graph_example():
async with AsyncRdf4j("http://localhost:19780/rdf4j-server") as db:
repo = await db.get_repository("my-repo")

# Add data to specific graphs
statements = [
Quad(
IRI("http://example.com/person/bob"),
IRI("http://xmlns.com/foaf/0.1/name"),
Literal("Bob"),
IRI("http://example.com/graph/people")
),
Quad(
IRI("http://example.com/person/bob"),
IRI("http://xmlns.com/foaf/0.1/age"),
Literal("30", datatype=IRI("http://www.w3.org/2001/XMLSchema#integer")),
IRI("http://example.com/graph/demographics")
)
]
await repo.add_statements(statements)

# Query specific graph
graph_query = """
SELECT * WHERE {
GRAPH {
?person ?property ?value
}
}
"""
results = await repo.query(graph_query)
```

### Advanced Repository Configuration

Here's a more comprehensive example showing repository creation with different configurations:

```python
async def advanced_example():
async with AsyncRdf4j("http://localhost:19780/rdf4j-server") as db:
# Memory store with persistence
persistent_config = RepositoryConfig(
repo_id="persistent-repo",
title="Persistent Memory Store",
impl=SailRepositoryConfig(sail_impl=MemoryStoreConfig(persist=True))
)

# Create and populate repository
repo = await db.create_repository(config=persistent_config)

# Bulk data operations
data = [
(IRI("http://example.com/alice"), IRI("http://xmlns.com/foaf/0.1/name"), Literal("Alice")),
(IRI("http://example.com/alice"), IRI("http://xmlns.com/foaf/0.1/email"), Literal("alice@example.com")),
(IRI("http://example.com/bob"), IRI("http://xmlns.com/foaf/0.1/name"), Literal("Bob")),
]

statements = [
Quad(subj, pred, obj, IRI("http://example.com/default"))
for subj, pred, obj in data
]
await repo.add_statements(statements)

# Query with the fluent query builder
from rdf4j_python import select
from rdf4j_python.model._namespace import Namespace

foaf = Namespace("foaf", "http://xmlns.com/foaf/0.1/")
query = (
select("?name", "?email")
.where("?person", foaf.name, "?name")
.optional("?person", foaf.email, "?email")
.order_by("?name")
.build()
)
results = await repo.query(query)
```

### Uploading RDF Files

```python
import pyoxigraph as og

async def upload_example():
async with AsyncRdf4j("http://localhost:19780/rdf4j-server") as db:
repo = await db.get_repository("my-repo")

# Upload a Turtle file (format auto-detected from extension)
await repo.upload_file("data.ttl")

# Upload to a specific named graph
await repo.upload_file("data.ttl", context=IRI("http://example.com/graph"))

# Upload with explicit format
await repo.upload_file("data.txt", rdf_format=og.RdfFormat.N_TRIPLES)

# Upload with base URI for relative URIs
await repo.upload_file("data.ttl", base_uri="http://example.com/")
```

### Using Transactions

```python
from rdf4j_python import IsolationLevel

async def transaction_example():
async with AsyncRdf4j("http://localhost:19780/rdf4j-server") as db:
repo = await db.get_repository("my-repo")

# Atomic operations with auto-commit/rollback
async with repo.transaction() as txn:
await txn.add_statements([
Quad(IRI("http://example.com/alice"), IRI("http://xmlns.com/foaf/0.1/name"), Literal("Alice")),
Quad(IRI("http://example.com/bob"), IRI("http://xmlns.com/foaf/0.1/name"), Literal("Bob")),
])
await txn.delete_statements([old_quad])
# Commits automatically on success, rolls back on exception

# With specific isolation level
async with repo.transaction(IsolationLevel.SERIALIZABLE) as txn:
await txn.update("""
DELETE { ?s "draft" }
INSERT { ?s "published" }
WHERE { ?s "draft" }
""")
```

For more detailed examples, see the [examples](examples/) directory.

## Development

### Setting up Development Environment

1. **Clone the repository**:
```bash
git clone https://github.com/odysa/rdf4j-python.git
cd rdf4j-python
```

2. **Install development dependencies**:
```bash
uv sync --group dev
```

3. **Start RDF4J Server** (for integration tests):
```bash
# Using Docker
docker run -p 19780:8080 eclipse/rdf4j:latest
```

4. **Run tests**:
```bash
pytest tests/
```

5. **Run linting**:
```bash
ruff check .
ruff format .
```

### Project Structure

```
rdf4j_python/
├── _driver/ # Core async driver implementation
├── model/ # Data models and configurations
├── query/ # SPARQL query builder
├── exception/ # Custom exceptions
└── utils/ # Utility functions

examples/ # Usage examples
tests/ # Test suite
docs/ # Documentation
```

## Contributing

We welcome contributions! Here's how to get involved:

1. Fork the repository on GitHub
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Make your changes and add tests
4. Run the test suite to ensure everything works
5. Commit your changes (`git commit -m 'Add amazing feature'`)
6. Push to your branch (`git push origin feature/amazing-feature`)
7. Open a Pull Request

### Running Examples

```bash
# Make sure RDF4J server is running on localhost:19780
python examples/complete_workflow.py
python examples/query.py
```

## License

This project is licensed under the BSD 3-Clause License. See the [LICENSE](LICENSE) file for details.

Copyright (c) 2025, Chengxu Bian

## Support

- **Issues & Bug Reports**: [GitHub Issues](https://github.com/odysa/rdf4j-python/issues)
- **Documentation**: [docs/](https://github.com/odysa/rdf4j-python/tree/main/docs)
- **Questions**: Feel free to open a discussion or issue

If you find this project useful, please consider starring the repository!