https://github.com/ai-mindset/texttograph

Transforms text files into interactive relationship graphs by extracting character relationships using LLMs
https://github.com/ai-mindset/texttograph

graphs langchain llms ollama sqlite

Last synced: 11 months ago
JSON representation

Transforms text files into interactive relationship graphs by extracting character relationships using LLMs

Host: GitHub
URL: https://github.com/ai-mindset/texttograph
Owner: ai-mindset
License: other
Created: 2025-02-21T10:28:52.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-03-24T19:38:08.000Z (over 1 year ago)
Last Synced: 2025-04-14T15:52:54.838Z (about 1 year ago)
Topics: graphs, langchain, llms, ollama, sqlite
Language: Python
Homepage:
Size: 2.47 MB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 5
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# TextToGraph

A Python-based system that transforms text files into interactive relationship graphs by extracting character relationships using LLMs, storing data in SQLite, and generating network visualisations.

## Overview

TextToGraph analyses text documents containing character information, extracts relationships between characters using Large Language Models (LLMs), and creates a knowledge graph database of characters and their relationships. It then visualises these relationships as an animated network graph.

## Features

- Text processing with chunk-based analysis
- Character and relationship extraction using Ollama LLMs
- Persistent storage using SQLite
- Advanced network visualisation with:
- Colourblind-friendly design
- Interactive animation
- Optimised node layout
- Relationship strength indicators

## Requirements

- Python 3.10+
- Ollama server running locally (with mistral-small and nomic-embed-text models)
- Required Python packages:
- langchain
- langchain_ollama
- langchain_text_splitters
- matplotlib
- networkx
- numpy
- pydantic

## Project Structure

- `constants.py`: Configuration parameters for the application
- `logger.py`: Logging utilities
- `main.py`: Core functionality for processing texts and building the knowledge graph
- `plot_graph.py`: Network visualisation tools
- `__init__.py`: Package initialisation

## Usage

### 1. Set up the directory structure

```
├── data/ # Database storage
├── plots/ # Generated visualisations
├── text/ # Input text files (.txt or .md)
└── texttograph/ # The code module
```

### 2. Prepare your text files

Place character description text files in the `text/` directory. Each file should contain information about a character and their relationships with other characters. The filename (without extension) will be used as the character ID.

### 3. Start the Ollama server

Ensure the Ollama server is running locally with the required models:

```bash
ollama serve
```

### 4. Run the main processing script

```bash
python -m texttograph.main
```

This will:
- Read all text files from the `text/` directory
- Process and chunk the content
- Generate embeddings
- Extract character relationships using LLMs
- Store all data in the SQLite database

### 5. Generate the visualisation

```bash
python -m texttograph.plot_graph
```

This will create an animated network graph of character relationships and save it to the path specified in `constants.py` (default: `plots/character_graph.mp4`).

## Configuration

Edit `constants.py` to configure:

- `CHUNK_SIZE`: Size of text chunks for processing (default: 1000)
- `CHUNK_OVERLAP`: Overlap between chunks (default: 20% of chunk size)
- `PLOT_DIRECTORY`: Where visualisations are saved (default: "plots")
- `TEXT_DIRECTORY`: Where input text files are located (default: "text")
- `DB_DIRECTORY`: Database storage location (default: "data")
- `MODEL`: Ollama model to use (default: "mistral-small:24b-instruct-2501-q4_K_M")
- `LOG_LEVEL`: Logging verbosity (default: "INFO")

## How It Works

1. **Text Processing**:
- Reads text files from the `text/` directory
- Splits content into overlapping chunks
- Generates embeddings for each chunk

2. **Relationship Extraction**:
- Uses LLMs to identify characters, traits, and relationships in the text
- Parses the output into structured `Character` and `Relationship` objects

3. **Database Storage**:
- Initialises an SQLite database with tables for documents, nodes, and edges
- Stores the original content, chunks, embeddings, characters, and relationships

4. **Network Visualisation**:
- Creates a directed graph using NetworkX
- Assigns visually distinct attributes based on character grouping
- Generates an animation showing the network formation

## Advanced Usage

### Custom Database Path

```bash
python -m texttograph.main --db path/to/database.db
```

### Custom Text Directory

```bash
python -m texttograph.main --texts path/to/texts
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ai-mindset/texttograph

Awesome Lists containing this project

README