An open API service indexing awesome lists of open source software.

https://github.com/chigwell/hist-pred-extractor

Extracts & structures historical predictions from text using llmatch-messages.
https://github.com/chigwell/hist-pred-extractor

archiveresearch contextanalysis dataenrichment historiantools historicalpredictions llmatchmessages nlp outcomeassessment predictionvalidation predictiveanalysis researchassistant semanticanalysis structuredoutput temporalinference textextraction

Last synced: about 1 month ago
JSON representation

Extracts & structures historical predictions from text using llmatch-messages.

Awesome Lists containing this project

README

          

# hist-pred-extractor
[![PyPI version](https://badge.fury.io/py/hist-pred-extractor.svg)](https://badge.fury.io/py/hist-pred-extractor)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
[![Downloads](https://static.pepy.tech/badge/hist-pred-extractor)](https://pepy.tech/project/hist-pred-extractor)
[![LinkedIn](https://img.shields.io/badge/LinkedIn-blue)](https://www.linkedin.com/in/eugene-evstafev-716669181/)

**hist-pred-extractor** is a tiny Python package that extracts and structures historical predictions (foresights, prophecies, or prognostications) from free‑form text.
Given a passage about a historical figure or event, the tool uses a LLM to locate statements that contain a prediction, validates them against a strict regex pattern, and returns a clean, structured list of the extracted predictions together with their context.

The package is useful for historians, researchers, educators, and anyone who wants to systematically analyse historical foresights.

---

## Features

- **One‑function API** – just pass a string (and optionally a custom LLM or API key).
- **LLM‑agnostic** – defaults to `ChatLLM7` (via `langchain_llm7`) but works with any LangChain chat model.
- **Regex‑validated output** – ensures extracted data follows the pattern defined in `prompts.pattern`.
- **Zero‑configuration** – works out of the box with the free tier of LLM7.

---

## Installation

```bash
pip install hist_pred_extractor
```

---

## Quick Start

```python
from hist_pred_extractor import hist_pred_extractor

# Simple call – uses the default ChatLLM7 and the LLM7_API_KEY
text = """
In 1846, the astronomer John Herschel wrote: "In the next fifty years, the continents will drift apart,
forming the Atlantic as we know it today." This prediction was later confirmed by plate tectonics.
"""
predictions = hist_pred_extractor(user_input=text)

print(predictions)
# Example output:
# [
# "In the next fifty years, the continents will drift apart, forming the Atlantic as we know it today."
# ]
```

---

## API Reference

### `hist_pred_extractor(user_input, llm=None, api_key=None) → List[str]`

| Parameter | Type | Description |
|-----------|------|-------------|
| **user_input** | `str` | The raw text containing historical statements to be analysed. |
| **llm** | `Optional[BaseChatModel]` | A LangChain chat model instance. If omitted, the function creates a `ChatLLM7` instance automatically. |
| **api_key** | `Optional[str]` | API key for LLM7. If not supplied, the function reads the `LLM7_API_KEY` environment variable, falling back to the default free‑tier key. |

**Returns:** `List[str]` – a list of extracted prediction strings that match the internal regex pattern.

---

## Using a Custom LLM

You can supply any LangChain‑compatible chat model, e.g. OpenAI, Anthropic, or Google Gemini.

### OpenAI

```python
from langchain_openai import ChatOpenAI
from hist_pred_extractor import hist_pred_extractor

llm = ChatOpenAI(model="gpt-4o-mini")
predictions = hist_pred_extractor(user_input="...", llm=llm)
```

### Anthropic

```python
from langchain_anthropic import ChatAnthropic
from hist_pred_extractor import hist_pred_extractor

llm = ChatAnthropic()
predictions = hist_pred_extractor(user_input="...", llm=llm)
```

### Google Gemini

```python
from langchain_google_genai import ChatGoogleGenerativeAI
from hist_pred_extractor import hist_pred_extractor

llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash")
predictions = hist_pred_extractor(user_input="...", llm=llm)
```

---

## API Key & Rate Limits

- The default free tier of **LLM7** provides enough calls for typical research tasks.
- For higher throughput, set your own key:

```bash
export LLM7_API_KEY="your_api_key_here"
```

or pass it directly:

```python
predictions = hist_pred_extractor(user_input="...", api_key="your_api_key_here")
```

You can obtain a free API key by registering at **https://token.llm7.io/**.

---

## Contributing & Support

- **Issue Tracker:**
- Feel free to open a GitHub issue for bugs, feature requests, or questions.

---

## License

This project is licensed under the MIT License.

---

## Author

**Eugene Evstafev** –
GitHub: [chigwell](https://github.com/chigwell)

---

## Acknowledgements

- **ChatLLM7** from the `langchain_llm7` package:
- **LangChain** framework for unified LLM access.