https://github.com/SciPhi-AI/synthesizer

A multi-purpose LLM framework for RAG and data creation.
https://github.com/SciPhi-AI/synthesizer

agents ai artificial-intelligence machine-learning synthetic-data

Last synced: 6 months ago
JSON representation

A multi-purpose LLM framework for RAG and data creation.

Host: GitHub
URL: https://github.com/SciPhi-AI/synthesizer
Owner: SciPhi-AI
License: apache-2.0
Archived: true
Created: 2023-09-15T21:01:46.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-01-13T05:03:59.000Z (over 1 year ago)
Last Synced: 2024-05-19T18:20:02.134Z (12 months ago)
Topics: agents, ai, artificial-intelligence, machine-learning, synthetic-data
Language: Python
Homepage:
Size: 31.5 MB
Stars: 590
Watchers: 11
Forks: 48
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.cff

Awesome Lists containing this project

README

# Synthesizer[ΨΦ]: A multi-purpose LLM framework 💡

SciPhi Logo

With Synthesizer, users can:

- **Custom Data Creation**: Generate datasets via LLMs that are tailored to your needs.
- Anthropic, OpenAI, vLLM, and HuggingFace.
- **Retrieval-Augmented Generation (RAG) on Demand**: Built-in RAG Provider Interface to anchor generated data to real-world sources.
- Turnkey integration with Agent Search API.
- **Custom Data Creation**: Generate datasets via LLMs that are tailored to your needs, for LLM training, RAG, and more.

---

## Fast Setup

```bash
pip install sciphi-synthesizer
```

### Using Synthesizer

1. **Generate synthetic question-answer pairs**

```bash
export SCIPHI_API_KEY=MY_SCIPHI_API_KEY
python -m synthesizer.scripts.data_augmenter run --dataset="wiki_qa"
```

```bash
tail augmented_output/config_name_eq_answer_question__dataset_name_eq_wiki_qa.jsonl
{ "formatted_prompt": "... ### Question:\nwhat country did wine originate in\n\n### Input:\n1. URL: https://en.wikipedia.org/wiki/History%20of%20wine (Score: 0.85)\nTitle:History of wine....",
{ "completion": "Wine originated in the South Caucasus, which is now part of modern-day Armenia ..."
```

2. **Evaluate RAG pipeline performance**

```bash
export SCIPHI_API_KEY=MY_SCIPHI_API_KEY
python -m synthesizer.scripts.rag_harness --rag_provider="agent-search" --llm_provider_name="sciphi" --n_samples=25
```

### Documentation

For more detailed information, tutorials, and API references, please visit the official [Synthesizer Documentation](https://sciphi.readthedocs.io/en/latest/).

### Community & Support

- Engage with our vibrant community on [Discord](https://discord.gg/j9GxfbxqAe).
- For tailored inquiries or feedback, please [email us](mailto:[email protected]).

### Developing with Synthesizer

Quickly set up RAG augmented generation with your choice of provider, from OpenAI, Anhtropic, vLLM, and SciPhi:

```python
# Requires SCIPHI_API_KEY in env

from synthesizer.core import LLMProviderName, RAGProviderName
from synthesizer.interface import LLMInterfaceManager, RAGInterfaceManager
from synthesizer.llm import GenerationConfig

# RAG Provider Settings
rag_interface = RAGInterfaceManager.get_interface_from_args(
RAGProviderName("agent-search"),
limit_hierarchical_url_results=rag_limit_hierarchical_url_results,
limit_final_pagerank_results=rag_limit_final_pagerank_results,
)
rag_context = rag_interface.get_rag_context(query)

# LLM Provider Settings
llm_interface = LLMInterfaceManager.get_interface_from_args(
LLMProviderName("openai"),
)

generation_config = GenerationConfig(
model_name=llm_model_name,
max_tokens_to_sample=llm_max_tokens_to_sample,
temperature=llm_temperature,
top_p=llm_top_p,
# other generation params here ...
)

formatted_prompt = raw_prompt.format(rag_context=rag_context)
completion = llm_interface.get_completion(formatted_prompt, generation_config)
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/SciPhi-AI/synthesizer

Awesome Lists containing this project

README