{"id":13838290,"url":"https://github.com/SciPhi-AI/synthesizer","last_synced_at":"2025-07-10T21:32:17.010Z","repository":{"id":195458518,"uuid":"692221879","full_name":"SciPhi-AI/synthesizer","owner":"SciPhi-AI","description":"A multi-purpose LLM framework for RAG and data creation.","archived":true,"fork":false,"pushed_at":"2024-01-13T05:03:59.000Z","size":33040,"stargazers_count":590,"open_issues_count":0,"forks_count":48,"subscribers_count":11,"default_branch":"main","last_synced_at":"2024-05-19T18:20:02.134Z","etag":null,"topics":["agents","ai","artificial-intelligence","machine-learning","synthetic-data"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SciPhi-AI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-09-15T21:01:46.000Z","updated_at":"2024-05-15T20:20:15.000Z","dependencies_parsed_at":"2023-12-21T19:10:46.716Z","dependency_job_id":"f483268e-fb22-4c6c-a5ef-54aafd1b78c2","html_url":"https://github.com/SciPhi-AI/synthesizer","commit_stats":null,"previous_names":["emrgnt-cmplxty/sciphi","sciphi-ai/sciphi","sciphi-ai/synthesizer"],"tags_count":13,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SciPhi-AI%2Fsynthesizer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SciPhi-AI%2Fsynthesizer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SciPhi-AI%2Fsynthesizer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SciPhi-AI%2Fsynthesizer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SciPhi-AI","download_url":"https://codeload.github.com/SciPhi-AI/synthesizer/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225657348,"owners_count":17503548,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agents","ai","artificial-intelligence","machine-learning","synthetic-data"],"created_at":"2024-08-04T15:01:48.671Z","updated_at":"2024-11-21T01:30:50.868Z","avatar_url":"https://github.com/SciPhi-AI.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# Synthesizer[ΨΦ]: A multi-purpose LLM framework 💡\n\n\u003cp align=\"center\"\u003e\n\u003cimg width=\"716\" alt=\"SciPhi Logo\" src=\"https://github.com/emrgnt-cmplxty/sciphi/assets/68796651/195367d8-54fd-4281-ace0-87ea8523f982\"\u003e\n\u003c/p\u003e\n\nWith Synthesizer, users can:\n\n- **Custom Data Creation**: Generate datasets via LLMs that are tailored to your needs.\n   - Anthropic, OpenAI, vLLM, and HuggingFace.\n- **Retrieval-Augmented Generation (RAG) on Demand**: Built-in RAG Provider Interface to anchor generated data to real-world sources. \n   - Turnkey integration with Agent Search API. \n- **Custom Data Creation**: Generate datasets via LLMs that are tailored to your needs, for LLM training, RAG, and more.\n\n---\n\n## Fast Setup\n\n```bash\npip install sciphi-synthesizer\n```\n\n### Using Synthesizer\n\n1. **Generate synthetic question-answer pairs**\n\n   ```bash\n   export SCIPHI_API_KEY=MY_SCIPHI_API_KEY\n   python -m synthesizer.scripts.data_augmenter run --dataset=\"wiki_qa\"\n   ```\n\n   ```bash\n   tail augmented_output/config_name_eq_answer_question__dataset_name_eq_wiki_qa.jsonl\n   { \"formatted_prompt\": \"... ### Question:\\nwhat country did wine originate in\\n\\n### Input:\\n1. URL: https://en.wikipedia.org/wiki/History%20of%20wine (Score: 0.85)\\nTitle:History of wine....\",\n   { \"completion\": \"Wine originated in the South Caucasus, which is now part of modern-day Armenia ...\"\n   ```\n\n2. **Evaluate RAG pipeline performance**\n\n   ```bash\n   export SCIPHI_API_KEY=MY_SCIPHI_API_KEY\n   python -m synthesizer.scripts.rag_harness --rag_provider=\"agent-search\" --llm_provider_name=\"sciphi\" --n_samples=25\n   ```\n\n### Documentation\n\nFor more detailed information, tutorials, and API references, please visit the official [Synthesizer Documentation](https://sciphi.readthedocs.io/en/latest/).\n\n### Community \u0026 Support\n\n- Engage with our vibrant community on [Discord](https://discord.gg/j9GxfbxqAe).\n- For tailored inquiries or feedback, please [email us](mailto:owen@sciphi.ai).\n\n### Developing with Synthesizer\n\nQuickly set up RAG augmented generation with your choice of provider, from OpenAI, Anhtropic, vLLM, and SciPhi:\n\n```python\n# Requires SCIPHI_API_KEY in env\n\nfrom synthesizer.core import LLMProviderName, RAGProviderName\nfrom synthesizer.interface import LLMInterfaceManager, RAGInterfaceManager\nfrom synthesizer.llm import GenerationConfig\n\n# RAG Provider Settings\nrag_interface = RAGInterfaceManager.get_interface_from_args(\n    RAGProviderName(\"agent-search\"),\n    limit_hierarchical_url_results=rag_limit_hierarchical_url_results,\n    limit_final_pagerank_results=rag_limit_final_pagerank_results,\n)\nrag_context = rag_interface.get_rag_context(query)\n\n# LLM Provider Settings\nllm_interface = LLMInterfaceManager.get_interface_from_args(\n    LLMProviderName(\"openai\"),\n)\n\ngeneration_config = GenerationConfig(\n    model_name=llm_model_name,\n    max_tokens_to_sample=llm_max_tokens_to_sample,\n    temperature=llm_temperature,\n    top_p=llm_top_p,\n    # other generation params here ...\n)\n\nformatted_prompt = raw_prompt.format(rag_context=rag_context)\ncompletion = llm_interface.get_completion(formatted_prompt, generation_config)\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSciPhi-AI%2Fsynthesizer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FSciPhi-AI%2Fsynthesizer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSciPhi-AI%2Fsynthesizer/lists"}