{"id":24406021,"url":"https://github.com/bespokelabsai/curator","last_synced_at":"2026-01-16T12:39:24.309Z","repository":{"id":261718346,"uuid":"879473096","full_name":"bespokelabsai/curator","owner":"bespokelabsai","description":"Synthetic data curation for post-training and structured data extraction","archived":false,"fork":false,"pushed_at":"2026-01-03T06:07:57.000Z","size":65648,"stargazers_count":1593,"open_issues_count":68,"forks_count":128,"subscribers_count":11,"default_branch":"main","last_synced_at":"2026-01-03T07:31:57.178Z","etag":null,"topics":["agents","deep-learning","fine-tuning","instruction-tuning","llm","machine-learning","natural-language-processing","prompt","python","synthetic-data","synthetic-dataset-generation"],"latest_commit_sha":null,"homepage":"https://docs.bespokelabs.ai/bespoke-curator","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bespokelabsai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"docs/CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-10-28T01:11:41.000Z","updated_at":"2026-01-02T04:27:59.000Z","dependencies_parsed_at":"2024-11-08T03:26:49.541Z","dependency_job_id":"f8075f51-2355-4447-b44e-f8c04ed4df39","html_url":"https://github.com/bespokelabsai/curator","commit_stats":null,"previous_names":["bespokelabsai/curator"],"tags_count":26,"template":false,"template_full_name":null,"purl":"pkg:github/bespokelabsai/curator","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bespokelabsai%2Fcurator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bespokelabsai%2Fcurator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bespokelabsai%2Fcurator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bespokelabsai%2Fcurator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bespokelabsai","download_url":"https://codeload.github.com/bespokelabsai/curator/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bespokelabsai%2Fcurator/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28478716,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-16T11:59:17.896Z","status":"ssl_error","status_checked_at":"2026-01-16T11:55:55.838Z","response_time":107,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agents","deep-learning","fine-tuning","instruction-tuning","llm","machine-learning","natural-language-processing","prompt","python","synthetic-data","synthetic-dataset-generation"],"created_at":"2025-01-20T05:01:05.949Z","updated_at":"2026-01-16T12:39:24.294Z","avatar_url":"https://github.com/bespokelabsai.png","language":"Python","funding_links":[],"categories":["Python","数据 Data","7. Training \u0026 Fine-tuning Ecosystem","Tools","Data Processing \u0026 ETL Agents"],"sub_categories":["Synthetic Data","NL AI Frameworks"],"readme":"\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://bespokelabs.ai/\" target=\"_blank\"\u003e\n    \u003cpicture\u003e\n      \u003csource media=\"(prefers-color-scheme: light)\" width=\"100px\" srcset=\"https://github.com/bespokelabsai/curator/blob/main/docs/Bespoke-Labs-Logomark-Red-crop.png\"\u003e\n      \u003cimg alt=\"Bespoke Labs Logo\" width=\"100px\" src=\"https://github.com/bespokelabsai/curator/blob/main/docs/Bespoke-Labs-Logomark-Red-crop.png\"\u003e\n    \u003c/picture\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\u003ch1 align=\"center\"\u003eBespoke Curator\u003c/h1\u003e\n\u003ch3 align=\"center\" style=\"font-size: 20px; margin-bottom: 4px\"\u003eBulk Inference and Scalable Data Curation for Post-Training\u003c/h3\u003e\n\u003cbr/\u003e\n\n\u003cdiv align=\"center\"\u003e\n\n[![Github](https://img.shields.io/badge/Curator-000000?style=for-the-badge\u0026logo=github\u0026logoColor=000\u0026logoColor=white)](https://github.com/bespokelabsai/curator/) [![Twitter](https://img.shields.io/badge/@BespokeLabsai-white?style=for-the-badge\u0026logo=X\u0026logoColor=white\u0026color=000)](https://x.com/bespokelabsai) [![Hugging Face](https://img.shields.io/badge/BespokeLabs-fcd022?style=for-the-badge\u0026logo=huggingface\u0026logoColor=000\u0026labelColor)](https://huggingface.co/bespokelabs) [![Discord](https://img.shields.io/badge/Bespoke_Labs-5865F2?style=for-the-badge\u0026logo=discord\u0026logoColor=white)](https://discord.gg/KqpXvpzVBS) \n\u003cbr\u003e\n[![Docs](https://img.shields.io/badge/Docs-docs.bespokelabs.ai-blue?style=for-the-badge\u0026link=https%3A%2F%2Fdocs.bespokelabs.ai\u0026labelColor=000)](https://docs.bespokelabs.ai/bespoke-curator/getting-started) [![Website](https://img.shields.io/badge/Site-bespokelabs.ai-blue?style=for-the-badge\u0026link=https%3A%2F%2Fbespokelabs.ai\u0026labelColor=000)](https://bespokelabs.ai/) [![PyPI](https://img.shields.io/pypi/v/bespokelabs-curator?style=for-the-badge\u0026labelColor=000)](https://pypi.org/project/bespokelabs-curator/)\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n[ English | \u003ca href=\"docs/README_zh.md\"\u003e中文\u003c/a\u003e ]\n\u003c/div\u003e\n\n## 🎉 What's New \n* **[2025.04.09]** [Launching Reasoning Datasets Competition](https://huggingface.co/blog/bespokelabs/reasoning-datasets-competition) with HuggingFace and Together.ai. Win $5000 USD worth of prizes!\n* **[2025-04-03]** We used Bespoke Curator to create [OpenThoughts2-1M](https://huggingface.co/datasets/open-thoughts/OpenThoughts2-1M) dataset, which was used to train OpenThinker2-32B that outperforms DeepSeek-R1-32B. The dataset started trending on HuggingFace.\n* **[2025.03.12]** [Gemini Batch support added](https://www.bespokelabs.ai/blog/effortless-gemini-batch-processing-with-curator): Gemini batch API is extremely challenging, and we made it much simpler! :)\n* **[2025.03.05]** [Claude 3.7 Sonnet Thinking and batch mode support added](https://www.bespokelabs.ai/blog/claude-3-7-sonnet-thinking-mode-in-curator).\n* **[2025.02.26]** [Code Execution Support added](https://www.bespokelabs.ai/blog/launching-code-executor): You can now run code (generated by Curator) using CodeExecutor. We support four backends: local (called multiprocessing), Ray, Docker and e2b.\n* **[2025.02.06]** We used Bespoke Curator to create [s1K-1.1]( https://huggingface.co/datasets/simplescaling/s1K-1.1), a high-quality sample-efficient reasoning dataset.\n* **[2025.01.30]** [Batch Processing Support for OpenAI, Anthropic, and other compatible APIs](https://www.bespokelabs.ai/blog/batch-processing-with-curator): Cut Token Costs in Half 🔥🔥🔥. Through our partnership with kluster.ai, new users using Curator can access open-source models like DeepSeek-R1 and receive a **$25 credit** (limits apply). EDIT: Promotion has come to an end.\n* **[2025.01.27]** We used Bespoke Curator to create [OpenThoughts-114k](https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k), a high-quality reasoning dataset (trending on HuggingFace).\n* **[2025.01.22]** We used Bespoke Curator to create [Bespoke-Stratos-17k](https://huggingface.co/datasets/bespokelabs/Bespoke-Stratos-17k), a high-quality reasoning dataset (trending on HuggingFace).\n* **[2025.01.15]** Curator launched 🎉\n\n## Overview\n\nBespoke Curator makes it easy to create synthetic data pipelines. Whether you are training a model or extracting structured data, Curator will prepare high-quality data quickly and robustly.\n\n* Rich Python based library for generating and curating synthetic data.\n* Viewer to monitor data while it is being generated.\n* First class support for structured outputs.\n* Built-in performance optimizations for asynchronous operations, caching, and fault recovery at every scale.\n* Support for a wide range of inference options via LiteLLM, vLLM, and popular batch APIs.\n\n![CLI in action](https://github.com/bespokelabsai/curator/blob/main/docs/curator-cli.gif)\n\nCheck out our full documentation for [getting started](https://docs.bespokelabs.ai/bespoke-curator/getting-started), [tutorials](https://docs.bespokelabs.ai/bespoke-curator/tutorials), [guides](https://docs.bespokelabs.ai/bespoke-curator/how-to-guides) and detailed [reference](https://docs.bespokelabs.ai/bespoke-curator/api-reference/llm-api-documentation).\n\n## 🛠️ Installation\n\n```bash\npip install bespokelabs-curator\n```\n## 📕 Examples\n\n### Finetuning/Distillation\n| **Task** | **Link(s)** | **Goal** |\n|----------|--------------|-------------|\n| **Product feature extraction** | \u003ca target=\"_blank\" href=\"https://colab.research.google.com/drive/1YoA23-cBcWpaSErULzBI2bo2LPGo37GQ\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/\u003e\u003c/a\u003e | Finetuning a model to identify features of a product |\n| **Sentiment analysis** | \u003ca href=\"https://colab.research.google.com/drive/1Zfl3g7POsqqYQqkzXdyhYRSAymLhZugn?usp=sharing\" target=\"_blank\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/\u003e\u003c/a\u003e | Aspect-based sentiment analysis of restaurant reviews and finetuning using Together.ai |\n| **RAFT for domain-specific RAG** | \u003ca href=\"https://github.com/bespokelabsai/curator/tree/main/examples/blocks/raft\" target=\"_blank\"\u003eCode\u003c/a\u003e | Implement Retrieval Augmented Fine-Tuning (RAFT) that processes domain-specific documents, generates questions, and prepares data for fine-tuning LLMs. |\n\n### Data Generation\n| **Task** | **Link(s)** | **Goal** |\n|----------|--------------|-------------|\n| **Reasoning dataset generation (Bespoke Stratos)** | \u003ca href=\"https://github.com/bespokelabsai/curator/tree/main/examples/bespoke-stratos-data-generation\" target=\"_blank\"\u003eCode\u003c/a\u003e | Generate the Bespoke-Stratos-17k dataset, focusing on reasoning traces from math, coding, and problem-solving datasets. |\n| **Reasoning dataset generation (Open Thoughts)** | \u003ca href=\"https://github.com/open-thoughts/open-thoughts\" target=\"_blank\"\u003eCode\u003c/a\u003e | Generate the Open-Thoughts-114k dataset, focusing on reasoning traces from math, coding, and problem-solving datasets.|\n| **Multimodal** | \u003ca href=\"https://github.com/bespokelabsai/curator/tree/main/examples/multimodal\" target=\"_blank\"\u003eCode\u003c/a\u003e | Demonstrates multimodal capabilities by generating recipes from food images |\n| **Ungrounded Question Answer generation** | \u003ca href=\"https://github.com/bespokelabsai/curator/tree/main/examples/ungrounded-qa\" target=\"_blank\"\u003eCode\u003c/a\u003e | Generate diverse question-answer pairs using techniques similar to the CAMEL paper |\n| **Code Execution** | \u003ca href=\"https://colab.research.google.com/drive/1YKj1-BC66-3LgNkf1m5AEPswIYtpOU-k\" target=\"_blank\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/\u003e\u003c/a\u003e| Execute code generated with Curator |\n| **3Blue1Brown video generation** | \u003ca href=\"https://github.com/bespokelabsai/curator/tree/main/examples/code-execution/math-animation\" target=\"_blank\"\u003eCode\u003c/a\u003e | Generate videos similar to 3Blue1Brown and render them using code execution! |\n| **Synthetic charts** | \u003ca href=\"https://github.com/bespokelabsai/curator/blob/main/examples/code-execution/chart-generation/charts.py\" target=\"_blank\"\u003eCode\u003c/a\u003e | Generate charts synthetically.\n| **Function calling** | \u003ca href=\"https://github.com/bespokelabsai/curator/tree/main/examples/function-calling\" target=\"_blank\"\u003eCode\u003c/a\u003e | Generate data for finetuning for function calling. |\n\n\n\n## 🚀 Quickstart\n\n### Using `curator.LLM` for Bulk Inference\n\n```python\nfrom typing import Dict\nfrom bespokelabs import curator\nfrom datasets import Dataset\nfrom pydantic import BaseModel, Field\nfrom typing import Literal\n\nclass Sentiment(BaseModel):\n  sentiment: Literal[\"positive\", \"negative\", \"neutral\"] = Field(\n    description=\"Sentiment of the review\")\n\nclass SentimentAnalyzer(curator.LLM):\n\n  def prompt(self, product: Dict):\n    return f\"Determine the sentiment of the product from the review: {product['review']}\"\n\n  def parse(self, product: Dict, response: Sentiment):\n    return [{\"name\": product[\"name\"], \"sentiment\": response.sentiment}]\n\n# You can easily have a million rows here. \n# Curator takes care of parallelism, retries, and caches responses.\ndataset = [{\"name\": \"Curator\", \"review\": \"Already saved hours in one day of use.\"},\n           {\"name\": \"Bespoke MiniCheck\", \"review\": \"Hallucination rates dropped by 90%.\"}]\n\n# You can set batch=True, and instantly uses batch mode to save 50% of the costs.\nanalyzer = SentimentAnalyzer(\n    model_name=\"gpt-4o-mini\", response_format=Sentiment, batch=False)\nreviews = analyzer(dataset)\nprint(reviews.to_pandas())\n```\nOutput:\n```\n                name sentiment\n0            Curator  positive\n1  Bespoke MiniCheck  positive\n```\n\nIn the `SentimentAnalyzer` class:\n* `prompt` takes the input (`product`) and returns the prompt for the LLM.\n* `parse` takes the input (`product`) and the structured output (`response`) and converts it to a list of dictionaries. This is so that we can easily convert the output to a HuggingFace Dataset object.\n\nInstead of a list, you can pass a HuggingFace Dataset object as well (see below for more details).\n\n### Using `curator.LLM` for data generation\n\nHere's an example of using structured outputs and chaing together two curator.LLM blocks to generate diverse poems.\n\n\n```python\nfrom typing import Dict, List\nfrom bespokelabs import curator\nfrom pydantic import BaseModel, Field\n\nclass Topics(BaseModel):\n    topics_list: List[str] = Field(description=\"A list of topics.\")\n\nclass TopicGenerator(curator.LLM):\n  response_format = Topics\n\n  def prompt(self, subject):\n    return f\"Return 3 topics related to {subject}\"\n\n  def parse(self, input: str, response: Topics):\n    return [{\"topic\": t} for t in response.topics_list]\n\n\nclass Poem(BaseModel):\n    title: str = Field(description=\"The title of the poem.\")\n    poem: str = Field(description=\"The content of the poem.\")\n\nclass Poet(curator.LLM):\n    response_format = Poem\n\n    def prompt(self, input: Dict) -\u003e str:\n        return f\"Write two poems about {input['topic']}.\"\n\n    def parse(self, input: Dict, response: Poem) -\u003e Dict:\n        return [{\"title\": response.title, \"poem\": response.poem}]\n\ntopic_generator = TopicGenerator(model_name=\"gpt-4o-mini\")\npoet = Poet(model_name=\"gpt-4o-mini\")\n# Start generation\ntopics = topic_generator(\"Mathematics\")\npoems = poet(topics)\n```\nOutput:\n```\n \ttitle                     poem\n0\tThe Language of Algebra\t  In symbols and signs, truths intertwine,..\n1\tThe Geometry of Space\t  In the world around us, shapes do collide,..\n2\tThe Language of Logic\t  In circuits and wires where silence speaks,..\n```\n\nYou can see more examples in the [examples](examples) directory.\n\nSee the [docs](https://docs.bespokelabs.ai/) for more details as well as\nfor troubleshooting information.\n\n\u003e [!TIP]\n\u003e If you are generating large datasets, you may want to use [batch mode](https://docs.bespokelabs.ai/bespoke-curator/save-usdusdusd-on-llm-inference) to save costs. Currently batch APIs from [OpenAI](https://platform.openai.com/docs/guides/batch) and [Anthropic](https://docs.anthropic.com/en/docs/build-with-claude/message-batches) are supported. With curator this is as simple as setting `batch=True` in the `LLM` class.\n\u003e [!NOTE]\n\u003e Retries and caching are enabled by default to help you rapidly iterate your data pipelines.\n\u003e So now if you run the same prompt again, you will get the same response, pretty much instantly.\n\u003e You can delete the cache at `~/.cache/curator` or disable it with `export CURATOR_DISABLE_CACHE=true`.\n\n\n\u003e [!IMPORTANT]\n\u003e Make sure to set your API keys as environment variables for the model you are calling. For example running `export OPENAI_API_KEY=sk-...` and `export ANTHROPIC_API_KEY=ant-...` will allow you to run the previous two examples. A full list of supported models and their associated environment variable names can be found [in the litellm docs](https://docs.litellm.ai/docs/providers).\n### Anonymized Telemetry\n\nWe collect minimal, anonymized usage telemetry to help prioritize new features and improvements that benefit the Curator community. You can opt out by setting the `TELEMETRY_ENABLED` environment variable to `False`. \n\n## 📖 Providers\nCurator supports a wide range of providers, including OpenAI, Anthropic, and many more. \n\n### OpenAI backend\n```python\nllm = curator.LLM(\n    model_name=\"gpt-4o-mini\",\n)\n```\nFor other models that support OpenAI-compatible APIs, you can use the `openai` backend:\n```python\nllm = curator.LLM(\n    model_name=\"gpt-4o-mini\",\n    backend=\"openai\",\n    backend_params={\n        \"base_url\": \"https://your-openai-compatible-api-url\",\n        \"api_key\": \u003cYOUR_OPENAI_COMPATIBLE_SERVICE_API_KEY\u003e,\n    },\n)\n```\n\n\n### LiteLLM (Anthropic, Gemini, together.ai, etc.)\nHere is an example of using Gemini with litellm backend:\n```python\nllm = curator.LLM(\n    model_name=\"gemini/gemini-1.5-flash\",\n    backend=\"litellm\",\n    backend_params={\n        \"max_requests_per_minute\": 2_000,\n        \"max_tokens_per_minute\": 4_000_000\n    },\n)\n```\n[Documentation](https://docs.bespokelabs.ai/bespoke-curator/how-to-guides/using-litellm-for-diverse-providers)\n\n### Ollama\n```python\nllm = curator.LLM(\n    model_name=\"ollama/llama3.1:8b\",  # Ollama model identifier\n    backend_params={\"base_url\": \"http://localhost:11434\"},\n)\n```\n[Documentation](https://docs.bespokelabs.ai/bespoke-curator/how-to-guides/using-ollama-with-curator#id-2.-configure-the-ollama-backend)\n\n### vLLM\n\n```python\nllm = curator.LLM( \n    model_name=\"Qwen/Qwen2.5-3B-Instruct\", \n    backend=\"vllm\", \n    backend_params={ \n        \"tensor_parallel_size\": 1, # Adjust based on GPU count \n        \"gpu_memory_utilization\": 0.7 \n    }\n)\n```\n[Documentation](https://docs.bespokelabs.ai/bespoke-curator/how-to-guides/using-vllm-with-curator#id-3-initialize-and-use-the-generator)\n### DeepSeek\nDeepSeek offers an OpenAI-compatible API that you can use with the `openai` backend.\n\u003e [!IMPORTANT]\n\u003e The DeepSeek API is experiencing intermittent issues and will return empty responses during times of high traffic. We recommend\ncalling the DeepSeek API through the `openai` backend, with a high max retries so that we can retry failed requests upon empty\nresponse and a reasonable max requests and tokens per minute so we don't retry too aggressively and overwhelm the API.\n\n```python\nllm = curator.LLM(\n    model_name=\"deepseek-reasoner\",\n    generation_params={\"temp\": 0.0},\n    backend_params={\n        \"max_requests_per_minute\": 100,\n        \"max_tokens_per_minute\": 10_000_000,\n        \"base_url\": \"https://api.deepseek.com/\",\n        \"api_key\": \u003cYOUR_DEEPSEEK_API_KEY\u003e,\n        \"max_retries\": 50,\n    },\n    backend=\"openai\",\n)\n```\n\n### kluster.ai\n```python\nllm = curator.LLM(\n    model_name=\"deepseek-ai/DeepSeek-R1\", \n    backend=\"klusterai\",\n)\n```\n[Documentation](https://docs.bespokelabs.ai/bespoke-curator/how-to-guides/using-kluster.ai-for-batch-inference)\n## 📦 Batch Mode\nSeveral providers offer about 50% discount on token usage when using batch mode. Curator makes it easy to use batch mode with a wide range of providers.\n\nExample with OpenAI ([docs reference](https://docs.bespokelabs.ai/bespoke-curator/how-to-guides/using-openai-for-batch-inference)):\n```python\nllm = curator.LLM(model_name=\"gpt-4o-mini\", batch=True)\n```\n\nSee documentation:\n* [OpenAI batch mode](https://docs.bespokelabs.ai/bespoke-curator/how-to-guides/using-openai-for-batch-inference)\n* [Anthropic batch mode](https://docs.bespokelabs.ai/bespoke-curator/how-to-guides/using-anthropic-for-batch-inference)\n* [Gemini batch mode](https://docs.bespokelabs.ai/bespoke-curator/how-to-guides/using-gemini-for-batch-inference)\n* [kluster.ai batch mode](https://docs.bespokelabs.ai/bespoke-curator/how-to-guides/using-kluster.ai-for-batch-inference)\n* [Mistral batch mode](https://docs.bespokelabs.ai/bespoke-curator/how-to-guides/using-mistral-for-batch-inference)\n\n## Bespoke Curator Viewer\nThe hosted curator viewer is a rich interface to visualize data -- and makes visually inspecting the data much easier.\n\nYou can enable it as follows:\n\nBash:\n\n```shell\nexport CURATOR_VIEWER=1\n```\nPython/colab:\n```python\nimport os\nos.environ[\"CURATOR_VIEWER\"]=\"1\"\n```\n\nWith this enabled, as curator generates data, it gets uploaded and you can see the responses streaming in the viewer. The URL for the viewer is displayed right next to the rich progress.\n\n### Authenticate with a Bespoke Labs API key\n\nBy default, datasets are accessible to anyone with the link. To keep your datasets private, you can associate them with a Bespoke Labs account. Doing so also allows you to:\n\n1. Track all datasets associated with your account\n2. Share datasets with collaborators\n3. Analyze data generation costs over time\n\nYou can enable authentication as follows;\n\n1. [Sign up](https://curator.bespokelabs.ai/auth/signup) for a Bespoke Labs account.\n2. Create an API key from the [API Key](https://curator.bespokelabs.ai/home/keys) page.\n3. Set the `BESPOKE_API_KEY` and `CURATOR_VIEWER` environment variables:\n\n```shell\nexport BESPOKE_API_KEY=\u003cYOUR_API_KEY\u003e\nexport CURATOR_VIEWER=1\n```\n\nWith the environment variables set, all your datasets will be streamed to the hosted viewer and linked to your Bespoke Labs account. You can visit the [Datasets](https://curator.bespokelabs.ai/home/datasets) page to see datasets generated with your API keys or shared with you by others, and the [Cost Report](https://curator.bespokelabs.ai/home/costs)\npage to see the data generation costs for a given period.\n\n## Environment Variables\n\nWe support a range of environment variables to customize the behavior of Curator.\n\nHere is a complete table of environment variables:\n| Variable | Description | Default |\n|----------|-------------|---------|\n| `CURATOR_VIEWER` | Enables the Curator viewer for visualizing data curation when `True`. | `False` |\n| `CURATOR_DISABLE_CACHE` | Disables caching for `curator.LLM` generations when `True`. Useful for fresh runs. | `False` |\n| `CURATOR_CACHE_DIR` | Sets the cache directory used for `curator.LLM` generations. | `~/.cache/curator` |\n| `CURATOR_DISABLE_RICH_DISPLAY` | When `True`, disables [Rich CLI](https://github.com/Textualize/rich) output (and falls back to [tqdm](https://tqdm.github.io/) logging) for local data generation monitoring. This is useful when debugging with inline breakpoints or interactive debuggers like `pdb`, where Rich's dynamic output can interfere with terminal input. | `False` |\n| `TELEMETRY_ENABLED` | Enable telemetry for curator usage tracking when `True` | `True` |\n\n## Contributing\nThank you to all the contributors for making this project possible!\nPlease follow [these instructions](docs/CONTRIBUTING.md) on how to contribute.\n\n## Citation\nIf you find Curator useful, please consider citing us!\n\n```\n@software{Curator: A Tool for Synthetic Data Creation,\n  author = {Marten, Ryan* and Vu, Trung* and Ji, Charlie Cheng-Jie and Sharma, Kartik and Pimpalgaonkar, Shreyas and Dimakis, Alex and Sathiamoorthy, Maheswaran},\n  month = jan,\n  title = {{Curator}},\n  year = {2025},\n  howpublished = {\\url{https://github.com/bespokelabsai/curator}}\n}\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbespokelabsai%2Fcurator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbespokelabsai%2Fcurator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbespokelabsai%2Fcurator/lists"}