{"id":49172495,"url":"https://github.com/evalops/dspy-0to1-guide","last_synced_at":"2026-05-09T03:01:41.317Z","repository":{"id":313136312,"uuid":"1050157453","full_name":"evalops/dspy-0to1-guide","owner":"evalops","description":"A comprehensive 0-to-1 guide for building self-improving LLM applications with DSPy framework","archived":false,"fork":false,"pushed_at":"2026-04-14T16:36:27.000Z","size":120,"stargazers_count":211,"open_issues_count":0,"forks_count":15,"subscribers_count":3,"default_branch":"main","last_synced_at":"2026-04-14T18:24:58.515Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/evalops.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-04T03:13:36.000Z","updated_at":"2026-04-14T16:36:33.000Z","dependencies_parsed_at":"2025-09-04T06:17:48.758Z","dependency_job_id":"f636dab1-6382-4465-b713-487dd556731e","html_url":"https://github.com/evalops/dspy-0to1-guide","commit_stats":null,"previous_names":["haasonsaas/dspy-0to1-guide","evalops/dspy-0to1-guide"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/evalops/dspy-0to1-guide","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/evalops%2Fdspy-0to1-guide","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/evalops%2Fdspy-0to1-guide/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/evalops%2Fdspy-0to1-guide/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/evalops%2Fdspy-0to1-guide/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/evalops","download_url":"https://codeload.github.com/evalops/dspy-0to1-guide/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/evalops%2Fdspy-0to1-guide/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32805514,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-08T08:22:46.396Z","status":"online","status_checked_at":"2026-05-09T02:00:06.633Z","response_time":123,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-04-22T20:00:44.252Z","updated_at":"2026-05-09T03:01:41.307Z","avatar_url":"https://github.com/evalops.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# DSPy 0‑to‑1 Guide: Building Self‑Improving LLM Applications from Scratch\n\n## Table of contents\n\n1. [Why DSPy? Motivation \u0026 Problem Statement](#why-dspy-motivation--problem-statement)\n2. [Core Concepts: Signatures, Modules \u0026 Optimizers](#core-concepts-signatures-modules--optimizers)\n3. [Installation \u0026 Hello World](#installation--hello-world)\n4. [Composing Pipelines: Retrieval‑Augmented Generation \u0026 Summarization](#composing-pipelines-retrieval-augmented-generation--summarization)\n5. [Evaluation \u0026 Metrics](#evaluation--metrics)\n6. [Optimization: Self‑Improving Pipelines](#optimization-self-improving-pipelines)\n7. [Agents \u0026 Tool Use](#agents--tool-use)\n8. [Advanced Topics](#advanced-topics)\n9. [Best Practices \u0026 Pitfalls](#best-practices--pitfalls)\n10. [Additional Resources \u0026 Next Steps](#additional-resources--next-steps)\n\n## Why DSPy? Motivation \u0026 Problem Statement\n\n### The pain of prompt engineering\n\nDevelopers building applications with large language models (LLMs) have traditionally relied on **hand‑crafted prompts** and chain‑of‑thought instructions. This approach is brittle and time‑consuming: small changes in wording can cause wildly different outputs, prompt logic is embedded in code and hard to reuse, and improving performance typically relies on manual trial‑and‑error. Prompt templates also tie the implementation to a specific model; when switching models you often need to rewrite prompts or adjust hyper‑parameters.\n\n### DSPy's solution\n\nDSPy—short for _Declarative Self‑improving Python_—was developed at Stanford University to address these pain points. It allows developers to **program** their applications rather than engineer prompts. You declare what inputs and outputs your system should handle, write modular Python code, and let DSPy automatically compile prompt templates and optimize them. Key advantages include:\n\n- **Declarative programming:** you specify what your system should accomplish (input/output signatures) rather than how to prompt the model. This decouples high‑level logic from low‑level prompt design.\n- **Automatic optimization:** DSPy uses optimizers to refine prompts and few‑shot examples based on feedback and metrics, freeing you from manual prompt tweaking.\n- **Production resilience:** built‑in patterns for caching, output validation and monitoring make pipelines less brittle.\n\nDSPy has quickly gained traction; the project (open‑sourced in late 2023) has thousands of stars, hundreds of contributors and is rapidly moving from prototype to production‑ready framework.\n\n## Core Concepts: Signatures, Modules \u0026 Optimizers\n\nDSPy's design revolves around three core abstractions: signatures, modules and optimizers. These concepts let you write composable code that DSPy compiles into robust LLM interactions.\n\n### Signatures – declarative task specification\n\nA **signature** defines the input/output behaviour of a task without specifying how the language model should accomplish it. It is analogous to a type declaration. For example, you can define a question‑answering task:\n\n```python\nimport dspy\n\nclass QA(dspy.Signature):\n    \"\"\"Question answering task.\"\"\"\n    context: str = dspy.InputField(desc=\"Background information\")\n    question: str = dspy.InputField()\n    answer: str = dspy.OutputField(desc=\"Accurate answer\")\n```\n\nThis signature acts as a contract: any module implementing it will accept context and question and return an answer. Using signatures instead of free‑text prompts provides type safety, readability and reusability.\n\n### Modules – composable building blocks\n\nA **module** encapsulates a particular prompting strategy or reasoning pattern. DSPy provides modules like Predict (basic prompting), ChainOfThought (step‑by‑step reasoning), ReAct (reasoning and acting via tools) and ProgramOfThought (code generation). You can also write custom modules by subclassing dspy.Module. For instance, a simple retrieval‑augmented QA pipeline may look like this:\n\n```python\nimport dspy\n\ndspy.configure(lm=dspy.LM('openai/gpt-4o-mini'))\n\nclass RAG(dspy.Module):\n    def __init__(self):\n        super().__init__()\n        self.retrieve = dspy.Retrieve(k=3)  # retrieval step\n        self.generate = dspy.ChainOfThought(\"question, context -\u003e answer\")\n\n    def forward(self, question):\n        context = self.retrieve(question)\n        return self.generate(question=question, context=context)\n```\n\nModules can be composed like neural‑network layers to build complex pipelines. You can swap modules (e.g., replace ChainOfThought with ProgramOfThought) without changing the signature.\n\n### Optimizers – self‑improving pipelines\n\nAn **optimizer** iteratively improves the prompts or parameters of a module using example data and a metric. DSPy includes optimizers such as BootstrapFewShot and BetterTogether. During optimization, DSPy generates variations of prompts, tests them on your examples, and retains the best ones. This yields higher accuracy and consistency over time without manual tuning.\n\n## Installation \u0026 Hello World\n\n### Environment setup\n\n1. **Install DSPy:**\n\n```bash\npip install dspy-ai\n```\n\n2. (Optional) **Local LLMs:** If you prefer running models locally for privacy/cost reasons, install [Ollama](https://ollama.ai) and pull a model:\n\n```bash\nbrew install ollama  # MacOS example\nollama pull llama3\nollama serve\n```\n\nThen configure DSPy with `dspy.LM('ollama_chat/llama3')` instead of an OpenAI model.\n\n### A minimal \"Hello World\" program\n\nBelow is a first DSPy program that answers a math question. It defines a custom module using the ChainOfThought pattern, which instructs the model to reason step‑by‑step before producing the final answer.\n\n```python\nimport dspy\n\n# Configure the language model (OpenAI's gpt‑4o‑mini for this example)\ndspy.configure(lm=dspy.LM('openai/gpt-4o-mini'))\n\nclass MathQA(dspy.Module):\n    def __init__(self):\n        super().__init__()\n        # Define the module using Chain‑of‑Thought reasoning\n        self.solve = dspy.ChainOfThought(\"question -\u003e answer: float\")\n\n    def forward(self, question: str):\n        return self.solve(question=question)\n\n# Instantiate and invoke the module\nqa = MathQA()\nresult = qa(\"What is 3 * 7 + 2?\")\nprint(result)\n```\n\nRunning this code will print a JSON‑like object containing the predicted answer and the intermediate reasoning produced by the model. The key takeaways are:\n\n- The signature (\"question -\u003e answer: float\") separates the task definition from the prompt.\n- You can use the same module with any compatible language model by changing the configure call.\n\n## Composing Pipelines: Retrieval‑Augmented Generation \u0026 Summarization\n\nDSPy shines when you compose multiple modules into richer pipelines. A typical example is Retrieval‑Augmented Generation (RAG):\n\n```python\nimport dspy\n\n# Configure your model (e.g., local LLM or cloud API)\ndspy.configure(lm=dspy.LM('openai/gpt-4o-mini'))\n\n# Define a function to call an external retrieval service\n# Here we use DSPy's built‑in ColBERTv2 retriever; you could also use your own search API\ndef search_wikipedia(query: str) -\u003e list[str]:\n    results = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')(query, k=3)\n    return [x['text'] for x in results]\n\nclass RAGPipeline(dspy.Module):\n    def __init__(self):\n        super().__init__()\n        self.retrieve = dspy.Retrieve(k=3)  # retrieval via ColBERT\n        self.generate = dspy.ChainOfThought(\"question, context -\u003e answer\")\n\n    def forward(self, question: str):\n        # Step 1: fetch relevant context\n        context = self.retrieve(question)\n        # Step 2: ask the model to answer using the context\n        return self.generate(question=question, context=context)\n\n# Usage\nrag = RAGPipeline()\nquestion = \"Who invented the telephone?\"\nanswer = rag(question)\nprint(answer)\n```\n\nIn this example the retrieval module fetches context, then the ChainOfThought module reasons over the question and context to generate an answer. Such patterns enable robust question‑answering systems.\n\n### Summarization\n\nYou can build a summarizer by composing modules similarly:\n\n```python\nimport dspy\n\n# Configure the model\ndspy.configure(lm=dspy.LM('openai/gpt-4o-mini'))\n\nclass Summarizer(dspy.Module):\n    def __init__(self):\n        super().__init__()\n        self.summarize = dspy.ChainOfThought(\"document -\u003e summary\")\n\n    def forward(self, document: str):\n        return self.summarize(document=document)\n\n# Example\ndoc = \"DSPy is a framework for programming language models...\"\nsummary = Summarizer()(doc)\nprint(summary)\n```\n\nThis pattern can be extended to translation, data extraction and other NLP tasks; you just change the signatures and modules accordingly.\n\n## Evaluation \u0026 Metrics\n\nCreating pipelines is only half the story; you need to **measure** their performance. DSPy provides a flexible evaluation API with built‑in metrics and support for custom metrics.\n\n1. **Prepare a dataset.** A dataset is a list of dspy.Example objects, each containing inputs and the expected output(s). For instance:\n\n```python\nfrom dspy import Example\n\n# Define some QA examples\ntrain_examples = [\n    Example(question=\"What is the capital of France?\", answer=\"Paris\").with_inputs(\"question\"),\n    Example(question=\"Who wrote 1984?\", answer=\"George Orwell\").with_inputs(\"question\"),\n]\n```\n\n2. **Choose a metric.** DSPy supplies metrics like answer_exact_match and SemanticF1. You can also write your own metric as a function that takes predictions and ground truth and returns a score in [0, 1].\n\n3. **Evaluate.** Use dspy.Evaluate to run your pipeline on a dataset and compute the metric:\n\n```python\nfrom dspy import Evaluate, metrics\n\nevaluate = Evaluate(\n    trainset=train_examples,\n    metric=metrics.answer_exact_match,  # use exact match metric\n)\n# Evaluate your module or compiled program\nresult = evaluate(rag)  # rag is the RAGPipeline defined earlier\nprint(\"Accuracy:\", result)\n```\n\nEvaluation helps you quantify improvement when applying optimizers or making architectural changes. By default, Evaluate runs on the training set; you should create a separate test set for final validation.\n\n## Optimization: Self‑Improving Pipelines\n\nManual prompt tuning is inefficient. DSPy's **optimizers** automate the process by generating prompt variants, trying them on your examples, and keeping the best ones. Here's a typical optimization loop:\n\n```python\nimport dspy\nfrom dspy.teleprompt import BootstrapFewShot\nfrom dspy import metrics\n\n# Suppose you already have a module (e.g., RAGPipeline) and a dataset of examples\nteleprompter = BootstrapFewShot(metric=metrics.answer_exact_match)\n\n# Use compile() to optimize prompts and few-shot examples\ncompiled_rag = teleprompter.compile(\n    rag,  # the module to optimize\n    trainset=train_examples,  # your training examples\n)\n\n# The compiled program is another module; evaluate it\nscore_before = dspy.Evaluate(train_examples, metrics.answer_exact_match)(rag)\nscore_after = dspy.Evaluate(train_examples, metrics.answer_exact_match)(compiled_rag)\nprint(f\"Accuracy before optimization: {score_before:.2f}\")\nprint(f\"Accuracy after optimization: {score_after:.2f}\")\n```\n\nThe compile step can produce dramatic improvements; DSPy will automatically generate candidate prompts and few‑shot examples, evaluate them on your data, and adopt the best configuration. You can adjust hyper‑parameters such as the number of candidates or exploration strategies.\n\nOther optimizers like BetterTogether, BootstrapFinetune and COPRO fine‑tune smaller models or jointly optimize prompts across multiple modules.\n\n## Agents \u0026 Tool Use\n\nSome tasks require the model to interact with external tools (calculators, APIs, web search). DSPy's ReAct module supports **Reasoning and Acting**: the model can decide whether to call a tool and incorporate the result in its reasoning. Here's a simple agent with a calculator tool:\n\n```python\nimport dspy\n\n# Configure the model\ndspy.configure(lm=dspy.LM('openai/gpt-4o-mini'))\n\n# Define a calculator tool\ndef calculator(expression: str) -\u003e float:\n    return eval(expression)\n\n# Create the agent module\nclass CalculatorAgent(dspy.Module):\n    def __init__(self):\n        super().__init__()\n        # Register the calculator tool with the ReAct module\n        self.react = dspy.ReAct(\"question -\u003e answer\", tools=[calculator])\n\n    def forward(self, question: str):\n        return self.react(question=question)\n\nagent = CalculatorAgent()\nprint(agent(\"What is 2 + 2 * 5?\"))  # The model can call calculator() to compute the answer\n```\n\nDSPy handles the plumbing: it formats tool calls, parses results and integrates them into the language model's reasoning. You can register multiple tools (e.g., web search, calendar APIs) to build sophisticated agents.\n\n## Advanced Topics\n\nOnce you grasp the basics, DSPy offers several advanced features:\n\n- **Reflective Prompt Evolution (GEPA).** GEPA is a technique where the model reflects on its own prompts and iteratively improves them. It's particularly useful for complex reasoning tasks.\n- **Reinforcement Learning optimization.** DSPy's experimental RL optimizer lets you optimize prompts using reinforcement learning signals, enabling deeper exploration of the prompt space.\n- **Integration with Pydantic for data validation.** You can use Pydantic models to enforce strict output schemas, catching malformed outputs early and providing runtime safety.\n- **Parallel \u0026 asynchronous execution.** DSPy supports parallel calls to LLMs and asynchronous pipelines, which is critical for latency‑sensitive applications.\n- **Production deployment.** DSPy's caching, logging and observability modules help you deploy pipelines reliably. For example, dspy.Cache can reduce cost by caching LM responses, and debugging tools can record intermediate states.\n\n## Best Practices \u0026 Pitfalls\n\n- **Start simple.** Begin with a single module (Predict or ChainOfThought) and a handful of examples. Add complexity incrementally; avoid prematurely optimizing.\n- **Collect representative examples.** DSPy's optimizers rely on example data; provide examples that reflect the range of inputs your system will see. Label them carefully.\n- **Separate training and evaluation.** Use distinct train and test sets to avoid overfitting prompts to your examples.\n- **Beware of cost and latency.** Optimization may generate many LM calls. Use smaller models and caching during experimentation.\n- **Validate outputs.** For tasks requiring structured output, integrate Pydantic or explicit parsing to ensure outputs meet your schema.\n- **Stay up to date.** DSPy evolves rapidly; APIs or module names can change. Always check the release notes before upgrading.\n\n## Working Examples in This Repository\n\nThis repository includes complete working implementations of the concepts covered above:\n\n### Basic Examples\n- **[examples/basic/hello_world.py](examples/basic/hello_world.py)** - Simple question answering\n- **[examples/basic/math_qa.py](examples/basic/math_qa.py)** - Mathematical reasoning with validation\n- **[examples/basic/summarizer.py](examples/basic/summarizer.py)** - Document summarization\n\n### Advanced Implementations  \n- **[examples/personas/support_sam.py](examples/personas/support_sam.py)** - Customer support agent with TF-IDF knowledge retrieval\n- **[examples/personas/legal_lucy.py](examples/personas/legal_lucy.py)** - Contract analysis with risk assessment\n- **[examples/advanced/pydantic_validation.py](examples/advanced/pydantic_validation.py)** - Structured outputs with Pydantic validation\n- **[examples/advanced/gepa_optimization.py](examples/advanced/gepa_optimization.py)** - GEPA prompt optimization\n- **[examples/advanced/parallel_execution.py](examples/advanced/parallel_execution.py)** - Parallel processing\n\n### Infrastructure \u0026 Monitoring\n- **[examples/infrastructure/prometheus_metrics.py](examples/infrastructure/prometheus_metrics.py)** - Production monitoring with Prometheus\n- **[docker-compose.yml](docker-compose.yml)** - Full deployment stack\n\n### Getting Started\n```bash\n# Install and run examples\nmake install\nollama pull llama3\npython examples/basic/hello_world.py\npython examples/personas/support_sam.py\n```\n\nAll examples work with both Ollama (local) and OpenAI models. See **[datasets/](datasets/)** for 31 sample datasets covering QA, classification, and more.\n\n## Additional Resources \u0026 Next Steps\n\n- **Official DSPy documentation:** start at the [Programming Overview](https://dspy.ai/learn/programming/overview/) and explore topics like language models, modules, evaluation and optimization.\n- **Community resources:** join the Discord or Slack channels for quick help. The [stanfordnlp/dspy](https://github.com/stanfordnlp/dspy) GitHub repository hosts examples, tutorials and real‑world demos.\n- **Build your own.** The best way to internalize DSPy is to build. Start with a small idea (e.g., summarizing team meetings, answering FAQs) and iterate. Measure improvements using DSPy's evaluation tools and share learnings with the community.\n\nBy following this path—understanding the motivation, grasping the core abstractions, writing simple modules, composing pipelines, evaluating and optimizing them—you will move from a complete novice to a developer who can build robust, self‑improving LLM applications. DSPy's declarative philosophy allows you to focus on high‑level design while it handles the low‑level prompt engineering. The above code examples and recommendations provide a strong foundation for exploring the more sophisticated capabilities of the framework.\n\n## References\n\n- [DSPy: An open-source framework for LLM-powered applications | InfoWorld](https://www.infoworld.com/article/3956455/dspy-an-open-source-framework-for-llm-powered-applications.html)\n- [DSPy Framework: A Comprehensive Technical Guide | DZone](https://dzone.com/articles/dspy-framework-technical-guide)\n- [What Is DSPy? How It Works, Use Cases, and Resources | DataCamp](https://www.datacamp.com/blog/dspy-introduction)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fevalops%2Fdspy-0to1-guide","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fevalops%2Fdspy-0to1-guide","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fevalops%2Fdspy-0to1-guide/lists"}