An open API service indexing awesome lists of open source software.

https://github.com/dmitriz/llm-selection

Strategic selection, evaluation, and routing of large language models
https://github.com/dmitriz/llm-selection

Last synced: 4 months ago
JSON representation

Strategic selection, evaluation, and routing of large language models

Awesome Lists containing this project

README

          

# LLM Selection Project

A system for selecting, routing, and evaluating large language models (LLMs) based on task type, reasoning ability, access methods, costs, and reliability.

## 🚀 Getting Started

This project is designed to help you make strategic model selection decisions for your LLM tasks. The repository will grow to include:

- Model catalogs with capability tracking
- Routing logic for task-to-model mapping
- Evaluation tools and reviewer prompts
- Integration patterns for various LLM access methods

## 📂 Repository Structure

- `models.md` - Catalog of available LLMs and their capabilities (planned)
- `routing-strategy.md` - Logic for matching tasks to models (planned)
- `reviewer-prompts.md` - Evaluation criteria and prompts (planned)
- `PROJECT_BACKLOG.md` - Complete project backlog with detailed goals and plans

## 🏁 Initial Tasks

- [ ] Create `models.md` for cataloging available LLMs
- [ ] Add `routing-strategy.md` to outline logic by task type
- [ ] Prepare reviewer prompt set in `reviewer-prompts.md`
- [ ] Design and test a selection prompt (e.g., "Which model should I use for X?")

## 📊 Roadmap & Prioritization

### Phase 1: Model Cataloging (June 2025)

- **Focus**: Comprehensive model inventory and capability assessment
- **Deliverable**: `models.md` with detailed capability matrix
- **Key milestones**:
- Complete API documentation for all major providers
- Document token limits and rate restrictions
- Create standardized capability scoring system

### Phase 2: Routing Strategy (July 2025)

- **Focus**: Decision framework for model selection
- **Deliverable**: `routing-strategy.md` with decision trees
- **Key milestones**:
- Map task types to optimal model profiles
- Create fallback pathways for rate limits and failures
- Design cost optimization strategies

### Phase 3: Evaluation Framework (August 2025)

- **Focus**: Output quality assessment
- **Deliverable**: `reviewer-prompts.md` with scoring metrics
- **Key milestones**:
- Develop standardized evaluation criteria
- Create reviewer prompts for different output types
- Build comparison methodology for cross-model evaluation

### Success Metrics

- **Latency targets**: <500ms for routing decisions, <5s end-to-end for simple tasks
- **Hallucination thresholds**: <2% factual errors in final outputs
- **Cost efficiency**: 30% reduction in token usage vs. single-model approach
- **Selection accuracy**: >95% appropriate model selection for given task types

## 📖 Detailed Documentation

For complete project details including:

- Core components
- Key insights and strategy notes
- Planned experiments
- Roles and agents

Please refer to the [PROJECT_BACKLOG.md](PROJECT_BACKLOG.md) file.

## 🤝 Contributing

Contributions are welcome! Check the PROJECT_BACKLOG.md for areas where you can help, or suggest new features and improvements.

---

> Note: This project is in active development. Features and documentation will evolve over time.