https://github.com/lenml/lenml-llm-leaderboard
lenml llm leaderboard
https://github.com/lenml/lenml-llm-leaderboard
Last synced: 5 months ago
JSON representation
lenml llm leaderboard
- Host: GitHub
- URL: https://github.com/lenml/lenml-llm-leaderboard
- Owner: lenML
- License: agpl-3.0
- Created: 2024-11-23T01:13:59.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-01-11T05:51:14.000Z (over 1 year ago)
- Last Synced: 2025-04-03T11:44:10.001Z (about 1 year ago)
- Language: TypeScript
- Homepage: https://lenml.github.io/lenml-llm-leaderboard/
- Size: 992 KB
- Stars: 7
- Watchers: 1
- Forks: 1
- Open Issues: 2
-
Metadata Files:
- Readme: README.en.md
- License: LICENSE
Awesome Lists containing this project
README
[cn](./README.md) | [en](./README.en.md)
# LenML LLM Leaderboard
[](https://github.com/lenML/lenml-llm-leaderboard/stargazers)
[](https://github.com/lenML/lenml-llm-leaderboard/blob/main/LICENSE)
A leaderboard focused on evaluating practical open-source language models. We only test models that are:
- Locally deployable
- Quantized
- Runnable with 20GB or less VRAM
🔗 Online Leaderboard: [https://lenml.github.io/lenml-llm-leaderboard/](https://lenml.github.io/lenml-llm-leaderboard/)
## Why Another Leaderboard?
Current open-source model evaluations face several limitations:
- Most leaderboards focus solely on English capabilities or standardized test scores
- Primary emphasis on large models (100B+ parameters), which lack practicality
- Evaluation methods are too academic and fail to reflect real-world usage
- Limited coverage of community models, especially ERP variants
## Evaluation Metrics
We've designed a set of metrics that better align with real-world usage scenarios:
| Metric | Description |
|--------|-------------|
| Hardcore | Evaluates model knowledge in specific (you known) niche domains |
| Reject | Tests model's tendency to refuse responses (lower is better) |
| Creative | Assesses creative writing capabilities |
| Long | Measures accuracy in generating content of specified length |
| ACG | Evaluates knowledge of Anime, Comics, and Games (ACG culture) |
## Roadmap
- [ ] Custom evaluation formula support
- [ ] Custom test data support
- [ ] Automated evaluation implementation
- [ ] Additional evaluation dimensions (e.g., lateral thinking puzzles)
## Contributing
Issues and Pull Requests are welcome to help improve this project!
## License
[GPL-3.0 License](LICENSE)