https://github.com/lenml/lenml-llm-leaderboard

lenml llm leaderboard
https://github.com/lenml/lenml-llm-leaderboard

Last synced: 5 months ago
JSON representation

lenml llm leaderboard

Host: GitHub
URL: https://github.com/lenml/lenml-llm-leaderboard
Owner: lenML
License: agpl-3.0
Created: 2024-11-23T01:13:59.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-01-11T05:51:14.000Z (over 1 year ago)
Last Synced: 2025-04-03T11:44:10.001Z (about 1 year ago)
Language: TypeScript
Homepage: https://lenml.github.io/lenml-llm-leaderboard/
Size: 992 KB
Stars: 7
Watchers: 1
Forks: 1
Open Issues: 2
Metadata Files:
- Readme: README.en.md
- License: LICENSE

Awesome Lists containing this project

README

          [cn](./README.md) | [en](./README.en.md)

# LenML LLM Leaderboard

[![GitHub stars](https://img.shields.io/github/stars/lenML/lenml-llm-leaderboard)](https://github.com/lenML/lenml-llm-leaderboard/stargazers)

[![GitHub license](https://img.shields.io/github/license/lenML/lenml-llm-leaderboard)](https://github.com/lenML/lenml-llm-leaderboard/blob/main/LICENSE)

A leaderboard focused on evaluating practical open-source language models. We only test models that are:

- Locally deployable

- Quantized

- Runnable with 20GB or less VRAM

🔗 Online Leaderboard: [https://lenml.github.io/lenml-llm-leaderboard/](https://lenml.github.io/lenml-llm-leaderboard/)

## Why Another Leaderboard?

Current open-source model evaluations face several limitations:

- Most leaderboards focus solely on English capabilities or standardized test scores

- Primary emphasis on large models (100B+ parameters), which lack practicality

- Evaluation methods are too academic and fail to reflect real-world usage

- Limited coverage of community models, especially ERP variants

## Evaluation Metrics

We've designed a set of metrics that better align with real-world usage scenarios:

| Metric | Description |

|--------|-------------|

| Hardcore | Evaluates model knowledge in specific (you known) niche domains |

| Reject | Tests model's tendency to refuse responses (lower is better) |

| Creative | Assesses creative writing capabilities |

| Long | Measures accuracy in generating content of specified length |

| ACG | Evaluates knowledge of Anime, Comics, and Games (ACG culture) |

## Roadmap

- [ ] Custom evaluation formula support

- [ ] Custom test data support

- [ ] Automated evaluation implementation

- [ ] Additional evaluation dimensions (e.g., lateral thinking puzzles)

## Contributing

Issues and Pull Requests are welcome to help improve this project!

## License

[GPL-3.0 License](LICENSE)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/lenml/lenml-llm-leaderboard

Awesome Lists containing this project

README