https://github.com/wb-az/ai-performance-analytics-llms
This repository is created to compared foundational models at different task, using visualisation and statistics.
https://github.com/wb-az/ai-performance-analytics-llms
anova bard-ai contingency-analysis contingency-table foundational-models kruskal-wallis large-language-models likert-data pandas-python plotly-python posthoc-tests prompts scikit-posthocs scipy scipy-optimize scipy-stats statsmodels
Last synced: about 2 months ago
JSON representation
This repository is created to compared foundational models at different task, using visualisation and statistics.
- Host: GitHub
- URL: https://github.com/wb-az/ai-performance-analytics-llms
- Owner: Wb-az
- Created: 2025-02-21T16:34:09.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2025-03-14T11:32:13.000Z (about 2 months ago)
- Last Synced: 2025-03-14T12:23:11.686Z (about 2 months ago)
- Topics: anova, bard-ai, contingency-analysis, contingency-table, foundational-models, kruskal-wallis, large-language-models, likert-data, pandas-python, plotly-python, posthoc-tests, prompts, scikit-posthocs, scipy, scipy-optimize, scipy-stats, statsmodels
- Language: Jupyter Notebook
- Homepage:
- Size: 1.82 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# AI Analytics - Comparing Performance of GenAI Models
This repository was created to compare the performance of foundational models at different tasks and levels of complexity, using visualisation and statistics.
**Data**:
Data was facilitated by [DataAnnotationTech](https://www.dataannotation.tech/)
**Categories**:
1. Adversarial Dishonesty
2. Adversarial Harmfulness
3. Brain Storming
4. Classification
5. Closed QA
6. Creative Writing
7. Extraction
8. Mathematical Reasoning
10. Open QA
11. Rewriting
12. Poetry
13. Rewriting
14. Sumarization**Likertype rating scale**
1. Bard much better
2. Bard better
3. Bard slightly better
4. About the same
5. ChatGPT slightly better
6. ChatGPT better
7. Chat GPT much betterTools used: ```pandas```, ```plotly```, ```statsmodels``` and ```scipy``` and ```scikit-posthocs```