Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jayantaadhikary/dataset-error-reduction
Reducing Error in NLP Datasets using LLMs
https://github.com/jayantaadhikary/dataset-error-reduction
Last synced: 3 days ago
JSON representation
Reducing Error in NLP Datasets using LLMs
- Host: GitHub
- URL: https://github.com/jayantaadhikary/dataset-error-reduction
- Owner: jayantaadhikary
- Created: 2024-01-25T05:14:48.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2024-06-01T06:22:45.000Z (7 months ago)
- Last Synced: 2024-06-02T07:50:27.259Z (7 months ago)
- Language: Jupyter Notebook
- Size: 6.42 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome_ai_agents - Dataset-Error-Reduction - Reducing Error in NLP Datasets using LLMs (Building / Datasets)
- awesome_ai_agents - Dataset-Error-Reduction - Reducing Error in NLP Datasets using LLMs (Building / Datasets)
README
# Dataset Error Reduction
To reduce error in NLP Datasets using LLMs.
The datasets used currently are [SQuAD](https://rajpurkar.github.io/SQuAD-explorer) and [RACE](https://www.cs.cmu.edu/~glai1/data/race) which are Extractive Question Answering Datasets.
### Use in your system
Fork and Clone this Repository.
1. If you are using a OpenAI API key, create a `.env` file in the same directory and add your OpenAI API key.`OPENAI_API_KEY = YOUR_API_KEY`
2. If you are using a LLM model locally using a local server, then use the files inside of the LocalLLM folder instead of the default files. The folder /ollama consists of my notebooks which were using Ollama to run LLMs locally.
PS - I use [LM Studio](https://lmstudio.ai/) and [Ollama](https://ollama.ai)to run the LLMs locally.
#### References:
1. [SQuAD: 100,000+ Questions for Machine Comprehension of Text](https://aclanthology.org/D16-1264) (Rajpurkar et al., EMNLP 2016)
2. [RACE: Large-scale ReAding Comprehension Dataset From Examinations](https://aclanthology.org/D17-1082) (Lai et al., EMNLP 2017)