Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/roy0428/ADL_Final
NTU ADL 2023
https://github.com/roy0428/ADL_Final
Last synced: 22 days ago
JSON representation
NTU ADL 2023
- Host: GitHub
- URL: https://github.com/roy0428/ADL_Final
- Owner: roy0428
- Created: 2023-12-19T04:10:40.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-03-28T07:26:55.000Z (3 months ago)
- Last Synced: 2024-03-28T08:34:06.924Z (3 months ago)
- Language: Python
- Homepage:
- Size: 6.86 MB
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Lists
- my-awesome-stars - roy0428/ADL_Final - NTU ADL 2023 (Python)
README
# Chinese Typo Correction with Taiwan-LLaMa
## Abstract
Developed a language model capable of identifying commonly misused words with an accuracy of 98.6%, surpassing the performance of GPT-4, which achieved only 82% accuracy.## Data Generation and Preprocessing
Data Generation
```
python3 generator.py --number_of_data n --output_dir /path/to/output.json
```
Data Preprocessing
```
python3 preprocessing.py --data_dir /path/to/output.json --output_dir_0 /path/to/zero_shot.json --output_dir_1 /path/to/one_shot.json --output_dir_2 /path/to/two_shot.json
```
Do the following to process the training data
```
python3 generator.py \
--number_of_data 1000 \
--output_dir data/output.jsonpython3 preprocessing.py \
--data_dir data/output.json \
--output_dir_0 data/train_1000_zero_shot.json \
--output_dir_1 data/train_1000_one_shot.json \
--output_dir_2 data/train_1000_two_shot.json
```## Training
Training
```
accelerate launch -m axolotl.cli.train examples/llama-2/qlora_final.yml --datasets.path="/path/to/dataset" --output_dir="/path/to/output/"
```
or modify the training_final.sh and do the following
```
bash training_final.sh
```
## Inference and Evaluation
Inference and Evaluation
```
bash run.sh /path/to/Taiwan-LLM-7B-v2.0-chat/ /path/to/qlora-out/ /path/to/test.json/ /path/to/prediction.json/ /path/to/combined_prediction.json/
```
or modify the inference.sh and do the following
```
bash inference.sh
```