Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/yangalan123/datamap-project
https://github.com/yangalan123/datamap-project
Last synced: about 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/yangalan123/datamap-project
- Owner: yangalan123
- Created: 2021-04-09T05:07:21.000Z (almost 4 years ago)
- Default Branch: master
- Last Pushed: 2023-04-12T06:25:50.000Z (almost 2 years ago)
- Last Synced: 2024-04-20T13:04:22.294Z (9 months ago)
- Language: Python
- Size: 3.32 MB
- Stars: 2
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# DataMap Pipeline for Active Learning and Data Verification
Author: Chenghao Yang ([email protected])
## Requirements (include those from DataMap)
- loguru == 0.3.0
- transformers == 4.0.1
- torch == 1.7.0
- tqdm == 4.43.0
- seaborn == 0.10.0
- pandas == 1.0.3
- matplotlib == 3.2.1
- numpy == 1.18.2
- jsonnet == 0.15.0
- tensorboardx == 2.0
- torch == 1.4.0
- spacy == 2.1.9
- scikit-learn == 0.22.2.post1## How to Use
1. Read `config.py` and sample data in TestTask. Prepare your data as the same format as the sample data and doing necessary edits to `config.py`. Generally, you can play with the model hyper-parameter, seed, and the task name. Feel free to rename `TestTask/` to other names like `TASK_NAME/`, as long as you make corresponding changes in `config.py`.
1. Run `python main.py` to train the model.
1. Run `python data_processor.py` to do necessary post-editing for data files generated in `TASK_NAME/`. Then run `python log_processor.py` to prepare the training dynamics data.
1. Switch to `cartography/` by the command `cd cartography`.
1. Run `python -m cartography.selection.train_dy_filtering --task_name TASK_NAME --data_dir ../ --plots_dir ../datamap_figures --model_dir ../ --filter --metric Metrics --plot` to plot the datamap figure and filter hard-to-learn / ambiguous data. For more details about this command, please read README.md in `cartography/`.