Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/thomasthaddeus/dscomp
This repository is for a sharing work from a competition on kaggle were teamed up on.
https://github.com/thomasthaddeus/dscomp
data-science jupyter-notebook nlp nlp-keywords-extraction python3 sentiment-analysis
Last synced: 3 months ago
JSON representation
This repository is for a sharing work from a competition on kaggle were teamed up on.
- Host: GitHub
- URL: https://github.com/thomasthaddeus/dscomp
- Owner: thomasthaddeus
- License: mit
- Created: 2023-07-28T21:57:09.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-10-12T08:14:25.000Z (about 1 year ago)
- Last Synced: 2024-10-13T00:06:02.910Z (3 months ago)
- Topics: data-science, jupyter-notebook, nlp, nlp-keywords-extraction, python3, sentiment-analysis
- Language: Jupyter Notebook
- Homepage:
- Size: 2.94 MB
- Stars: 3
- Watchers: 2
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# DS-Competition
[![Pylint](https://github.com/thomasthaddeus/DS-Competition/actions/workflows/pylint.yml/badge.svg?branch=dev)](https://github.com/thomasthaddeus/DS-Competition/actions/workflows/pylint.yml)
This repository holds in progress work.
## Distribution of Work
1. **Eli - Data Collection and Preprocessing Specialist**: This person would be responsible for collecting the necessary data for the project, cleaning it, and transforming it into a format that can be used for model training. They would also handle any necessary data augmentation.
2. **Thad - Feature Engineer**: This person would be responsible for creating new features from the existing data that might help improve the model's performance. They would work closely with the Data Collection and Preprocessing Specialist to understand the data and come up with effective features.
3. **Model Developer 1**: This person would be responsible for selecting a suitable model, training it, and tuning its parameters. They would work closely with the Feature Engineer to understand the features and how they can be used in the model.
4. **Model Developer 2**: This person would also be responsible for model development. Having two people on this task allows for parallel experimentation with different models or different sets of parameters, which can speed up the process and potentially lead to better results.
5. **Validation and Testing Specialist**: This person would be responsible for evaluating the model's performance using a validation set and making adjustments to the model if necessary. They would work closely with the Model Developers to understand the models and how they can be improved.
6. **Person 6 - Submission and Documentation Manager / Infrastructure Manager**: This person would be responsible for submitting the team's entries to the competition, documenting the team's work, and managing the infrastructure needed for model training. This includes keeping track of the different models that were tried, the features that were used, and the performance of each model. They would also handle any necessary setup and management of cloud resources, and manage the team's code using a version control system like Git.
## Users
1. Thad - Data Collection and Preprocessing Specialist
2. Eli - Feature Engineer
3. Nicholas## Project Structure
- `.github`: don't touch this folder
- `/data`: all data should be stored here
- `/models`: store learning models here
- `/notebooks`: put all notebooks here under your folder
- `/src`: any source code you need to import for your notebook to work
Directory Tree
.
├── data
│ ├── eval_student_summaries
│ │ ├── prompts_test.csv
│ │ ├── prompts_train.csv
│ │ ├── sample_submission.csv
│ │ ├── summaries_test.csv
│ │ └── summaries_train.csv
│ └── json
├── LICENSE
├── models
├── notebooks
│ └── sample_notebk.ipynb
├── README.md
├── requirements.txt
├── sitemap.html
├── src
│ ├── evaluation
│ ├── prep
│ │ ├── data_prep.py
│ │ └── text_prep.py
│ ├── scripts
│ └── visualize
└── tests
## Setup and Installation
Instructions for setting up and installing any necessary software or libraries.
If you want to use weights and biases here is the [link](https://wandb.ai/site/research)
## Usage
Instructions for how to run the code.
[How to setup the virtual environment.](./docs/venv_setup.md)
## License
[MIT](./LICENSE)