{"id":19600244,"url":"https://github.com/habedi/rater-competition-solution","last_synced_at":"2026-06-10T18:32:03.005Z","repository":{"id":242626649,"uuid":"810086399","full_name":"habedi/RATER-Competition-Solution","owner":"habedi","description":"My final solution for the RATER competition","archived":false,"fork":false,"pushed_at":"2024-07-26T17:16:46.000Z","size":40482,"stargazers_count":1,"open_issues_count":7,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-26T15:31:11.741Z","etag":null,"topics":["deep-learning","language-model","machine-learning","machine-learning-competition","nlp","python"],"latest_commit_sha":null,"homepage":"https://the-learning-agency.com/robust-algorithms-for-thorough-essay-rating/overview/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/habedi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-04T02:48:19.000Z","updated_at":"2024-07-26T17:16:35.000Z","dependencies_parsed_at":"2024-06-04T04:37:29.370Z","dependency_job_id":null,"html_url":"https://github.com/habedi/RATER-Competition-Solution","commit_stats":null,"previous_names":["habedi/rater-competition-solution"],"tags_count":0,"template":false,"template_full_name":"habedi/template-python-project","purl":"pkg:github/habedi/RATER-Competition-Solution","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/habedi%2FRATER-Competition-Solution","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/habedi%2FRATER-Competition-Solution/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/habedi%2FRATER-Competition-Solution/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/habedi%2FRATER-Competition-Solution/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/habedi","download_url":"https://codeload.github.com/habedi/RATER-Competition-Solution/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/habedi%2FRATER-Competition-Solution/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34165482,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-10T02:00:07.152Z","response_time":89,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","language-model","machine-learning","machine-learning-competition","nlp","python"],"created_at":"2024-11-11T09:14:08.887Z","updated_at":"2026-06-10T18:32:02.991Z","avatar_url":"https://github.com/habedi.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# My Solution for the RATER Competition\n\nThis repository contains my solution for\nthe [RATER competition](https://the-learning-agency.com/robust-algorithms-for-thorough-essay-rating/overview/).\n\n## Installing the Dependencies\n\nTo run the code in this repository, you must have `Python 3.8` or newer, `PyTorch`, `Transformers`, and `LightGBM` among\nother things like `Pandas` and `Scikit-learn` installed on your machine.\nThe [requirements.txt](requirements.txt) file lists the required Python packages with the correct versions that you can\ninstall using the pip package manager.\n\n```bash\npip install -r requirements.txt\n```\n\nAlso, you may want to install a few additional CLI tools to make it easier to work with the code in this repository.\nYou can install these tools using the [install_useful_cli_tools.sh](bin/install_useful_cli_tools.sh) script.\n\n```bash\nsudo bash bin/install_useful_cli_tools.sh\n```\n\nThe installation script works on Ubuntu and other Debian-based systems. If you are using a different operating system,\nyou may need to install the tools manually.\n\n## The Models\n\nThe solution consists of two main parts: segmenting the essay into discourse elements and detecting the effectiveness of\nthe discourse elements.\n\nThe models used in the solution are stored in the [models](models) directory. The figure shows how the models are used\nin the inference pipeline:\n\n![pipeline_architecture](data/assets/pipeline_architecture.png)\n\n### Segmentation Models\n\nThe models used for segmenting the essay into discourse elements are primarily from the work of the team that ranked\nsecond in the (final) private leaderboard of\nthe [Feedback Prize - Evaluating Student Writing](https://www.kaggle.com/competitions/feedback-prize-2021/) Kaggle\ncompetition.\nTheir solution is available [here](https://www.kaggle.com/competitions/feedback-prize-2021/discussion/313389), including\nthe solution details,\nmodel weights, and the code for training and inference.\n\n### Effectiveness Detection Model\n\nThe model for detecting the effectiveness of the discourse elements is an ensemble of `LightGBM` models trained on the\ntext of the discourse elements and the essays.\nThe models are stored in the [effectiveness_models.pkl](models/effectiveness_models.pkl) file. The code for training the\nmodels and inference is in the [predict_effectiveness.py](bin/predict_effectiveness.py) file.\n\n### Training Data and Code, and Inference Code\n\nThe table below includes the references to the training data and the code for training and inference for the models used\nin\nthe solution:\n\n| Index | Training Data                                                                                                                | Training Code                                                                            | Inference Code                                           | Description                                                                               |\n|-------|------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------|----------------------------------------------------------|-------------------------------------------------------------------------------------------|\n| 1     | [https://www.kaggle.com/competitions/feedback-prize-2021/data](https://www.kaggle.com/competitions/feedback-prize-2021/data) | [https://github.com/ubamba98/feedback-prize](https://github.com/ubamba98/feedback-prize) | [generate_predictions.py](bin/generate_predictions.py)   | The training data, and training and inference code for the segmentation models.           |\n| 2     | [competition_data](data/competition_data)                                                                                    | [predict_effectiveness.py](bin/predict_effectiveness.py)                                 | [predict_effectiveness.py](bin/predict_effectiveness.py) | The training data, and training and inference code for the effectiveness detection model. |\n\nAs mentioned earlier, the pretrained models from the solution of the team that ranked second in the private leaderboard\nof\nthe [Feedback Prize - Evaluating Student Writing](https://www.kaggle.com/competitions/feedback-prize-2021/) Kaggle\ncompetition was used for the segmentation. Please see this post for more details about their solution:\n[https://www.kaggle.com/competitions/feedback-prize-2021/discussion/313389](https://www.kaggle.com/competitions/feedback-prize-2021/discussion/313389).\n\nThe table below shows the individual inference score of each segmentation model on RATER's public leaderboard. Measurements were\ntaken on a machine with an NVIDIA RTX A6000 GPU.\n\n| Model                                                              | Score  |\n|--------------------------------------------------------------------|--------|\n| [lsg](models/model_pack_1/auglsgrobertalarge)                      | N/A    |\n| [bigbird_base_chris](models/model_pack_1/bird-base)                | N/A    |\n| [yoso](models/model_pack_1/feedbackyoso)                           | N/A    |\n| [funnel](models/model_pack_1/funnel-large-6folds)                  | 0.6044 |\n| [debertawithlstm](models/model_pack_1/models/model_pack_1)         | 0.6288 |\n| [deberta_v2](models/model_pack_1/deberta-v2-xlarge)                | 0.6105 |\n| [debertal_chris](models/model_pack_1/deberta-large-100)            | 0.6229 |\n| [debertal](models/model_pack_1/deberta-large-v2)                   | 0.6150 |\n| [debertaxl](models/model_pack_1/deberta-xlarge-1536)               | 0.6154 |\n| [longformerwithlstm](models/model_pack_1/longformerwithbilstmhead) | 0.6185 |\n\n### Downloading the Segmentation Models\n\nYou can download the segmentation models, if you haven't already, by running the following commands in the root directory\nof this repository:\n\n```bash\nbash bin/download_models.sh \u0026\u0026 bash bin/make_model_pack.sh\n```\n\nNote that the size of the models is around 80 GB, so the download may take a while. \n\n## Running the Inference Pipeline\n\nYou can execute the end-to-end inference pipeline by running the [do_inference.sh](do_inference.sh) script.\nThe script will perform inference on the test data stored in the [test](data/competition_data/test) data directory and\ncreate a CSV file named `final_submission.csv` in the current directory.\n\nTo run the pipeline, you can use the following command:\n\n```bash\nbash do_inference.sh --clean --fast\n```\n\nOr, if you want to run the pipeline in the background, you can use the following command:\n\n```bash\nnohup bash do_inference.sh --clean --fast \u003e inference.log 2\u003e\u00261 \u0026\n```\n\nThe `--clean` flag is optional. If you use it, the pipeline will remove the old CSV files and temporary files and\nfolders created during the previous runs of the pipeline. If you use the `--fast` flag, only the\nbest segmentation model will be used for the inference, and the effectiveness detection model will be skipped.\nThis will make the pipeline run much faster.\n\nThe `inference.log` file will contain the output of the pipeline. You can monitor the progress of the pipeline by\nrunning the following command:\n\n```bash\nccze -m ansi \u003c inference.log\n```\n\nI'm assuming that you have the `ccze` tool installed on your machine.\n(If not, running the [install_useful_cli_tools.sh](bin/install_useful_cli_tools.sh) script will install it for you.)\n\nBelow, you can see the playback of an example inference log (inference is done with the best segmentation model only, which is\n`debertawithlstm`):\n\n![playback](data/runs/28apr-3/inference_log.gif)\n\nPlease note that if the `--fast` flag is not set, the pipeline may take a long time to finish.\nIf you want to make the inference on machines with GPUs with less memory, you may want to reduce the batch sizes (to\naround 4 or 8) in the [run_inference.sh](bin/run_inference.sh) script\nto avoid getting out-of-memory errors.\n\n## The Predictions File\n\nWhen the pipeline finishes successfully, a file named [final_submission.csv](final_submission.csv) should be created in\nthe current directory.\nThe file contains the predictions for the test set in the required format for the competition and should be ready for\nsubmission.\n\n## License\n\nMost of the files in this repository are licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\nThe exceptions are the files in the [competition_data](data/competition_data) directory and those in\nthe [model_pack_1](models/model_pack_1) directory.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhabedi%2Frater-competition-solution","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhabedi%2Frater-competition-solution","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhabedi%2Frater-competition-solution/lists"}