{"id":21561526,"url":"https://github.com/subhadarship/nlp4if-2021","last_synced_at":"2026-01-07T00:57:05.231Z","repository":{"id":90983830,"uuid":"346162394","full_name":"subhadarship/nlp4if-2021","owner":"subhadarship","description":"Cross-lingual misinformation detection","archived":false,"fork":false,"pushed_at":"2022-12-26T11:11:59.000Z","size":14424,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-02-02T13:49:37.186Z","etag":null,"topics":["bert","cross-lingual","misinformation","multilingual-bert"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/subhadarship.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-03-09T22:32:03.000Z","updated_at":"2023-04-11T15:35:34.000Z","dependencies_parsed_at":null,"dependency_job_id":"884b69d7-5e2a-447a-8ed3-0f21ca53159c","html_url":"https://github.com/subhadarship/nlp4if-2021","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/subhadarship%2Fnlp4if-2021","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/subhadarship%2Fnlp4if-2021/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/subhadarship%2Fnlp4if-2021/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/subhadarship%2Fnlp4if-2021/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/subhadarship","download_url":"https://codeload.github.com/subhadarship/nlp4if-2021/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246036302,"owners_count":20713218,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bert","cross-lingual","misinformation","multilingual-bert"],"created_at":"2024-11-24T09:27:01.279Z","updated_at":"2026-01-07T00:57:05.200Z","avatar_url":"https://github.com/subhadarship.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Cross-lingual misinformation detection\n\nThis repo contains the code for cross-lingual misinformation detection. See paper 📔 [here](https://aclanthology.org/2021.nlp4if-1.19).\n\n## Quick start\n\nInstall PyTorch 1.1.0 from the [official website](https://pytorch.org/). Install other dependencies\nin `requirements.txt`.\n\n### Prepare data\n\nFor details of the data, see\n\n- https://gitlab.com/NLP4IF/nlp4if-2021\n- https://www.aclweb.org/portal/content/nlp4if-2021-shared-tasks\n\n```\ncd src\npython prepare_data.py  # prepare data without using additional data\npython prepare_data_additional.py  # prepare data with using additional data\n```\n\nAnalysis of the data is available in `notebooks/analyze_data.ipynb` and `notebooks/analyze_data_additional.ipynb`.\n\n### Training\n\nChoose the appropriate file in the `bash` folder to train without using additional data or the folder `bash_additional`\nto use additional data for training. For example, if you want to fine-tune multilingual BERT with source language\nEnglish while using the additional data, run the following command lines.\n\n```\ncd bash_additional\nchmod +x train_multilingual_bert_src_en.sh\n./train_multilingual_bert_src_en.sh\n```\n\nThe training logs are saved in the specified file, the argument for which is `--log_file_path`. The log file also stores\nthe evaluation results after training completes.\n\n**Note**: To tabulate the results from the log files and pick the best hyperparameters across multiple runs,\nsee `notebooks/tabulate_results_v{1,2,3}.ipynb`.\n\n### Predict labels for the test set\n\n```\ncd bash_predict\nchmod +x predict_best_sys.sh\n./predict_best_sys.sh\n```\n\n#### Training logs\n\n- `logs_v1` contains the training logs while using own train-dev splits for en and ar and provided train and dev data\n  for bg.\n- `logs_v2` contains the training logs while using the provided train and dev data for all languages.\n- `logs` contains the training logs while using the provided additional train and dev data for all languages.\n\n## Citation\n\n```\n@inproceedings{detecting-multilingual-misinformation,\n    title = \"Detecting Multilingual {COVID}-19 Misinformation on Social Media via Contextualized Embeddings\",\n    author = \"Panda, Subhadarshi and Levitan, Sarah Ita\",\n    booktitle = \"Proceedings of the Fourth Workshop on Natural Language Processing for Internet Freedom: Censorship, Disinformation, and Propaganda\",\n    series = {NLP4IF@NAACL'~21},\n    month = {June},\n    year = \"2021\",\n    address = \"Online\",\n    publisher = \"Association for Computational Linguistics\",\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsubhadarship%2Fnlp4if-2021","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsubhadarship%2Fnlp4if-2021","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsubhadarship%2Fnlp4if-2021/lists"}