{"id":13816774,"url":"https://github.com/norakassner/mlama","last_synced_at":"2025-05-15T18:32:33.603Z","repository":{"id":54849631,"uuid":"296973014","full_name":"norakassner/mlama","owner":"norakassner","description":null,"archived":false,"fork":false,"pushed_at":"2024-01-22T10:41:55.000Z","size":9354,"stargazers_count":25,"open_issues_count":0,"forks_count":5,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-11-19T14:42:21.553Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/norakassner.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2020-09-20T00:15:52.000Z","updated_at":"2024-11-12T09:01:09.000Z","dependencies_parsed_at":"2024-04-06T22:42:12.827Z","dependency_job_id":null,"html_url":"https://github.com/norakassner/mlama","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/norakassner%2Fmlama","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/norakassner%2Fmlama/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/norakassner%2Fmlama/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/norakassner%2Fmlama/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/norakassner","download_url":"https://codeload.github.com/norakassner/mlama/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254397920,"owners_count":22064584,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-04T06:00:20.600Z","updated_at":"2025-05-15T18:32:31.275Z","avatar_url":"https://github.com/norakassner.png","language":"Python","funding_links":[],"categories":["Factual Knowledge Probes","Urdu Datasets"],"sub_categories":["General NLP Datasets"],"readme":"# mLAMA: multilingual LAnguage Model Analysis\n\nThis repository contains code for the EACL 2021 paper [\"Multilingual LAMA: Investigating Knowledge in Multilingual Pretrained Language Models\"](https://arxiv.org/abs/2102.00894).\nIt extends the original LAMA probe to the multilingual setting, e.g. it probes knowledge in pre-trained language models in a multilingual setting.\n\nThe repository is forked from https://github.com/facebookresearch/LAMA and adapted accordingly. \n\n## The mLAMA probe\n\nTo reproduce our results:\n\n### 1. Create conda environment and install requirements\n\n(optional) It might be a good idea to use a separate conda environment. It can be created by running:\n```\nconda create -n mlama -y python=3.7 \u0026\u0026 conda activate mlama\npip install -r requirements.txt\n```\n\nadd project to path:\n\nexport PYTHONPATH=${PYTHONPATH}:/path-to-project\n\n### 2. Download the data\n\n\n```bash\nwget http://cistern.cis.lmu.de/mlama/mlama1.1.zip\nunzip mlama1.1.zip\nrm mlama1.1.zip\nmv mlama1.1 data/mlama1.1/\n```\n\n### 3. Run the experiments\n\n```bash\npython scripts/run_experiments_mBERT_ranked.py --lang \"fr\"\npython scripts/eval.py\n```\n\n## The dataset\n\nCode to recreate the dataset can be found in the folder `dataset`. \n\nWe provide a class to read in the dataset in `dataset/reader.py`. Example for reading the data: \n```python\nml = MLama(\"data/mlama/\")\nml.load()\n```\n\n## Reference:\n\n```bibtex\n@inproceedings{kassner2021multilingual,\n    title = \"Multilingual {LAMA}: Investigating Knowledge in Multilingual Pretrained Language Models\",\n    author = {Kassner, Nora  and\n      Dufter, Philipp  and\n      Sch{\\\"u}tze, Hinrich},\n    booktitle = \"to appear in Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics\",\n    year = \"2021\",\n    address = \"Online\",\n    publisher = \"Association for Computational Linguistics\",\n}\n\n@inproceedings{petroni2019language,\n  title={Language Models as Knowledge Bases?},\n  author={F. Petroni, T. Rockt{\\\"{a}}schel, A. H. Miller, P. Lewis, A. Bakhtin, Y. Wu and S. Riedel},\n  booktitle={In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019},\n  year={2019}\n}\n```\n\n## Acknowledgements\n\n* [https://github.com/huggingface/pytorch-pretrained-BERT](https://github.com/huggingface/pytorch-pretrained-BERT)\n* [https://github.com/allenai/allennlp](https://github.com/allenai/allennlp)\n* [https://github.com/pytorch/fairseq](https://github.com/pytorch/fairseq)\n* https://github.com/facebookresearch/LAMA\n\n## Licence\n\nmLAMA is licensed under the CC-BY-NC 4.0 license. The text of the license can be found [here](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnorakassner%2Fmlama","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnorakassner%2Fmlama","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnorakassner%2Fmlama/lists"}