{"id":13595417,"url":"https://github.com/danieldeutsch/repro","last_synced_at":"2025-04-09T13:31:59.415Z","repository":{"id":40451325,"uuid":"387634683","full_name":"danieldeutsch/repro","owner":"danieldeutsch","description":"Repro is a library for easily running code from published papers via Docker.","archived":false,"fork":false,"pushed_at":"2023-09-22T11:11:34.000Z","size":858,"stargazers_count":40,"open_issues_count":6,"forks_count":6,"subscribers_count":1,"default_branch":"master","last_synced_at":"2024-07-17T12:56:39.336Z","etag":null,"topics":["docker","machine-learning","nlp","reproducibility","reproducible-research"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/danieldeutsch.png","metadata":{"files":{"readme":"Readme.md","changelog":"Changelog.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2021-07-20T01:11:37.000Z","updated_at":"2024-01-04T16:59:32.000Z","dependencies_parsed_at":"2024-01-16T22:18:40.782Z","dependency_job_id":"bf49045a-d28d-4c34-9ae9-c647424c46af","html_url":"https://github.com/danieldeutsch/repro","commit_stats":{"total_commits":154,"total_committers":3,"mean_commits":"51.333333333333336","dds":"0.10389610389610393","last_synced_commit":"fa7be030ab3aeb67600c2185110764707b5cabb2"},"previous_names":[],"tags_count":10,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danieldeutsch%2Frepro","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danieldeutsch%2Frepro/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danieldeutsch%2Frepro/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danieldeutsch%2Frepro/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/danieldeutsch","download_url":"https://codeload.github.com/danieldeutsch/repro/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":213549713,"owners_count":15604012,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["docker","machine-learning","nlp","reproducibility","reproducible-research"],"created_at":"2024-08-01T16:01:49.622Z","updated_at":"2024-11-06T18:30:33.145Z","avatar_url":"https://github.com/danieldeutsch.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# Repro\n![Master](https://github.com/danieldeutsch/repro/workflows/Master/badge.svg?branch=master\u0026event=push)\n![Documentation](https://readthedocs.org/projects/repro/badge/?version=latest)\n\nRepro is a library for reproducing results from research papers.\nFor now, it is focused on making predictions with pre-trained models as easy as possible.\n\nCurrently, running pre-trained models can be difficult to do.\nSome models require specific versions of dependencies, require complicated preprocessing steps, have their own input and output formats, are poorly documented, etc.\n\nRepro addresses these problems by packaging each of the pre-trained models in its own Docker container, which includes the pre-trained models themselves as well as all of the code, dependencies, and environment setup required to run them.\nThen, Repro provides lightweight Python code to read the input data, pass the data to a Docker container, run prediction in the container, and return the output to the user.\nSince the complicated model-specific code is isolated within Docker, the user does not need to worry about setting up the environment correctly or know how the model is implemented at all.\n**As long as you have a working Docker installation, then you can run every model included in repro with no additional effort.**\nIt should \"just work\" (at least that is the goal).\n\n## Installation Instructions\nFirst, you need to have a working Docker installation.\nSee [here](https://repro.readthedocs.io/en/latest/tutorials/docker.html) for installation instructions as well as scripts to verify your setup is working.\n\nThen, we recommend creating a conda environment specific to repro before installing the library:\n```shell script\nconda create -n repro python=3.6\nconda activate repro\n```\n\nFor users:\n```shell script\npip install repro\n```\n\nFor developers:\n```shell script\ngit clone https://github.com/danieldeutsch/repro\ncd repro\npip install --editable .\npip install -r dev-requirements.txt\n```                                       \n\n## Example Usage\nHere is an example of how Repro can be used, highlighting how simple it is to run a complex model pipeline.\nWe will demonstrate how to generate summaries of a document with three different models\n\n- BertSumExtAbs from [Liu \u0026 Lapata (2019)](https://arxiv.org/abs/1908.08345) ([docs](https://repro.readthedocs.io/en/latest/models/liu2019.html))\n- BART from [Lewis et al. (2020)](https://arxiv.org/abs/1910.13461) ([docs](https://repro.readthedocs.io/en/latest/models/lewis2020.html))\n- GSum from [Dou et al. (2021)](https://arxiv.org/abs/2010.08014) ([docs](https://repro.readthedocs.io/en/latest/models/dou2021.html))\n\nand then evaluate those summaries with three different text generation evaluation metrics\n\n- ROUGE from [Lin (2004)](https://aclanthology.org/W04-1013/) ([docs](https://repro.readthedocs.io/en/latest/models/lin2004.html))\n- BLEURT from [Sellam et al. (2020)](https://arxiv.org/abs/2004.04696) ([docs](https://repro.readthedocs.io/en/latest/models/sellam2020.html))\n- QAEval from [Deutsch et al. (2021)](https://arxiv.org/abs/2010.00490) ([docs](https://repro.readthedocs.io/en/latest/models/deutsch2021.html))\n\nOnce you have Docker and Repro installed, all you have to do is instantiate the classes and run `predict`:\n\n```python\nfrom repro.models.liu2019 import BertSumExtAbs\nfrom repro.models.lewis2020 import BART\nfrom repro.models.dou2021 import SentenceGSumModel\n\n# Each of these classes uses the pre-trained weights that we want to use\n# by default, but you can specify others if you want to\nliu2019 = BertSumExtAbs()\nlewis2020 = BART()\ndou2021 = SentenceGSumModel()\n\n# Here's the document we want to summarize (it's not very long,\n# but you get the point)\ndocument = (\n    \"Joseph Robinette Biden Jr. was elected the 46th president of the United States \"\n    \"on Saturday, promising to restore political normalcy and a spirit of national \"\n    \"unity to confront raging health and economic crises, and making Donald J. Trump \"\n    \"a one-term president after four years of tumult in the White House.\"\n)\n\n# Now, run `predict` to generate the summaries from the models\nsummary1 = liu2019.predict(document)\nsummary2 = lewis2020.predict(document)\nsummary3 = dou2021.predict(document)\n\n# Import the evaluation metrics. We call them \"models\" even though\n# they are metrics\nfrom repro.models.lin2004 import ROUGE\nfrom repro.models.sellam2020 import BLEURT\nfrom repro.models.deutsch2021 import QAEval\n\n# Like the summarization models, each of these classes take parameters,\n# but we just use the defaults\nrouge = ROUGE()\nbleurt = BLEURT()\nqaeval = QAEval()\n\n# Here is the reference summary we will use\nreference = (\n    \"Joe Biden was elected president of the United States after defeating Donald Trump.\"\n)\n\n# Then evaluate the summaries\nfor summary in [summary1, summary2, summary3]:\n    metrics1 = rouge.predict(summary, [reference])\n    metrics2 = bleurt.predict(summary, [reference])\n    metrics3 = qaeval.predict(summary, [reference])\n```\n\nBehind the scenes, Repro is running each model and metric in its own Docker container.\n`BertSumExtAbs`  is tokenizing and sentence splitting the input document with Stanford CoreNLP, then running BERT with `torch==1.1.0` and `transformers==1.2.0`.\n`BLEURT` is running `tensorflow==2.2.2` to score the summary with a learned metric.\n`QAEval` is chaining together pretrained question generation and question answering models with `torch==1.6.0` to evaluate the model outputs.\n**But you don't need to know about any of that to run the models!**\nAll of the complex logic and environment details are taken care of by the Docker container, so all you have to do is call `predict()`.\nIt's that simple!\n\nAbstracting the implementation details away in a Docker image is really useful for chaining together a complex NLP pipeline.\nIn this example, we summarize a document, ask a question, then evaluate how likely the QA prediction and expected answer mean the same thing.\nThe models used are:\n\n- BART from [Lewis et al. (2020)](https://arxiv.org/abs/1910.13461) ([docs](https://repro.readthedocs.io/en/latest/models/lewis2020.html))\n- A neural module network QA model from [Gupta et al. (2020)](https://arxiv.org/abs/1912.04971) ([docs](https://repro.readthedocs.io/en/latest/models/gupta2020.html))\n- LERC from [Chen et al. (2020)](https://arxiv.org/abs/2010.03636) ([docs](https://repro.readthedocs.io/en/latest/models/chen2020.html))\n\n```python\nfrom repro.models.chen2020 import LERC\nfrom repro.models.gupta2020 import NeuralModuleNetwork\nfrom repro.models.lewis2020 import BART\n\ndocument = (\n    \"Roger Federer is a Swiss professional tennis player. He is ranked \"\n    \"No. 9 in the world by the Association of Tennis Professionals (ATP). \"\n    \"He has won 20 Grand Slam men's singles titles, an all-time record \"\n    \"shared with Rafael Nadal and Novak Djokovic. Federer has been world \"\n    \"No. 1 in the ATP rankings a total of 310 weeks – including a record \"\n    \"237 consecutive weeks – and has finished as the year-end No. 1 five times.\"\n)\n\n# First, summarize the document\nbart = BART()\nsummary = bart.predict(document)\n\n# Now, ask a question using the summary\nquestion = \"How many grand slam titles has Roger Federer won?\"\nanswer = \"twenty\"\n\nnmn = NeuralModuleNetwork()\nprediction = nmn.predict(summary, question)\n\n# Check to see if the expected answer (\"twenty\") and prediction (\"20\") mean the\n# same thing in the summary\nlerc = LERC()\nscore = lerc.predict(summary, question, answer, prediction)\n```\n\nMore details on how to use the models implemented in Repro can be found [here](https://repro.readthedocs.io/en/latest/tutorials/using-models.html).\n\n## Models Implemented in Repro\nSee [this page](https://repro.readthedocs.io/en/latest/models/index.html) to see the list of papers with models currently supported by Repro.\nEach model's documentation contains information about how to use it as well as whether or not it currently reproduces the results reported in its respective paper or if it hasn't been tested yet.\nIf it has been tested, the code to reproduce the results is also included.\n\n## Contributing a Model\nSee the tutorial [here](https://repro.readthedocs.io/en/latest/tutorials/adding-a-model.html) for instructions on how to add a new model.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdanieldeutsch%2Frepro","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdanieldeutsch%2Frepro","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdanieldeutsch%2Frepro/lists"}