{"id":13564711,"url":"https://github.com/feedly/transfer-nlp","last_synced_at":"2025-04-04T16:17:23.506Z","repository":{"id":57476919,"uuid":"175287526","full_name":"feedly/transfer-nlp","owner":"feedly","description":"NLP library designed for reproducible experimentation management","archived":false,"fork":false,"pushed_at":"2024-07-25T10:16:22.000Z","size":2946,"stargazers_count":294,"open_issues_count":4,"forks_count":16,"subscribers_count":10,"default_branch":"master","last_synced_at":"2025-03-28T15:11:12.140Z","etag":null,"topics":["framework","language-model","natural-language-understanding","nlp","playground","pytorch","transfer-learning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/feedly.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"docs/CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-03-12T20:00:31.000Z","updated_at":"2025-03-06T22:53:40.000Z","dependencies_parsed_at":"2024-09-29T06:18:07.536Z","dependency_job_id":null,"html_url":"https://github.com/feedly/transfer-nlp","commit_stats":{"total_commits":386,"total_committers":8,"mean_commits":48.25,"dds":0.2849740932642487,"last_synced_commit":"85515b73165c299b7a9b96d3608bd4e8ee567154"},"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/feedly%2Ftransfer-nlp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/feedly%2Ftransfer-nlp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/feedly%2Ftransfer-nlp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/feedly%2Ftransfer-nlp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/feedly","download_url":"https://codeload.github.com/feedly/transfer-nlp/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247208190,"owners_count":20901570,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["framework","language-model","natural-language-understanding","nlp","playground","pytorch","transfer-learning"],"created_at":"2024-08-01T13:01:34.841Z","updated_at":"2025-04-04T16:17:23.488Z","avatar_url":"https://github.com/feedly.png","language":"Python","funding_links":[],"categories":["Python","Pytorch \u0026 related libraries｜Pytorch \u0026 相关库","Pytorch \u0026 related libraries","文本数据和NLP"],"sub_categories":["NLP \u0026 Speech Processing｜自然语言处理 \u0026 语音处理:","NLP \u0026 Speech Processing:"],"readme":"\u003cimg src=\"https://github.com/feedly/transfer-nlp/blob/v0.1/data/images/TransferNLP_Logo.jpg\" width=\"1000\"\u003e\n\nWelcome to the Transfer NLP library, a framework built on top of PyTorch to promote reproducible experimentation and Transfer Learning in NLP\n\nYou can have an overview of the high-level API on this [Colab Notebook](https://colab.research.google.com/drive/1DtC31eUejz1T0DsaEfHq_DOxEfanmrG1#scrollTo=Xzu3HPdGrnza), which shows how to use the framework on several examples.\nAll DL-based examples on these notebooks embed in-cell Tensorboard training monitoring!\n\nFor an example of pre-trained model finetuning, we provide a short executable tutorial on BertClassifier finetuning on this [Colab Notebook](https://colab.research.google.com/drive/10Toyi0V4fp0Sn33RSPCkoPrtf5FVpm3q#scrollTo=PXJFfulWkEl6)\n\n# Set up your environment\n\n```\nmkvirtualenv transfernlp\nworkon transfernlp\n\ngit clone https://github.com/feedly/transfer-nlp.git\ncd transfer-nlp\npip install -r requirements.txt\n```\n\nTo use Transfer NLP as a library:\n\n```\n# to install the experiment builder only\npip install transfernlp\n# to install Transfer NLP with PyTorch and Transfer Learning in NLP support\npip install transfernlp[torch]\n```\nor \n```\npip install git+https://github.com/feedly/transfer-nlp.git\n```\nto get the latest state before new releases.\n\nTo use Transfer NLP with associated examples:\n\n```\ngit clone https://github.com/feedly/transfer-nlp.git\npip install -r requirements.txt\n```\n\n# Documentation\nAPI documentation and an overview of the library can be found [here](https://transfer-nlp.readthedocs.io/en/latest/)\n\n# Reproducible Experiment Manager\nThe core of the library is made of an experiment builder: you define the different objects that your experiment needs, and the configuration loader builds them in a nice way. For reproducible research and easy ablation studies, the library then enforces the use of configuration files for experiments.\nAs people have different tastes for what constitutes a good experiment file, the library allows for experiments defined in several formats:\n\n- Python Dictionary\n- JSON\n- YAML\n- TOML\n\nIn Transfer-NLP, an experiment config file contains all the necessary information to define entirely the experiment.\nThis is where you will insert names of the different components your experiment will use, along with the hyperparameters you want to use.\nTransfer-NLP makes use of the Inversion of Control pattern, which allows you to define any class / method / function you could need, the `ExperimentConfig` class will create a dictionnary and instatiate your objects accordingly.\n\nTo use your own classes inside Transfer-NLP, you need to register them using the `@register_plugin` decorator. Instead of using a different registry for each kind of component (Models, Data loaders, Vectorizers, Optimizers, ...), only a single registry is used here, in order to enforce total customization.\n\nIf you use Transfer NLP as a dev dependency only, you might want to use it declaratively only, and call `register_plugin()` on objects you want to use at experiment running time. \n\nHere is an example of how you can define an experiment in a YAML file:\n\n```\ndata_loader:\n  _name: MyDataLoader\n  data_parameter: foo\n  data_vectorizer:\n    _name: MyVectorizer\n    vectorizer_parameter: bar\n\nmodel:\n  _name: MyModel\n  model_hyper_param: 100\n  data: $data_loader\n\ntrainer:\n  _name: MyTrainer\n  model: $model\n  data: $data_loader\n  loss:\n    _name: PyTorchLoss\n  tensorboard_logs: $HOME/path/to/tensorboard/logs\n  metrics:\n    accuracy:\n      _name: Accuracy\n```\n\nAny object can be defined through a class, method or function, given a `_name` parameters followed by its own parameters.\nExperiments are then loaded and instantiated using `ExperimentConfig(experiment=experiment_path_or_dict)`\n\nSome considerations:\n\n- Defaults parameters can be skipped in the experiment file.\n\n- If an object is used in different places, you can refer to it using the `$` symbol, for example here the `trainer` object uses the `data_loader` instantiated elsewhere. No ordering of objects is required.\n\n- For paths, you might want to use environment variables so that other machines can also run your experiments.\nIn the previous example, you would run e.g. `ExperimentConfig(experiment=yaml_path, HOME=Path.home())` to instantiate the experiment and replace `$HOME` by your machine home path.\n\n- The config instantiation allows for any complex settings with nested dict / list\n\nYou can have a look at the [tests](https://github.com/feedly/transfer-nlp/blob/master/tests/plugins/test_config.py) for examples of experiment settings the config loader can build.\nAdditionally we provide runnable experiments in [`experiments/`](https://github.com/feedly/transfer-nlp/tree/master/experiments).\n\n# Transfer Learning in NLP: flexible PyTorch Trainers\nFor deep learning experiments, we provide a `BaseIgniteTrainer` in [`transfer_nlp.plugins.trainers.py`](https://github.com/feedly/transfer-nlp/blob/master/transfer_nlp/plugins/trainers.py).\nThis basic trainer will take a model and some data as input, and run a whole training pipeline. We make use of the [PyTorch-Ignite](https://github.com/pytorch/ignite) library to monitor events during training (logging some metrics, manipulating learning rates, checkpointing models, etc...). Tensorboard logs are also included as an option, you will have to specify a `tensorboard_logs` simple parameters path in the config file. Then just run `tensorboard --logdir=path/to/logs` in a terminal and you can monitor your experiment while it's training!\nTensorboard comes with very nice utilities to keep track of the norms of your model weights, histograms, distributions, visualizing embeddings, etc so we really recommend using it.\n\n\u003cimg src=\"https://github.com/feedly/transfer-nlp/blob/v0.1/data/images/tensorboard.png\" width=\"1000\"\u003e\n\nWe provide a `SingleTaskTrainer` class which you can use for any supervised setting dealing with one task.\nWe are working on a `MultiTaskTrainer` class to deal with multi task settings, and a `SingleTaskFineTuner` for large models finetuning settings.\n\n# Use cases\nHere are a few use cases for Transfer NLP:\n\n- You have all your classes / methods / functions ready. Transfer NLP allows for a clean way to centralize loading and executing your experiments\n- You have all your classes but you would like to benchmark multiple configuration settings: the `ExperimentRunner` class allows for sequentially running your sets of experiments, and generates personalized reporting (you only need to implement your `report` method in a custom `ReporterABC` class)\n- You want to experiment with training deep learning models but you feel overwhelmed bby all the boilerplate code in SOTA models github projects. Transfer NLP encourages separation of important objects so that you can focus on the PyTorch `Module` implementation and let the trainers deal with the training part (while still controlling most of the training parameters through the experiment file)\n- You want to experiment with more advanced training strategies, but you are more interested in the ideas than the implementations details. We are working on improving the advanced trainers so that it will be easier to try new ideas for multi task settings, fine-tuning strategies or model adaptation schemes. \n\n\n# Slack integration\nWhile experimenting with your own models / data, the training might take some time. To get notified when your training finishes or crashes, you can use the simple library [knockknock](https://github.com/huggingface/knockknock) by folks at HuggingFace, which add a simple decorator to your running function to notify you via Slack, E-mail, etc.\n\n\n# Some objectives to reach:\n - Include examples using state of the art pre-trained models\n - Include linguistic properties to models\n - Experiment with RL for sequential tasks\n - Include probing tasks to try to understand the properties that are learned by the models\n\n# Acknowledgment\nThe library has been inspired by the reading of \u003ccite\u003e[\"Natural Language Processing with PyTorch\"](https://www.amazon.com/dp/1491978236/)\u003ccite\u003e by Delip Rao and Brian McMahan.\nExperiments in [`experiments`](https://github.com/feedly/transfer-nlp/tree/master/experiments/deep_learning_with_pytorch), the Vocabulary building block and embeddings nearest neighbors are taken or adapted from the code provided in the book.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffeedly%2Ftransfer-nlp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffeedly%2Ftransfer-nlp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffeedly%2Ftransfer-nlp/lists"}