{"id":13715370,"url":"https://github.com/outcastofmusic/quick-nlp","last_synced_at":"2025-05-07T04:30:42.398Z","repository":{"id":215967325,"uuid":"126586968","full_name":"outcastofmusic/quick-nlp","owner":"outcastofmusic","description":"Pytorch NLP library based on FastAI ","archived":false,"fork":false,"pushed_at":"2018-07-04T08:13:28.000Z","size":59240,"stargazers_count":283,"open_issues_count":1,"forks_count":48,"subscribers_count":17,"default_branch":"master","last_synced_at":"2024-11-14T03:34:27.226Z","etag":null,"topics":["fastai","nlp-library","pytorch","seq2seq"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/outcastofmusic.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-03-24T10:03:50.000Z","updated_at":"2024-01-09T09:55:10.000Z","dependencies_parsed_at":null,"dependency_job_id":"26709b08-778a-4713-aba9-89bb95e7ce8a","html_url":"https://github.com/outcastofmusic/quick-nlp","commit_stats":null,"previous_names":["outcastofmusic/quick-nlp"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/outcastofmusic%2Fquick-nlp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/outcastofmusic%2Fquick-nlp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/outcastofmusic%2Fquick-nlp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/outcastofmusic%2Fquick-nlp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/outcastofmusic","download_url":"https://codeload.github.com/outcastofmusic/quick-nlp/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252813642,"owners_count":21808362,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["fastai","nlp-library","pytorch","seq2seq"],"created_at":"2024-08-03T00:00:58.168Z","updated_at":"2025-05-07T04:30:42.373Z","avatar_url":"https://github.com/outcastofmusic.png","language":"Python","funding_links":[],"categories":["Pytorch \u0026 related libraries｜Pytorch \u0026 相关库","Pytorch \u0026 related libraries","NLP\u0026PyTorch实战"],"sub_categories":["NLP \u0026 Speech Processing｜自然语言处理 \u0026 语音处理:","NLP \u0026 Speech Processing:"],"readme":"***********\nQuick NLP\n***********\n\n\n**Quick NLP**  is a deep learning nlp library inspired by the `fast.ai library  \u003chttps://github.com/fastai/fastai\u003e`_\n\nIt follows the same api as fastai and extends it allowing for quick and easy running of nlp models\n\nFeatures\n--------\n\n- Python 3.6 code\n- Tight-knit integration with Fast.ai library:\n    - Fast.ai style DataLoader objects for sentence to sentence algorithms\n    - Fast.ai style DataLoader objects for dialogue algorithms\n    - Fast.ai style DataModel objects for training nlp models\n- Can run a seq2seq model with a few lines of code similar to existing fast.ai examples\n- Easy to expand/train and try different models or use different data\n- Ready made algorithms to try out\n    - Seq2Seq https://arxiv.org/abs/1506.05869\n    - Seq2Seq with Attention https://arxiv.org/abs/1703.03906\n    - HRED http://www.aaai.org/ocs/index.php/AAAI/AAAI17/paper/download/14567/14219\n    - Attention is all you need http://papers.nips.cc/paper/7181-attention-is-all-you-need\n    - Depthwise Separable Convolutions for Neural Machine Translation (TODO) https://arxiv.org/abs/1706.03059\n\n\nInstallation\n------------\n\nInstallation of fast.ai library is required. Please install using the instructions `here \u003chttps://github.com/fastai/fastai\u003e`_ .\nIt is important that the latest version of fast.ai is used and not the pip version which is not up to date.\n\n\nAfter setting up an environment using the fasta.ai instructions please clone the quick-nlp repo and use pip install to install the package as follows:\n\n.. code-block:: bash\n\n    git clone https://github.com/outcastofmusic/quick-nlp\n    cd quick-nlp\n    pip install .\n\n\nDocker Image\n~~~~~~~~~~~~\n\nA docker image with the latest master is available to use it please run:\n\n.. code-block:: bash\n\n    docker run --runtime nvidia -it -p 8888:8888 --mount type=bind,source=\"$(pwd)\",target=/workspace agispof/quicknlp:latest\n\nthis will mount your current directory to /workspace and start a jupyter lab session in that directory\n\nUsage Example\n-------------\n\nThe main goal of quick-nlp is to provided the easy interface of the fast.ai library for seq2seq models.\n\nFor example  Lets assume that we have a dataset_path with folders for training, validation files.\nEach file is a tsv file where each row is two sentences separated by a tab. For example a file inside the train folder can be a eng_to_fr.tsv file with the following first few lines::\n\n    Go.\tVa !\n    Run!\tCours !\n    Run!\tCourez !\n    Wow!\tÇa alors !\n    Fire!\tAu feu !\n    Help!\tÀ l'aide !\n    Jump.\tSaute.\n    Stop!\tÇa suffit !\n    Stop!\tStop !\n    Stop!\tArrête-toi !\n    Wait!\tAttends !\n    Wait!\tAttendez !\n    I see.\tJe comprends.\n\n\nloading the data from the directory is as simple as:\n\n.. code-block:: python\n\n    from fastai.plots import *\n    from torchtext.data import Field\n    from fastai.core import SGD_Momentum\n    from fastai.lm_rnn import seq2seq_reg\n    from quicknlp import SpacyTokenizer, print_batch, S2SModelData\n    INIT_TOKEN = \"\u003csos\u003e\"\n    EOS_TOKEN = \"\u003ceos\u003e\"\n    DATAPATH = \"dataset_path\"\n    fields = [\n        (\"english\", Field(init_token=INIT_TOKEN, eos_token=EOS_TOKEN, tokenize=SpacyTokenizer('en'), lower=True)),\n        (\"french\", Field(init_token=INIT_TOKEN, eos_token=EOS_TOKEN, tokenize=SpacyTokenizer('fr'), lower=True))\n\n    ]\n    batch_size = 64\n    data = S2SModelData.from_text_files(path=DATAPATH, fields=fields,\n                                        train=\"train\",\n                                        validation=\"validation\",\n                                        source_names=[\"english\", \"french\"],\n                                        target_names=[\"french\"],\n                                        bs= batch_size\n                                       )\n\n\nFinally, to train a seq2seq model with the data we only need to do:\n\n.. code-block:: python\n\n    emb_size = 300\n    nh = 1024\n    nl = 3\n    learner = data.get_model(opt_fn=SGD_Momentum(0.7), emb_sz=emb_size,\n                             nhid=nh,\n                             nlayers=nl,\n                             bidir=True,\n                            )\n    clip = 0.3\n    learner.reg_fn = reg_fn\n    learner.clip = clip\n    learner.fit(2.0, wds=1e-6)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foutcastofmusic%2Fquick-nlp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Foutcastofmusic%2Fquick-nlp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foutcastofmusic%2Fquick-nlp/lists"}