{"id":18889051,"url":"https://github.com/naver/claf","last_synced_at":"2025-04-06T09:10:47.032Z","repository":{"id":51368668,"uuid":"172458804","full_name":"naver/claf","owner":"naver","description":"CLaF: Open-Source Clova Language Framework","archived":false,"fork":false,"pushed_at":"2021-03-26T00:34:11.000Z","size":9024,"stargazers_count":216,"open_issues_count":5,"forks_count":36,"subscribers_count":14,"default_branch":"master","last_synced_at":"2025-03-30T08:12:43.122Z","etag":null,"topics":["clova","framework","language","natural-language-processing","nlp","pytorch"],"latest_commit_sha":null,"homepage":"https://naver.github.io/claf/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/naver.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-02-25T07:47:44.000Z","updated_at":"2025-03-17T03:29:09.000Z","dependencies_parsed_at":"2022-08-25T11:31:02.601Z","dependency_job_id":null,"html_url":"https://github.com/naver/claf","commit_stats":null,"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/naver%2Fclaf","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/naver%2Fclaf/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/naver%2Fclaf/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/naver%2Fclaf/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/naver","download_url":"https://codeload.github.com/naver/claf/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247457803,"owners_count":20941906,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clova","framework","language","natural-language-processing","nlp","pytorch"],"created_at":"2024-11-08T07:47:17.515Z","updated_at":"2025-04-06T09:10:47.014Z","avatar_url":"https://github.com/naver.png","language":"Python","readme":"\u003cp align=\"center\"\u003e\n    \u003cimg src=\"images/logo.png\" style=\"inline\" width=300\u003e\n\u003c/p\u003e\n\n\u003ch4 align=\"center\"\u003eClova Language Framework\u003c/h4\u003e\n\n\u003cp align=\"center\"\u003e\n    \u003ca href=\"https://naver.github.io/claf\"\u003e\n        \u003cimg src=\"https://img.shields.io/badge/docs-passing-brightgreen.svg\" alt=\"Documentation Status\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://travis-ci.org/naver/claf\"\u003e\n        \u003cimg src='https://travis-ci.org/naver/claf.svg?branch=master'/\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://github.com/ambv/black\"\u003e\n        \u003cimg src=\"https://img.shields.io/badge/code%20style-black-000000.svg\" alt=\"Code style: black\"\u003e\n    \u003ca href=\"https://codecov.io/gh/naver/claf\"\u003e\n    \u003cimg src=\"https://codecov.io/gh/naver/claf/branch/master/graph/badge.svg\" /\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n---\n\n# CLaF: Clova Language Framework\n\n\n- [Full Documentation](https://naver.github.io/claf/)\n- [Dataset And Model](https://naver.github.io/claf/docs/_build/html/contents/dataset_and_model.html)\n- [Pretrained Vector](https://naver.github.io/claf/docs/_build/html/contents/pretrained_vector.html)\n- [Tokens](https://naver.github.io/claf/docs/_build/html/contents/tokens.html): `Tokenizers` and `TokenMakers`\n- List of [BaseConfig](#baseconfig)\n\n| Task | Language | Dataset | Model |\n| ---- | -------- | ------- | ----- |\n| Multi-Task Learning | English | [GLUE Benchmark](https://gluebenchmark.com/), [SQuAD v1.1](https://rajpurkar.github.io/SQuAD-explorer/) | `MT-DNN (BERT)` |\n| Natural Language Understanding | English | [GLUE Benchmark](https://gluebenchmark.com/) | `BERT`, `RoBERTa` |\n| Named Entity Recognition | English | CoNLL 2003 | `BERT` |\n| Question Answering | Korean | [KorQuAD v1.0](https://korquad.github.io/category/1.0_KOR.html) | `BiDAF`, `DocQA`, `BERT` |\n| Question Answering | Engilsh | [SQuAD v1.1 and v2.0](https://rajpurkar.github.io/SQuAD-explorer/) | - v1.1: `BiDAF`, `DrQA`, `DocQA`, `DocQA+ELMo`, `QANet`, `BERT`, `RoBERTa` \u003cbr/\u003e - v2.0: `BiDAF + No Answer`, `DocQA + No Answer` |\n| Semantic Parsing | English | [WikiSQL](https://github.com/salesforce/WikiSQL) | `SQLNet` |\n\n\n- Reports\n    - [GLUE](https://naver.github.io/claf/docs/_build/html/reports/glue.html)\n    - [KorQuAD](https://naver.github.io/claf/docs/_build/html/reports/korquad.html)\n    - [SQuAD](https://naver.github.io/claf/docs/_build/html/reports/squad.html)\n    - [WikiSQL](https://naver.github.io/claf/docs/_build/html/reports/wikisql.html)\n- Summary (1-example Inference Latency)\n    - [Reading Comprehension](https://naver.github.io/claf/docs/_build/html/summary/reading_comprehension.html)\n\n\n- List of [MachineConfig](#machine)\n\n| Name | Language | Pipeline | Note |\n| ---- | -------- | ------- | ----- |\n| KoWiki | Korean | `Wiki Dumps` -\u003e `Document Retrieval` -\u003e `Reading Comprehension` | - |\n| NLU | All | `Query` -\u003e `Intent Classification` \u0026 `Token Classification (Slot)` -\u003e `Template Matching` | - |\n\n---\n\n\n## Table of Contents\n- [Overview](#overview)\n    - [Features](#features)\n- [Installation](#installation) \n    - [Requirements](#requirements)\n    - [Install via pip](#install-via-pip)\n- [Experiment](#experiment)\n\t- [Usage](#usage)\n\t    - [Training](#training) \n\t    - [Evaluate](#evaluate) \n\t    - [Predict](#predict) \n\t    - [Docker Images](#docker-images)\n- [Machine](#machine)\n- [Contributing](#contributing)\n- [Maintainers](#maintainers)\n- [Citing](#citing)\n- [License](#license)\n\n\n---\n\n\n## Overview\n\n**CLaF** is a Language Framework built on PyTorch that provides following two high-level features:\n\n- `Experiment` enables the control of training flow in general NLP by offering various `TokenMaker` methods. \n    - CLaF is inspired by the design principle of [AllenNLP](https://github.com/allenai/allennlp) such as the higher level concepts and reusable code, but mostly based on PyTorch’s common module, so that user can easily modify the code on their demands.  \n- `Machine` helps to combine various modules to build a NLP Machine in one place.\n    - There are knowledge-based, components and trained experiments which infer 1-example in modules.\n\n### Features\n\n- **Multilingual** modeling support (currently, English and Korean are supported).\n- Light weighted **Systemization** and Modularization.\n- Easy extension and implementation of models.\n- A wide variation of **Experiments** with reproducible and comprehensive logging\n- The metrics for services such as \"1\\-example inference latency\" are provided.\n- Easy to build of a NLP **Machine** by combining modules.\n\n\n## Installation\n\n### Requirements\n\n- Python 3.6\n- PyTorch \u003e= 1.3.1\n- [MeCab](https://bitbucket.org/eunjeon/mecab-ko) for Korean Tokenizer\n    - ```sh script/install_mecab.sh```\n\nIt is recommended to use the virtual environment.  \n[Conda](https://conda.io/docs/download.html) is the easiest way to set up a virtual environment.\n\n```\nconda create -n claf python=3.6\nconda activate claf\n\n(claf) ✗ pip install -r requirements.txt\n```\n\n### Install via pip\n\nCommands to install via pip \n\n```\npip install claf\n```\n\n\n## Experiment\n\n- Training Flow\n\n![images](images/claf-experiment.001.png)\n\n\n### Usage\n\n#### Training\n\n![images](images/training_config_mapping.png)\n\n\n1. only Arguments\n\n\t```\n\tpython train.py --train_file_path {file_path} --valid_file_path {file_path} --model_name {name} ...\n\t```\n\n2. only BaseConfig (skip `/base_config` path)\n\n\t```\n\tpython train.py --base_config {base_config}\n\t```\n\t\n3. BaseConfig + Arguments\n\n\t```\n\tpython train.py --base_config {base_config} --learning_rate 0.002\n\t```\n\t\n\t- Load BaseConfig then overwrite `learning_rate` to 0.002\n\n\n#### BaseConfig\n\nDeclarative experiment config (.json, .ymal)\n\n- Simply matching with object's parameters\n- Exists samples in `/base_config` directory\n\n##### Defined BaseConfig\n\n```\nBase Config:\n  --base_config BASE_CONFIG\n    Use pre-defined base_config:\n    []\n\n\n    * CoNLL 2003:\n    ['conll2003/bert_large_cased']\n\n    * GLUE:\n    ['glue/qqp_roberta_base', 'glue/qnli_bert_base', 'glue/rte_bert_base', 'glue/wnli_roberta_base', 'glue/mnlim_roberta_base', 'glue/wnli_bert_base', 'glue/mnlimm_roberta_base', 'glue/cola_bert_base', 'glue/mrpc_bert_base', 'glue/mnlimm_bert_base', 'glue/stsb_bert_base', 'glue/mnlim_bert_base', 'glue/qqp_bert_base', 'glue/rte_roberta_base', 'glue/qnli_roberta_base', 'glue/sst_bert_base', 'glue/mrpc_roberta_base', 'glue/cola_roberta_base', 'glue/stsb_roberta_base', 'glue/sst_roberta_base']\n\n    * KorQuAD:\n    ['korquad/bert_base_multilingual_cased', 'korquad/bidaf', 'korquad/bert_base_multilingual_uncased', 'korquad/docqa']\n\n    * SQuAD:\n    ['squad/bert_large_uncased', 'squad/bidaf', 'squad/drqa_paper', 'squad/drqa', 'squad/bert_base_uncased', 'squad/qanet', 'squad/docqa+elmo', 'squad/bidaf_no_answer', 'squad/docqa_no_answer', 'squad/qanet_paper', 'squad/bidaf+elmo', 'squad/docqa']\n\n    * WikiSQL:\n    ['wikisql/sqlnet']\n```\n\n\n#### Evaluate\n\n```\npython eval.py \u003cdata_path\u003e \u003cmodel_checkpoint_path\u003e\n```\n\n- Example\n\n```\n✗ python eval.py data/squad/dev-v1.1.json logs/squad/bidaf/checkpoint/model_19.pkl\n...\n[INFO] - {\n    \"valid/loss\": 2.59111491665019,\n    \"valid/epoch_time\": 60.7434446811676,\n    \"valid/start_acc\": 63.17880794701987,\n    \"valid/end_acc\": 67.19016083254493,\n    \"valid/span_acc\": 54.45600756859035,\n    \"valid/em\": 68.10785241248817,\n    \"valid/f1\": 77.77963381714842\n}\n# write predictions files (\u003clog_dir\u003e/predictions/predictions-valid-19.json)\n```\n\n- 1-example Inference Latency ([Summary](docs/_build/html/reports/summary.html))\n\n```\n✗ python eval.py data/squad/dev-v1.1.json logs/squad/bidaf/checkpoint/model_19.pkl\n...\n# Evaluate Inference Latency Mode.\n...\n[INFO] - saved inference_latency results. bidaf-cpu.json  # file_format: {model_name}-{env}.json\n```\n\n#### Predict\n\n```\npython predict.py \u003cmodel_checkpoint_path\u003e --\u003carguments\u003e\n```\n\n- Example\n\n```\n✗ python predict.py logs/squad/bidaf/checkpoint/model_19.pkl \\\n    --question \"When was the last Super Bowl in California?\" \\\n    --context \"On May 21, 2013, NFL owners at their spring meetings in Boston voted and awarded the game to Levi's Stadium. The $1.2 billion stadium opened in 2014. It is the first Super Bowl held in the San Francisco Bay Area since Super Bowl XIX in 1985, and the first in California since Super Bowl XXXVII took place in San Diego in 2003.\"\n\n\u003e\u003e\u003e Predict: {'text': '2003', 'score': 4.1640071868896484}\n```\n\n#### Docker Images\n\n- [Docker Hub](https://hub.docker.com/u/claf)\n- Run with Docker Image\n    - Pull docker image\n        ```✗ docker pull claf/claf:latest```\n    - Run \n        ``` docker run --rm -i -t claf/claf:latest /bin/bash ```\n\n\n---\n\n\n### Machine\n\n- Machine Architecture\n\n\n![images](images/claf-machine.001.png)\n\n#### Usage\n\n- Define the config file (.json) like [BaseConfig](#baseconfig) in `machine_config/` directory\n- Run CLaF Machine (skip `/machine_config` path)\n\n\n```\n✗ python machine.py --machine_config {machine_config}\n```\n\n\n* The list of pre-defined `Machine`:\n\n```\nMachine Config:\n  --machine_config MACHINE_CONFIG\n    Use pre-defined machine_config (.json (.json))\n\n    ['ko_wiki', 'nlu']\n```\n\n#### Open QA (DrQA Style)\n\nDrQA is a system for reading comprehension applied to open-domain question answering. The system has to combine the challenges of document retrieval (finding the relevant documents) with that of machine comprehension of text (identifying the answers from those documents).\n\n- ko_wiki: Korean Wiki Version\n\n``` \n✗ python machine.py --machine_config ko_wiki\n...\nCompleted!\nQuestion \u003e 동학의 2대 교주 이름은?\n--------------------------------------------------\nDoc Scores:\n - 교주 : 0.5347289443016052\n - 이교주 : 0.4967213571071625\n - 교주도 : 0.49036136269569397\n - 동학 : 0.4800325632095337\n - 동학중학교 : 0.4352934956550598\n--------------------------------------------------\nAnswer: [\n    {\n        \"text\": \"최시형\",\n        \"score\": 11.073444366455078\n    },\n    {\n        \"text\": \"충주목\",\n        \"score\": 9.443866729736328\n    },\n    {\n        \"text\": \"반월동\",\n        \"score\": 9.37778091430664\n    },\n    {\n        \"text\": \"환조 이자춘\",\n        \"score\": 4.64817476272583\n    },\n    {\n        \"text\": \"합포군\",\n        \"score\": 3.3186707496643066\n    }\n]\n```\n\n#### NLU (Dialog)\n\nThe reason why NLU machine does not return the full response is that response generation may require various task-specific post-processing techniques or additional logic(e.g. API calls, template-decision rules, template filling rules, nn-based response generation model) Therefore, for flexible usage, NLU machine returns only the NLU result.\n\n``` \n✗ python machine.py --machine_config nlu\n...\nUtterance \u003e \"looking for a flight from Boston to Seoul or Incheon\"\n\nNLU Result: {\n    \"intent\": \"flight\",\n    \"slots\": {\n        \"city.depart\": [\"Boston\"],\n        \"city.dest\": [\"Seoul\", \"Incheon\"]\n    }\n}\n```\n\n\n## Contributing\n\nThanks for your interest in contributing! There are many ways to contribute to this project.  \nGet started [here](./CONTRIBUTING.md).\n\n## Maintainers\n\nCLaF is currently maintained by \n\n- [Dongjun Lee](https://github.com/DongjunLee) (Author)\n- [Sohee Yang](https://github.com/soheeyang)\n- [Minjeong Kim](https://github.com/Mjkim88)\n\n## Citing\n\nIf you use CLaF for your work, please cite:\n\n```bibtex\n@misc{claf,\n  author = {Lee, Dongjun and Yang, Sohee and Kim, Minjeong},\n  title = {CLaF: Open-Source Clova Language Framework},\n  year = {2019},\n  publisher = {GitHub},\n  journal = {GitHub repository},\n  howpublished = {\\url{https://github.com/naver/claf}}\n}\n```\n\nWe will update this bibtex with our paper.\n\n\n## Acknowledgements\n\n`docs/` directory which includes documentation created by [Sphinx](http://www.sphinx-doc.org/).\n\n## License\n\nMIT license\n\n```\nCopyright (c) 2019-present NAVER Corp.\n\nPermission is hereby granted, free of charge, to any person obtaining a copy \nof this software and associated documentation files (the \"Software\"), to deal \nin the Software without restriction, including without limitation the rights \nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell \ncopies of the Software, and to permit persons to whom the Software is \nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all \ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR \nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, \nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE \nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER \nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, \nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE \nSOFTWARE.\n```\n\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnaver%2Fclaf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnaver%2Fclaf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnaver%2Fclaf/lists"}