{"id":15290470,"url":"https://github.com/4ai/langml","last_synced_at":"2025-07-03T23:33:30.665Z","repository":{"id":39844349,"uuid":"424244792","full_name":"4AI/langml","owner":"4AI","description":"A Keras-based and TensorFlow-backend NLP Models Toolkit.","archived":false,"fork":false,"pushed_at":"2022-07-07T06:10:45.000Z","size":17558,"stargazers_count":11,"open_issues_count":2,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-14T14:41:59.771Z","etag":null,"topics":["attentions","bert","contrastive-learning","crf","keras","named-entity-recognition","ner","nlp","pretrained-language-models","prompt","prompt-learning","prompt-toolkit","sentence-bert","simcse","tensorflow","text-classification"],"latest_commit_sha":null,"homepage":"https://langml.readthedocs.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/4AI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-11-03T13:56:23.000Z","updated_at":"2024-02-29T04:56:59.000Z","dependencies_parsed_at":"2022-09-24T03:02:34.850Z","dependency_job_id":null,"html_url":"https://github.com/4AI/langml","commit_stats":null,"previous_names":[],"tags_count":11,"template":false,"template_full_name":null,"purl":"pkg:github/4AI/langml","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/4AI%2Flangml","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/4AI%2Flangml/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/4AI%2Flangml/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/4AI%2Flangml/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/4AI","download_url":"https://codeload.github.com/4AI/langml/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/4AI%2Flangml/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263421460,"owners_count":23464012,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["attentions","bert","contrastive-learning","crf","keras","named-entity-recognition","ner","nlp","pretrained-language-models","prompt","prompt-learning","prompt-toolkit","sentence-bert","simcse","tensorflow","text-classification"],"created_at":"2024-09-30T16:08:17.626Z","updated_at":"2025-07-03T23:33:30.641Z","avatar_url":"https://github.com/4AI.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align='center'\u003e\u003cimg src='docs/langml-logo.png' width=480 /\u003e\u003c/p\u003e\n\nLangML (**Lang**uage **M**ode**L**) is a Keras-based and TensorFlow-backend language model toolkit, which provides mainstream pre-trained language models, e.g., BERT/RoBERTa/ALBERT, and their downstream application models.\n\n\n[![pypi](https://img.shields.io/pypi/v/langml?style=for-the-badge)](https://pypi.org/project/langml/) [![](https://img.shields.io/badge/tensorflow-1.14+,2.x-orange.svg?style=for-the-badge#from=url\u0026id=tVzOp\u0026margin=%5Bobject%20Object%5D\u0026originHeight=28\u0026originWidth=197\u0026originalType=binary\u0026ratio=1\u0026status=done\u0026style=none)](https://code.alipay.com/riskstorm/langml/blob/master/) [![](https://img.shields.io/badge/keras-2.3.1+-blue.svg?style=for-the-badge#from=url\u0026id=AIJ4T\u0026margin=%5Bobject%20Object%5D\u0026originHeight=28\u0026originWidth=132\u0026originalType=binary\u0026ratio=1\u0026status=done\u0026style=none)](https://code.alipay.com/riskstorm/langml/blob/master/)\n\n# Outline\n- [Outline](#outline)\n- [Features](#features)\n- [Installation](#installation)\n- [Quick Start](#quick-start)\n  - [Specify the Keras variant](#specify-the-keras-variant)\n  - [Load pretrained language models](#load-pretrained-language-models)\n  - [Finetune a model](#finetune-a-model)\n  - [Use langml-cli to train baseline models](#use-langml-cli-to-train-baseline-models)\n- [Documentation](#documentation)\n- [Reference](#reference)\n\n\n# Features\n\u003ca href='#features'\u003e\u003c/a\u003e\n\n- Common and widely-used Keras layers: CRF, Transformer, Attentions: Additive, ScaledDot, MultiHead, GatedAttentionUnit, and so on.\n- Pretrained Language Models: BERT, RoBERTa, ALBERT. Providing friendly designed interfaces and easy to implement downstream singleton, shared/unshared two-tower or multi-tower models.\n- Tokenizers: WPTokenizer (wordpiece), SPTokenizer (sentencepiece)\n- Baseline models: Text Classification, Named Entity Recognition, Contrastive Learning. It's no need to write any code, and just need to preprocess the data into a specific format and use the \"langml-cli\" to train various baseline models.\n- Prompt-Based Tuning: PTuning\n\n\n# Installation\n\u003ca href='#installation'\u003e\u003c/a\u003e\n\nYou can install or upgrade langml/langml-cli via the following command:\n\n```bash\npip install -U langml\n```\n\n# Quick Start\n\u003ca href='#quick-start'\u003e\u003c/a\u003e\n\n## Specify the Keras variant\n\n1) Use pure Keras (default setting)\n   \n```bash\nexport TF_KERAS=0\n```\n\n2) Use TensorFlow Keras\n\n```bash\nexport TF_KERAS=1\n```\n\n\n## Load pretrained language models\n\n```python\nfrom langml import WPTokenizer, SPTokenizer\nfrom langml import load_bert, load_albert\n\n# load bert / roberta plm\nbert_model, bert = load_bert(config_path, checkpoint_path)\n# load albert plm\nalbert_model, albert = load_albert(config_path, checkpoint_path)\n# load wordpiece tokenizer\nwp_tokenizer = WPTokenizer(vocab_path, lowercase)\n# load sentencepiece tokenizer\nsp_tokenizer = SPTokenizer(vocab_path, lowercase)\n```\n\n## Finetune a model\n\n```python\nfrom langml import keras, L\nfrom langml import load_bert\n\nconfig_path = '/path/to/bert_config.json'\nckpt_path = '/path/to/bert_model.ckpt'\nvocab_path = '/path/to/vocab.txt'\n\nbert_model, bert_instance = load_bert(config_path, ckpt_path)\n# get CLS representation\ncls_output = L.Lambda(lambda x: x[:, 0])(bert_model.output)\noutput = L.Dense(2, activation='softmax',\n                 kernel_intializer=bert_instance.initializer)(cls_output)\ntrain_model = keras.Model(bert_model.input, cls_output)\ntrain_model.summary()\ntrain_model.compile(loss='categorical_crossentropy', optimizer=keras.optimizer.Adam(1e-5))\n```\n\n## Use langml-cli to train baseline models\n\n1) Text Classification\n\n```bash\n$ langml-cli baseline clf --help\nUsage: langml baseline clf [OPTIONS] COMMAND [ARGS]...\n\n  classification command line tools\n\nOptions:\n  --help  Show this message and exit.\n\nCommands:\n  bert\n  bilstm\n  textcnn\n```\n\n2) Named Entity Recognition\n\n```bash\n$ langml-cli baseline ner --help\nUsage: langml baseline ner [OPTIONS] COMMAND [ARGS]...\n\n  ner command line tools\n\nOptions:\n  --help  Show this message and exit.\n\nCommands:\n  bert-crf\n  lstm-crf\n```\n\n3) Contrastive Learning\n\n```bash\n$ langml-cli baseline contrastive --help\nUsage: langml baseline contrastive [OPTIONS] COMMAND [ARGS]...\n\n  contrastive learning command line tools\n\nOptions:\n  --help  Show this message and exit.\n\nCommands:\n  simcse\n```\n\n4) Text Matching\n\n```bash\n$ langml-cli baseline matching --help\nUsage: langml baseline matching [OPTIONS] COMMAND [ARGS]...\n\n  text matching command line tools\n\nOptions:\n  --help  Show this message and exit.\n\nCommands:\n  sbert\n```\n\n\n# Documentation\n\u003ca href='#documentation'\u003e\u003c/a\u003e\n\nPlease visit the [langml.readthedocs.io](https://langml.readthedocs.io/en/latest/index.html) to check the latest documentation.\n\n\n# Reference\n\u003ca href='#reference'\u003e\u003c/a\u003e\n\nThe implementation of pretrained language model is inspired by [CyberZHG/keras-bert](https://github.com/CyberZHG/keras-bert#Download-Pretrained-Checkpoints) and [bojone/bert4keras](https://github.com/bojone/bert4keras).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F4ai%2Flangml","html_url":"https://awesome.ecosyste.ms/projects/github.com%2F4ai%2Flangml","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F4ai%2Flangml/lists"}