{"id":16837873,"url":"https://github.com/hankcs/id-cnn-cws","last_synced_at":"2026-03-10T12:36:59.021Z","repository":{"id":77342596,"uuid":"107699741","full_name":"hankcs/ID-CNN-CWS","owner":"hankcs","description":"Source codes and corpora of paper \"Iterated Dilated Convolutions for Chinese Word Segmentation\"","archived":false,"fork":false,"pushed_at":"2021-04-15T20:44:41.000Z","size":28537,"stargazers_count":134,"open_issues_count":8,"forks_count":38,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-11-21T04:05:22.561Z","etag":null,"topics":["bilstm","cnn","crf","cws","nlp","tensorflow"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hankcs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2017-10-20T16:14:58.000Z","updated_at":"2025-11-03T12:31:03.000Z","dependencies_parsed_at":"2023-04-27T13:16:24.897Z","dependency_job_id":null,"html_url":"https://github.com/hankcs/ID-CNN-CWS","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/hankcs/ID-CNN-CWS","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hankcs%2FID-CNN-CWS","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hankcs%2FID-CNN-CWS/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hankcs%2FID-CNN-CWS/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hankcs%2FID-CNN-CWS/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hankcs","download_url":"https://codeload.github.com/hankcs/ID-CNN-CWS/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hankcs%2FID-CNN-CWS/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30333612,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-10T05:25:20.737Z","status":"ssl_error","status_checked_at":"2026-03-10T05:25:17.430Z","response_time":106,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bilstm","cnn","crf","cws","nlp","tensorflow"],"created_at":"2024-10-13T12:19:14.596Z","updated_at":"2026-03-10T12:36:58.990Z","avatar_url":"https://github.com/hankcs.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ID-CNN-CWS\nSource codes and corpora of paper \"[Iterated Dilated Convolutions for Chinese Word Segmentation](http://www.nnw.cz/doi/2020/NNW.2020.30.022.pdf)\" published in NNW journal.\n\n![2017-10-20_13-23-31](http://wx3.sinaimg.cn/large/006Fmjmcly1fkpa3q8maej30dh0c2jup.jpg)\n\n\nIt implements the following `4` models for CWS:\n\n- Bi-LSTM\n- Bi-LSTM-CRF\n- ID-CNN\n- ID-CNN-CRF\n\n## Dependencies\n\n- Python \u003e= 3.6\n- TensorFlow \u003e= 1.2\n\nBoth CPU and GPU are supported. GPU training is `10` times faster.\n\n## Preparation\n\nRun following script to convert corpus to TensorFlow dataset.\n\n```\n$ ./scripts/make.sh\n```\n\n## Train and Test\n\n### Quick Start\n\n```\n$ ./scripts/run.sh $dataset $model\n```\n\n- `$dataset` can be `pku`, `msr`, `asSC` or `cityuSC`. \n- `$model` can be `cnn` or `bilstm`.\n\nFor example:\n\n```\n$ ./scripts/run.sh pku cnn\n```\n\nIt will train a `cnn` model on `pku` dataset, then evaluate performance on test set.\n\n### CRF Layer\n\nTo enable CRF layer, simply append `--viterbi` to your command, e.g.\n\n```\n$ ./scripts/run.sh pku cnn --viterbi\n```\n\n## Accuracy\n\n![2017-10-20_13-25-11](http://wx1.sinaimg.cn/large/006Fmjmcly1fkpa3in2haj30dq0h9q8u.jpg)\n\n\n## Speed\n\n![2017-10-20_11-44-42](http://wx3.sinaimg.cn/large/006Fmjmcly1fkp6wafcngj30d407l0th.jpg)\n\n## Acknowledgments\n\n- Corpora are from SIGHAN05, converted to Simplified Chinese via [HanLP](https://github.com/hankcs/HanLP). Note that the SIGHAN datasets should only be used for research purposes.\n- Model implementations adopted from https://github.com/iesl/dilated-cnn-ner by [Emma Strubell](https://cs.umass.edu/~strubell).\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhankcs%2Fid-cnn-cws","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhankcs%2Fid-cnn-cws","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhankcs%2Fid-cnn-cws/lists"}