{"id":16269621,"url":"https://github.com/fengyh3/text-classification","last_synced_at":"2026-05-04T07:39:40.013Z","repository":{"id":157176321,"uuid":"252992792","full_name":"fengyh3/Text-Classification","owner":"fengyh3","description":"Deep Learning for Text Classification in NLP","archived":false,"fork":false,"pushed_at":"2020-04-14T13:54:57.000Z","size":641,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-14T11:33:54.142Z","etag":null,"topics":["tensorflow","text-classification"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fengyh3.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-04-04T12:37:07.000Z","updated_at":"2020-08-07T04:16:21.000Z","dependencies_parsed_at":null,"dependency_job_id":"b4258bd4-7a87-4cfc-9222-55c633254679","html_url":"https://github.com/fengyh3/Text-Classification","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fengyh3%2FText-Classification","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fengyh3%2FText-Classification/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fengyh3%2FText-Classification/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fengyh3%2FText-Classification/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fengyh3","download_url":"https://codeload.github.com/fengyh3/Text-Classification/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247867367,"owners_count":21009240,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["tensorflow","text-classification"],"created_at":"2024-10-10T18:08:42.777Z","updated_at":"2026-05-04T07:39:39.975Z","avatar_url":"https://github.com/fengyh3.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Text-Classification\nDeep Learning for Text Classification in NLP.\n\n# Enviroment\npy3 + tensorflow 1.12+\n\n# Dataset\nMovie Review dataset is from [this website](http://www.cs.cornell.edu/people/pabo/movie-review-data/)\n\nYelp: it's from [yelp academic review](https://www.kaggle.com/yelp-dataset/yelp-dataset/version/2), i just use first 500,000 texts to train.\n\n# Models\nNow it contain four models: CNN/BiLSTM/BiLSTM+attention/FastText/HAN.(To be continued...)\n\n# Results\nSome results about accuracy are in below:\n\n|      | CNN    | BiLSTM    | BiLSTM + attention | FastText | RCNN_max-pooling | RCNN_average-pooling|    HAN    |  Bert-Tiny | Bert-Mini |\n| ---- | ------ | ------ | ------ | ---------- |---------------------|-------------------------|-----------------|------------|------------|\n|movie review | 76.2% | 79.5% | 76.9% |   80.3%   |     80.4%          |        80.3%            |      -%    |  77.2%(dataset encoding issue)  |  77.2%    |\n|Yelp | 65.1% | 68.2% | 70.2% |  69.5%    |               |                    |    70.5%      | 72.5%  |  74.8%  |\n\n# Tips\nNote that the models do not contain save and load model in tensorflow, and it contains visulazation using tensorboard. Moreover, the models just simply ajust the hyper-parameters and in FastText it just uses unigram. So it just a toy-level demo and use it to learn the text classification.\n\nIn moview review dataset, we can see that because of the dataset is a bunch of small-scale and short texts, so the complcated DL methods may be not as good as simpler DL methods or ML methods. What's more, the training cost: RCNN \u003e BiLSTM + attention ≈ BiLSTM \u003e CNN \u003e\u003e FastText. And due to movie review dataset is encoding with 'windows-1252', so in training in bert, it causes the messy code and i can't  get a good enough result.\n\nIn Yelp dataset, it is a larger-scale dataset and the texts are longer. Due to the limitation of computed resource, the models' hyper-parameter is not a pretty good setting. \n\nNow it will be continued with Transformer, BERT and so on.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffengyh3%2Ftext-classification","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffengyh3%2Ftext-classification","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffengyh3%2Ftext-classification/lists"}