{"id":23184678,"url":"https://github.com/shanetian/textcnn","last_synced_at":"2026-03-01T01:32:39.950Z","repository":{"id":139235852,"uuid":"181653116","full_name":"ShaneTian/TextCNN","owner":"ShaneTian","description":"TextCNN by TensorFlow 2.0.0 ( tf.keras mainly ).","archived":false,"fork":false,"pushed_at":"2019-04-29T08:46:52.000Z","size":687,"stargazers_count":62,"open_issues_count":0,"forks_count":12,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-08-12T22:47:36.323Z","etag":null,"topics":["python3","tensorflow2","text-classification","text-cnn"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ShaneTian.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2019-04-16T08:59:24.000Z","updated_at":"2025-08-07T12:23:51.000Z","dependencies_parsed_at":"2024-04-08T02:57:28.726Z","dependency_job_id":"a3858eaa-522e-416b-990a-0b2d5bb732b0","html_url":"https://github.com/ShaneTian/TextCNN","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ShaneTian/TextCNN","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ShaneTian%2FTextCNN","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ShaneTian%2FTextCNN/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ShaneTian%2FTextCNN/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ShaneTian%2FTextCNN/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ShaneTian","download_url":"https://codeload.github.com/ShaneTian/TextCNN/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ShaneTian%2FTextCNN/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29957361,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-28T22:53:01.873Z","status":"ssl_error","status_checked_at":"2026-02-28T22:52:50.699Z","response_time":90,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["python3","tensorflow2","text-classification","text-cnn"],"created_at":"2024-12-18T09:24:58.113Z","updated_at":"2026-03-01T01:32:39.919Z","avatar_url":"https://github.com/ShaneTian.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# TextCNN\nTextCNN by TensorFlow 2.0.0 ( tf.keras mainly ).\n## Software environments\n1. tensorflow-gpu 2.0.0-alpha0\n2. python 3.6.7\n3. pandas 0.24.2\n4. numpy 1.16.2\n\n## Data\n- Vocabulary size: 3407\n- Number of classes: 18\n- Train/Test split: 20351/2261\n\n## Model architecture\n```\nModel: \"model\"\n__________________________________________________________________________________________________\nLayer (type)                    Output Shape         Param #     Connected to                     \n==================================================================================================\ninput_data (InputLayer)         [(None, 128)]        0                                            \n__________________________________________________________________________________________________\nembedding (Embedding)           (None, 128, 512)     1744384     input_data[0][0]                 \n__________________________________________________________________________________________________\nadd_channel (Reshape)           (None, 128, 512, 1)  0           embedding[0][0]                  \n__________________________________________________________________________________________________\nconvolution_3 (Conv2D)          (None, 126, 1, 128)  196736      add_channel[0][0]                \n__________________________________________________________________________________________________\nconvolution_4 (Conv2D)          (None, 125, 1, 128)  262272      add_channel[0][0]                \n__________________________________________________________________________________________________\nconvolution_5 (Conv2D)          (None, 124, 1, 128)  327808      add_channel[0][0]                \n__________________________________________________________________________________________________\nmax_pooling_3 (MaxPooling2D)    (None, 1, 1, 128)    0           convolution_3[0][0]              \n__________________________________________________________________________________________________\nmax_pooling_4 (MaxPooling2D)    (None, 1, 1, 128)    0           convolution_4[0][0]              \n__________________________________________________________________________________________________\nmax_pooling_5 (MaxPooling2D)    (None, 1, 1, 128)    0           convolution_5[0][0]              \n__________________________________________________________________________________________________\nconcatenate (Concatenate)       (None, 1, 1, 384)    0           max_pooling_3[0][0]              \n                                                                 max_pooling_4[0][0]              \n                                                                 max_pooling_5[0][0]              \n__________________________________________________________________________________________________\nflatten (Flatten)               (None, 384)          0           concatenate[0][0]                \n__________________________________________________________________________________________________\ndropout (Dropout)               (None, 384)          0           flatten[0][0]                    \n__________________________________________________________________________________________________\ndense (Dense)                   (None, 18)           6930        dropout[0][0]                    \n==================================================================================================\nTotal params: 2,538,130\nTrainable params: 2,538,130\nNon-trainable params: 0\n__________________________________________________________________________________________________\n```\n\n## Model parameters\n- Padding size: 128\n- Embedding size: 512\n- Num channel: 1\n- Filter size: [3, 4, 5]\n- Num filters: 128\n- Dropout rate: 0.5\n- Regularizers lambda: 0.01\n- Batch size: 64\n- Epochs: 10\n- Fraction validation: 0.05 (1018 samples)\n- Total parameters: 2,538,130\n\n## Run\n### Train result\nUse 20351 samples after 10 epochs:\n\n| Loss | Accuracy | Val loss | Val accuracy |\n| --- | --- | --- | --- |\n| 0.1609 | 0.9683 | 0.3648 | 0.9185 |\n### Test result\nUse 2261 samples:\n\n| Accuracy | Macro-Precision | Macro-Recall | Macro-F1 |\n| --- | --- | --- | --- |\n| 0.9363 | 0.9428 | 0.9310 | **0.9360** |\n### Images\n#### Accuracy\n![Accuracy](https://github.com/ShaneTian/TextCNN/blob/master/results/2019-04-29-15-43-54/acc.jpg)\n#### Loss\n![Loss](https://github.com/ShaneTian/TextCNN/blob/master/results/2019-04-29-15-43-54/loss.jpg)\n#### Confusion matrix\n![Confusion matrix](https://github.com/ShaneTian/TextCNN/blob/master/results/2019-04-29-15-43-54/confusion_matrix.jpg)\n\n### Usage\n```\nusage: train.py [-h] [-t TEST_SAMPLE_PERCENTAGE] [-p PADDING_SIZE]\n                [-e EMBED_SIZE] [-f FILTER_SIZES] [-n NUM_FILTERS]\n                [-d DROPOUT_RATE] [-c NUM_CLASSES] [-l REGULARIZERS_LAMBDA]\n                [-b BATCH_SIZE] [--epochs EPOCHS]\n                [--fraction_validation FRACTION_VALIDATION]\n                [--results_dir RESULTS_DIR]\n\nThis is the TextCNN train project.\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -t TEST_SAMPLE_PERCENTAGE, --test_sample_percentage TEST_SAMPLE_PERCENTAGE\n                        The fraction of test data.(default=0.1)\n  -p PADDING_SIZE, --padding_size PADDING_SIZE\n                        Padding size of sentences.(default=128)\n  -e EMBED_SIZE, --embed_size EMBED_SIZE\n                        Word embedding size.(default=512)\n  -f FILTER_SIZES, --filter_sizes FILTER_SIZES\n                        Convolution kernel sizes.(default=3,4,5)\n  -n NUM_FILTERS, --num_filters NUM_FILTERS\n                        Number of each convolution kernel.(default=128)\n  -d DROPOUT_RATE, --dropout_rate DROPOUT_RATE\n                        Dropout rate in softmax layer.(default=0.5)\n  -c NUM_CLASSES, --num_classes NUM_CLASSES\n                        Number of target classes.(default=18)\n  -l REGULARIZERS_LAMBDA, --regularizers_lambda REGULARIZERS_LAMBDA\n                        L2 regulation parameter.(default=0.01)\n  -b BATCH_SIZE, --batch_size BATCH_SIZE\n                        Mini-Batch size.(default=64)\n  --epochs EPOCHS       Number of epochs.(default=10)\n  --fraction_validation FRACTION_VALIDATION\n                        The fraction of validation.(default=0.05)\n  --results_dir RESULTS_DIR\n                        The results dir including log, model, vocabulary and\n                        some images.(default=./results/)\n```\n\n```\nusage: test.py [-h] [-p PADDING_SIZE] [-c NUM_CLASSES] results_dir\n\nThis is the TextCNN test project.\n\npositional arguments:\n  results_dir           The results dir including log, model, vocabulary and\n                        some images.\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -p PADDING_SIZE, --padding_size PADDING_SIZE\n                        Padding size of sentences.(default=128)\n  -c NUM_CLASSES, --num_classes NUM_CLASSES\n                        Number of target classes.(default=18)\n```\n#### You need to know...\n1. You need to alter `load_data_and_write_to_file` function in `data_helper.py` to match you data file;\n2. This code used single channel input, you can use two channels from embedding vector, one is static and the other is dynamic. Maybe it is greater;\n3. The model is saved by `hdf5` file;\n4. Tensorboard is available.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshanetian%2Ftextcnn","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshanetian%2Ftextcnn","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshanetian%2Ftextcnn/lists"}