{"id":22613332,"url":"https://github.com/sbischoff-ai/basic-document-classifier","last_synced_at":"2025-06-28T08:37:20.386Z","repository":{"id":57423417,"uuid":"200648916","full_name":"sbischoff-ai/basic-document-classifier","owner":"sbischoff-ai","description":"A simple CNN for n-class classification of document images","archived":false,"fork":false,"pushed_at":"2019-08-15T08:49:44.000Z","size":28,"stargazers_count":2,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-07T04:17:26.990Z","etag":null,"topics":["cnn","deep-learning","document-classification","image-classification","neural-network"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sbischoff-ai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-08-05T12:12:50.000Z","updated_at":"2024-02-25T13:52:07.000Z","dependencies_parsed_at":"2022-09-07T04:52:23.605Z","dependency_job_id":null,"html_url":"https://github.com/sbischoff-ai/basic-document-classifier","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sbischoff-ai%2Fbasic-document-classifier","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sbischoff-ai%2Fbasic-document-classifier/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sbischoff-ai%2Fbasic-document-classifier/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sbischoff-ai%2Fbasic-document-classifier/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sbischoff-ai","download_url":"https://codeload.github.com/sbischoff-ai/basic-document-classifier/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246117761,"owners_count":20726068,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cnn","deep-learning","document-classification","image-classification","neural-network"],"created_at":"2024-12-08T17:16:40.016Z","updated_at":"2025-03-29T00:13:50.797Z","avatar_url":"https://github.com/sbischoff-ai.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# basic-document-classifier\nA simple CNN for n-class classification of document images.\n\nIt doesn't take colour into account (it transforms to grayscale).\nFor small numbers of classes (2 to 4) this model can achieve \u003e 90% accuracy with as little as 10 to 30 training images per class.\nTraining data can be provided in [any image format supported by *PIL*](https://pillow.readthedocs.io/en/5.1.x/handbook/image-file-formats.html).\n\n## Installation\n\n```pip install document-classifier```\nor\n```poetry add document-classifier```\n\n## Usage\n\n```python\nfrom document_classifier import CNN\n\n# Create a classification model for 3 document classes.\nclassifier = CNN(class_number=3)\n\n# Train the model based on images stored on the file system.\ntraining_metrics = classifier.train(\n    batch_size=8,\n    epochs=40,\n    train_data_path=\"./train_data\",\n    test_data_path=\"./test_data\"\n)\n# \"./train_data\" and \"./test_data\" have to contain a subfolder for\n# each document class, e.g. \"./train_data/letter\" or \"./train_data/report\".\n\n# View training metrics like the validation accuracy on the test data.\nprint(training_metrics.history[\"val_acc\"])\n\n# Save the trained model to the file system.\nclassifier.save(model_path=\"./my_model\")\n\n# Load the model from the file system.\nclassifier = CNN.load(model_path=\"./my_model\")\n\n# Predict the class of some document image stored in the file system.\nprediction = classifier.predict(image=\"./my_image.jpg\")\n# The image parameter also taks binary image data as a bytes object.\n```\n\nThe prediction result is a 2-tuple containing the document class label as a string and the confidence score as a float.\n\n## Changes\n\n### 0.1.2\n- Give every CNN instance its own isolated tensorflow graph and session\n\n### 0.1.1\n- Fix a bug that occured when using multiple model instances at the same time\n\n## TODO\n\nThe model architecture is fixed for now and geared towards smaller numbers of classes and training images.\nI'm working on automatic scaling for the CNN.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsbischoff-ai%2Fbasic-document-classifier","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsbischoff-ai%2Fbasic-document-classifier","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsbischoff-ai%2Fbasic-document-classifier/lists"}