{"id":15521296,"url":"https://github.com/paulrinckens/han_for_doc_classification","last_synced_at":"2025-09-03T12:43:16.707Z","repository":{"id":39722642,"uuid":"256993708","full_name":"paulrinckens/han_for_doc_classification","owner":"paulrinckens","description":"Hierarchical Attention Networks for Document Classification","archived":false,"fork":false,"pushed_at":"2023-03-25T00:56:08.000Z","size":840,"stargazers_count":4,"open_issues_count":2,"forks_count":2,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-23T04:48:39.389Z","etag":null,"topics":["document-classification","hierarchical-attention-networks","machine-learning","natural-language-processing"],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/paulrinckens.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-04-19T12:28:46.000Z","updated_at":"2023-01-22T22:03:55.000Z","dependencies_parsed_at":"2025-03-05T07:41:53.780Z","dependency_job_id":null,"html_url":"https://github.com/paulrinckens/han_for_doc_classification","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/paulrinckens/han_for_doc_classification","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paulrinckens%2Fhan_for_doc_classification","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paulrinckens%2Fhan_for_doc_classification/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paulrinckens%2Fhan_for_doc_classification/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paulrinckens%2Fhan_for_doc_classification/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/paulrinckens","download_url":"https://codeload.github.com/paulrinckens/han_for_doc_classification/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paulrinckens%2Fhan_for_doc_classification/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273445648,"owners_count":25107150,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-03T02:00:09.631Z","response_time":76,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["document-classification","hierarchical-attention-networks","machine-learning","natural-language-processing"],"created_at":"2024-10-02T10:33:38.794Z","updated_at":"2025-09-03T12:43:16.660Z","avatar_url":"https://github.com/paulrinckens.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Hierarchical Attention Networks for Document Classification\n\nAn implementation of Hierarchical Attention Networks for Document Classification from the following paper:\n\n![Paper on Hierarchical Attention Networks for Document Classification](img/paper_screenshot_han_for_doc_class.png)\n\n## Content\nThis repo contains the following:\n- Implementation of the Network Architecture with the following technologies\n    - Keras\n    - Preprocessing with spaCy\n    - Embedding Layer with pretrained Glove vectors\n    - Tokenizing with Keras tokenizer\n- Utilities to train the model on the *Ten Thousand German News Article Dataset* (https://github.com/tblock/10kGNAD)\n- Jupyter Notebook used during initial development of the network\n- REST API server developed with FastAPI for predicting on a trained model and display of the sentence and word attentions in html \n\n## Get started\n- Download german word vectors pretrained by Deepset (https://deepset.ai/german-word-embeddings)\nfrom here `https://int-emb-glove-de-wiki.s3.eu-central-1.amazonaws.com/vectors.txt` and place them in the directory ``embeddings/glove_german/``\n- The dataset can be downloaded from ``https://github.com/tblock/10kGNAD/blob/master/articles.csv``. Place the articles.csv file in the directory ``data/``\n\n## Train the model\nRun the training script `train_script.py`\n\n## Start the server\nRunning \n```\nuvicorn app.app:app\n```\nstarts the server under `http://127.0.0.1:8000` \n\n### Access the API docs\nAfter running the server the API docs can be accessed via `http://127.0.0.1:8000/docs`\n \n### Make predictions\nSend a HTTP request to the running server, e.g.\n```\ncurl --location --request GET 'http://127.0.0.1:8000/predict/?text=\u003cyour-text-to-be-classifier\u003e'\n```\n\n### Visualize prediction and attentions\nSend a HTTP request to the running server, e.g.\n```\nhttp://127.0.0.1:8000/visualize/?text=\u003cyour-text-to-be-classifier\u003e\n```\nThe server responds with a static html which should look something like the following:\n![Prediction and Attention Visualization Response](img/prediction_and_attention_visualization_response.png)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpaulrinckens%2Fhan_for_doc_classification","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpaulrinckens%2Fhan_for_doc_classification","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpaulrinckens%2Fhan_for_doc_classification/lists"}