Projects in Awesome Lists tagged with document-classification
A curated list of projects in awesome lists tagged with document-classification .
https://github.com/kk7nc/text_classification
Text Classification Algorithms: A Survey
boosting-algorithms conditional-random-fields convolutional-neural-networks decision-trees deep-belief-network deep-learning deep-neural-network dimensionality-reduction document-classification hierarchical-attention-networks k-nearest-neighbours logistic-regression naive-bayes-classifier nlp-machine-learning random-forest recurrent-neural-networks rocchio-algorithm support-vector-machines text-classification text-processing
Last synced: 14 May 2025
https://github.com/kk7nc/Text_Classification
Text Classification Algorithms: A Survey
boosting-algorithms conditional-random-fields convolutional-neural-networks decision-trees deep-belief-network deep-learning deep-neural-network dimensionality-reduction document-classification hierarchical-attention-networks k-nearest-neighbours logistic-regression naive-bayes-classifier nlp-machine-learning random-forest recurrent-neural-networks rocchio-algorithm support-vector-machines text-classification text-processing
Last synced: 07 Apr 2025
https://github.com/brightmart/bert_language_understanding
Pre-training of Deep Bidirectional Transformers for Language Understanding: pre-train TextCNN
attention-is-all-you-need bert-model document-classification fasttext language-model language-understanding nlp pre-training question-answering self-attention text-classification textcnn transfer-learning transformer-encoder
Last synced: 13 Apr 2025
https://github.com/castorini/hedwig
PyTorch deep learning models for document classification
deep-learning document-classification pytorch
Last synced: 04 Apr 2025
https://github.com/ematvey/hierarchical-attention-networks
Document classification with Hierarchical Attention Networks in TensorFlow. WARNING: project is currently unmaintained, issues will probably not be addressed.
deep-learning document-classification hierarchical-attention-networks machine-learning nlp tensorflow
Last synced: 04 Feb 2026
https://github.com/vietnh1009/hierarchical-attention-networks-pytorch
Hierarchical Attention Networks for document classification
cnn deep-learning deep-neural-networks deeplearning document-classification han hierarchical-attention-networks nlp nlp-machine-learning python python3 pytorch text-classification
Last synced: 04 May 2025
https://github.com/sergioburdisso/pyss3
A Python package implementing a new interpretable machine learning model for text classification (with visualization tools for Explainable AI :octocat:)
artificial-intelligence data-mining document-categorization document-classification early-classification explainable-artificial-intelligence interpretability interpretable-machine-learning interpretable-ml machine-learning machine-learning-algorithms multilabel-classification natural-language-processing nlp sentence-classification ss3-classifier text-classification text-labeling text-mining xai
Last synced: 15 May 2025
https://github.com/kk7nc/hdltex
HDLTex: Hierarchical Deep Learning for Text Classification
convolutional-neural-networks dataset deep-learning deep-neural-networks document-classification gpu hierarchical-deep-learning information-retrieval recurrent-neural-networks science-dataset tensorflow text-classification text-mining
Last synced: 09 Sep 2025
https://github.com/DataTurks/DataTurks
ML data annotations made super easy for teams. Just upload data, add your team and build training/evaluation dataset in hours.
annotation-tool document-annotate document-classification image-captioning image-classification image-processing image-segmentation java ner
Last synced: 23 Aug 2025
https://github.com/sgrvinod/a-pytorch-tutorial-to-text-classification
Hierarchical Attention Networks | a PyTorch Tutorial to Text Classification
attention-mechanism document-classification hierarchical-attention-networks pytorch pytorch-tutorial text-classification text-classifier
Last synced: 14 Jun 2025
https://github.com/luopeixiang/textclf
TextClf :基于Pytorch/Sklearn的文本分类框架,包括逻辑回归、SVM、TextCNN、TextRNN、TextRCNN、DRNN、DPCNN、Bert等多种模型,通过简单配置即可完成数据处理、模型训练、测试等过程。
bert cnn-text-classification configurable document-classification dpcnn drnn glove logistic-regression lstm-text-classification neuralclassifier pytorch sentiment-analysis sklearn-classify svm textcnn textrnn word2vec
Last synced: 09 Apr 2025
https://github.com/renovamen/text-classification
PyTorch implementation of some text classification models (HAN, fastText, BiLSTM-Attention, TextCNN, Transformer) | 文本分类
bilstm-attention cnn document-classification fasttext han hierarchical-attention-networks lstm nlp text-classification textcnn transformer
Last synced: 24 Apr 2025
https://github.com/pandeykartikey/hierarchical-attention-network
Implementation of Hierarchical Attention Networks in PyTorch
deep-learning document-classification glove gru hierarchical-attention-networks nlp pytorch word2vec
Last synced: 21 Apr 2025
https://github.com/raviqqe/tensorflow-font2char2word2sent2doc
TensorFlow implementation of Hierarchical Attention Networks for Document Classification and some extension
deep-learning document-classification font ideogram logogram natural-language-processing python tensorflow
Last synced: 09 Oct 2025
https://github.com/tqtg/hierarchical-attention-networks
TensorFlow implementation of the paper "Hierarchical Attention Networks for Document Classification"
attention-mechanism document-classification hierarchical-attention-networks sentiment-analysis tensorflow text-classification
Last synced: 27 Mar 2025
https://github.com/microsoft/simplechat
Secure AI conversations with documents, video, audio, and more. Personal workspaces for focused context, group spaces for shared insight. Classify docs, reuse prompts, and extend with modular features.
ai-chatbot azure azure-openai collaboration document-chat document-classification modular rag rbac secure semantic-search
Last synced: 26 Jan 2026
https://github.com/eahlys/EdPaper
Helps you organizing your paperwork
document-classification document-management documents documents-manager php
Last synced: 03 Apr 2025
https://github.com/vietnh1009/character-level-cnn-pytorch
Character-level CNN for text classification
character-level-cnn deep-learning deep-neural-networks document-classification natural-language-processing nlp nlp-machine-learning pytorch text-classification
Last synced: 04 May 2025
https://github.com/jpleorx/detectron2-publaynet
Trained Detectron2 object detection models for document layout analysis based on PubLayNet dataset
artificial-intelligence computer-vision deep-learning detectron2 document-analysis document-classification document-layout document-layout-analysis faster-rcnn instance-segmentation layout-analysis machine-learning neural-network neural-networks object-detection publaynet python python3 pytorch
Last synced: 10 May 2025
https://github.com/GerHobbelt/qiqqa-open-source
The open-sourced version of the award-winning Qiqqa research management tool for Windows (a bleeding edge dev fork) ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ☞☞☞ File any issues you find in the main repo issue tracker at https://github.com/jimmejardine/qiqqa-open-source/issues
citations document-classification document-management meta-analysis metadata mupdf pdf qiqqa tesseract
Last synced: 08 Apr 2025
https://github.com/vietnh1009/very-deep-cnn-pytorch
Very deep CNN for text classification
deep-learning deep-neural-networks deeplearning document-classification natural-language-processing nlp pytorch text-classification vdcnn very-deep-cnn
Last synced: 04 May 2025
https://github.com/wri-dssg-omdena/policy-data-analyzer
Building a model to recognize incentives for landscape restoration in environmental policies from Latin America, the US and India. Bringing NLP to the world of policy analysis through an extensible framework that includes scraping, preprocessing, active learning and text analysis pipelines.
active-learning bert data-science document-classification environmental huggingface incentives landscape-restoration lda machine-learning nlp policy sbert scraping scrapy sentence-transformers spyder text-classification topic transformers
Last synced: 27 Mar 2025
https://github.com/vietnh1009/character-level-cnn-tensorflow
Character-level CNN for text classification
character-level-cnn deep-learning deep-neural-networks document-classification natural-language-processing nlp nlp-machine-learning tensorflow text-classification
Last synced: 26 Jul 2025
https://github.com/miteshputhran/document_classification
Python code for classification of documents into different classes using machine learning
doc2vec docfileanalysis docs document-classification jupyter-notebook machine-learning naive-bayes-classifier pdf python random-forest supervised-learning text-classification textfile word2vec xgboost
Last synced: 14 Apr 2025
https://github.com/sraashis/diseaseprediction
Undergrad final year project to predict diseases given any text symptoms.
disease-prediction document-classification final-year-project hsqldb nlp spring-mvc
Last synced: 09 Apr 2025
https://github.com/vietnh1009/very-deep-cnn-tensorflow
Very deep CNN for text classification
deep-learning deep-neural-networks deeplearning document-classification natural-language-processing nlp nlp-machine-learning tensorflow text-classification vdcnn very-deep-cnn
Last synced: 20 Oct 2025
https://github.com/qkrdmsghk/textssl
[AAAI 2022] Sparse Structure Learning via Graph Neural Networks for Inductive Document Classification
aaai2022 document-classification graph graph-neural-networks inductive-learning natural-language-processing sparse-reconstruction text-classification
Last synced: 30 Apr 2025
https://github.com/mdh266/textclassificationapp
Building and Deploying A Serverless Text Classification Web App
data-science docker document-classification fastapi imbalanced-data imbalanced-learning machine-learning naive-bayes natural-language-processing nlp nltk scikit-learn support-vector-machine text-classification
Last synced: 30 Jul 2025
https://github.com/sudharsan13296/document-classification-using-lsa
Document classification using Latent semantic analysis in python
deep-learning document document-classification keras latent-semantic-analysis lsa natural-language-processing python tensorflow tf-idf
Last synced: 12 Apr 2025
https://github.com/pfalcon/papersman
Minimalist electronic documents/papers/publications manager/indexer/categorizer
document-classification document-database document-management knowledge-management minimalist papers-collection tagging
Last synced: 19 Mar 2025
https://github.com/guillaumedd/gowpy
A very simple library for exploiting graph-of-words in NLP
classification document-classification frequent-subgraphs graph-algorithms keywords-extraction machine-learning natural-language-processing nlp subgraph-mining tw-idf
Last synced: 19 Oct 2025
https://github.com/caltechlibrary/documentarist
Process Caltech Archives' digital documents and photos, and annotate each page or image with information about its contents
annotation annotator document-classification document-image-classification document-image-processing handwriting-recognition handwritten-character-recognition handwritten-mathematical-symbols handwritten-text-recognition htr image-classification image-recognition image-tagging machine-learning math-recognition tagging
Last synced: 14 Apr 2025
https://github.com/rituyadav92/lightweighted-cnn-for-document-classification
Optimized Text Document Classification
cnn-text-classification document-classification optimization
Last synced: 07 May 2025
https://github.com/tariqulislam/nlp_research
This is a Natural language processing for semi supervised mechine learning technique to create Document classification
document-classification gensim lda natural-language-processing nlp-machine-learning nltk pymongo pypdf2 python
Last synced: 03 Mar 2026
https://github.com/sdpdas/document-layout-generator-and-segmentation-tool
Lists all parts of a document PDF and is a highly scalable with robust code.
analysis document-classification numpy opencv-python pdf2image python
Last synced: 15 Apr 2025
https://github.com/paulrinckens/han_for_doc_classification
Hierarchical Attention Networks for Document Classification
document-classification hierarchical-attention-networks machine-learning natural-language-processing
Last synced: 03 Sep 2025
https://github.com/docsaidlab/docclassifier
A zero-shot document classifier.
clip document-classification feature-learning lightning partial-fc python pytorch
Last synced: 12 Apr 2025
https://github.com/wolny/complement-naive-bayes
Implementation of Complement Naive Bayes text classifier used for automatic categorisation of DaWanda products
complement-navie-bayes document-classification machine-learning naive-bayes-classifier
Last synced: 05 Jan 2026
https://github.com/sdpdas/yolov5-docanalyser
This tool extracts images from a PDF, annotates them using the YOLOv5 model, and converts the annotated images back into a single PDF.. https://github.com/ultralytics/yolov5 https://github.com/HumanSignal/labelImg https://www.kaggle.com/code/sagardeepdas/yolov5-model1
computer-vision dataset-generation deep-learning document-classification image-annotation kaggle labelimg machine-learning object-detection python yolov5
Last synced: 01 Apr 2025
https://github.com/yahya123-hub/classification-of-documents-using-graph-based-features-and-knn
An innovative project that integrates graph theory and machine learning techniques to classify documents into predefined topics. By leveraging graph representations of documents and employing the K-Nearest Neighbors (KNN) algorithm, this project aims to provide a robust system for document classification
cv data-visualization document-classification graph-theory image-processing machine-learning nlp
Last synced: 20 Mar 2025
https://github.com/sbischoff-ai/basic-document-classifier
A simple CNN for n-class classification of document images
cnn deep-learning document-classification image-classification neural-network
Last synced: 28 Jun 2025
https://github.com/jhj0517/document_classification
finetune text classification model
ai deep-learning document-classification open-source text-classification
Last synced: 06 Apr 2025
https://github.com/extrievetechnologies/quickcapture_ios
QuickCapture Mobile Scanning SDK Specially designed for native IOS
document-classification document-scanner-app document-scanning-sdk document-understanding ios objective-c swift
Last synced: 02 Sep 2025
https://github.com/acsenrafilho/cucaracha
A bureaucratic cockroach (cucaracha) assistent to help in document processing and analysis
document-analysis document-classification document-processing optical-character-recognition python3
Last synced: 28 Oct 2025
https://github.com/PFS-AI/PFS
The AI-powered desktop tool for finding, classifying, and understanding your files. Search by keyword, ask questions, and get insights from your scattered files instantly.
ai cross-platform data-science document-classification fastapi file-management file-organizer file-search huggingface-transformers knowledge-management langchain machine-learning productivity-tools rag scikit-learn search-engine semantic-search vector-search
Last synced: 30 Dec 2025
https://github.com/md-emon-hasan/informatruth
Fine-tuned roberta-base classifier on the LIAR dataset. Aaccepts multiple input types text, URLs, and PDFs and outputs a prediction with a confidence score. It also leverages google/flan-t5-base to generate explanations and uses an Agentic AI with LangGraph to orchestrate agents for planning, retrieval, execution, fallback, and reasoning.
ai-webapp confidence-score document-classification end-to-end-ml-workflows fake-news-detection fine-tuning flan-t5 huggingface-transformers machine-learning misinformation-detection natural-language-processing news-analysis news-classification roberta sequence-classification text-analysis text-classification transformers truth-verification url-parser
Last synced: 11 Oct 2025
https://github.com/yuvaraj3855/preocr
Fast document classification and OCR detection. Analyzes any file type to determine if OCR is needed, saving time and money on unnecessary processing.
computer-vision document-analysis document-classification document-intelligence document-processing document-understanding file-analysis image-processing layout-analysis ocr ocr-detection opencv pdf pdf-analysis pdf-parsing preprocessing python python-library text-detection text-extraction
Last synced: 16 Feb 2026
https://github.com/ali7haider/classification_of_documents_using_graph-based-features_and_knn_gt
Classification of Documents Using Graph-Based Features and KNN This project offers hands-on experience with graph theory and machine learning, fostering skills in data representation, algorithm implementation, and analytical thinking in the context of document classification.
document-classification graph-construction graph-theory knn-classification machine-learning scrapping-python
Last synced: 23 Feb 2025
https://github.com/dimits-ts/large-text-nlp-survey
A survey paper exploring the use of state-of-the-art deep neural network architectures in NLP problems featuring very large documents.
bert document-classification document-summarization literature nlp sentiment-analysis survey-paper
Last synced: 26 Jan 2026
https://github.com/valaydave/fake-news-detection-han
Hierarchical Attention Neural Network For Fake News Detection
document-classification fake-news-classification han hierarchical-attention-networks keras keras-implementations lstm neural-networks python
Last synced: 27 Oct 2025
https://github.com/igoraugust0/info-org-retrieval
📙 Arquivos e materiais utilizados na disciplina GSI024 - Organização e Recuperação da Informação da UFU.
document-classification information-retrieval inverted-index jupyter-notebook pagerank
Last synced: 03 Aug 2025
https://github.com/sr-murthy/doc_classifier
A simple experimental document classification tool based on a domain-dependent, keywords-based document class map and a keyword frequency score
Last synced: 04 Nov 2025
https://github.com/arthurdjn/pynews
NLP - Neural Network Classifier from Bag of Words features.
bag-of-words classifier document-classification nlp pytorch tutorial
Last synced: 14 Mar 2025
https://github.com/graphql-api/graphql-api-rossum
unofficial graphql-wrapper for elis.rossum.ai api
apollo apollo-server document-classification graphql graphql-api graphql-schema rossum typescript
Last synced: 13 Oct 2025
https://github.com/olekli/mrdocument
Automatic PDF transcription and classification via OpenAI
automation chatgpt classification document-classification documents openai transcription workflow
Last synced: 16 Feb 2026
https://github.com/vickshan001/friends-character-classifier-vector-semantics-nlp
NLP coursework using vector space semantics to classify Friends character dialogue. Includes TF-IDF, POS, sentiment, and context-aware features.
distributional-semantics document-classification friends-tv-show nlp pos-tagging python sentiment-analysis tfidf vector-space-model
Last synced: 31 Aug 2025