An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with document-classification

A curated list of projects in awesome lists tagged with document-classification .

https://github.com/castorini/hedwig

PyTorch deep learning models for document classification

deep-learning document-classification pytorch

Last synced: 04 Apr 2025

https://github.com/ematvey/hierarchical-attention-networks

Document classification with Hierarchical Attention Networks in TensorFlow. WARNING: project is currently unmaintained, issues will probably not be addressed.

deep-learning document-classification hierarchical-attention-networks machine-learning nlp tensorflow

Last synced: 04 Feb 2026

https://github.com/DataTurks/DataTurks

ML data annotations made super easy for teams. Just upload data, add your team and build training/evaluation dataset in hours.

annotation-tool document-annotate document-classification image-captioning image-classification image-processing image-segmentation java ner

Last synced: 23 Aug 2025

https://github.com/luopeixiang/textclf

TextClf :基于Pytorch/Sklearn的文本分类框架,包括逻辑回归、SVM、TextCNN、TextRNN、TextRCNN、DRNN、DPCNN、Bert等多种模型,通过简单配置即可完成数据处理、模型训练、测试等过程。

bert cnn-text-classification configurable document-classification dpcnn drnn glove logistic-regression lstm-text-classification neuralclassifier pytorch sentiment-analysis sklearn-classify svm textcnn textrnn word2vec

Last synced: 09 Apr 2025

https://github.com/renovamen/text-classification

PyTorch implementation of some text classification models (HAN, fastText, BiLSTM-Attention, TextCNN, Transformer) | 文本分类

bilstm-attention cnn document-classification fasttext han hierarchical-attention-networks lstm nlp text-classification textcnn transformer

Last synced: 24 Apr 2025

https://github.com/raviqqe/tensorflow-font2char2word2sent2doc

TensorFlow implementation of Hierarchical Attention Networks for Document Classification and some extension

deep-learning document-classification font ideogram logogram natural-language-processing python tensorflow

Last synced: 09 Oct 2025

https://github.com/tqtg/hierarchical-attention-networks

TensorFlow implementation of the paper "Hierarchical Attention Networks for Document Classification"

attention-mechanism document-classification hierarchical-attention-networks sentiment-analysis tensorflow text-classification

Last synced: 27 Mar 2025

https://github.com/microsoft/simplechat

Secure AI conversations with documents, video, audio, and more. Personal workspaces for focused context, group spaces for shared insight. Classify docs, reuse prompts, and extend with modular features.

ai-chatbot azure azure-openai collaboration document-chat document-classification modular rag rbac secure semantic-search

Last synced: 26 Jan 2026

https://github.com/GerHobbelt/qiqqa-open-source

The open-sourced version of the award-winning Qiqqa research management tool for Windows (a bleeding edge dev fork) ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ☞☞☞ File any issues you find in the main repo issue tracker at https://github.com/jimmejardine/qiqqa-open-source/issues

citations document-classification document-management meta-analysis metadata mupdf pdf qiqqa tesseract

Last synced: 08 Apr 2025

https://github.com/wri-dssg-omdena/policy-data-analyzer

Building a model to recognize incentives for landscape restoration in environmental policies from Latin America, the US and India. Bringing NLP to the world of policy analysis through an extensible framework that includes scraping, preprocessing, active learning and text analysis pipelines.

active-learning bert data-science document-classification environmental huggingface incentives landscape-restoration lda machine-learning nlp policy sbert scraping scrapy sentence-transformers spyder text-classification topic transformers

Last synced: 27 Mar 2025

https://github.com/sraashis/diseaseprediction

Undergrad final year project to predict diseases given any text symptoms.

disease-prediction document-classification final-year-project hsqldb nlp spring-mvc

Last synced: 09 Apr 2025

https://github.com/qkrdmsghk/textssl

[AAAI 2022] Sparse Structure Learning via Graph Neural Networks for Inductive Document Classification

aaai2022 document-classification graph graph-neural-networks inductive-learning natural-language-processing sparse-reconstruction text-classification

Last synced: 30 Apr 2025

https://github.com/pfalcon/papersman

Minimalist electronic documents/papers/publications manager/indexer/categorizer

document-classification document-database document-management knowledge-management minimalist papers-collection tagging

Last synced: 19 Mar 2025

https://github.com/tariqulislam/nlp_research

This is a Natural language processing for semi supervised mechine learning technique to create Document classification

document-classification gensim lda natural-language-processing nlp-machine-learning nltk pymongo pypdf2 python

Last synced: 03 Mar 2026

https://github.com/sdpdas/document-layout-generator-and-segmentation-tool

Lists all parts of a document PDF and is a highly scalable with robust code.

analysis document-classification numpy opencv-python pdf2image python

Last synced: 15 Apr 2025

https://github.com/wolny/complement-naive-bayes

Implementation of Complement Naive Bayes text classifier used for automatic categorisation of DaWanda products

complement-navie-bayes document-classification machine-learning naive-bayes-classifier

Last synced: 05 Jan 2026

https://github.com/sdpdas/yolov5-docanalyser

This tool extracts images from a PDF, annotates them using the YOLOv5 model, and converts the annotated images back into a single PDF.. https://github.com/ultralytics/yolov5 https://github.com/HumanSignal/labelImg https://www.kaggle.com/code/sagardeepdas/yolov5-model1

computer-vision dataset-generation deep-learning document-classification image-annotation kaggle labelimg machine-learning object-detection python yolov5

Last synced: 01 Apr 2025

https://github.com/yahya123-hub/classification-of-documents-using-graph-based-features-and-knn

An innovative project that integrates graph theory and machine learning techniques to classify documents into predefined topics. By leveraging graph representations of documents and employing the K-Nearest Neighbors (KNN) algorithm, this project aims to provide a robust system for document classification

cv data-visualization document-classification graph-theory image-processing machine-learning nlp

Last synced: 20 Mar 2025

https://github.com/sbischoff-ai/basic-document-classifier

A simple CNN for n-class classification of document images

cnn deep-learning document-classification image-classification neural-network

Last synced: 28 Jun 2025

https://github.com/acsenrafilho/cucaracha

A bureaucratic cockroach (cucaracha) assistent to help in document processing and analysis

document-analysis document-classification document-processing optical-character-recognition python3

Last synced: 28 Oct 2025

https://github.com/PFS-AI/PFS

The AI-powered desktop tool for finding, classifying, and understanding your files. Search by keyword, ask questions, and get insights from your scattered files instantly.

ai cross-platform data-science document-classification fastapi file-management file-organizer file-search huggingface-transformers knowledge-management langchain machine-learning productivity-tools rag scikit-learn search-engine semantic-search vector-search

Last synced: 30 Dec 2025

https://github.com/md-emon-hasan/informatruth

Fine-tuned roberta-base classifier on the LIAR dataset. Aaccepts multiple input types text, URLs, and PDFs and outputs a prediction with a confidence score. It also leverages google/flan-t5-base to generate explanations and uses an Agentic AI with LangGraph to orchestrate agents for planning, retrieval, execution, fallback, and reasoning.

ai-webapp confidence-score document-classification end-to-end-ml-workflows fake-news-detection fine-tuning flan-t5 huggingface-transformers machine-learning misinformation-detection natural-language-processing news-analysis news-classification roberta sequence-classification text-analysis text-classification transformers truth-verification url-parser

Last synced: 11 Oct 2025

https://github.com/ali7haider/classification_of_documents_using_graph-based-features_and_knn_gt

Classification of Documents Using Graph-Based Features and KNN This project offers hands-on experience with graph theory and machine learning, fostering skills in data representation, algorithm implementation, and analytical thinking in the context of document classification.

document-classification graph-construction graph-theory knn-classification machine-learning scrapping-python

Last synced: 23 Feb 2025

https://github.com/dimits-ts/large-text-nlp-survey

A survey paper exploring the use of state-of-the-art deep neural network architectures in NLP problems featuring very large documents.

bert document-classification document-summarization literature nlp sentiment-analysis survey-paper

Last synced: 26 Jan 2026

https://github.com/igoraugust0/info-org-retrieval

📙 Arquivos e materiais utilizados na disciplina GSI024 - Organização e Recuperação da Informação da UFU.

document-classification information-retrieval inverted-index jupyter-notebook pagerank

Last synced: 03 Aug 2025

https://github.com/sr-murthy/doc_classifier

A simple experimental document classification tool based on a domain-dependent, keywords-based document class map and a keyword frequency score

document-classification

Last synced: 04 Nov 2025

https://github.com/arthurdjn/pynews

NLP - Neural Network Classifier from Bag of Words features.

bag-of-words classifier document-classification nlp pytorch tutorial

Last synced: 14 Mar 2025

https://github.com/olekli/mrdocument

Automatic PDF transcription and classification via OpenAI

automation chatgpt classification document-classification documents openai transcription workflow

Last synced: 16 Feb 2026

https://github.com/vickshan001/friends-character-classifier-vector-semantics-nlp

NLP coursework using vector space semantics to classify Friends character dialogue. Includes TF-IDF, POS, sentiment, and context-aware features.

distributional-semantics document-classification friends-tv-show nlp pos-tagging python sentiment-analysis tfidf vector-space-model

Last synced: 31 Aug 2025