Projects in Awesome Lists tagged with text-preprocessing
A curated list of projects in awesome lists tagged with text-preprocessing .
https://github.com/adbar/trafilatura
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
article-extractor corpus corpus-builder corpus-tools crawler html-to-markdown html2text news news-aggregator news-crawler nlp readability rss-feed scraping tei text-cleaning text-extraction text-mining text-preprocessing web-scraping
Last synced: 24 Dec 2025
https://github.com/jbesomi/texthero
Text preprocessing, representation and visualization from zero to hero.
machine-learning nlp nlp-pipeline text-clustering text-mining text-preprocessing text-representation text-visualization texthero word-embeddings
Last synced: 14 May 2025
https://github.com/jfilter/clean-text
๐งน Python package for text cleaning
natural-language-processing nlp python python-package scraping text-cleaning text-normalization text-preprocessing user-generated-content
Last synced: 15 May 2025
https://github.com/lyeoni/prenlp
Preprocessing Library for Natural Language Processing
natural-language-processing nlp preprocessing-library text-preprocessing text-processing
Last synced: 10 Apr 2025
https://github.com/ezgisubasi/turkish-tweets-sentiment-analysis
This sentiment analysis project determines whether the tweets posted in the Turkish language on Twitter are positive or negative.
data-visualization deep-learning glove glove-embeddings keras n-grams nlp sentiment-analysis text-preprocessing turkish-language turkish-nlp tweets twitter-sentiment-analysis zemberek zemberek-nlp
Last synced: 25 Oct 2025
https://github.com/Lipairui/textgo
Text preprocessing, representation, similarity calculation, text search and classification. Let's go and play with text!
bert nlp text-classification text-preprocessing text-representation text-search text-similarity
Last synced: 18 Jul 2025
https://github.com/CDSoft/panda
Panda is a Pandoc Lua filter that works on internal Pandoc's AST. Panda is heavily inspired by [abp](http:/cdelord.fr/abp) reimplemented as a Pandoc Lua filter.
lua pandoc pandoc-filter text-preprocessing
Last synced: 10 May 2025
https://github.com/danielhaim1/titlecaser
A powerful utility for transforming text to title case with support for multiple style guides and extensive customization options.
acronym-identification apa-style case-conversion case-converter case-formatting headline-optimization javascript sentence-case string-manipulation string-utils style-guide text-parser text-preprocessing text-processing text-transformation text-utils title-case-converter titlecase titlecasing word-casing
Last synced: 19 Jun 2025
https://github.com/tesserato/inscribe
Markdown preprocessor that runs code fences
markdown rust text-preprocessing
Last synced: 19 Oct 2025
https://github.com/VivekChoudhary77/Textify-text-Preprocessing
A text preprocessing web application
text-generation text-preprocessing text-summarization text-summarizer
Last synced: 15 Apr 2025
https://github.com/venkat-0706/sentimental-analysis
Build a model to classify text as positive, negative, or neutral. Apply NLP techniques for preprocessing and machine learning for classification. Aim for accurate sentiment prediction on various text formats.
data-visualization feature-engineering machine-learning natural-language-processing numpy pandas python scikit-learn sentiment-detection supervised-learning text-classification text-preprocessing tokenizaiton wordcloud
Last synced: 10 Mar 2025
https://github.com/lanl/t-elf
Tensor Extraction of Latent Features (T-ELF). Within T-ELF's arsenal are non-negative matrix and tensor factorization solutions, equipped with automatic model determination (also known as the estimation of latent factors - rank) for accurate data modeling. Our software suite encompasses cutting-edge data pre-processing and post-processing modules.
blind-source-separation dimensionality-reduction feature-extraction gpu high-performance-computing hpc latent-variables machine-learning matrix matrix-completion matrix-factorization non-negative-matrix-factorization pattern-extraction semi-supervised-learning tensor-decomposition tensor-factorization tensors text-preprocessing unsupervised-learning
Last synced: 12 Apr 2025
https://github.com/byam/mnlp
MNLP: Mongolian Natural Language Processing.
hacktoberfest mongolian mongolian-text-classification nlp text-preprocessing
Last synced: 13 Sep 2025
https://github.com/giocoal/reddit-tldr-summarizer-and-topic-modeling
Extreme Extractive Text Summarization and Topic Modeling (using LSA and LDA techniques) over Reddit Posts from TLDRHQ dataset.
extreme-summarization latent-dirichlet-allocation latent-semantic-analysis lda lda-model lsa lsa-model nlp part-of-speech-tagging reddit reddit-bot reddit-dataset social-media summarization text-analysis text-preprocessing text-summarization tldr tldr9 topic-modeling
Last synced: 11 Mar 2025
https://github.com/sayamalt/resume-classification-using-fine-tuned-bert
Successfully developed a resume classification model which can accurately classify the resume of any person into its corresponding job with a tremendously high accuracy of more than 99%.
bert-model exploratory-data-analysis fine-tuning-bert model-evaluation nlp text-preprocessing text-tokenization word-embeddings
Last synced: 31 Aug 2025
https://github.com/sayamalt/language-detection-using-fine-tuned-xlm-roberta-base-transformer-model
Successfully developed a language detection transformer model that can accurately recognize the language in which any given text is written.
bert-fine-tuning feature-engineering fine-tuning model-evaluation model-evaluation-metrics nlp text-classification text-preprocessing xlm-roberta
Last synced: 31 Aug 2025
https://github.com/andythefactory/article-extraction-dataset
Article title, authors, date and body extraction dataset.
article-extractor corpus corpus-builder corpus-tools dataset datasets html-to-markdown html2text news news-aggregator news-crawler readability scraping scraping-websites text-cleaning text-extraction text-mining text-preprocessing web-scraping
Last synced: 06 Nov 2025
https://github.com/ailln/proces
๐จ text preprocess.
python-package python3 text-preprocessing text-processing
Last synced: 08 May 2025
https://github.com/sayamalt/emotion-detection-using-fine-tuned-bert-transformer
Successfully developed a fine-tuned BERT transformer model which can effectively perform emotion classification on any given piece of texts to identify a suitable human emotion based on semantic meaning of the text.
bert-transformer emotion-classification feature-engineering fine-tuning-bert natural-language-processing natural-language-understanding text-preprocessing
Last synced: 01 Sep 2025
https://github.com/bhattbhavesh91/texthero-demo
Tutorial to demonstrate the power of Texthero which is a library used for Text preprocessing, representation and visualization from zero to hero.
nlp nlp-pipeline text-clustering text-mining text-preprocessing text-representation text-visualization texthero texthero-tutorial word-embeddings
Last synced: 26 Oct 2025
https://github.com/sayamalt/sms-spam-classification-using-fine-tuned-roberta-base-transformer
Successfully developed a fine-tuned RoBERTa transformer model which can almost perfectly classify whether any given SMS is spam or not.
bert-embeddings bert-fine-tuning feature-engineering natural-language-processing natural-language-understanding roberta text-classification text-preprocessing transformers
Last synced: 06 Aug 2025
https://github.com/bhattbhavesh91/clean-text-demo
Tutorial on Clean-Text which is a Python package for text cleaning
machine-learning natural-language-processing nlp python text-cleaning text-preprocessing tutorial user-generated-content
Last synced: 11 Jul 2025
https://github.com/prashver/movie-recommendation-system
This recommendation system employs content-based filtering and NLP preprocessing to suggest similar movies based on user preferences and movie data. It fetches movie posters via APIs and is deployed on Streamlit for easy access.
api-request natural-language-processing nltk-python numpy pandas recommender-system streamlit-deployment text-preprocessing
Last synced: 12 Oct 2025
https://github.com/sayamalt/abstractive-text-summarization-of-news-articles
Successfully developed an encoder-decoder based sequence to sequence (Seq2Seq) model which can summarize the entire text of an Indian news summary into a short paragraph with limited number of words.
attention-is-all-you-need attention-mechanism lstm-neural-networks natural-language-processing sequence-to-sequence-models text-generation text-preprocessing
Last synced: 09 Nov 2025
https://github.com/adilrasheed139/ai-powered-resume-screening-using-bert
Successfully developed a resume classification model which can accurately classify the resume of any person into its corresponding job with a tremendously high accuracy of more than 99%.
bert-model deep-learning exploratory-data-analysis-eda fine-tuning-bert model-evaluation nlp nlp-machine-learning text-preprocessing text-tokenization word-embeddings word-embeddings-for-nlp
Last synced: 03 Apr 2025
https://github.com/jesly-joji/spam-ham-classifier
Used Naive Bayes Algorithm, NLP Text Preprocessing Techniques
naive-bayes-classifier nlp scikit-learn streamlit text-preprocessing
Last synced: 18 Aug 2025
https://github.com/sd7campeon/yelp-sentiment-analysis-with-python-bs4-and-llm
A scalable pipeline for automated extraction, preprocessing, and sentiment analysis of Yelp reviews. Uses advanced HTTP requests, HTML parsing, and text normalization (tokenization, stopword removal, lemmatization) to enable precise polarity and subjectivity analysis for consumer insights and business analytics.
beautifulsoup beautifulsoup4 business-analytics cuda data-analysis nlp-machine-learning nltk opinion-mining pandas python python3 requests-library-python sentiment-analysis text-preprocessing textblob torch web-scraping yelp-reviews
Last synced: 18 Oct 2025
https://github.com/sayamalt/news-category-classification
Successfully developed a news category classification model using fine-tuned BERT which can accurately classify any news text into its respective category i.e. Politics, Business, Technology and Entertainment.
bert-embeddings exploratory-data-analysis feature-engineering fine-tuning-bert model-evaluation nlp text-classification text-cleaning text-preprocessing text-tokenization
Last synced: 15 Jun 2025
https://github.com/ssciwr/mailcom
Recognize and pseudonymize named entities in emails
anonymization data-privacy llm-inference pseudonymization text-preprocessing
Last synced: 21 Apr 2025
https://github.com/ajaykumar095/natural_language_processing
Explore cutting-edge Natural Language Processing (NLP) techniques in this GitHub repository. Includes pre-trained models, custom NLP pipelines, text preprocessing tools, sentiment analysis, text classification, and more. Ideal for research, learning, and deploying NLP solutions in Python.
ann nltk-python python rnn spacy tensorflow text-preprocessing textblob
Last synced: 20 Sep 2025
https://github.com/pngo1997/word-embeddings-co-occurrence-svd-glove
Explores word embeddings.
co-occurence-matrix embeddings glove glove-embeddings nltk python singular-value-decomposition text-preprocessing
Last synced: 23 Nov 2025
https://github.com/theveryhim/massive-text-processing-1
cleaning, processing and analysis of papers' dataset in pyspark(rdd) framework
big-data data-analysis frequent-itemsets massive-datasets pyspark text-preprocessing
Last synced: 03 Jul 2025
https://github.com/bhargav-joshi/nlp-practicals
Natural Language Processing Practicals on different concepts to analyze and understand the practical implementation their use and actual use.
chunking morphological-analysis n-grams name-entity-recognition natural-language-processing nlp nlp-machine-learning pos-tagging text-classification text-preprocessing topic-modeling
Last synced: 09 Apr 2025
https://github.com/pngo1997/text-processing-tokenization
Simple text analysis and tokenization.
nltk python sentence-segmentation text-preprocessing tokenization word-frequency zipfs-law
Last synced: 28 Feb 2025
https://github.com/pngo1997/n-gram-language-models
Builds N-gram language modes and applies text generation.
bigrams cfd conditional-frequency-distribution greedy-algorithms laplace-smoothing natural-language-processing ngrams nltk nucleus-sampling perplexity python random-sampling text-generation text-preprocessing trigrams unigram
Last synced: 28 Feb 2025
https://github.com/imdeepmind/textpreprocessingscript
Text Preprocessing Script: This is a simple python script that i use for preprocessing text using NLTK.
deep-learning machine-learning natural-language-processing nltk python3 text-preprocessing
Last synced: 22 Feb 2025
https://github.com/evanch98/natural-language-processing-python
Natural Language Processing
jupyter-notebook n-grams natural-language-processing python stemming-and-lemmatization text-preprocessing tokenization
Last synced: 01 Mar 2025
https://github.com/jasoncobra3/natural_language_processing
Natural Language Processing (NLP) is a captivating field at the intersection of computer science and linguistics. It enables machines to understand, interpret, and respond to human language in a way that is both meaningful and useful. From chatbots to sentiment analysis, NLP applications are transforming industries and enhancing user experiences.
artificial-intelligence artificial-neural-networks data-science deep-learning google-news-scraper machine-learning natural-language-processing nlp nlp-pipeline parts-of-speech pos python text text-analysis text-classification text-preprocessing text-representation word2vec-embeddinngs word2vec-model
Last synced: 16 Mar 2025
https://github.com/sayamalt/mental-health-classification-using-fine-tuned-distilbert
Successfully established a multiclass text classification model by fine-tuning pretrained DistilBERT transformer model to classify several distinct types of mental health statuses such as anxiety, stress, personality disorder, etc. with an accuracy of 77%.
data-visualization deep-learning distilbert-fine-tuning distilbert-model model-evaluation model-inference model-training-and-evaluation multiclass-text-classification natural-language-processing text-classification text-preprocessing text-tokenization
Last synced: 08 Oct 2025
https://github.com/sayamalt/luxury-apparel-product-category-classification-using-fine-tuned-distilbert
Successfully developed a multiclass text classification model by fine-tuning pretrained DistilBERT transformer model to classify various distinct types of luxury apparels into their respective categories i.e. pants, accessories, underwear, shoes, etc.
deep-learning distilbert-fine-tuning distilbert-model exploratory-data-analysis fine-tuning-bert model-inference model-training-and-evaluation multiclass-classification natural-language-processing text-classification text-preprocessing text-tokenization
Last synced: 08 Oct 2025
https://github.com/antoniskl/un-general-debate-corpus-classification
The aim of this project is to classify UNGDC speeches with regards to climate change. As a secondary objective, a correlation is being examined between these speeches, the forestation and the happiness index of the countries.
classification data-science jupyter-notebook machine-learning nlp python regression scikit-learn text-classification text-preprocessing
Last synced: 09 Oct 2025
https://github.com/bragdond/naive-bayes-ngram-text-classifier-nlp
Basic Naive Bayes classifier for text classification using ngram
naive-bayes-algorithm naive-bayes-classifier ngram nlp text-classification text-preprocessing
Last synced: 19 Oct 2025
https://github.com/sayamalt/financial-news-sentiment-analysis
Successfully developed a fine-tuned DistilBERT transformer model which can accurately predict the overall sentiment of a piece of financial news up to an accuracy of nearly 81.5%.
data-exploration-and-preprocessing distilbert-model fine-tune-bert-tensorflow hugging-face-transformers model-architecture-and-implementation model-inference model-training-and-evaluation multiclass-classification natural-language-processing sentiment-analysis text-preprocessing text-tokenization
Last synced: 17 Oct 2025
https://github.com/somjit101/nlp-casestudy-amazon-fine-foods-review
Efficient Sentencing Encoding and Vectorization techniques with customer reviews on a product page of the popular E-Commerce website, Amazon using proven NLP techniques for the purpose of sentiment analysis.
amazon-fine-food-reviews amazon-fine-food-reviews-dataset featurization natural-language-processing nlp text-classification text-preprocessing tfidf-vectorizer vectorization word2vec
Last synced: 06 Mar 2025
https://github.com/christoph/tagrefinery-releases
Guided Label and Tag Preprocessing
data-wrangling spell-correction tags text-preprocessing text-processing
Last synced: 02 Mar 2025
https://github.com/moustafamohamed01/web-summarizer-ai
A Python tool to scrape and summarize website content using AI. Built with Selenium, BeautifulSoup, and Google's Gemini AI, this project extracts the main text from any website and generates a concise summary in markdown format. Perfect for quickly understanding long articles, blogs, or news pages.
ai beautifulsoup gemini-ai python selenium text-preprocessing web-scraping
Last synced: 17 Mar 2025
https://github.com/vishnun0027/sentiment-analysis
Here the several ways to perform sentiment analysis on text data, with varying degrees of complexity and accuracy
bag-of-words deep-learning deep-neural-networks deeplearning gru logistic-regression lstm machine-learning nltk-python rnn rnn-model sentiment-analysis svm-model tensorflow tensorflow2 text-classification text-preprocessing tf-idf tokenizer
Last synced: 20 Mar 2025
https://github.com/mevlutayilmaz/text-summarization
text summarization in python
docx matplotlib networkx nltk pyqt5 python sklearn text-preprocessing text-summarization tf-idf
Last synced: 15 Jun 2025
https://github.com/bilalhameed248/faq-chat-bot-using-vertexai
A generative AI-based FAQ Chat-Bot with a Flask Back-End, designed to operate within an organization's internal domain. - Jul 2023 - Oct 2023
csv embeddings flask gecko html java jquery natural-language-processing nlp python python3 pytorch text-bison text-embedding text-preprocessing vertex-ai
Last synced: 30 Dec 2025
https://github.com/vlada-pv/prediction-sociolinguistic-data-based-on-the-diaries-texts-of-the-prozhito-project
The repository contains notebooks created for collecting and preprocessing the corpus of diary entries and for experiments on creating models for predicting gender, age groups of authors and the time period of text creation.
author-profiling bag-of-words bilstm convol convolutional-neural-networks deep-learning diary-entries logistic-regression naive-bayes-classifier neural-networks recurrent-neural-networks sociolinguistics text-preprocessing text-vectorization tf-idf-vectorizer word-embeddings
Last synced: 13 Jul 2025
https://github.com/theveryhim/massive-text-processing
cleaning, processing and analysis of papers' dataset in pyspark(rdd) framework
big-data data-analysis frequent-itemsets massive-datasets pyspark text-preprocessing
Last synced: 18 Jul 2025
https://github.com/nurfawaiq/nlp-text-preprocessing
Natural Language Processing - Text Preprocessing
crawling nlp python3 text-preprocessing twitter
Last synced: 11 Sep 2025
https://github.com/hariprasath-v/machinehack-sentiment_analysis_weekend_hackathon_edition_2
Sentiment classification of reviews/tweets
nlp-machine-learning regular-expression seaborn text-preprocessing
Last synced: 02 Mar 2025
https://github.com/pantpujan017/nenglish-stopwords-chat-analysis
nepali stop words
chat-analysis code-mixed-language messenger-nlp nenglish nepali-nlp social-media-nlp stopwords text-preprocessing viber-chat whatsapp-chat
Last synced: 20 Jul 2025
https://github.com/abinashsahoo007/project-resume-classification
The document classification solution should significantly reduce the manual human effort in the HRM. It should achieve a higher level of accuracy and automation with minimal human intervention.
corpus count-vectorizer label-encoding lemmitization machine-learning nltk part-of-speech-tagging resume-classification spacy stemming text-mining text-preprocessing textract tfidf-vectorizer tokenization wordcloud
Last synced: 17 Jun 2025
https://github.com/oya163/datascience101
Pushing things as I do data science stuff
data-science dataset embeddings quora text-preprocessing traditional-machine-learning
Last synced: 24 Feb 2025
https://github.com/arnab-0053/song-identifier
It identifies songs and artists from lyric snippets using two distinct methods - simple NLP based approach and BM25(Best Match 25) approach.
bm25 nlp nltk python rank-bm25 scikit-learn song-lyrics spotify-dataset text-preprocessing
Last synced: 05 Mar 2025
https://github.com/farhad-here/textprepx
A Multilingual Text Preprocessing Tool for English and Persian.
cleantext contractions data-analysis deep-learning emoji nlp nltk opp parsivar regex streamlit text-preprocessing textblob
Last synced: 07 May 2025
https://github.com/sayamalt/fake-news-classification-using-fine-tuned-bert
Successfully developed a text classification model to predict whether a given news text is fake or not by fine-tuning a pretrained BERT transformed model imported from Hugging Face.
bert-embeddings bert-model data-analysis data-visualization deep-learning fine-tuning-bert model-evaluation model-training-and-evaluation text-classification text-preprocessing text-tokenization tokenizer-nlp wordcloud-visualization
Last synced: 05 Apr 2025
https://github.com/farshad-hasanpour/textfeature
transforms unstructured text to feature vector using word2vec, lexicon and ...
bag-of-words python text-preprocessing text2vec word2vec
Last synced: 26 Feb 2025
https://github.com/kunalpisolkar24/ir_lab
Collection of practical codes for Savitribai Phule Pune University's Information Retrieval Lab (410247) .
cosine-similarity information-retrieval map-reduce pagerank sppu-computer-engineering text-preprocessing web-crawling
Last synced: 05 Mar 2025
https://github.com/sayamalt/symptoms-disease-text-classification
Successfully developed a fine-tuned BERT transformer model which can accurately classify symptoms to their corresponding diseases upto an accuracy of 89%.
bert-fine-tuning data-exploration-and-preprocessing exploratory-data-analysis fine-tune-bert-tensorflow hugging-face-transformers model-architecture-and-implementation model-inference model-training-and-evaluation multiclass-classification natural-language-processing text-classification text-preprocessing text-tokenization
Last synced: 09 Nov 2025
https://github.com/sayamalt/english-to-spanish-language-translation-using-seq2seq-and-attention
Successfully established a Seq2Seq with attention model which can perform English to Spanish language translation up to an accuracy of almost 97%.
attention-is-all-you-need attention-model bert-transformer exploratory-data-analysis fine-tuning-bert hugging-face-transformers language-translation luong-attention model-architecture-and-implementation model-inference model-training-and-evaluation natural-language-processing neural-machine-translation seq2seq-modeling text-generation text-preprocessing text-tokenization
Last synced: 09 Nov 2025
https://github.com/sayamalt/cyberbullying-classification-using-fine-tuned-distilbert
Successfully fine-tuned a pretrained DistilBERT transformer model that can classify social media text data into one of 4 cyberbullying labels i.e. ethnicity/race, gender/sexual, religion and not cyberbullying with a remarkable accuracy of 99%.
cyberbullying-detection data-exploration distilbert-model exploratory-data-analysis fine-tune-bert-tensorflow llm model-inference model-training-and-evaluation multiclass-classification natural-language-processing text-classification text-preprocessing text-tokenization
Last synced: 09 Nov 2025
https://github.com/sayamalt/text-similarity-quantifier
Successfully developed a machine learning model for computing the similarity score between two text paragraphs taken as input from a webpage.
bag-of-words cosine-similarity cosine-similarity-scores countvectorizer flask machine-learning nlp pandas python text-preprocessing tfidf
Last synced: 09 Nov 2025
https://github.com/sayamalt/quora-duplicate-question-pairs-identification
Successfully developed a machine learning model which can accurately detect whether any given pair of Quora questions are duplicate or not.
data-visualization machine-learning natural-language-processing nltk paraphrase-detection text-preprocessing
Last synced: 09 Nov 2025
https://github.com/sayamalt/e-commerce-text-classification
Successfully established a machine learning model that can accurately classify an e-commerce product into one of four categories, namely "Books", "Clothing & Accessories", "Household" and "Electronics", based on the product's description.
categorical-encoding cross-validation exploratory-data-analysis hyperparameter-optimization machine-learning model-deployment model-training-and-evaluation text-classification text-preprocessing text-vectorization
Last synced: 09 Nov 2025
https://github.com/sayamalt/detection-of-disaster-from-tweets
Successfully established a machine learning model for detecting whether a given tweet is about a real disaster or not.
data-cleaning eda feature-engineering feature-extraction machine-learning natural-language-processing sklearn text-preprocessing
Last synced: 09 Nov 2025
https://github.com/gaaniruddha/fit5196-a1
This repository contains assignments #1 that was completed as a part of "FIT5196 Data Wrangling", taught at Monash Uni in S2 2020.
bigrams count-vectorizer langid parsing-text python regular-expressions text-preprocessing unigrams
Last synced: 26 Feb 2025
https://github.com/mrqadeer/internet_words_remover
Python module designed to replace common internet slang and abbreviations with their full forms, enhancing the readability of informal text. It efficiently cleans text data from chats, social media, and online communication. The module also supports tokenization and integrates seamlessly with pandas for batch processing of text in DataFrames.
pandas python3 text-preprocessing
Last synced: 23 Mar 2025
https://github.com/michel-nemo/yelp-sentiment-analysis-with-python-bs4-and-llm
Explore the Yelp-Sentiment-Analysis-with-Python-BS4-and-LLM repository to extract and analyze customer reviews efficiently. This project utilizes powerful Python libraries for scraping, processing, and visualizing sentiment data. ๐๐
beautifulsoup beautifulsoup4 business-analytics cuda data-analysis nltk opinion-mining pandas python requests-library-python sentiment-analysis text-preprocessing textblob torch web-scraping yelp-reviews
Last synced: 15 Jun 2025
https://github.com/atheeralzhrani/arabic_nlp
This repository contains projects focused on Arabic Natural Language Processing (NLP)
arabic-dataset arabic-language arabic-language-dataset arabic-nlp arabic-text-classification arabic-text-detection arabic-text-recognition huggingface spacy-nlp stemming text-preprocessing tokenization
Last synced: 16 Oct 2025
https://github.com/mrqadeer/text_prettifier
Python library designed to clean and preprocess text data by removing unwanted elements such as HTML tags, URLs, numbers, special characters, emojis, contractions, and stopwords. It offers flexible functionality, including options to return text in lowercase and as a list of tokens.
nltk-library python3 regular-expressions text-cleaning text-preprocessing
Last synced: 23 Mar 2025
https://github.com/sayamalt/english-to-german-translation-using-seq2seq
Successfully established a neural machine translation model using sequence to sequence modeling which can successfully translate English sentences to their corresponding German translations.
natural-language-processing neural-language-translation sequence-to-sequence-models text-generation-using-lstm text-preprocessing
Last synced: 02 Jul 2025