https://github.com/hermann-web/text-preprocessing-methods-for-nlp-search-engine

This repository is about a comparison of some text preprocessing methods that i have used when working on a NLP (Natural Language Processing) project
https://github.com/hermann-web/text-preprocessing-methods-for-nlp-search-engine

correction data-cleaning lemmatization lemmatizer nlp nlp-machine-learning preprocessing python search-engine search-engines tokenization

Last synced: 8 months ago
JSON representation

This repository is about a comparison of some text preprocessing methods that i have used when working on a NLP (Natural Language Processing) project

Host: GitHub
URL: https://github.com/hermann-web/text-preprocessing-methods-for-nlp-search-engine
Owner: Hermann-web
Created: 2021-09-02T08:51:39.000Z (about 4 years ago)
Default Branch: main
Last Pushed: 2023-05-19T04:05:25.000Z (over 2 years ago)
Last Synced: 2025-01-04T21:19:54.748Z (9 months ago)
Topics: correction, data-cleaning, lemmatization, lemmatizer, nlp, nlp-machine-learning, preprocessing, python, search-engine, search-engines, tokenization
Language: Python
Homepage:
Size: 4.06 MB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# text-preprocessing-methods
It is not obvious to compute all parts of a NLP project in french language. After some research, i've found/created some methods that you could use for your own NLP projects.

# Text pre-processing methods
- clear sentences: using regex
- sentence correction: [correction.py](https://github.com/Hermann-web/text-preprocessing-methods-for-NLP-search-engine/blob/main/correcteur.py)
- tokenization : [summary_token.py](https://github.com/Hermann-web/text-preprocessing-methods-for-NLP-search-engine/blob/main/summary_token.py)
- lemmatization: [summary_lemma.py](https://github.com/Hermann-web/text-preprocessing-methods-for-NLP-search-engine/blob/main/summary_lemma.py)
- find synonyms in french: [syn_french.py](https://github.com/Hermann-web/text-preprocessing-methods-for-NLP-search-engine/blob/main/syn_french.py)

# A preprocessing algorithm
After these benchmark You cand find an [function named SENTENCE_TO_CORRECT_WORDS](https://github.com/Hermann-web/text-preprocessing-methods-for-NLP-search-engine/blob/main/all.py#LC146) in the file [all.py](https://github.com/Hermann-web/text-preprocessing-methods-for-NLP-search-engine/blob/main/all.py) that use these methods to get french tokens from a french sentence.

You can also find my search engine that use preprocessing and semantic similarities [here](https://github.com/Hermann-web/Search-engine-with-python-nlp)

# requirements
You can find them [here](https://github.com/Hermann-web/text-preprocessing-methods-for-NLP-search-engine/blob/main/requirements.txt). In respect of the methods you want to test, you can just install some of them

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/hermann-web/text-preprocessing-methods-for-nlp-search-engine

Awesome Lists containing this project

README