https://github.com/vishal815/language_predictor_ml_nlp

Click below to checkout the website of this ML-NLP Project
https://github.com/vishal815/language_predictor_ml_nlp

coderun data-science deep-learning githubproject huggingface language language-detection language-prediction machine-learning ml nlp nlp-machine-learning open-source streamlit textanalysis tfidf-vectorizer vishal vishal-lazrus vishallazrus

Last synced: 2 months ago
JSON representation

Click below to checkout the website of this ML-NLP Project

Host: GitHub
URL: https://github.com/vishal815/language_predictor_ml_nlp
Owner: vishal815
Created: 2023-04-20T11:13:02.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2024-03-08T05:15:35.000Z (about 1 year ago)
Last Synced: 2025-01-30T20:57:05.058Z (4 months ago)
Topics: coderun, data-science, deep-learning, githubproject, huggingface, language, language-detection, language-prediction, machine-learning, ml, nlp, nlp-machine-learning, open-source, streamlit, textanalysis, tfidf-vectorizer, vishal, vishal-lazrus, vishallazrus
Language: Jupyter Notebook
Homepage: https://vishal815-language-predictor-ml-nlp-app-dqjsvm.streamlit.app/
Size: 1.98 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Language_Predictor_ML_NLP

## huggingface
Check out the hosted website [here👉](https://vishal815-language-predictor-ml-nlp-app-dqjsvm.streamlit.app/).

## rander server
Check out the hosted website [here👉](https://huggingface.co/spaces/Visal9252/Languagepredictormlnlp)

![Screenshot 2023-04-21 022042](https://github.com/vishal815/Language_Predictor_ML_NLP/assets/83393190/1a9180f0-92fb-4b18-80fc-9664bd3a406d)

#To run code: streamlit run app.py

## TfidfVectorizer

The `TfidfVectorizer` method helps us to achieve this by generating a numerical representation of each text document based on the frequency of each term and how often it appears in each document compared to its frequency in the entire corpus.

The `ngram_range` parameter in `TfidfVectorizer` specifies the range of n-grams to be considered. An n-gram is a contiguous sequence of n items from a given sample of text or speech. By default, `TfidfVectorizer` uses a unigram approach, but specifying `ngram_range=(1,2)` means that both unigrams and bigrams will be considered.

The `analyzer` parameter in `TfidfVectorizer` specifies the type of analysis to be performed. By setting `analyzer='char'`, the vectorizer will generate character-level n-grams instead of word-level n-grams.

Using `TfidfVectorizer` from the `feature_extraction.text` module in the `scikit-learn` library, we can generate numerical representations of text data based on term frequency and inverse document frequency. By specifying `ngram_range=(1,2)` and `analyzer='char'`, we can consider both unigrams and bigrams at the character level.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/vishal815/language_predictor_ml_nlp

Awesome Lists containing this project

README