{"id":26563831,"url":"https://github.com/webdevcaptain/nlp-review","last_synced_at":"2025-03-22T16:18:18.594Z","repository":{"id":280942346,"uuid":"943686880","full_name":"WebDevCaptain/nlp-review","owner":"WebDevCaptain","description":"Reviewing basics of Natural Language Processing","archived":false,"fork":false,"pushed_at":"2025-03-06T05:33:15.000Z","size":6,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-06T06:27:50.403Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/WebDevCaptain.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-03-06T05:25:53.000Z","updated_at":"2025-03-06T05:33:19.000Z","dependencies_parsed_at":"2025-03-06T06:38:23.870Z","dependency_job_id":null,"html_url":"https://github.com/WebDevCaptain/nlp-review","commit_stats":null,"previous_names":["webdevcaptain/nlp-review"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WebDevCaptain%2Fnlp-review","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WebDevCaptain%2Fnlp-review/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WebDevCaptain%2Fnlp-review/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WebDevCaptain%2Fnlp-review/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/WebDevCaptain","download_url":"https://codeload.github.com/WebDevCaptain/nlp-review/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244982062,"owners_count":20542301,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-03-22T16:18:17.928Z","updated_at":"2025-03-22T16:18:18.577Z","avatar_url":"https://github.com/WebDevCaptain.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Natural Language Processing\n\nRepository for reviewing basics of NLP. It contains various notebooks covering almost all the topics related to NLP. It also hosts multiple projects and datasets.\n\n---\n\n## Contents\n\n1. [Basics](./1-nlp_basics.ipynb)\n\n   - Tokenization\n   - Part of Speech (POS) tagging and Parse tree\n   - Lemmatization\n   - Stemming\n\n2. [Text Preprocessing](./2-text_preprocessing.ipynb)\n\n   - Stopword removal\n   - Regexp tokenizer\n   - Data cleaning\n\n3. [Named Entity Recognition](./3-named_entity_recognition.ipynb)\n\n   - Information Extraction and NER (Named Entity Recognition)\n   - Using Spacy pipeline\n   - Web Scraping\n   - Visualizing Named Entities using `displacy`\n\n4. [Sentiment Analysis](./4-sentiment_analysis.ipynb)\n\n   - Exploratory Data Analysis\n   - Using Spacy for training a sentiment analysis model (using custom data)\n   - Model evaluation and persistence\n\n5. [Text Summarization](./5-extraction_based_summarization.ipynb)\n\n   - Scraping Wikipedia API for articles\n   - TF-IDF based summarization using Scikit-learn (Non-gramatical summaries 😭)\n   - TextRank based summarization using Sumy library.\n\n6. [Topic Modelling (NMF)](./6-topic_modelling.ipynb)\n\n   - Using NMF(Non-negative Matrix Factorization) suggest topics for BBC news articles\n   - Unsupervised text classification\n   - WordCloud for visualizing topics\n   - Using NMF model with Tf-Idf vectorizer\n   - [TODO]: Try LDA (Latent Dirichlet Allocation) for Topic Modelling\n\n7. [Recommendation Systems](./7-recommendation_systems.ipynb)\n\n   - Word2Vec model from Gensim\n   - Transfer Learning using Word2Vec and Google News weights\n   - Netflix recommendation system based on what you watched (using cosine similarity from Scikit learn)\n\n8. [Fake News Detection](./8-fake_news_detector.ipynb)\n\n   - LSTM based neural network for fake news detection (using Tensorflow and Keras)\n   - Using custom dataset for training a Deep Neural Network for NLP\n   - Data preprocessing and cleaning\n   - Binary classification (Fake or Real news)\n   - One-hot encoding of features using Keras preprocessing utility\n   - Word embeddings\n   - [Dataset](https://drive.google.com/file/d/1gsJ90FOeAAB2tm9OWn_M5vV0TvpBWcCz/view?usp=sharing)\n\n---\n\n## Extra Content\n\n1. Speech Recognition\n   - Perform _speech to text_ using **Google's speech recognition engine** and `OpenAI's Whisper` models.\n   - `Librosa` is used for audio processing\n   - `SpeechRecognition` package is used for speech recognition.\n\n---\n\n## Libraries Used\n\n1. NLTK\n2. Spacy\n3. Gensim\n4. Sumy\n5. Scikit-learn\n6. Tensorflow\n7. OpenAI Whisper\n8. SpeechRecognition\n9. WordCloud\n10. Librosa\n11. Numpy\n12. Pandas\n13. Matplotlib\n14. Seaborn\n\n---\n\n## License\n\nThis repository is released under the [MIT License](./LICENSE). It can be used for educational purposes, as well as for NLP related training, with proper attribution.\n\nIf you find it useful, please consider contributing back to the repository.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwebdevcaptain%2Fnlp-review","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwebdevcaptain%2Fnlp-review","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwebdevcaptain%2Fnlp-review/lists"}