{"id":26802304,"url":"https://github.com/2000pawan/sentiment-analysis-","last_synced_at":"2026-04-19T19:01:16.883Z","repository":{"id":284914976,"uuid":"955493376","full_name":"2000pawan/Sentiment-Analysis-","owner":"2000pawan","description":"This project performs sentiment analysis on text extracted from URLs using Natural Language Processing (NLP) techniques. It analyzes the sentiment and readability of the extracted content and saves the results in an Excel file.","archived":false,"fork":false,"pushed_at":"2025-03-28T10:35:36.000Z","size":376,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-20T07:32:59.845Z","etag":null,"topics":["artificial-intelligence","beautifulsoup","machine-learning","nlp","nltk","python","requests-library-python","selenium-python","sentiment-analysis","webscraping"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/2000pawan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-03-26T18:14:56.000Z","updated_at":"2025-05-25T19:46:27.000Z","dependencies_parsed_at":"2025-03-28T11:33:38.593Z","dependency_job_id":null,"html_url":"https://github.com/2000pawan/Sentiment-Analysis-","commit_stats":null,"previous_names":["2000pawan/sentiment-analysis-"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/2000pawan/Sentiment-Analysis-","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/2000pawan%2FSentiment-Analysis-","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/2000pawan%2FSentiment-Analysis-/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/2000pawan%2FSentiment-Analysis-/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/2000pawan%2FSentiment-Analysis-/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/2000pawan","download_url":"https://codeload.github.com/2000pawan/Sentiment-Analysis-/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/2000pawan%2FSentiment-Analysis-/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32018764,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-18T20:23:30.271Z","status":"online","status_checked_at":"2026-04-19T02:00:07.110Z","response_time":55,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","beautifulsoup","machine-learning","nlp","nltk","python","requests-library-python","selenium-python","sentiment-analysis","webscraping"],"created_at":"2025-03-29T21:17:21.849Z","updated_at":"2026-04-19T19:01:16.865Z","avatar_url":"https://github.com/2000pawan.png","language":"Jupyter Notebook","readme":"---\n\n# **Sentiment Analysis Project**\n\n## **📌 Project Overview**\nThis project performs **sentiment analysis** on text extracted from URLs using **Natural Language Processing (NLP)** techniques. It analyzes the sentiment and readability of the extracted content and saves the results in an Excel file.\n\n---\n\n## **1. Approach to the Solution**\n\n### **🔹 Step 1: Load Necessary Data**\n- The input file `input.xlsx` contains **URLs** with their corresponding **URL_IDs**.\n- Stopwords are loaded from the [`StopWords`](https://github.com/2000pawan/Sentiment-Analysis-/tree/main/StopWords) folder to filter out unnecessary words.\n- A **positive** and **negative** words dictionary is loaded from [`MasterDictionary`](https://github.com/2000pawan/Sentiment-Analysis-/tree/main/MasterDictionary).\n\n### **🔹 Step 2: Extract Text from URLs**\n- The script fetches the webpage content using `requests` and **BeautifulSoup**.\n- It extracts the **title** and **article text** from `\u003ch1\u003e` and `\u003cp\u003e` tags.\n- The extracted text is stored in the `url_text/` folder as a `.txt` file for each URL.\n\n### **🔹 Step 3: Preprocessing the Extracted Text**\n- Tokenization is performed using `nltk.word_tokenize()`.\n- Stopwords and non-alphanumeric words are removed.\n\n### **🔹 Step 4: Sentiment Analysis \u0026 Text Complexity Metrics**\nFor each extracted text, the script computes:\n✔ **Positive Score** (count of positive words)  \n✔ **Negative Score** (count of negative words)  \n✔ **Polarity Score** (positive vs. negative balance)  \n✔ **Subjectivity Score** (extent of opinion-based content)  \n✔ **Complexity Measures**:\n   - **Fog Index** (readability metric)\n   - **Syllables per word**\n   - **Complex word count** (words with \u003e2 syllables)\n   - **Personal Pronoun Count** (e.g., \"I\", \"we\", \"my\")\n✔ **General Statistics**:\n   - **Average sentence length**\n   - **Average word length**\n\n### **🔹 Step 5: Save Output to Excel**\n- The computed scores and metrics are saved in `output.xlsx`.\n- Each row contains a **URL_ID, URL, and sentiment analysis results**.\n\n---\n\n## **2. How to Run the Script**\n\n### **💻 Prerequisites**\nEnsure you have **Python 3.x** installed on your system.\n\n### **📌 Steps to Run the Script**\n\n1️⃣ **Clone the Repository**  \nRun the following command to download the project:\n```sh\ngit clone https://github.com/2000pawan/Sentiment-Analysis-\n```\n\n2️⃣ **Install Dependencies**  \nRun the following command in your terminal or command prompt:\n```sh\npip install pandas requests beautifulsoup4 nltk openpyxl\n```\n\n3️⃣ **Prepare the Input File**  \n- Place `input.xlsx` in the same directory as the script.\n- Ensure it has **two columns: `URL_ID` and `URL`**.\n\n4️⃣ **Download Master Dictionary \u0026 Stopwords**  \n- **Master Dictionary:** [Download Here](https://github.com/2000pawan/Sentiment-Analysis-/tree/main/MasterDictionary)  \n- **Stopwords:** [Download Here](https://github.com/2000pawan/Sentiment-Analysis-/tree/main/StopWords)  \nMake sure these are placed in their respective folders.\n\n5️⃣ **Run the Script**  \nNavigate to the project folder and execute:\n```sh\npython Sentiment_analysis.py\n```\n\n6️⃣ **View the Output**  \n- Extracted webpage text is saved in `url_text/`.\n- The final **sentiment analysis report** is saved as `output.xlsx`.\n\n---\n\n## **3. Required Dependencies**\nEnsure you have the following Python libraries installed:\n\n| **Library**        | **Purpose**  | **Installation Command** |\n|--------------------|-------------|-------------------------|\n| `pandas`          | Handling Excel data  | `pip install pandas` |\n| `requests`        | Fetching webpage content  | `pip install requests` |\n| `beautifulsoup4`  | Parsing HTML  | `pip install beautifulsoup4` |\n| `nltk`            | Natural Language Processing  | `pip install nltk` |\n| `openpyxl`        | Handling Excel files  | `pip install openpyxl` |\n\nAdditionally, download the **NLTK tokenizer data** by running:\n```python\nimport nltk\nnltk.download('punkt')\n```\n\n---\n\n## **4. Import Commands**\nEnsure your script includes the following imports at the beginning:\n```python\nimport os\nimport pandas as pd\nimport requests\nfrom bs4 import BeautifulSoup\nimport nltk\nimport re\nfrom nltk.tokenize import word_tokenize, sent_tokenize\nnltk.download('punkt')\n```\n\n---\n\n## **📌 Notes**\n✅ If an error occurs due to a missing directory (`url_text/`), manually create it or ensure the script includes:\n```python\nos.makedirs(\"url_text\", exist_ok=True)\n```\n✅ Ensure your input Excel file (`input.xlsx`) is correctly formatted.\n✅ If you face encoding issues, use `ISO-8859-1` while reading text files.\n\n---\nLicense\n\nThis project is open-source and available for modification and enhancement.\n\n**🎯 Project Completed 🚀**\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F2000pawan%2Fsentiment-analysis-","html_url":"https://awesome.ecosyste.ms/projects/github.com%2F2000pawan%2Fsentiment-analysis-","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F2000pawan%2Fsentiment-analysis-/lists"}