{"id":26896053,"url":"https://github.com/sofiasawczenko/analyzing_text_sentiment","last_synced_at":"2025-04-01T02:59:31.012Z","repository":{"id":268756422,"uuid":"905374734","full_name":"sofiasawczenko/analyzing_text_sentiment","owner":"sofiasawczenko","description":"Project that preprocess textual data using Python libraries such as NLTK, TextBlob, and Newspaper3k. It covers how to scrape articles, clean and process text data, and analyze its sentiment.","archived":false,"fork":false,"pushed_at":"2024-12-18T17:47:24.000Z","size":0,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-12-18T18:37:48.570Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sofiasawczenko.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-18T17:38:15.000Z","updated_at":"2024-12-18T17:55:28.000Z","dependencies_parsed_at":"2024-12-18T18:37:50.409Z","dependency_job_id":"62543be3-be7b-40eb-981a-e8a62489e3e2","html_url":"https://github.com/sofiasawczenko/analyzing_text_sentiment","commit_stats":null,"previous_names":["sofiasawczenko/analyzing_text_sentiment"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sofiasawczenko%2Fanalyzing_text_sentiment","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sofiasawczenko%2Fanalyzing_text_sentiment/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sofiasawczenko%2Fanalyzing_text_sentiment/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sofiasawczenko%2Fanalyzing_text_sentiment/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sofiasawczenko","download_url":"https://codeload.github.com/sofiasawczenko/analyzing_text_sentiment/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246574843,"owners_count":20799221,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-04-01T02:59:30.459Z","updated_at":"2025-04-01T02:59:31.003Z","avatar_url":"https://github.com/sofiasawczenko.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Preprocessing and Analyzing Text Sentiment with NLTK and TextBlob\n\n## Preprocessing Data with NLTK\n\nThis repository demonstrates how to preprocess textual data using Python libraries such as NLTK, TextBlob, and Newspaper3k. It covers how to scrape articles, clean and process text data, and analyze its sentiment.\n\n## Required Libraries\n- NLTK: The Natural Language Toolkit (NLTK) is used for text preprocessing tasks like tokenization and downloading necessary language resources.\n- TextBlob: A simple Python library for processing textual data. It is used here to perform sentiment analysis on the extracted article.\n- Newspaper3k: A Python library used for extracting and parsing articles from the web.\n- lxml: An XML and HTML processing library used in conjunction with the newspaper3k for efficient content extraction.\nOnce installed, NLTK will automatically download the necessary data packages (e.g., tokenizers) to process the text data.\n\n## Usage\nThe example script processes an article from the web, performs text extraction, and analyzes sentiment using TextBlob.\n\n### 1. Install the dependencies\n  ```bash\npip install nltk textblob newspaper3k\npip install lxml[html_clean]\n```\n### 2. Download necessary NLTK data\nThe script starts by downloading NLTK's tokenizer for text processing:\n\n  ```bash\nimport nltk\nnltk.download('punkt_tab')\n```\n### 3. Extract article content\nNext, the script uses newspaper3k to download and parse an article:\n\n  ```bash\nfrom newspaper import Article\n\nurl = 'https://blog.reedsy.com/short-story/8z78f4/'\n\narticle = Article(url)\narticle.download()\narticle.parse()\narticle.nlp()\n\ntext = article.text\nprint(text)\n```\n\nThe article text is printed and can be processed further.\n\n### 4. Analyze sentiment with TextBlob\nSentiment analysis is performed on the extracted text using TextBlob:\n\n  ```bash\nfrom textblob import TextBlob\n\nblob = TextBlob(text)\nsentiment = blob.sentiment.polarity\nprint(f\"Sentiment: {'positive' if sentiment \u003e 0 else 'negative' if sentiment \u003c 0 else 'neutral'}\")\n```\nThis outputs whether the sentiment of the article is positive, negative, or neutral based on its polarity score.\n\n### Example Output\nAfter running the code, you'll see the extracted text printed, followed by the sentiment analysis result:\n\n  ```bash\nSentiment: positive\n```\n\n## License\nThis code is licensed under the MIT License. See the LICENSE file for more details.\n\n## Contributions\nFeel free to open issues or submit pull requests if you want to contribute to this project!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsofiasawczenko%2Fanalyzing_text_sentiment","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsofiasawczenko%2Fanalyzing_text_sentiment","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsofiasawczenko%2Fanalyzing_text_sentiment/lists"}