{"id":25175101,"url":"https://github.com/sarahm44/crypto-sentiment-analysis","last_synced_at":"2026-01-30T09:16:34.725Z","repository":{"id":54969640,"uuid":"492438353","full_name":"sarahm44/crypto-sentiment-analysis","owner":"sarahm44","description":"Analysis of the sentiment of the latest news articles on Bitcoin and Ethereum using sentiment analysis, natural language processing and named entity recognition.","archived":false,"fork":false,"pushed_at":"2022-08-10T03:02:45.000Z","size":2816,"stargazers_count":3,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-05-06T21:11:17.694Z","etag":null,"topics":["bitcoin","bitcoin-sentiment","crypto-sentiment","ethereum","ethereum-sentiment","fintech","named-entity-recognition","natural-language-processing","newsapi","sentiment-analysis"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sarahm44.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-05-15T09:12:00.000Z","updated_at":"2024-01-27T02:36:13.000Z","dependencies_parsed_at":"2022-08-14T07:40:55.369Z","dependency_job_id":null,"html_url":"https://github.com/sarahm44/crypto-sentiment-analysis","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/sarahm44/crypto-sentiment-analysis","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sarahm44%2Fcrypto-sentiment-analysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sarahm44%2Fcrypto-sentiment-analysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sarahm44%2Fcrypto-sentiment-analysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sarahm44%2Fcrypto-sentiment-analysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sarahm44","download_url":"https://codeload.github.com/sarahm44/crypto-sentiment-analysis/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sarahm44%2Fcrypto-sentiment-analysis/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260301992,"owners_count":22988721,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bitcoin","bitcoin-sentiment","crypto-sentiment","ethereum","ethereum-sentiment","fintech","named-entity-recognition","natural-language-processing","newsapi","sentiment-analysis"],"created_at":"2025-02-09T12:28:52.615Z","updated_at":"2026-01-30T09:16:34.685Z","avatar_url":"https://github.com/sarahm44.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Crypto Sentiment Analysis\n\n![](https://github.com/sarahm44/crypto-sentiment-analysis/blob/main/images/sentimental.jpeg)\n\n## Table of Contents\n- [Overview](#overview)\n- [Sentiment Analysis](#sentiment-analysis)\n  * [Bitcoin Sentiment](#bitcoin-sentiment)\n  * [Ethereum Sentiment](#ethereum-sentiment)\n- [Natural Language Processing](#natural-language-processing)\n  * [Tokenize](#tokenize)\n  * [N-grams](#n-grams)\n  * [Word Clouds](#word-clouds)\n  * [Named Entity Recognition](#named-entity-recognition)\n\n\n## Overview\n\nIn this repository I applied natural language processing to understand the sentiment in the latest news articles featuring Bitcoin and Ethereum. I also applied fundamental NLP techniques to better understand the other factors involved with the coin prices such as common words and phrases and organizations and entities mentioned in the articles.\n\nI completed the following tasks:\n\n1. Sentiment Analysis\n2. Natural Language Processing\n3. Named Entity Recognition\n\nSee this contained in this [Jupyter Lab notebook](https://github.com/sarahm44/crypto-sentiment-analysis/blob/main/crypto_sentiment_1.ipynb).\n\n## Sentiment Analysis\n\nI used the [newsapi](https://newsapi.org/) to pull the latest news articles for Bitcoin and Ethereum and created a DataFrame of sentiment scores for each coin.\n\n### Bitcoin Sentiment\n\nI created the Bitcoin sentiment scores dataframe:\n\n![](https://github.com/sarahm44/crypto-sentiment-analysis/blob/main/images/btc_df.png)\n\nSee Bitcoin sentiment below:\n\n![](https://github.com/sarahm44/crypto-sentiment-analysis/blob/main/images/btc_sentiment.png)\n\n### Ethereum Sentiment\n\nI created the Ethereum sentiment scores dataframe:\n\n![](https://github.com/sarahm44/crypto-sentiment-analysis/blob/main/images/eth_df.png)\n\nSee Ethereum sentiment as follows:\n\n![](https://github.com/sarahm44/crypto-sentiment-analysis/blob/main/images/eth_sentiment.png)\n\nSome observations include that:\n\n* Ethereum had the highest mean positive score.\n* Ethereum had the highest mean compound score.\n* Bitcoin had the highest max compound score.\n\n## Natural Language Processing\n\nIn this section, I used NLTK and Python to tokenize text, find n-gram counts, and create word clouds for both coins. \n\n### Tokenize\n\nI used NLTK and Python to tokenize the text for each coin. I completed the following:\n\n1. Changed each word to lowercase.\n2. Removed punctuation.\n3. Removed stop words.\n\nSee relevant code below:\n\n![](https://github.com/sarahm44/crypto-sentiment-analysis/blob/main/images/tokenize.png)\n\nI then added the \"Tokens\" column of the tokenized text to the dataframe:\n\n![](https://github.com/sarahm44/crypto-sentiment-analysis/blob/main/images/tokens_df.png)\n\n### N-grams\n\nThen I looked at the ngrams and word frequency for each coin.\n\nI completed as follows:\n1. Used NLTK to produce the ngrams for N = 2.\n2. Listed the top 10 words for each coin.\n\nSee below the count for ngrams for N = 2:\n\n![](https://github.com/sarahm44/crypto-sentiment-analysis/blob/main/images/ngrams.png)\n\nSee below the code and results for the top 10 words for each coin:\n\n![](https://github.com/sarahm44/crypto-sentiment-analysis/blob/main/images/top10.png)\n\n### Word Clouds\n\nFinally, I generated word clouds for each coin to summarize the news for each coin.\n\nSee Bitcoin word cloud:\n\n![](https://github.com/sarahm44/crypto-sentiment-analysis/blob/main/images/btc_cloud.png)\n\nSee Ethereum word cloud:\n\n![](https://github.com/sarahm44/crypto-sentiment-analysis/blob/main/images/eth_cloud.png)\n\n\n### Named Entity Recognition\n\nIn this section, I built a named entity recognition (NER) model for both coins and visualized the tags using SpaCy.\n\nSee Bitcoin NER:\n\n![](https://github.com/sarahm44/crypto-sentiment-analysis/blob/main/images/btc_ner.png)\n\nSee Ethereum NER:\n\n![](https://github.com/sarahm44/crypto-sentiment-analysis/blob/main/images/eth_ner.png)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsarahm44%2Fcrypto-sentiment-analysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsarahm44%2Fcrypto-sentiment-analysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsarahm44%2Fcrypto-sentiment-analysis/lists"}