https://github.com/marios-mamalis/airflowsentimentanalysis
Airflow DAG for sentiment analysis on GCP
https://github.com/marios-mamalis/airflowsentimentanalysis
airflow google-cloud-bigquery google-cloud-composer sentiment-analysis
Last synced: over 1 year ago
JSON representation
Airflow DAG for sentiment analysis on GCP
- Host: GitHub
- URL: https://github.com/marios-mamalis/airflowsentimentanalysis
- Owner: Marios-Mamalis
- License: mit
- Created: 2024-07-05T15:30:53.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-07-25T20:24:08.000Z (almost 2 years ago)
- Last Synced: 2025-01-13T04:42:02.521Z (over 1 year ago)
- Topics: airflow, google-cloud-bigquery, google-cloud-composer, sentiment-analysis
- Language: Python
- Homepage:
- Size: 6.84 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## Airflow DAG for sentiment analysis on GCP
Airflow DAG that performs sentiment analysis on API-fetched news articles.
The DAG is designed to be deployed on GCP Composer and run at a daily interval.
During the DAG Run the processes below are serially executed:
1) A table is idempotently created on Google BigQuery. Each row represents a news article,
with the table's columns being `title`, `creator`, `description`, `country`, `category`,
`description_sentiment`, `topic` and `retrievaldate`.
2) News articles are fetched using the `newsdata.io` API client.
The topic and volume of the articles fetched are configurable via the YAML file.
3) The articles' descriptions are processed with standard NLP methods and their sentiment is calculated
with the `Vader` `NLTK` sub-module. Sentiment scores range from 0 (negative sentiment)
to 1 (positive sentiment).
4) The results of step 3 are stored in the BigQuery table created in step 1.