https://github.com/mrintern/machine-learning-pipeline
A mini machine learning pipeline, from scratch.
https://github.com/mrintern/machine-learning-pipeline
Last synced: about 1 year ago
JSON representation
A mini machine learning pipeline, from scratch.
- Host: GitHub
- URL: https://github.com/mrintern/machine-learning-pipeline
- Owner: mrintern
- Created: 2023-07-01T17:03:44.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2023-09-10T18:12:03.000Z (almost 3 years ago)
- Last Synced: 2025-02-10T13:43:50.875Z (over 1 year ago)
- Language: Jupyter Notebook
- Homepage:
- Size: 323 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## TODO: Run pipeline as an airflow DAG
# machine-learning-pipeline
I made this machine learning pipeline to show recruiters an example of my skills in data engineering and my style of writing and documenting code.
The pipeline does the following:
1. Sends a request to the NewsAPI's /sources endpoint (https://newsapi.org/)
2. Creates `sources.csv` from the API response
3. Sends a request to the NewsAPI's /everything endpoint
4. For every publisher in sources.csv grabs every article written in the past 3 days
5. Applies a sentiment analysis model to the article titles
6. Performs some data analysis
7. Generates a sql statement that can be used to upload this data to BigQuery for further analysis
Note: `pipeline.ipynb` is meant to be run in google colab.
Enjoy!