Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/victoriacheng15/articles-extractor
Python app utilizing Beautiful Soup, Docker, Bash, Raspberry Pi, and cron job to automate article extraction from preferred websites and organize them into Google Sheets.
https://github.com/victoriacheng15/articles-extractor
bash docker google-sheets google-sheets-api python3 raspberry-pi
Last synced: 3 months ago
JSON representation
Python app utilizing Beautiful Soup, Docker, Bash, Raspberry Pi, and cron job to automate article extraction from preferred websites and organize them into Google Sheets.
- Host: GitHub
- URL: https://github.com/victoriacheng15/articles-extractor
- Owner: victoriacheng15
- License: mit
- Created: 2024-02-04T20:28:44.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2024-09-25T14:33:05.000Z (4 months ago)
- Last Synced: 2024-11-01T23:30:31.289Z (3 months ago)
- Topics: bash, docker, google-sheets, google-sheets-api, python3, raspberry-pi
- Language: Python
- Homepage:
- Size: 46.9 KB
- Stars: 2
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Article Extractor
This application is created to retrieve articles from freeCodeCamp and Substack, and subsequently transfer all pertinent article details to a Google Sheet.
## Tech Stacks
![Python](https://img.shields.io/badge/Python-3776AB.svg?style=for-the-badge&logo=Python&logoColor=white) ![Google Sheet API](https://img.shields.io/badge/Google%20Sheets-34A853.svg?style=for-the-badge&logo=Google-Sheets&logoColor=white) ![docker](https://img.shields.io/badge/Docker-2496ED.svg?style=for-the-badge&logo=Docker&logoColor=white) ![Raspberry PI](https://img.shields.io/badge/Raspberry%20Pi-A22846.svg?style=for-the-badge&logo=Raspberry-Pi&logoColor=white) ![Bash](https://img.shields.io/badge/GNU%20Bash-4EAA25.svg?style=for-the-badge&logo=GNU-Bash&logoColor=white)
## Getting Started
Please refer to the [Wiki](https://github.com/victoriacheng15/articles-extractor/wiki)
## What I have learned
I employed Python's generator feature for enhanced efficiency. I used this feature to send article information to the Sheets individually. There is no need to store the entire sequence of articles in memory at once. Previously, articles were stored in the array named “all_articles” from various providers. And then I had to loop through the array to send articles to the Sheets.
The generator is a neat way to simplify and streamline the process. This eliminates the need to store the sequence in the memory before sending it to the Sheets