Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/gabrielahn/citieslover-public
Allows users to scrape select web pages and upload them to S3 bucket.
https://github.com/gabrielahn/citieslover-public
newsletter python scraping
Last synced: about 2 months ago
JSON representation
Allows users to scrape select web pages and upload them to S3 bucket.
- Host: GitHub
- URL: https://github.com/gabrielahn/citieslover-public
- Owner: gabrielAHN
- Created: 2019-06-08T23:08:44.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2024-11-17T09:42:13.000Z (2 months ago)
- Last Synced: 2024-11-17T10:29:27.712Z (2 months ago)
- Topics: newsletter, python, scraping
- Language: Python
- Homepage: https://www.citieslover.com
- Size: 16.7 MB
- Stars: 0
- Watchers: 0
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Cities Lover
Cities Lover 💚 a place where you can find jobs 💼 and news 📰 for all those looking for anything related to urbanism 🌆.
This script scrapes different websites and uploads to S3 Buckets to be rendered here **[Cities Lover](https://www.gabrielhn.com/citieslover/)**
![website](/citieslover.gif)
## Setup
```
python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt
```## Use
```
# Test scrape for a specific source
python -m cities_scrape_data test_source --source [--response] [--threads ]# Test scrape by source type (e.g., jobs, podcasts, articles)
python -m cities_scrape_data test_by_type --type [--response] [--threads ]# Test scraping all data sources
python -m cities_scrape_data test_all [--response] [--threads ]# Create datasets to be uploaded to the S3 Bucket
python -m cities_scrape_data create_datasets [--threads ]# Get websites info
python -m cities_scrape_data get_websites
```