https://github.com/comsavvy/punch-scraping-engine
Scraping the top Punch news
https://github.com/comsavvy/punch-scraping-engine
news newsfeed punch python3 scrapy web-scraping
Last synced: 4 months ago
JSON representation
Scraping the top Punch news
- Host: GitHub
- URL: https://github.com/comsavvy/punch-scraping-engine
- Owner: comsavvy
- Created: 2020-12-02T09:01:08.000Z (almost 5 years ago)
- Default Branch: main
- Last Pushed: 2021-01-15T23:27:26.000Z (over 4 years ago)
- Last Synced: 2025-01-25T15:29:14.683Z (8 months ago)
- Topics: news, newsfeed, punch, python3, scrapy, web-scraping
- Language: Jupyter Notebook
- Homepage:
- Size: 86.9 KB
- Stars: 1
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# News
This code is for scraping the latest Punch News (here) by crawling through different NEWS url.
End product:
- The URL of the News
- Title of the news
- News content*All in one file!*
This project has three branches:
1. main: For storing the NEWS into a text file.
2. CSV: For storing the NEWS into a csv file.
3. deployment:This can be deployed in SCRAPYHUB platform
# Requirement
*scrapy_engine.py* module will handle the installation of the necessary libraries,
are you scared if the libraries is too much?
Don't be!
Because we are only installing one library called **SCRAPY**
But to install it manually,
copy and paste this **pip install scrapy** to your console.
You can visit the **SCRAPY** documentation if you are curious about how it works.