Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/cherukuri-thanu/webscraping-hn
This repository contains configuration files for web scraping the Hacker News Website.
https://github.com/cherukuri-thanu/webscraping-hn
project python3
Last synced: 5 days ago
JSON representation
This repository contains configuration files for web scraping the Hacker News Website.
- Host: GitHub
- URL: https://github.com/cherukuri-thanu/webscraping-hn
- Owner: Cherukuri-Thanu
- Created: 2024-04-22T15:09:35.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2024-04-22T15:13:45.000Z (9 months ago)
- Last Synced: 2024-11-09T22:18:21.118Z (2 months ago)
- Topics: project, python3
- Language: Python
- Homepage:
- Size: 3.91 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# WebScraping-HN
## Description
This project is a custom scraper for the Hacker News website. It is designed to extract news articles from multiple pages of Hacker News, filtering and sorting them based on the number of upvotes. The final output includes articles that have garnered more than 99 upvotes, providing a curated list of popular and relevant news items.## Features
- Scrapes multiple Hacker News pages.
- Filters articles with more than 99 upvotes.
- Sort articles based on upvote count.
- Utilizes BeautifulSoup for efficient HTML parsing.## How to Use
1. Clone this repository.
2. Install the required dependencies: `requests` and `beautifulsoup4`.
3. Add URLs of the Hacker News pages you want to scrape in `URLs_list.txt`.
4. Run the script: `python main.py`.## Requirements
- Python 3.x
- `requests`
- `beautifulsoup4`## Contact
Thanuja Cherukuri - [[email protected]]