Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/cherukuri-thanu/webscraping-hn

This repository contains configuration files for web scraping the Hacker News Website.
https://github.com/cherukuri-thanu/webscraping-hn

project python3

Last synced: about 1 month ago
JSON representation

This repository contains configuration files for web scraping the Hacker News Website.

Host: GitHub
URL: https://github.com/cherukuri-thanu/webscraping-hn
Owner: Cherukuri-Thanu
Created: 2024-04-22T15:09:35.000Z (10 months ago)
Default Branch: main
Last Pushed: 2024-04-22T15:13:45.000Z (10 months ago)
Last Synced: 2024-11-09T22:18:21.118Z (3 months ago)
Topics: project, python3
Language: Python
Homepage:
Size: 3.91 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# WebScraping-HN

## Description
This project is a custom scraper for the Hacker News website. It is designed to extract news articles from multiple pages of Hacker News, filtering and sorting them based on the number of upvotes. The final output includes articles that have garnered more than 99 upvotes, providing a curated list of popular and relevant news items.

## Features
- Scrapes multiple Hacker News pages.
- Filters articles with more than 99 upvotes.
- Sort articles based on upvote count.
- Utilizes BeautifulSoup for efficient HTML parsing.

## How to Use
1. Clone this repository.
2. Install the required dependencies: `requests` and `beautifulsoup4`.
3. Add URLs of the Hacker News pages you want to scrape in `URLs_list.txt`.
4. Run the script: `python main.py`.

## Requirements
- Python 3.x
- `requests`
- `beautifulsoup4`

## Contact
Thanuja Cherukuri - [[email protected]]