https://github.com/rg089/newsemble

API for fetching data from news websites.
https://github.com/rg089/newsemble

api bs4 flask heroku mongodb news newsapi newsemble python scraper webscraping

Last synced: 5 months ago
JSON representation

API for fetching data from news websites.

Host: GitHub
URL: https://github.com/rg089/newsemble
Owner: rg089
Created: 2021-06-12T21:46:22.000Z (over 4 years ago)
Default Branch: master
Last Pushed: 2022-07-04T07:18:24.000Z (over 3 years ago)
Last Synced: 2025-04-08T18:47:02.915Z (9 months ago)
Topics: api, bs4, flask, heroku, mongodb, news, newsapi, newsemble, python, scraper, webscraping
Language: Python
Homepage: http://www.newsemble.ml/news
Size: 326 KB
Stars: 44
Watchers: 3
Forks: 8
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

:newspaper: Newsemble :newspaper:

An API for fetching the current news.

[![GitHub release](https://img.shields.io/github/release/rg089/newsemble.svg)](https://github.com/rg089/newsemble/releases/)
[![Visits Badge](https://badges.pufler.dev/visits/rg089/newsemble)](https://badges.pufler.dev)
![Stars Badge](https://img.shields.io/github/stars/rg089/newsemble.svg)
![Fork Badge](https://img.shields.io/github/forks/rg089/newsemble.svg)
[![Github all releases](https://img.shields.io/github/downloads/rg089/newsemble/total.svg)](https://github.com/rg089/newsemble/releases/)
![watchers Badge](https://img.shields.io/github/watchers/rg089/newsemble.svg)

:bookmark: About :bookmark:

Blog Post

> Newsemble is an API that provides easy access to the current news for programmatic analysis. It has been built using Python, BeautifulSoup and MongoDB.

The data is scraped from [these news websites](#gear-currently-supported-sites) every hour, stored in a database on the cloud and whenever requested, the most recent articles are promptly served.

Developers can make use of this API to fetch current data with each article having the following fields:
***Headlines, Content, Source, Link and Time***.

## :spiral_notepad: Table of contents
* [Technologies](#computer-technologies)
* [File Structure and Description](#open_file_folder-file-structure-and-description)
* [Pipeline](#hammer_and_wrench-pipeline)
* [Getting started](#rocket-getting-started)
* [Currently Supported Sites](#gear-currently-supported-sites)

## :computer: Technologies
Newsemble is created with:

* Python 3
* Flask
* PyMongo
* BeautifulSoup

## :open_file_folder: File Structure and Description

* *app.py* - Flask code for the API
* *scraper.py* - Collection of scrapers for the various news sites.
* *db.py* - Connecting and Using MongoDB
* *utils.py* - Utility Functions
* *scheduler.py* - Scheduler
* *Procfile* - For Deployment
* *requirements.txt* - Python Requirments

## :hammer_and_wrench: Pipeline
![Newsemble pipeline](https://user-images.githubusercontent.com/52444089/125912546-d572c104-9c64-4237-a1f8-81228f8a0774.png)

## :rocket: Getting-started
This project can be accessed by using following setup

**Links**

Links
Description

http://www.newsemble.ml/news
Link to fetch all the data from all sources

http://www.newsemble.ml/news/toi
Link to fetch data from Times of India

http://www.newsemble.ml/news/th
Link to fetch data from The Hindu

http://www.newsemble.ml/news/tie
Link to fetch data from The Indian Express

http://www.newsemble.ml/news/ndtv
Link to fetch data from NDTV news

http://www.newsemble.ml/news/it
Link to fetch data from India Today

**Request format**
```
$ import requests
$ url = "http://www.newsemble.ml/news/"
$ requests.get(url).json()
```

**Response format**
```
{
‘link’ : $source_link$,
‘content’ : $content_text$,
‘source’ : $news_source$,
‘title’ : $headline$,
‘time : $date_time_of_article$
}
```
**Sample output**

![image](https://user-images.githubusercontent.com/52444089/125032819-1f5b3580-e0ac-11eb-9662-efa79dc0e099.png)

## :gear: Currently Supported Sites
* [Times of India](https://timesofindia.indiatimes.com/news)
* [India Today](https://www.indiatoday.in/)
* [The Hindu](https://www.thehindu.com/)
* [NDTV](https://www.ndtv.com/)
* [The Indian Express](https://indianexpress.com/)

:pray: Thanks!

All contributions are welcome and appreciated. :+1:

If you liked this project, or found it useful in any way, please drop a :star2:!

:writing_hand: Authors :writing_hand:

:black_nib: Rishabh Gupta

:black_nib: Vishal Singhania

:black_nib: Roshan Kumar

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome