Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/rg089/newsemble

API for fetching data from news websites.
https://github.com/rg089/newsemble

api bs4 flask heroku mongodb news newsapi newsemble python scraper webscraping

Last synced: about 1 month ago
JSON representation

API for fetching data from news websites.

Awesome Lists containing this project

README

        

:newspaper: Newsemble :newspaper:




Logo

An API for fetching the current news.




python  
Flask  
MongoDB 
Heroku



[![GitHub release](https://img.shields.io/github/release/rg089/newsemble.svg)](https://github.com/rg089/newsemble/releases/)
[![Visits Badge](https://badges.pufler.dev/visits/rg089/newsemble)](https://badges.pufler.dev)
![Stars Badge](https://img.shields.io/github/stars/rg089/newsemble.svg)
![Fork Badge](https://img.shields.io/github/forks/rg089/newsemble.svg)
[![Github all releases](https://img.shields.io/github/downloads/rg089/newsemble/total.svg)](https://github.com/rg089/newsemble/releases/)
![watchers Badge](https://img.shields.io/github/watchers/rg089/newsemble.svg)

:bookmark: About :bookmark:



Blog Post

> Newsemble is an API that provides easy access to the current news for programmatic analysis. It has been built using Python, BeautifulSoup and MongoDB.

The data is scraped from [these news websites](#gear-currently-supported-sites) every hour, stored in a database on the cloud and whenever requested, the most recent articles are promptly served.

Developers can make use of this API to fetch current data with each article having the following fields:
***Headlines, Content, Source, Link and Time***.



## :spiral_notepad: Table of contents
* [Technologies](#computer-technologies)
* [File Structure and Description](#open_file_folder-file-structure-and-description)
* [Pipeline](#hammer_and_wrench-pipeline)
* [Getting started](#rocket-getting-started)
* [Currently Supported Sites](#gear-currently-supported-sites)

## :computer: Technologies
Newsemble is created with:

* Python 3
* Flask
* PyMongo
* BeautifulSoup

## :open_file_folder: File Structure and Description

* *app.py* - Flask code for the API
* *scraper.py* - Collection of scrapers for the various news sites.
* *db.py* - Connecting and Using MongoDB
* *utils.py* - Utility Functions
* *scheduler.py* - Scheduler
* *Procfile* - For Deployment
* *requirements.txt* - Python Requirments

## :hammer_and_wrench: Pipeline
![Newsemble pipeline](https://user-images.githubusercontent.com/52444089/125912546-d572c104-9c64-4237-a1f8-81228f8a0774.png)

## :rocket: Getting-started
This project can be accessed by using following setup

**Links**

Links
Description


http://www.newsemble.ml/news
Link to fetch all the data from all sources


http://www.newsemble.ml/news/toi
Link to fetch data from Times of India

http://www.newsemble.ml/news/th
Link to fetch data from The Hindu


http://www.newsemble.ml/news/tie
Link to fetch data from The Indian Express


http://www.newsemble.ml/news/ndtv
Link to fetch data from NDTV news

http://www.newsemble.ml/news/it
Link to fetch data from India Today

**Request format**
```
$ import requests
$ url = "http://www.newsemble.ml/news/"
$ requests.get(url).json()
```

**Response format**
```
{
‘link’ : $source_link$,
‘content’ : $content_text$,
‘source’ : $news_source$,
‘title’ : $headline$,
‘time : $date_time_of_article$
}
```
**Sample output**

![image](https://user-images.githubusercontent.com/52444089/125032819-1f5b3580-e0ac-11eb-9662-efa79dc0e099.png)

## :gear: Currently Supported Sites
* [Times of India](https://timesofindia.indiatimes.com/news)
* [India Today](https://www.indiatoday.in/)
* [The Hindu](https://www.thehindu.com/)
* [NDTV](https://www.ndtv.com/)
* [The Indian Express](https://indianexpress.com/)



:pray: Thanks!


All contributions are welcome and appreciated. :+1:

If you liked this project, or found it useful in any way, please drop a :star2:!


:writing_hand: Authors :writing_hand:


:black_nib: Rishabh Gupta

:black_nib: Vishal Singhania

:black_nib: Roshan Kumar