Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/rg089/newsemble
API for fetching data from news websites.
https://github.com/rg089/newsemble
api bs4 flask heroku mongodb news newsapi newsemble python scraper webscraping
Last synced: about 1 month ago
JSON representation
API for fetching data from news websites.
- Host: GitHub
- URL: https://github.com/rg089/newsemble
- Owner: rg089
- Created: 2021-06-12T21:46:22.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2022-07-04T07:18:24.000Z (over 2 years ago)
- Last Synced: 2024-07-14T19:59:09.261Z (5 months ago)
- Topics: api, bs4, flask, heroku, mongodb, news, newsapi, newsemble, python, scraper, webscraping
- Language: Python
- Homepage: http://www.newsemble.ml/news
- Size: 326 KB
- Stars: 44
- Watchers: 3
- Forks: 7
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
:newspaper: Newsemble :newspaper:
An API for fetching the current news.
[![GitHub release](https://img.shields.io/github/release/rg089/newsemble.svg)](https://github.com/rg089/newsemble/releases/)
[![Visits Badge](https://badges.pufler.dev/visits/rg089/newsemble)](https://badges.pufler.dev)
![Stars Badge](https://img.shields.io/github/stars/rg089/newsemble.svg)
![Fork Badge](https://img.shields.io/github/forks/rg089/newsemble.svg)
[![Github all releases](https://img.shields.io/github/downloads/rg089/newsemble/total.svg)](https://github.com/rg089/newsemble/releases/)
![watchers Badge](https://img.shields.io/github/watchers/rg089/newsemble.svg):bookmark: About :bookmark:
> Newsemble is an API that provides easy access to the current news for programmatic analysis. It has been built using Python, BeautifulSoup and MongoDB.
The data is scraped from [these news websites](#gear-currently-supported-sites) every hour, stored in a database on the cloud and whenever requested, the most recent articles are promptly served.
Developers can make use of this API to fetch current data with each article having the following fields:
***Headlines, Content, Source, Link and Time***.
## :spiral_notepad: Table of contents
* [Technologies](#computer-technologies)
* [File Structure and Description](#open_file_folder-file-structure-and-description)
* [Pipeline](#hammer_and_wrench-pipeline)
* [Getting started](#rocket-getting-started)
* [Currently Supported Sites](#gear-currently-supported-sites)## :computer: Technologies
Newsemble is created with:* Python 3
* Flask
* PyMongo
* BeautifulSoup## :open_file_folder: File Structure and Description
* *app.py* - Flask code for the API
* *scraper.py* - Collection of scrapers for the various news sites.
* *db.py* - Connecting and Using MongoDB
* *utils.py* - Utility Functions
* *scheduler.py* - Scheduler
* *Procfile* - For Deployment
* *requirements.txt* - Python Requirments## :hammer_and_wrench: Pipeline
![Newsemble pipeline](https://user-images.githubusercontent.com/52444089/125912546-d572c104-9c64-4237-a1f8-81228f8a0774.png)## :rocket: Getting-started
This project can be accessed by using following setup**Links**
Links
Description
http://www.newsemble.ml/news
Link to fetch all the data from all sources
http://www.newsemble.ml/news/toi
Link to fetch data from Times of India
http://www.newsemble.ml/news/th
Link to fetch data from The Hindu
http://www.newsemble.ml/news/tie
Link to fetch data from The Indian Express
http://www.newsemble.ml/news/ndtv
Link to fetch data from NDTV news
http://www.newsemble.ml/news/it
Link to fetch data from India Today
**Request format**
```
$ import requests
$ url = "http://www.newsemble.ml/news/"
$ requests.get(url).json()
```**Response format**
```
{
‘link’ : $source_link$,
‘content’ : $content_text$,
‘source’ : $news_source$,
‘title’ : $headline$,
‘time : $date_time_of_article$
}
```
**Sample output**![image](https://user-images.githubusercontent.com/52444089/125032819-1f5b3580-e0ac-11eb-9662-efa79dc0e099.png)
## :gear: Currently Supported Sites
* [Times of India](https://timesofindia.indiatimes.com/news)
* [India Today](https://www.indiatoday.in/)
* [The Hindu](https://www.thehindu.com/)
* [NDTV](https://www.ndtv.com/)
* [The Indian Express](https://indianexpress.com/)
:pray: Thanks!
All contributions are welcome and appreciated. :+1:
If you liked this project, or found it useful in any way, please drop a :star2:!
:writing_hand: Authors :writing_hand:
:black_nib: Rishabh Gupta
:black_nib: Vishal Singhania
:black_nib: Roshan Kumar