{"id":21457016,"url":"https://github.com/rg089/newsemble","last_synced_at":"2025-12-27T09:05:14.793Z","repository":{"id":42082550,"uuid":"376388934","full_name":"rg089/newsemble","owner":"rg089","description":"API for fetching data from news websites.","archived":false,"fork":false,"pushed_at":"2022-07-04T07:18:24.000Z","size":334,"stargazers_count":44,"open_issues_count":1,"forks_count":8,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-08T18:47:02.915Z","etag":null,"topics":["api","bs4","flask","heroku","mongodb","news","newsapi","newsemble","python","scraper","webscraping"],"latest_commit_sha":null,"homepage":"http://www.newsemble.ml/news","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rg089.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-06-12T21:46:22.000Z","updated_at":"2023-02-19T18:12:00.000Z","dependencies_parsed_at":"2022-08-12T04:21:32.475Z","dependency_job_id":null,"html_url":"https://github.com/rg089/newsemble","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/rg089/newsemble","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rg089%2Fnewsemble","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rg089%2Fnewsemble/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rg089%2Fnewsemble/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rg089%2Fnewsemble/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rg089","download_url":"https://codeload.github.com/rg089/newsemble/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rg089%2Fnewsemble/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265385703,"owners_count":23756728,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["api","bs4","flask","heroku","mongodb","news","newsapi","newsemble","python","scraper","webscraping"],"created_at":"2024-11-23T06:00:32.324Z","updated_at":"2025-12-27T09:05:14.779Z","avatar_url":"https://github.com/rg089.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003e :newspaper: Newsemble :newspaper: \u003c/h1\u003e\r\n\r\n\u003cp align=\"center\"\u003e\r\n  \u003cbr\u003e\r\n\t\u003ca href=\"http://www.newsemble.ml/news/\"\u003e\u003cimg src=\"https://user-images.githubusercontent.com/66423362/125942926-17368ab4-7513-44b8-978f-1b013f7e08a3.png\" alt=\"Logo\"\u003e\u003c/a\u003e\u003cbr\u003e\r\n\t\u003cb\u003e\u003ci\u003eAn \u003ca href=\"http://www.newsemble.ml/news/\"\u003eAPI\u003c/a\u003e for fetching the current news.\u003c/b\u003e\u003c/i\u003e\r\n  \u003cbr\u003e\u003cbr\u003e\r\n\u003c/p\u003e\r\n\r\n\u003ch1 align=\"center\"\u003e\r\n\u003ca href=\"https://www.python.org\"\u003e \u003cimg alt = \"python\" src= \"https://img.shields.io/badge/Python-FFD43B?style=for-the-badge\u0026logo=python\u0026logoColor=darkgreen\"/\u003e \u003c/a\u003e\u0026nbsp; \r\n\u003ca href=\"https://palletsprojects.com/p/flask/\"\u003e \u003cimg alt=\"Flask\" src=\"https://img.shields.io/badge/flask-%23000.svg?style=for-the-badge\u0026logo=flask\u0026logoColor=white\"/\u003e \u003c/a\u003e \u0026nbsp; \r\n\u003ca href=\"https://www.mongodb.com/\"\u003e \u003cimg alt=\"MongoDB\" src =\"https://img.shields.io/badge/MongoDB-%234ea94b.svg?style=for-the-badge\u0026logo=mongodb\u0026logoColor=white\"/\u003e\u0026nbsp; \u003c/a\u003e\r\n\u003ca href=\"https://www.heroku.com/\"\u003e \u003cimg alt=\"Heroku\" src=\"https://img.shields.io/badge/heroku-%23430098.svg?style=for-the-badge\u0026logo=heroku\u0026logoColor=white\"/\u003e \u003c/a\u003e\r\n\u003c/h1\u003e\r\n\r\n\r\n\u003ch1 align = \"center\"\u003e\r\n\t\r\n[![GitHub release](https://img.shields.io/github/release/rg089/newsemble.svg)](https://github.com/rg089/newsemble/releases/)\r\n[![Visits Badge](https://badges.pufler.dev/visits/rg089/newsemble)](https://badges.pufler.dev)\r\n![Stars Badge](https://img.shields.io/github/stars/rg089/newsemble.svg)\r\n![Fork Badge](https://img.shields.io/github/forks/rg089/newsemble.svg)\r\n[![Github all releases](https://img.shields.io/github/downloads/rg089/newsemble/total.svg)](https://github.com/rg089/newsemble/releases/)\r\n![watchers Badge](https://img.shields.io/github/watchers/rg089/newsemble.svg)\r\n\r\n\u003c/h1\u003e\r\n\r\n\r\n\r\n\u003ch1 align=\"center\"\u003e :bookmark: About :bookmark: \u003c/h1\u003e\u003cbr\u003e\r\n\r\n\u003cp align=\"center\"\u003e\r\n\t\u003ca href=\"https://medium.com/@rg089/newsemble-3311d2dc9817\"\u003e\u003cb\u003eBlog Post\u003c/b\u003e\u003c/a\u003e\r\n\u003c/p\u003e\r\n\r\n\u003e Newsemble is an API that provides easy access to the current news for programmatic analysis. It has been built using Python, BeautifulSoup and MongoDB.\u003cbr\u003e \r\n  The data is scraped from [these news websites](#gear-currently-supported-sites) every hour, stored in a database on the cloud and whenever requested, the most recent articles are promptly served.\u003cbr\u003e\r\n  Developers can make use of this API to fetch current data with each article having the following fields: \u003cbr\u003e***Headlines, Content, Source, Link and Time***.  \r\n\r\n\u003chr style=\"border:2px solid gray\"\u003e \u003c/hr\u003e\u003cbr\u003e\r\n\r\n## :spiral_notepad: Table of contents\r\n* [Technologies](#computer-technologies)\r\n* [File Structure and Description](#open_file_folder-file-structure-and-description)\r\n* [Pipeline](#hammer_and_wrench-pipeline)\r\n* [Getting started](#rocket-getting-started)\r\n* [Currently Supported Sites](#gear-currently-supported-sites)\r\n\r\n\r\n## :computer: Technologies\r\nNewsemble is created with:\r\n\r\n* Python 3\r\n* Flask\r\n* PyMongo\r\n* BeautifulSoup\r\n\r\n## :open_file_folder: File Structure and Description\r\n\r\n* *app.py* - Flask code for the API\r\n* *scraper.py*  - Collection of scrapers for the various news sites.\r\n* *db.py* - Connecting and Using MongoDB\r\n* *utils.py* - Utility Functions\r\n* *scheduler.py* - Scheduler \r\n* *Procfile* - For Deployment\r\n* *requirements.txt* - Python Requirments \r\n\r\n## :hammer_and_wrench: Pipeline\r\n![Newsemble pipeline](https://user-images.githubusercontent.com/52444089/125912546-d572c104-9c64-4237-a1f8-81228f8a0774.png)\r\n\r\n## :rocket: Getting-started\r\nThis project can be accessed by using following setup\r\n\r\n**Links**\r\n\r\n\u003cTABLE BORDER=\"3\"\u003e\r\n\t\u003cTH\u003eLinks \u003c/TH\u003e\r\n       \u003cTH\u003eDescription\u003c/TH\u003e\r\n\t\r\n   \u003cTR\u003e\r\n      \u003cTD\u003ehttp://www.newsemble.ml/news\u003c/TD\u003e\r\n      \u003cTD\u003eLink to fetch all the data from all sources\u003c/TD\u003e\r\n   \u003c/TR\u003e\r\n  \u003cTR\u003e\r\n      \u003cTD\u003ehttp://www.newsemble.ml/news/toi\u003c/TD\u003e\r\n      \u003cTD\u003eLink to fetch data from Times of India \u003c/TD\u003e\r\n  \u003c/TR\u003e\r\n\u003cTR\u003e\r\n      \u003cTD\u003ehttp://www.newsemble.ml/news/th\u003c/TD\u003e\r\n      \u003cTD\u003eLink to fetch data from The Hindu \u003c/TD\u003e\r\n  \u003c/TR\u003e\r\n \u003cTR\u003e\r\n      \u003cTD\u003ehttp://www.newsemble.ml/news/tie\u003c/TD\u003e\r\n      \u003cTD\u003eLink to fetch data from The Indian Express \u003c/TD\u003e\r\n  \u003c/TR\u003e\r\n \u003cTR\u003e\r\n      \u003cTD\u003ehttp://www.newsemble.ml/news/ndtv\u003c/TD\u003e\r\n      \u003cTD\u003eLink to fetch data from NDTV news \u003c/TD\u003e\r\n  \u003c/TR\u003e\r\n\u003cTR\u003e\r\n      \u003cTD\u003ehttp://www.newsemble.ml/news/it\u003c/TD\u003e\r\n      \u003cTD\u003eLink to fetch data from India Today \u003c/TD\u003e\r\n  \u003c/TR\u003e\r\n  \r\n\u003c/TABLE\u003e\r\n\r\n\r\n**Request format**\r\n```\r\n$ import requests\r\n$ url = \"http://www.newsemble.ml/news/\"\r\n$ requests.get(url).json()\r\n```\r\n\r\n**Response format**\r\n```\r\n{   \r\n    ‘link’      :  $source_link$,\r\n    ‘content’   :  $content_text$,    \r\n    ‘source’    :  $news_source$,\r\n    ‘title’     :  $headline$, \r\n    ‘time       :  $date_time_of_article$  \r\n }\r\n```\r\n**Sample output**\r\n\r\n![image](https://user-images.githubusercontent.com/52444089/125032819-1f5b3580-e0ac-11eb-9662-efa79dc0e099.png)\r\n\r\n## :gear: Currently Supported Sites\r\n* [Times of India](https://timesofindia.indiatimes.com/news)\r\n* [India Today](https://www.indiatoday.in/)\r\n* [The Hindu](https://www.thehindu.com/)\r\n* [NDTV](https://www.ndtv.com/)\r\n* [The Indian Express](https://indianexpress.com/)\r\n\r\n\u003chr style=\"border:2px solid gray\"\u003e \u003c/hr\u003e\u003cbr\u003e\r\n\r\n\u003ch1 align=\"center\"\u003e:pray: Thanks!\u003c/h1\u003e\r\n\r\n\u003cp align=\"center\"\u003e\r\n  \u003cb\u003eAll contributions are welcome and appreciated. :+1: \u003c/b\u003e\u003cbr\u003e\r\n\t\u003cb\u003e\u003ci\u003eIf you liked this project, or found it useful in any way, please drop a :star2:!\u003c/b\u003e\u003c/i\u003e\u003cbr\u003e\u003cbr\u003e\r\n\u003c/p\u003e\r\n\r\n\u003ch1 align=\"center\"\u003e :writing_hand: Authors :writing_hand: \u003c/h1\u003e\r\n\r\n\u003cp align=\"center\"\u003e\r\n\t  :black_nib: \u003ca href=\"https://github.com/rg089\"\u003eRishabh Gupta\u003c/a\u003e\u003cbr\u003e\r\n\t  :black_nib: \u003ca href=\"https://github.com/vishalvvs\"\u003eVishal Singhania\u003c/a\u003e\u003cbr\u003e\r\n\t  :black_nib: \u003ca href=\"https://github.com/roshankumarg529\"\u003eRoshan Kumar\u003c/a\u003e\u003cbr\u003e\r\n\u003c/p\u003e\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frg089%2Fnewsemble","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frg089%2Fnewsemble","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frg089%2Fnewsemble/lists"}