{"id":23469170,"url":"https://github.com/shivam5992/pyscrapper","last_synced_at":"2026-02-27T22:42:30.839Z","repository":{"id":71617545,"uuid":"13909313","full_name":"shivam5992/pyscrapper","owner":"shivam5992","description":":camera: web scrapping in python: multiple libraries -requests, beautifulsoup, mechanize, selenium","archived":false,"fork":false,"pushed_at":"2016-09-08T12:23:37.000Z","size":4105,"stargazers_count":62,"open_issues_count":0,"forks_count":39,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-04-14T15:49:12.456Z","etag":null,"topics":["python","requests","scrapping","selenium"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/shivam5992.png","metadata":{"files":{"readme":"README.md","changelog":"news_scrapping/linux_news.py","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2013-10-27T20:03:38.000Z","updated_at":"2025-03-01T06:21:31.000Z","dependencies_parsed_at":"2023-02-26T11:30:23.724Z","dependency_job_id":null,"html_url":"https://github.com/shivam5992/pyscrapper","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/shivam5992/pyscrapper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shivam5992%2Fpyscrapper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shivam5992%2Fpyscrapper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shivam5992%2Fpyscrapper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shivam5992%2Fpyscrapper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/shivam5992","download_url":"https://codeload.github.com/shivam5992/pyscrapper/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shivam5992%2Fpyscrapper/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29917939,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-27T19:37:42.220Z","status":"ssl_error","status_checked_at":"2026-02-27T19:37:41.463Z","response_time":57,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["python","requests","scrapping","selenium"],"created_at":"2024-12-24T14:59:46.675Z","updated_at":"2026-02-27T22:42:30.832Z","avatar_url":"https://github.com/shivam5992.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"PyScrapper\n==========\n\n[![Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.](http://www.repostatus.org/badges/latest/wip.svg)](https://github.com/ivannieto/PyScrapper)\n[![Packagist](https://img.shields.io/packagist/l/doctrine/orm.svg?maxAge=2592000)](https://github.com/ivannieto/PyScrapper/blob/master/LICENSE)\n[![Twitter URL](https://img.shields.io/twitter/url/http/shields.io.svg?style=social\u0026maxAge=2592000)](https://twitter.com/intent/tweet?text=Wanna%20learn%20to%20scrap%20websites?%20Check%20PyScrapper%20\u0026url=https%3A%2F%2Fgithub.com%2Fivannieto%2FPyScrapper%2F%20%23python%20%23scrapping)\n[![Twitter Follow](https://img.shields.io/twitter/follow/shields_io.svg?style=social\u0026label=Follow\u0026maxAge=2592000)](https://twitter.com/IvanNietoS)\n\n\n\n##### WIP DISCLAIMER\n\nSome of the projects inside this repo are broken due to updates on the websites used, \nso they are being reworked to be fully functional. Contributions are welcome. Just fork the repo and pull request your updates.\n\n### Web Scrapping series in python.\n\nForked and mantained by Ivan Nieto \u003civan.n.s@tuta.io\u003e \n\nOriginal work by Shivam Bansal \u003cshivam5992@gmail.com\u003e\n\n\n## Module dependencies:\n\nmechanize, BeautifulSoup (for Python 2.x) | bs4 (for Python 3.x), json, re, requests, urlparse, urllib\n\n        pip install \u003cmodule_name\u003e\n\n# Projects\n\n#### Google Movies\n\n        Script to scrap google movies, retrieving a list of theaters, their address, movies list, \n        movies genere and showtimes for a given location. \n             \n        This script outputs a JSON file with the response. \n\n#### Zomato Top Restaurants\n\t\n        Script to scrap the top 25 trending restaurants with their rank, rating, details... \n        for the mentioned cities on the zomato.com website.\n        \n        It outputs a separate JSON response for each city.\n\n\n#### Finance and Stock\n\t\n        Scrapping the last closing price for all the quotes from various sites \n        like google, yahoo, bloomberg etc\n\n#### Live Weather\n\n        Scrap the weather details for morning, afternoon and night time for a particular website.\n\n#### Daily Horoscope\n\t\n        Scrapping the daily horoscope details for each sign and creating the output as text files. \n        Multiple websites are scrapped to get the details.\n\n#### Train Details\n\n        Scrap the details of train from irctc by inputting train number.\n\n#### Website Top Keywords\n\t\n        Create a list of most occured words in a website.\n        Also counts thier frequency.\n\n#### News Scrapping\n\n        Scrap the news from various news sources.\n\n#### Alexa Top Websites\n\t\n        Get the list of top 25 websites of a country.\n\n#### Movie Details\n\n        Get the movie details from IMDB and RottenTomatoes.\n\n#### US President State of Union Speech\n\t\n        Scrap the speech transcripts of all Us Presidents from 1700 to Present.\n\n#### Spider Algorithm\n\n        Spider algorithm is a typical web scrapping technique to fetch all urls (etc) of a webpage.\n        By all means, even those urls which are not part of the requested page. \n        It fetches all urls of current urls as well.\n        Implemented using two ways, one normal and second using mechanize.\n\n\n## Rework ToDo\n\n- [x] Google Movies\n- [ ] Zomato Top Restaurants\n- [ ] Finance and Stock\n- [ ] Live Weather\n- [ ] Daily Horoscope\n- [ ] Train Details\n- [ ] Website Top Keywords\n- [x] News Scrapping\n- [ ] Alexa Top Websites\n- [ ] Movie Details\n- [ ] US President State of Union Speech\n- [ ] Spider Algorithm","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshivam5992%2Fpyscrapper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshivam5992%2Fpyscrapper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshivam5992%2Fpyscrapper/lists"}