{"id":25880913,"url":"https://github.com/sreejoy/crawlerfriend","last_synced_at":"2025-06-12T08:34:06.131Z","repository":{"id":62565289,"uuid":"142647699","full_name":"Sreejoy/CrawlerFriend","owner":"Sreejoy","description":"A light weight crawler which gives search results in HTML form or in Dictionary form, given URLs and keywords.","archived":false,"fork":false,"pushed_at":"2018-08-14T15:12:29.000Z","size":17,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-02T08:48:37.822Z","etag":null,"topics":["crawler","python-crawler","python-scraper","python27","scrapper"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Sreejoy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-07-28T04:48:18.000Z","updated_at":"2018-08-14T15:12:30.000Z","dependencies_parsed_at":"2022-11-03T17:02:11.091Z","dependency_job_id":null,"html_url":"https://github.com/Sreejoy/CrawlerFriend","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Sreejoy%2FCrawlerFriend","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Sreejoy%2FCrawlerFriend/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Sreejoy%2FCrawlerFriend/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Sreejoy%2FCrawlerFriend/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Sreejoy","download_url":"https://codeload.github.com/Sreejoy/CrawlerFriend/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241519062,"owners_count":19975589,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","python-crawler","python-scraper","python27","scrapper"],"created_at":"2025-03-02T14:26:25.633Z","updated_at":"2025-03-02T14:26:26.387Z","avatar_url":"https://github.com/Sreejoy.png","language":"Python","readme":"## CrawlerFriend\n\nA light weight **Web Crawler** that supports **Python 2.7** which gives search results in HTML form or in\nDictionary form given URLs and Keywords. If you regularly visit a few websites and look for a few keywords\nthen this python package will automate the task for you and\nreturn the result in a HTML file in your web browser.\n\n### Installation\n```\npip install CrawlerFriend\n```\n\n### How to use?\n#### All Result in HTML\n```\nimport CrawlerFriend\n\nurls = [\"http://www.goal.com/\",\"http://www.skysports.com/football\",\"https://www.bbc.com/sport/football\"]\nkeywords = [\"Ronaldo\",\"Liverpool\",\"Salah\",\"Real Madrid\",\"Arsenal\",\"Chelsea\",\"Man United\",\"Man City\"]\n\ncrawler = CrawlerFriend.Crawler(urls, keywords)\ncrawler.crawl()\ncrawler.get_result_in_html()\n```\n\nThe above code will open the following HTML document in Browser\n\n![](https://i.imgur.com/aPoNAYu.png)\n\n#### All Result in Dictionary\n```\nresult_dict = crawler.get_result()\n```\n\n#### Changing Default Arguments\nCrawlerFriend uses four HTML tags 'title', 'h1', 'h2', 'h3' and max_link_limit = 50 by default for searching.\nBut it can be changed by passing arguments to the constructor:\n ```\ncrawler = CrawlerFriend.Crawler(urls, keywords, max_link_limit=200, tags=['p','h4'])\ncrawler.crawl()\n```\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsreejoy%2Fcrawlerfriend","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsreejoy%2Fcrawlerfriend","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsreejoy%2Fcrawlerfriend/lists"}