{"id":18695852,"url":"https://github.com/rahulmoundekar/webscraping-in-python","last_synced_at":"2025-11-08T14:30:26.349Z","repository":{"id":50170695,"uuid":"261398987","full_name":"rahulmoundekar/webscraping-in-python","owner":"rahulmoundekar","description":"webscraping in python","archived":false,"fork":false,"pushed_at":"2022-12-08T09:46:30.000Z","size":7765,"stargazers_count":3,"open_issues_count":2,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-12-28T03:26:48.761Z","etag":null,"topics":["beautifulsoup4","bs4","html5lib","python-3","requests-module","webscraper-website"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rahulmoundekar.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-05-05T08:20:41.000Z","updated_at":"2024-03-02T21:07:32.000Z","dependencies_parsed_at":"2023-01-25T06:15:14.423Z","dependency_job_id":null,"html_url":"https://github.com/rahulmoundekar/webscraping-in-python","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rahulmoundekar%2Fwebscraping-in-python","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rahulmoundekar%2Fwebscraping-in-python/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rahulmoundekar%2Fwebscraping-in-python/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rahulmoundekar%2Fwebscraping-in-python/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rahulmoundekar","download_url":"https://codeload.github.com/rahulmoundekar/webscraping-in-python/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239558910,"owners_count":19658927,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["beautifulsoup4","bs4","html5lib","python-3","requests-module","webscraper-website"],"created_at":"2024-11-07T11:16:27.849Z","updated_at":"2025-11-08T14:30:26.316Z","avatar_url":"https://github.com/rahulmoundekar.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Web Scraping With Python :\n\n![python](https://img.shields.io/badge/Made%20with-Python-1f425f.svg)\n\n#### Project Setup\n\n  - Making the project as :\n    ```\n    mkdir webscraping\n\tcd webscraping\n    ```\n  - Web Scraping installation:\n     ```\n     open command prompt type \n        pip install virtualenv\n     create virtualenv\n    \t\u003e\u003evirtualenv web-scraping\n     we need to activate virtualenv for use\n    \t\u003e\u003eweb-scraping\\scripts\\activate\n     \n     need libraries for Web Scraping :\n     \n    pip install requests\n\tpip install beautifulsoup4 or install bs4\n    ```\n  - Create WebsiteScrap.py for development\n     ```python\n    import requests\n    from bs4 import BeautifulSoup\n    \n    url = \"https://www.learnpython.org/\"\n    \n    response = requests.get(url)\n    htmlContent = response.content\n    formatted_html_content = BeautifulSoup(htmlContent, 'html.parser')\n    \n    # print(formatted_html_content)\n    \n    # 1} Get the title of the HTML page\n    title = formatted_html_content.title\n    print(title)\n    # if you want only tag content\n    print(title.string)\n    \n    # 2} find All anchor tag on this website and print count\n    list_anchors = formatted_html_content.find_all('a')\n    # print all anchor tags\n    print(list_anchors)\n    # print count\n    print(\"Number of anchor tags on this website : \", len(list_anchors))\n    \n    # 3} Get first element in the HTML page\n    print(formatted_html_content.find('head'))\n    \n    # 4} Get classes of any element in the HTML page\n    print(formatted_html_content.find('a')['class'])\n    \n    # 5} find all the elements by class name\n    print(formatted_html_content.find_all(\"a\", class_=\"navbar-brand\"))\n    \n    # 6} Get the text from the tags/soup\n    print(formatted_html_content.find(\"p\").get_text())\n    \n    # 7} Get all the anchor tags from the page with iteration\n    list_anchors = formatted_html_content.find_all('a')\n    all_links = set()\n    for link in list_anchors:\n        print(link)  # get all anchor tag with links\n        print(link.get('href'))  # get all links\n        all_links.add(link.get('href'))  # want to remove duplicate links\n    \n    print(all_links)\n    print(len(all_links))\n    # 8} find duplicate links\n    all_web_links_count=len(list_anchors)\n    after_remove_duplicate_links_count=len(all_links)\n    print('Number of duplicate links in this website are : ',all_web_links_count-after_remove_duplicate_links_count)\n     ```\n  - In order to run app:\n     ```\n\t   python WebsiteScrap.py\n     ```\n  - create clone in you system just execute this file\n  \t```\n\t1} create virtualenv and just type below command\n\t2} pip install -r .\\requirements.txt\n\t```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frahulmoundekar%2Fwebscraping-in-python","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frahulmoundekar%2Fwebscraping-in-python","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frahulmoundekar%2Fwebscraping-in-python/lists"}