{"id":16535641,"url":"https://github.com/hjsblogger/web-scraping-with-python","last_synced_at":"2025-10-28T11:32:18.813Z","repository":{"id":184476745,"uuid":"666435358","full_name":"hjsblogger/web-scraping-with-python","owner":"hjsblogger","description":"Demonstration of Web Scraping using Selenium Python (Pytest \u0026 Pyunit) and Beautiful Soup","archived":false,"fork":false,"pushed_at":"2023-11-02T18:14:23.000Z","size":62,"stargazers_count":3,"open_issues_count":3,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2023-11-02T19:26:32.214Z","etag":null,"topics":["beautiful-soup","beautifulsoup","beautifulsoup4","lambdatest","selenium-python","selenium-webdriver","web-scraping","youtube-scrapping"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hjsblogger.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-07-14T14:05:58.000Z","updated_at":"2023-09-19T04:09:55.000Z","dependencies_parsed_at":"2023-10-31T10:38:28.331Z","dependency_job_id":null,"html_url":"https://github.com/hjsblogger/web-scraping-with-python","commit_stats":null,"previous_names":["hjsblogger/web-scraping-with-python"],"tags_count":0,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hjsblogger%2Fweb-scraping-with-python","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hjsblogger%2Fweb-scraping-with-python/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hjsblogger%2Fweb-scraping-with-python/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hjsblogger%2Fweb-scraping-with-python/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hjsblogger","download_url":"https://codeload.github.com/hjsblogger/web-scraping-with-python/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":219859562,"owners_count":16556035,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["beautiful-soup","beautifulsoup","beautifulsoup4","lambdatest","selenium-python","selenium-webdriver","web-scraping","youtube-scrapping"],"created_at":"2024-10-11T18:28:25.612Z","updated_at":"2025-10-28T11:32:18.510Z","avatar_url":"https://github.com/hjsblogger.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Web Scraping with Selenium Python and Beautiful Soup\n\n\u003cimg width=\"1000\" height=\"500\" alt=\"Bulb\" src=\"https://github.com/hjsblogger/web-scraping-with-python/assets/1688653/5f7ad1c6-f5af-4607-a421-113eef7580b3\"\u003e\n\n\u003cdiv align=\"center\"\u003e\u003ca href=\"https://scrape-it.cloud/assets/blog_img/web-scraping-with-python.png\"\u003eImage Credit\u003c/a\u003e\u003c/div\u003e\n\u003cbr/\u003e\n\nIn this 'Web Scraping with Python' repo, we have covered the following usecases:\n\n* \u003cb\u003eWeb Scraping using Selenium PyUnit\u003c/b\u003e\n* \u003cb\u003eWeb Scraping using Selenium Pytest\u003c/b\u003e\n* \u003cb\u003eWeb Scraping of dynamic website using Beautiful Soup and Selenium\u003c/b\u003e\n\nThe following websites are used for the purpose of demoing web scraping:\n\n* [LambdaTest YouTube Channel](https://www.youtube.com/@lambdatest/videos)\n* [LambdaTest E-commerce Playground](https://ecommerce-playground.lambdatest.io/)\n* [Scraping Club Infinite Scroll Website](https://scrapingclub.com/exercise/list_infinite_scroll/)\n\n\u003cimg width=\"20\" height=\"20\" alt=\"Bulb\" src=\"https://github.com/hjsblogger/web-scraping-with-python/assets/1688653/6134e8c2-edd6-4910-9f0e-e8cab9b8669d\"\u003eAs mentioned online, scraping public web data from YouTube is legal as long as you don't go after information that is not available to the general public. However, there might be cases where the YouTube scraping might throw errors (or exceptions) when scraping is done on the Cloud Selenium Grid.\n\n## Pre-requisites for test execution\n\n**Step 1**\n\nCreate a virtual environment by triggering the *virtualenv venv* command on the terminal\n\n```bash\nvirtualenv venv\n```\n\u003cimg width=\"1418\" alt=\"VirtualEnvironment\" src=\"https://github.com/hjsblogger/web-scraping-with-python/assets/1688653/89beb6af-549f-42ac-a063-e5f715018ef8\"\u003e\n\n**Step 2**\n\nNavigate the newly created virtual environment by triggering the *source venv/bin/activate* command on the terminal\n\n```bash\nsource venv/bin/activate\n```\n\nFollow steps(3) and (4) for performing web scraping on LambdaTest Cloud Grid:\n\n**Step 3**\n\nProcure the LambdaTest User Name and Access Key by navigating to [LambdaTest Account Page](https://accounts.lambdatest.com/security). You might need to create an an account on LambdaTest since it is used for running tests (or scraping) on the cloud Grid.\n\n\u003cimg width=\"1288\" alt=\"LambdaTestAccount\" src=\"https://github.com/hjsblogger/web-scraping-with-python/assets/1688653/9b40c9cb-93a1-4239-9fe5-99f33766a23a\"\u003e\n\n**Step 4**\n\nAdd the LambdaTest User Name and Access Key in the *Makefile* that is located in the parent directory. Once done, save the Makefile.\n\n![MakeFileChange](https://github.com/hjsblogger/web-scraping-with-python/assets/1688653/e3c0a6c3-cc1d-4692-ab59-182ca30964c0)\n\n## Dependency/Package Installation\n\nRun the *make install* command on the terminal to install the desired packages (or dependencies) - Pytest, Selenium, Beautiful Soup, etc.\n\n```bash\nmake install\n```\n\u003cimg width=\"1404\" alt=\"Make-Install\" src=\"https://github.com/hjsblogger/web-scraping-with-python/assets/1688653/4cb16443-4411-4f11-8692-aa7290cded0b\"\u003e\n\n\u003cimg width=\"1404\" alt=\"Make-Install-2\" src=\"https://github.com/hjsblogger/web-scraping-with-python/assets/1688653/8c7e8938-5584-480b-ad04-002b53827396\"\u003e\n\nWith this, all the dependencies and environment variables are set. We are all set for web scraping with the desired frameworks (i.e. Pyunit, Pytest, and Beautiful Soup)\n\n## Web Scraping using Selenium PyUnit (Local Execution)\n\nThe following websites are used for demonstration:\n\n* [LambdaTest YouTube Channel](https://www.youtube.com/@lambdatest/videos)\n* [LambdaTest E-commerce Playground](https://ecommerce-playground.lambdatest.io/)\n\nFollow the below mentioned steps to perform scraping on local machine:\n\n**Step 1**\n\nSet *EXEC_PLATFORM* environment variable to *local*. Trigger the command *export EXEC_PLATFORM=local* on the terminal.\n\n\u003cimg width=\"1043\" alt=\"Make-Local\" src=\"https://github.com/hjsblogger/web-scraping-with-python/assets/1688653/1ab63873-28e8-4ec0-bebc-ff95d30b224e\"\u003e\n\n**Step 2**\n\nTrigger the command *make clean* to clean the remove _pycache_ folder(s) and .pyc files\n\n\u003cimg width=\"710\" alt=\"Make-Clean\" src=\"https://github.com/hjsblogger/web-scraping-with-python/assets/1688653/1baf2aeb-fab1-4207-8547-4c07a70074c2\"\u003e\n\n**Step 3**\n\nThe Chrome browser is invoked in the Headless Mode. It is recommended to install Chrome on your machine before you proceed to Step(3)\n\n**Step 4**\n\nTrigger the *make scrap-using-pyunit* command on the terminal to scrap content from the above mentioned websites\n\n\u003cimg width=\"1404\" alt=\"Pyunit-Scraping-1\" src=\"https://github.com/hjsblogger/web-scraping-with-python/assets/1688653/3e3ab76f-6c92-4f49-8574-dbe7dc949220\"\u003e\n\n\u003cimg width=\"1404\" alt=\"Pyunit-Scraping-2\" src=\"https://github.com/hjsblogger/web-scraping-with-python/assets/1688653/398f147d-bfe9-45af-8fb7-7682592a4470\"\u003e\n\nAs seen above, the content from LambdaTest YouTube channel and LambdaTest e-commerce playground are scrapped successfully!\n\n## Web Scraping using Selenium Pytest (Local Execution)\n\nThe following websites are used for demonstration:\n\n* [LambdaTest YouTube Channel](https://www.youtube.com/@lambdatest/videos)\n* [LambdaTest E-commerce Playground](https://ecommerce-playground.lambdatest.io/)\n\nFollow the below mentioned steps to perform scraping on local machine:\n\n**Step 1**\n\nSet *EXEC_PLATFORM* environment variable to *local*. Trigger the command *export EXEC_PLATFORM=local* on the terminal.\n\n\u003cimg width=\"1043\" alt=\"Make-Local\" src=\"https://github.com/hjsblogger/web-scraping-with-python/assets/1688653/0c9fceba-492c-4f3a-9240-8478b76b4eab\"\u003e\n\n**Step 2**\n\nThe Chrome browser is invoked in the Headless Mode. It is recommended to install Chrome on your machine before you proceed to Step(4)\n\n**Step 3**\n\nTrigger the command *make clean* to clean the remove _pycache_ folder(s) and .pyc files\n\n\u003cimg width=\"710\" alt=\"Make-Clean\" src=\"https://github.com/hjsblogger/web-scraping-with-python/assets/1688653/a5d706a8-ccc7-4ef8-aa85-1288b5bef60d\"\u003e\n\n**Step 4**\n\nTrigger the *make scrap-using-pytest* command on the terminal to scrap content from the above mentioned websites\n\n\u003cimg width=\"1405\" alt=\"Pytest-scraping-1\" src=\"https://github.com/hjsblogger/web-scraping-with-python/assets/1688653/b6614736-c03a-4e67-9460-32c0443b6166\"\u003e\n\n\u003cimg width=\"1405\" alt=\"Pytest-scraping-2\" src=\"https://github.com/hjsblogger/web-scraping-with-python/assets/1688653/dedbbe0c-f18a-4f7d-8ffb-f89c22bea1f3\"\u003e\n\n## Web Scraping using Beautiful Soup\n\nBeautiful Soup is a Python library that is majorly used for screen-scraping (or web scraping). More information about the library is available on [Beautiful Soup HomePage](https://www.crummy.com/software/BeautifulSoup/)\n\nThe Beautiful Soup (bs4) library is already installed as a part of *pre-requisite steps*. Hence, it is safe to proceed with the scraping with Beautiful Soup. [Scraping Club Infinite Scroll Website](https://scrapingclub.com/exercise/list_infinite_scroll/) has infinite scrolling pages and Selenium is used to scroll to the end of the page so that all the items on the page can be scraped using the said libraries.\n\nThe following websites are used for demonstration:\n\n* [LambdaTest E-commerce Playground](https://ecommerce-playground.lambdatest.io/)\n* [Scraping Club Infinite Scroll Website](https://scrapingclub.com/exercise/list_infinite_scroll/)\n\nFollow the below mentioned steps to perform web scraping using Beautiful Soup(bs4):\n\n**Step 1**\n\nSet *EXEC_PLATFORM* environment variable to *local*. Trigger the command *export EXEC_PLATFORM=local* on the terminal.\n\n\u003cimg width=\"1043\" alt=\"Make-Local\" src=\"https://github.com/hjsblogger/web-scraping-with-python/assets/1688653/f8f3fd04-661e-4674-a7e7-48dc8d9cb49f\"\u003e\n\n**Step 2**\n\nTrigger the *make scrap-using-beautiful-soup* command on the terminal to scrap content from the above mentioned websites\n\n\u003cimg width=\"1402\" alt=\"scraping-bs4-1\" src=\"https://github.com/hjsblogger/web-scraping-with-python/assets/1688653/82b56e1a-0355-47bc-8527-a14ecf660b33\"\u003e\n\n\u003cimg width=\"1402\" alt=\"scraping-bs4-2\" src=\"https://github.com/hjsblogger/web-scraping-with-python/assets/1688653/63253dea-e00d-4636-9955-097952d15d85\"\u003e\n\n\u003cimg width=\"1402\" alt=\"scraping-bs4-3\" src=\"https://github.com/hjsblogger/web-scraping-with-python/assets/1688653/746724d6-2f1d-47a3-a640-dc40e9338625\"\u003e\n\n\u003cimg width=\"1413\" alt=\"scraping-bs4-4\" src=\"https://github.com/hjsblogger/web-scraping-with-python/assets/1688653/1047b1bb-6495-4d4c-913e-53ea55e9fd78\"\u003e\n\n\u003cimg width=\"1413\" alt=\"scraping-bs4-5\" src=\"https://github.com/hjsblogger/web-scraping-with-python/assets/1688653/d2a9d796-e1ff-47c5-baa7-323b0ac5649a\"\u003e\n\nAs seen from the above screenshots, content on Pages (1) thru' (5) on [LambdaTest E-Commerce Playground](https://ecommerce-playground.lambdatest.io/index.php?route=product/category\u0026path=57) are successfully displayed on the console.\n\n\u003cimg width=\"1413\" alt=\"infinite-1\" src=\"https://github.com/hjsblogger/web-scraping-with-python/assets/1688653/22cbf56e-9420-402f-a16f-df7ea25135e5\"\u003e\n\n\u003cimg width=\"1097\" alt=\"infinite-2\" src=\"https://github.com/hjsblogger/web-scraping-with-python/assets/1688653/a691fe82-0f0e-48df-adf1-57d047a904ca\"\u003e\n\nAlso, all the 60 items on [Scraping Club Infinite Scroll Website](https://scrapingclub.com/exercise/list_infinite_scroll/) are scraped without any issues.\n\n## Web Scraping using Selenium Cloud Grid and Python\n\n\u003cb\u003eNote\u003c/b\u003e: As mentioned earlier, there could be cases where YouTube Scraping might fail on cloud grid (particularly when there are a number of attempts to scrape the content). Since cookies and other settings are cleared (or sanitized) after every test session, YouTube might take genuine web scraping as a Bot Attack! In such cases, you might across the following page where cookie consent has to be given by clicking on \"Accept all\" button.\n\n\u003cimg width=\"1407\" alt=\"Accept-All\" src=\"https://github.com/hjsblogger/web-scraping-with-python/assets/1688653/b3a49faa-1ff0-496c-8c8d-661c694455e1\"\u003e\n\nYou can find more information about this insightful [Stack Overflow thread](https://stackoverflow.com/questions/66902404/selenium-python-click-agree-to-youtube-cookie)\n\nSince we are using LambdaTest Selenium Grid for test execution, it is recommended to create an acccount on [LambdaTest](https://www.lambdatest.com/?fp_ref=himanshu15) before proceeding with the test execution. Procure the LambdaTest User Name and Access Key by navigating to [LambdaTest Account Page](https://accounts.lambdatest.com/security).\n\n\u003cimg width=\"1288\" alt=\"LambdaTestAccount\" src=\"https://github.com/hjsblogger/web-scraping-with-python/assets/1688653/74028ca3-fe1f-4c25-8cfc-9d563b71900e\"\u003e\n\n### Web Scraping using Selenium Pyunit (Cloud Execution)\n\nThe following websites are used for demonstration:\n\n* [LambdaTest YouTube Channel](https://www.youtube.com/@lambdatest/videos)\n* [LambdaTest E-commerce Playground](https://ecommerce-playground.lambdatest.io/)\n\nFollow the below mentioned steps to perform scraping on LambdaTest cloud grid:\n\n**Step 1**\n\nSet *EXEC_PLATFORM* environment variable to *cloud*. Trigger the command *export EXEC_PLATFORM=cloud* on the terminal.\n\n\u003cimg width=\"1396\" alt=\"Terminal\" src=\"https://github.com/hjsblogger/web-scraping-with-python/assets/1688653/f9d81fe0-2eab-466d-8794-aaafc49a5e02\"\u003e\n\n**Step 2**\n\nTrigger the command *make clean* to clean the remove _pycache_ folder(s) and .pyc files\n\n\u003cimg width=\"710\" alt=\"Make-Clean\" src=\"https://github.com/hjsblogger/web-scraping-with-python/assets/1688653/09dd65fc-653a-460f-9ef7-216bd0750d39\"\u003e\n\n**Step 3**\n\nTrigger the *make scrap-using-pyunit* command on the terminal to scrap content from the above mentioned websites\n\n\u003cimg width=\"1410\" alt=\"Pyunit-cloud-1\" src=\"https://github.com/hjsblogger/web-scraping-with-python/assets/1688653/dd1129bc-2f74-406c-a54d-6742d0552c66\"\u003e\n\n\u003cimg width=\"1410\" alt=\"Pyunit-cloud-2\" src=\"https://github.com/hjsblogger/web-scraping-with-python/assets/1688653/c598f5a3-402b-4117-839f-e78792d711f6\"\u003e\n\n\u003cimg width=\"1410\" alt=\"Pyunit-cloud-3\" src=\"https://github.com/hjsblogger/web-scraping-with-python/assets/1688653/85ffdf69-6719-47fb-b031-6c2d872a0d59\"\u003e\n\n\nAs seen above, the content from LambdaTest YouTube channel and LambdaTest e-commerce playground are scrapped successfully! You can find the status of test execution in the [LambdaTest Automation Dashboard](https://automation.lambdatest.com/build).\n\n\u003cimg width=\"1422\" alt=\"Pyunit-LambdaTest-Status-1\" src=\"https://github.com/hjsblogger/web-scraping-with-python/assets/1688653/5d394264-af49-43d9-a4a0-43f000ec458d\"\u003e\n\n\u003cimg width=\"1422\" alt=\"Pyunit-LambdaTest-Status-2\" src=\"https://github.com/hjsblogger/web-scraping-with-python/assets/1688653/98e3cfe2-815f-4f14-a803-0ad3f7399870\"\u003e\n\nAs seen above, the status of test execution is \"Completed\". Since the browser is instantiated in the *Headless* mode, the video recording is not available on the dashboard.\n\n### Web Scraping using Selenium Pytest (Cloud Execution)\n\nThe following websites are used for demonstration:\n\n* [LambdaTest YouTube Channel](https://www.youtube.com/@lambdatest/videos)\n* [LambdaTest E-commerce Playground](https://ecommerce-playground.lambdatest.io/)\n\nFollow the below mentioned steps to perform scraping on LambdaTest cloud grid:\n\n**Step 1**\n\nSet *EXEC_PLATFORM* environment variable to *cloud*. Trigger the command *export EXEC_PLATFORM=cloud* on the terminal.\n\n\u003cimg width=\"1396\" alt=\"Terminal\" src=\"https://github.com/hjsblogger/web-scraping-with-python/assets/1688653/a89872a1-5a43-4d88-8e9e-1b3e4f170051\"\u003e\n\n**Step 2**\n\nTrigger the command *make clean* to clean the remove _pycache_ folder(s) and .pyc files\n\n\u003cimg width=\"710\" alt=\"Make-Clean\" src=\"https://github.com/hjsblogger/web-scraping-with-python/assets/1688653/1c228aa5-804c-40a9-9a3b-920e3cd9e489\"\u003e\n\n**Step 3**\n\nTrigger the *make scrap-using-pytest* command on the terminal to scrap content from the above mentioned websites\n\n\u003cimg width=\"1410\" alt=\"Pytest-cloud-1\" src=\"https://github.com/hjsblogger/web-scraping-with-python/assets/1688653/4e22c844-0d61-4b4d-85e0-152e11c73689\"\u003e\n\n\u003cimg width=\"1410\" alt=\"Pytest-cloud-2\" src=\"https://github.com/hjsblogger/web-scraping-with-python/assets/1688653/0b043360-8f0d-45e7-8f2f-6d96bb65219e\"\u003e\n\n\u003cimg width=\"1410\" alt=\"Pytest-cloud-3\" src=\"https://github.com/hjsblogger/web-scraping-with-python/assets/1688653/53490f40-f21d-4ecf-90eb-cb38add032da\"\u003e\n\nAs seen above, the content from LambdaTest YouTube channel and LambdaTest e-commerce playground are scrapped successfully! You can find the status of test execution in the [LambdaTest Automation Dashboard](https://automation.lambdatest.com/build).\n\n\u003cimg width=\"1422\" alt=\"Pytest-LambdaTest-Status-1\" src=\"https://github.com/hjsblogger/web-scraping-with-python/assets/1688653/1c090187-2785-4505-916f-34cf07d7565c\"\u003e\n\n\u003cimg width=\"1429\" alt=\"Pytest-LambdaTest-Status-2\" src=\"https://github.com/hjsblogger/web-scraping-with-python/assets/1688653/bf0d5757-cc71-4a56-b5ad-e6384018d78e\"\u003e\n\nAs seen above, the status of test execution is \"Completed\". Since the browser is instantiated in the *Headless* mode, the video recording is not available on the dashboard.\n\n## Have feedback or need assistance?\nFeel free to fork the repo and contribute to make it better! Email to [himanshu[dot]sheth[at]gmail[dot]com](mailto:himanshu.sheth@gmail.com) for any queries or ping me on the following social media sites:\n\n\u003cb\u003eLinkedIn\u003c/b\u003e: [@hjsblogger](https://linkedin.com/in/hjsblogger)\u003cbr/\u003e\n\u003cb\u003eTwitter\u003c/b\u003e: [@hjsblogger](https://www.twitter.com/hjsblogger)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhjsblogger%2Fweb-scraping-with-python","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhjsblogger%2Fweb-scraping-with-python","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhjsblogger%2Fweb-scraping-with-python/lists"}