Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/medss19/web-scraping-with-beautiful-soup
https://github.com/medss19/web-scraping-with-beautiful-soup
Last synced: 22 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/medss19/web-scraping-with-beautiful-soup
- Owner: medss19
- Created: 2024-06-11T03:19:37.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2024-07-20T06:30:51.000Z (6 months ago)
- Last Synced: 2024-07-20T07:42:37.076Z (6 months ago)
- Language: Python
- Size: 6.84 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# web-scraping-with-beautiful-soup
https://www.linkedin.com/posts/medha-agarwal-01b33725a_internship-pythonprogramming-webscraping-activity-7214991432367976448-8iL1?utm_source=share&utm_medium=member_desktop
𝗗𝗲𝘀𝗰𝗿𝗶𝗽𝘁𝗶𝗼𝗻:
𝗪𝗲𝗯𝘀𝗶𝘁𝗲 𝗦𝗲𝗹𝗲𝗰𝘁𝗶𝗼𝗻:
Chose BigBasket, a website with publicly accessible product listings.
𝗗𝗮𝘁𝗮 𝗘𝘅𝘁𝗿𝗮𝗰𝘁𝗶𝗼𝗻:
Used the Beautiful Soup library to scrape HTML content and extract relevant information such as product titles, prices, quantities, and discounts.
𝗗𝗮𝘁𝗮 𝗦𝘁𝗼𝗿𝗮𝗴𝗲:
Stored the extracted data in a structured format (CSV file) for further analysis and use.
𝗧𝗲𝗰𝗵𝗻𝗶𝗰𝗮𝗹 𝗖𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲𝘀:
Handled issues like dynamic content loading, ensuring accurate and complete data extraction.𝗜𝗺𝗽𝗹𝗲𝗺𝗲𝗻𝘁𝗮𝘁𝗶𝗼𝗻 𝗛𝗶𝗴𝗵𝗹𝗶𝗴𝗵𝘁𝘀:
- Utilized Selenium for navigating and interacting with the dynamic website.
- Leveraged Beautiful Soup for parsing HTML content and extracting product details.
- Implemented a scrolling mechanism to handle infinite scrolling and ensure all products were captured.
- Ensured data integrity by handling missing or unavailable data gracefully.𝗖𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲𝘀 𝗙𝗮𝗰𝗲𝗱:
- Managing dynamic content loading and ensuring the scraper captures all products as the page scrolls.
- Handling website structure changes and ensuring the scraper adapts accordingly.
- Optimizing the scraper to efficiently process and store large amounts of data.