{"id":29585061,"url":"https://github.com/ankulmaurya88/alibaba-data-extraction","last_synced_at":"2026-05-11T07:36:16.715Z","repository":{"id":304587598,"uuid":"1019072694","full_name":"ankulmaurya88/alibaba-data-extraction","owner":"ankulmaurya88","description":"Automated Alibaba RFQ data extraction tool using Python, Selenium, and BeautifulSoup.","archived":false,"fork":false,"pushed_at":"2025-07-14T02:47:20.000Z","size":2053,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-07-14T05:00:06.014Z","etag":null,"topics":["alibaba","automation","beautifulsoup4","data-extraction","pandas","python3","selenium","web-scraping"],"latest_commit_sha":null,"homepage":"https://github.com/ankulmaurya88/alibaba-data-extraction","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ankulmaurya88.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-13T17:16:07.000Z","updated_at":"2025-07-14T02:47:23.000Z","dependencies_parsed_at":"2025-07-14T05:00:25.185Z","dependency_job_id":"c63bc414-66bf-4000-8763-32b0eb34f83f","html_url":"https://github.com/ankulmaurya88/alibaba-data-extraction","commit_stats":null,"previous_names":["ankulmaurya88/alibaba-data-extraction"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/ankulmaurya88/alibaba-data-extraction","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankulmaurya88%2Falibaba-data-extraction","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankulmaurya88%2Falibaba-data-extraction/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankulmaurya88%2Falibaba-data-extraction/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankulmaurya88%2Falibaba-data-extraction/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ankulmaurya88","download_url":"https://codeload.github.com/ankulmaurya88/alibaba-data-extraction/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankulmaurya88%2Falibaba-data-extraction/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266053906,"owners_count":23869499,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["alibaba","automation","beautifulsoup4","data-extraction","pandas","python3","selenium","web-scraping"],"created_at":"2025-07-20T01:36:19.781Z","updated_at":"2026-05-11T07:36:11.674Z","avatar_url":"https://github.com/ankulmaurya88.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# 📦 Alibaba RFQ Data Extraction\n\nThis repository contains a powerful and customizable web scraping tool built with **Python, Selenium, and BeautifulSoup** to extract RFQ (Request for Quotation) data from [sourcing.alibaba.com](https://sourcing.alibaba.com). The tool captures detailed buyer inquiries, product requirements, and associated metadata — including screenshots of individual RFQ pages.\n\n---\n\n## 🚀 Features\n\n- ✅ Extracts buyer and product data from Alibaba RFQ listings\n- 📆 Converts relative \"Date Posted\" to accurate **Inquiry Date**\n- 🌍 Captures metadata such as Country, Buyer Name, Quantity, Email Status, etc.\n- 📸 Takes **full-page screenshots** of each RFQ detail page\n- 💾 Saves data into CSV and Excel formats\n- 🧠 Handles \"Just now\", \"X minutes ago\", and \"X hours ago\" cases\n- 📂 Automatically stores screenshots in a dedicated folder\n\n---\n\n## 🛠️ Tech Stack\n\n- **Python 3.12**\n- **Selenium WebDriver**\n- **BeautifulSoup4**\n- **Pandas**\n- **OpenPyXL** (for writing Excel files)\n\n---\n\n## 📦 Installation\n\n```bash\ngit clone https://github.com/ankulmaurya88/alibaba-data-extraction.git\ncd alibaba-data-extraction\npython3 -m venv venv\nsource venv/bin/activate   # Or use venv\\Scripts\\activate on Windows\npip install -r requirements.txt\n```\n\n---\n📄 Usage\nRun the scraper script:\n---\n``` bash\npython scraper.py\n```\n\n---\nThe script will:\n\nNavigate to Alibaba RFQ listing pages\nExtract RFQ details\nVisit each RFQ detail page and take a screenshot\nConvert Inquiry Times into full Inquiry Dates\nSave everything into rfq_output.csv and rfq_output.xlsx\n\n---\n\n\n\n## 🗃️ Output Files\n\n- rfq_output.csv – Tabular data of all RFQs scraped\n- rfq_output.xlsx – Excel version of the same\n- /rfq_screenshots/ – Folder containing all RFQ page screenshots\n\n\n## 📊 Data Columns\n\n- Column\tDescription\n- RFQ ID\tUnique RFQ Identifier\n- Title\tProduct or service being requested\n- Buyer Name\tName of the person raising the inquiry\n- Buyer Image\tProfile image of buyer (if available)\n- Inquiry Time\tTime displayed on the site (e.g., \"5 hours ago\")\n- Inquiry Date\tConverted to full date (e.g., \"13-07-2025\")\n- Quotes Left\tNumber of quotes remaining for RFQ\n- Country\tBuyer location\n- Quantity Required\tQuantity requested\n- Email Confirmed\tWhether buyer's email is confirmed\n- Experienced Buyer\tTag status\n- Complete Order via RFQ\tTag status\n- Typical Replies\tTag status\n- Interactive User\tTag status\n- Inquiry URL\tURL to the inquiry page\n- Scraping Date\tDate on which scraping was performed\n\n\n\n\n\n# Chrome \u0026 ChromeDriver (version match required)\n\n\n# 📬 Contact\n---\nCreated by @ankulmaurya88\nFeel free to raise issues or pull requests.\n---\n\n# 📄 License\n---\nThis project is licensed under the MIT License.\n---\n\n\n\n\n\n\n\n\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fankulmaurya88%2Falibaba-data-extraction","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fankulmaurya88%2Falibaba-data-extraction","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fankulmaurya88%2Falibaba-data-extraction/lists"}