An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with dataextraction

A curated list of projects in awesome lists tagged with dataextraction .

https://github.com/feddelegrand7/ralger

ralger makes it easy to scrape a website. Built on the shoulders of titans: rvest, xml2.

dataextraction r rstats webcrawling webscraper-website webscraping

Last synced: 06 Apr 2025

https://github.com/oxylabs/web-scraping-with-selenium

In this guide on how to web scrape with Selenium, we will be using Python 3. The code should work with any version of Python above 3.6

dataextraction dataextractor github-python python scraper scrapers webscraping

Last synced: 23 Apr 2025

https://github.com/weizhonzhen/FastEtl

简单的etl 支持跨数据库抽取数据库

dataextraction etl fast

Last synced: 04 May 2025

https://github.com/eneiromatos/the-home-depot-web-scraper

This web scraper is intended to extract data from The Home Depot Website, it could be run locally or in the Apify platform, the latter is the preferred way. It was made using Apify SDK V3 (Crawlee) with Typescript.

dataextraction scraper typescript webscrapping

Last synced: 06 Jan 2025

https://github.com/samrb-dev/autoseekout

A simple web scraping bot for scraping information from seekout.com written in Python and Selenium

bot dataextraction python3 scraping-python seekout selenium selenium-python webscraping

Last synced: 11 Jan 2025

https://github.com/sravanigodavarthi/gmail_to_excel

This Python script allows you to extract specific email messages from your Gmail inbox, retrieve their subject and content, and save the data into an Excel file

dataextraction gmail-inbox imap-client pandas python

Last synced: 13 May 2025

https://github.com/devnamdev2003/result_automation_system

The "RGPV Result Scraper" is a Python script that automates the extraction of student results from the Rajiv Gandhi Proudyogiki Vishwavidyalaya (RGPV) website. It handles captchas and saves data in CSV files, making it a valuable tool for academic record retrieval.

automation captcha-handler dataextraction python selenuim tesseract-ocr web-automation webscraping

Last synced: 11 Apr 2025

https://github.com/dimitryzub/py-google-scholar-organic-cite-to-csv-sqlite

Scrape historic Google Scholar Organic and Cite results to CSV, MySQL Lite using Python and SerpApi.

csv data dataextraction datamining datascience datascraping dataset google googlescholar python scraper serpapi sqlite webscraper webscraping

Last synced: 15 Apr 2025

https://github.com/chathumiamarasinghe/web-scraping

A versatile Python script for scraping data from websites. This script automates data extraction, processes the information, and saves it in a structured format like CSV. Ideal for data collection, research, and analysis tasks.

beautifulsoup csv-export dataextraction phyton pythonwebscraper webscraping

Last synced: 12 Apr 2025

https://github.com/cjhydragenz/komik

this is web comic from data komikcash

bun bunjs comic dataextraction komik komik-api komikcast nextjs webscraping website

Last synced: 15 Apr 2025

https://github.com/swapnanildutta/instagram-search

I have used a python code to extract the details of a given username.

dataextraction webscraping

Last synced: 25 Apr 2025

https://github.com/happydream9032/pdf_parser

This is simple pdf_parser project with Python and PyPDF2

api back-end-development dataextraction flask pypdf2 python

Last synced: 04 Mar 2025

https://github.com/roslove44/web-scraping-toolkit

Un ensemble d'outils de web scraping pour extraire et analyser des données à partir de sites web spécifiques.

automation beautifulsoup4 dataextraction ecommerce-scraping python requests scraping-tools webscraping

Last synced: 06 Mar 2025

https://github.com/faizanmohd5/web-scraping-iphone-11-reviews

This is a web scraping project that extracts customer reviews for the iPhone 11 from Flipkart.com using Python and BeautifulSoup. The extracted data is saved in a CSV file for further analysis. Use it as a starting point for your own web scraping projects or for analyzing customer reviews of the iPhone 11.

beautifulsoup csv data-visualization dataanalysis dataextraction datainsights datamining datapreprocessing ecommerce-website ipython-notebook jupyter-notebook python reviews reviewscrapper webscraping

Last synced: 01 Mar 2025

https://github.com/shubhcs01/ipl-webscraper

This scrapes whole IPL using Javascript and NodeJS

automation dataextraction espncricinfo ipl webscraper webscraping

Last synced: 03 Mar 2025

https://github.com/gabrielianfr/web-scraping-project

A Python-based web scraping tool that extracts and stores data in JSON format using BeautifulSoup and Requests.

beautifulsoup dataextraction json python requests webscraping

Last synced: 29 Mar 2025

https://github.com/chouaib-629/webscraping

A collection of web scraping projects using Beautiful Soup, Selenium, and mixed approaches. Each project includes Python scripts and CSV files of the scraped data. Perfect for learning and experimenting with static and dynamic web scraping techniques.

automation beautifulsoup beautifulsoup4 browser-automation csv datacollection dataextraction dynamicwebsite html-parser jupyter-notebook python python-script python3 selenium staticwebsite webscraping

Last synced: 19 Feb 2025

https://github.com/docutain/docutain-sdk-example-windows-forms-.net-framework

Sample project showing how to integrate the Docutain SDK into a Windows Forms application.

data-capture dataextraction image-filter image-processing ocr ocr-recognition pdf sdk textrecognition

Last synced: 13 Apr 2025

https://github.com/ditikrushna/event-app

Render data from Google Spreadsheets with React and Tabletop.js

data-extraction-from-googlesheets dataextraction githubpages googlesheets reactjs tabletop

Last synced: 01 Mar 2025

https://github.com/swethajoseph/automate-api-extraction

Automating the process of extracting data from APIs, appending new data to existing datasets and generating insightful visualizations

api dataextraction datavisualization jupyter-notebook python pythonlibraries

Last synced: 17 Mar 2025

https://github.com/bessouat40/prefect-github-indexer

A Prefect pipeline that periodically scrapes one or more GitHub repositories, generates embeddings, and indexes them in ChromaDB.

automation database dataengineering dataextraction docker docker-compose prefect python

Last synced: 29 Mar 2025

https://github.com/apajo/php-data-miner

Train your Miner with previously entered data. Start collecting data from them automatically. Supports semi-structured data-sctructures (such as PDF invoices) and unstuctured data (free text like emails)

annotations dataextraction nltk rubix-ml

Last synced: 04 Mar 2025

https://github.com/ilyazub/walmart-store-locator

Download list of Walmart Stores

dataextraction rust walmart webscraping

Last synced: 28 Mar 2025

https://github.com/maemoonfarooq/amazon-dataset-mining

The Frequent Dataset Mining project offers a comprehensive solution for mining frequent itemsets from the extensive Amazon dataset using Apache Kafka. Leveraging the power of distributed computing, this project employs two powerful algorithms, Apriori and PCY, to efficiently process and analyze large volumes of data.

bash-script dataextraction frequent-itemset-mining kafka kafka-consumer kafka-producer mongodb-atlas preprocessing python3

Last synced: 23 Mar 2025

https://github.com/shuddha2021/nodejs-crawler

A lightweight and efficient web crawler built with Node.js

asynchronous axios cheerio dataextraction javascript nodejs opensource webcrawler webscarping

Last synced: 15 May 2025

https://github.com/damnitjoshua/um-timeedit-timetable-toolkit

UM TimeEdit data toolkit for timetable software development.

dataextraction timeedit timetable

Last synced: 27 Feb 2025

https://github.com/nel-zi/francine_store

Built a scalable data pipeline for Francine Stores, enabling them to extract, clean, and load data from Aliexpress for real-time market trend analysis and smarter business decisions.

datacleaning dataengineering dataextraction datamodeling etl etl-pipeline pandas

Last synced: 16 May 2025

https://github.com/shubhamatkal/instagram-user-data-extraction

Used to extract data from instagram profile for analytics purpose

dataextraction instagram instagram-api instagram-bot python

Last synced: 16 May 2025

https://github.com/dimitryzub/allrecipes-us-recipes-by-state-analysis

Personal Data Exploratory Project in Python. Data extracted from AllRecipes.

data data-visualization dataexploration dataextraction matplotlib pandas python seaborn webscraping

Last synced: 01 Apr 2025

https://github.com/yumeangelica/jirai_sweeties

A friendly Discord bot with store monitoring capabilities - Tracks online stores for new items and price changes while providing chat commands and real-time notifications. Built with Python.

aiohttp dataextraction discord-bot discord-py lxml python3

Last synced: 01 Mar 2025