Projects in Awesome Lists tagged with dataextraction
A curated list of projects in awesome lists tagged with dataextraction .
https://github.com/feddelegrand7/ralger
ralger makes it easy to scrape a website. Built on the shoulders of titans: rvest, xml2.
dataextraction r rstats webcrawling webscraper-website webscraping
Last synced: 06 Apr 2025
https://github.com/oxylabs/web-scraping-with-selenium
In this guide on how to web scrape with Selenium, we will be using Python 3. The code should work with any version of Python above 3.6
dataextraction dataextractor github-python python scraper scrapers webscraping
Last synced: 23 Apr 2025
https://github.com/docutain/docutain-sdk-example-android-kotlin
Sample project showing how to integrate the Docutain Document Scanner SDK into an Android application.
android data-capture dataextraction document document-capture document-recognition document-rectification document-scanner documentscanner image-filter image-processing kotlin ocr ocr-recognition pdf scan scanner scanning sdk textrecognition
Last synced: 07 Apr 2025
https://github.com/eneiromatos/the-home-depot-web-scraper
This web scraper is intended to extract data from The Home Depot Website, it could be run locally or in the Apify platform, the latter is the preferred way. It was made using Apify SDK V3 (Crawlee) with Typescript.
dataextraction scraper typescript webscrapping
Last synced: 06 Jan 2025
https://github.com/samrb-dev/autoseekout
A simple web scraping bot for scraping information from seekout.com written in Python and Selenium
bot dataextraction python3 scraping-python seekout selenium selenium-python webscraping
Last synced: 11 Jan 2025
https://github.com/docutain/docutain-sdk-example-.net-maui
Sample project showing how to integrate the Docutain Document Scanner SDK into a .NET MAUI application.
data-capture dataextraction document-capture document-recognition document-rectification document-scanner documentscanner image-filter image-processing maui ocr pdf pdf-generation scan scan-tool scanner scanning scans sdk textrecognition
Last synced: 07 Apr 2025
https://github.com/sravanigodavarthi/gmail_to_excel
This Python script allows you to extract specific email messages from your Gmail inbox, retrieve their subject and content, and save the data into an Excel file
dataextraction gmail-inbox imap-client pandas python
Last synced: 13 May 2025
https://github.com/devnamdev2003/result_automation_system
The "RGPV Result Scraper" is a Python script that automates the extraction of student results from the Rajiv Gandhi Proudyogiki Vishwavidyalaya (RGPV) website. It handles captchas and saves data in CSV files, making it a valuable tool for academic record retrieval.
automation captcha-handler dataextraction python selenuim tesseract-ocr web-automation webscraping
Last synced: 11 Apr 2025
https://github.com/dimitryzub/py-google-scholar-organic-cite-to-csv-sqlite
Scrape historic Google Scholar Organic and Cite results to CSV, MySQL Lite using Python and SerpApi.
csv data dataextraction datamining datascience datascraping dataset google googlescholar python scraper serpapi sqlite webscraper webscraping
Last synced: 15 Apr 2025
https://github.com/docutain/docutain-sdk-example-react-native
Sample project showing how to integrate the Docutain Document Scanner SDK into a React Native application.
android data-capture dataextraction document-recognition document-rectification document-scanner documentscanner image-filter image-processing ios ocr ocr-recognition react-native recognition scan scan-tool scanner scanning sdk textrecognition
Last synced: 07 Apr 2025
https://github.com/chathumiamarasinghe/web-scraping
A versatile Python script for scraping data from websites. This script automates data extraction, processes the information, and saves it in a structured format like CSV. Ideal for data collection, research, and analysis tasks.
beautifulsoup csv-export dataextraction phyton pythonwebscraper webscraping
Last synced: 12 Apr 2025
https://github.com/docutain/docutain-sdk-example-xamarin-ios
Sample project showing how to integrate the Docutain Document Scanner SDK into a Xamarin.iOS application.
data-capture dataextraction document-capture document-recognition document-rectification document-scanner documentscanner image-processing imagefilter ocr ocr-recognition pdf pdf-generation scan scanner scanning sdk textrecognition xamarin xamarin-ios
Last synced: 11 Feb 2025
https://github.com/docutain/docutain-sdk-example-flutter
Sample project showing how to integrate the Docutain Document Scanner SDK into a Flutter application.
data-capture dataextraction document-recognition document-rectification document-scanner documentscanner flutter flutter-sdk image-filter image-processing ocr ocr-recognition pdf scan scan-tool scanner scanners scanning sdk textrecognition
Last synced: 07 Apr 2025
https://github.com/docutain/docutain-sdk-example-ios-swift
Sample project showing how to integrate the Docutain Document Scanner SDK into an iOS application.
data-capture dataextraction document-capture document-recognition document-rectification document-scanner documentscanner image-filter image-processing ios ocr pdf scan scan-tool scanner scanners scanning sdk swift textrecognition
Last synced: 14 Apr 2025
https://github.com/docutain/docutain-sdk-example-windows-wpf-.net-framework
Sample project showing how to integrate the Docutain SDK into a WPF application.
data-capture dataextraction image-filter image-processing ocr ocr-recognition pdf pdf-generation sdk textrecognition wpf
Last synced: 13 Apr 2025
https://github.com/cjhydragenz/komik
this is web comic from data komikcash
bun bunjs comic dataextraction komik komik-api komikcast nextjs webscraping website
Last synced: 15 Apr 2025
https://github.com/swapnanildutta/instagram-search
I have used a python code to extract the details of a given username.
Last synced: 25 Apr 2025
https://github.com/happydream9032/pdf_parser
This is simple pdf_parser project with Python and PyPDF2
api back-end-development dataextraction flask pypdf2 python
Last synced: 04 Mar 2025
https://github.com/docutain/docutain-sdk-example-xamarin-android
Sample project showing how to integrate the Docutain Document Scanner SDK into a Xamarin.Android application.
data-capture dataextraction document-capture document-recognition document-scanner documentscanner image-filter image-processing ocr ocr-recognition pdf pdf-generation scan scanner scanning sdk textrecognition xamarin xamarin-android xamarin-sdk
Last synced: 26 Mar 2025
https://github.com/docutain/docutain-sdk-example-android-java
Sample project showing how to integrate the Docutain Document Scanner SDK into an Android application (Java).
android data-capture dataextraction document document-capture document-recognition document-rectification document-scanner documentscanner image-filter image-processing java ocr ocr-recognition pdf scan scanner scanning sdk textrecognition
Last synced: 14 Apr 2025
https://github.com/roslove44/web-scraping-toolkit
Un ensemble d'outils de web scraping pour extraire et analyser des données à partir de sites web spécifiques.
automation beautifulsoup4 dataextraction ecommerce-scraping python requests scraping-tools webscraping
Last synced: 06 Mar 2025
https://github.com/faizanmohd5/web-scraping-iphone-11-reviews
This is a web scraping project that extracts customer reviews for the iPhone 11 from Flipkart.com using Python and BeautifulSoup. The extracted data is saved in a CSV file for further analysis. Use it as a starting point for your own web scraping projects or for analyzing customer reviews of the iPhone 11.
beautifulsoup csv data-visualization dataanalysis dataextraction datainsights datamining datapreprocessing ecommerce-website ipython-notebook jupyter-notebook python reviews reviewscrapper webscraping
Last synced: 01 Mar 2025
https://github.com/shubhcs01/ipl-webscraper
This scrapes whole IPL using Javascript and NodeJS
automation dataextraction espncricinfo ipl webscraper webscraping
Last synced: 03 Mar 2025
https://github.com/gabrielianfr/web-scraping-project
A Python-based web scraping tool that extracts and stores data in JSON format using BeautifulSoup and Requests.
beautifulsoup dataextraction json python requests webscraping
Last synced: 29 Mar 2025
https://github.com/aiwithqasim/text_analysis
Data Extraction and text analysis
analysis beautifulsoup4 dataextraction nltk pandas python textanalysis
Last synced: 17 Mar 2025
https://github.com/spajai/yahoo-finance
yahoo finance perl module to get symbol history
bse bse-stock-data cpan cpan-module data-mining dataextraction nse nse-stock-data perl perl-module share webscraper webscraping yahoo yahoo-finance yahoo-finance-api
Last synced: 19 Feb 2025
https://github.com/docutain/docutain-sdk-example-xamarin-forms
Sample project showing how to integrate the Docutain Document Scanner SDK into a Xamarin.Forms application.
data-capture dataextraction document-capture document-recognition document-rectification document-scanner documentscanner image-filter image-procesing ocr pdf pdf-generation scan scanner scanning sdk textrecognition xamarin xamarin-forms xamarin-sdk
Last synced: 27 Mar 2025
https://github.com/chouaib-629/webscraping
A collection of web scraping projects using Beautiful Soup, Selenium, and mixed approaches. Each project includes Python scripts and CSV files of the scraped data. Perfect for learning and experimenting with static and dynamic web scraping techniques.
automation beautifulsoup beautifulsoup4 browser-automation csv datacollection dataextraction dynamicwebsite html-parser jupyter-notebook python python-script python3 selenium staticwebsite webscraping
Last synced: 19 Feb 2025
https://github.com/docutain/docutain-sdk-example-windows-forms-.net-framework
Sample project showing how to integrate the Docutain SDK into a Windows Forms application.
data-capture dataextraction image-filter image-processing ocr ocr-recognition pdf sdk textrecognition
Last synced: 13 Apr 2025
https://github.com/ditikrushna/event-app
Render data from Google Spreadsheets with React and Tabletop.js
data-extraction-from-googlesheets dataextraction githubpages googlesheets reactjs tabletop
Last synced: 01 Mar 2025
https://github.com/swethajoseph/automate-api-extraction
Automating the process of extracting data from APIs, appending new data to existing datasets and generating insightful visualizations
api dataextraction datavisualization jupyter-notebook python pythonlibraries
Last synced: 17 Mar 2025
https://github.com/bessouat40/prefect-github-indexer
A Prefect pipeline that periodically scrapes one or more GitHub repositories, generates embeddings, and indexes them in ChromaDB.
automation database dataengineering dataextraction docker docker-compose prefect python
Last synced: 29 Mar 2025
https://github.com/apajo/php-data-miner
Train your Miner with previously entered data. Start collecting data from them automatically. Supports semi-structured data-sctructures (such as PDF invoices) and unstuctured data (free text like emails)
annotations dataextraction nltk rubix-ml
Last synced: 04 Mar 2025
https://github.com/ilyazub/walmart-store-locator
Download list of Walmart Stores
dataextraction rust walmart webscraping
Last synced: 28 Mar 2025
https://github.com/maemoonfarooq/amazon-dataset-mining
The Frequent Dataset Mining project offers a comprehensive solution for mining frequent itemsets from the extensive Amazon dataset using Apache Kafka. Leveraging the power of distributed computing, this project employs two powerful algorithms, Apriori and PCY, to efficiently process and analyze large volumes of data.
bash-script dataextraction frequent-itemset-mining kafka kafka-consumer kafka-producer mongodb-atlas preprocessing python3
Last synced: 23 Mar 2025
https://github.com/shuddha2021/nodejs-crawler
A lightweight and efficient web crawler built with Node.js
asynchronous axios cheerio dataextraction javascript nodejs opensource webcrawler webscarping
Last synced: 15 May 2025
https://github.com/damnitjoshua/um-timeedit-timetable-toolkit
UM TimeEdit data toolkit for timetable software development.
dataextraction timeedit timetable
Last synced: 27 Feb 2025
https://github.com/nel-zi/francine_store
Built a scalable data pipeline for Francine Stores, enabling them to extract, clean, and load data from Aliexpress for real-time market trend analysis and smarter business decisions.
datacleaning dataengineering dataextraction datamodeling etl etl-pipeline pandas
Last synced: 16 May 2025
https://github.com/shubhamatkal/instagram-user-data-extraction
Used to extract data from instagram profile for analytics purpose
dataextraction instagram instagram-api instagram-bot python
Last synced: 16 May 2025
https://github.com/dimitryzub/allrecipes-us-recipes-by-state-analysis
Personal Data Exploratory Project in Python. Data extracted from AllRecipes.
data data-visualization dataexploration dataextraction matplotlib pandas python seaborn webscraping
Last synced: 01 Apr 2025
https://github.com/yumeangelica/jirai_sweeties
A friendly Discord bot with store monitoring capabilities - Tracks online stores for new items and price changes while providing chat commands and real-time notifications. Built with Python.
aiohttp dataextraction discord-bot discord-py lxml python3
Last synced: 01 Mar 2025