An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with extract-data

A curated list of projects in awesome lists tagged with extract-data .

https://github.com/opendatalab/mineru

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。

ai4science document-analysis extract-data layout-analysis ocr parser pdf pdf-converter pdf-extractor-llm pdf-extractor-pretrain pdf-extractor-rag pdf-parser python

Last synced: 06 Jan 2026

https://github.com/opendatalab/MinerU

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。

ai4science document-analysis extract-data layout-analysis ocr parser pdf pdf-converter pdf-extractor-llm pdf-extractor-pretrain pdf-extractor-rag pdf-parser python

Last synced: 24 Mar 2025

https://github.com/pymupdf/pymupdf

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

data-science epub extract-data font mupdf ocr pdf pdf-documents pymupdf python table-extraction tesseract text-processing text-shaping xps

Last synced: 09 Sep 2025

https://github.com/bda-research/node-crawler

Web Crawler/Spider for NodeJS + server-side jQuery ;-)

cheerio crawler extract-data javascript jquery nodejs spider

Last synced: 13 May 2025

https://github.com/pymupdf/PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

data-science epub extract-data font mupdf ocr pdf pdf-documents pymupdf python table-extraction tesseract text-processing text-shaping xps

Last synced: 08 Apr 2025

https://github.com/meltano/meltano

Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.

connectors data data-engineering data-pipelines dataops dataops-platform elt extract-data integration loaders meltano meltano-sdk open-source opensource pipelines singer tap taps target targets

Last synced: 12 May 2025

https://github.com/markummitchell/engauge-digitizer

Extracts data points from images of graphs

digitizer extract-data image-analysis utility

Last synced: 16 May 2025

https://github.com/elixir-crawly/crawly

Crawly, a high-level web crawling & scraping framework for Elixir.

crawler crawling elixir erlang extract-data scraper scraping scraping-websites spider

Last synced: 11 Dec 2025

https://github.com/slotix/dataflowkit

Extract structured data from web sites. Web sites scraping.

cdp chrome-fetcher crawling extract-data go golang golang-library headless scraper scraping scraping-websites

Last synced: 14 Mar 2025

https://github.com/danschultzer/receipt-scanner

Receipt scanner extracts information from your PDF or image receipts - built in NodeJS

extract-data extract-information ocr optical-character-recognition receipt-scanner receipts

Last synced: 07 Apr 2025

https://github.com/omkarpathak/resumeparser

A simple resume parser used for extracting information from resumes

extract-data gui parser python python3 resume-parser

Last synced: 05 Apr 2025

https://github.com/OmkarPathak/ResumeParser

A simple resume parser used for extracting information from resumes

extract-data gui parser python python3 resume-parser

Last synced: 18 Jul 2025

https://github.com/ropensci/smapr

An R package for acquisition and processing of NASA SMAP data

acquisition extract-data nasa peer-reviewed r r-package raster rstats smap-data soil-mapping soil-moisture soil-moisture-sensor

Last synced: 20 Jul 2025

https://github.com/msoap/html2data

Library and cli for extracting data from HTML via CSS selectors

cli css-selector extract-data golang homebrew html library parser scrapping

Last synced: 27 Jul 2025

https://github.com/isaacmg/fb_scraper

FBLYZE is a Facebook scraping system and analysis system.

extract-data facebook-scraper flink kafka spark tf-idf

Last synced: 10 Jul 2025

https://github.com/techcatchers/pylyrics-extractor

Get Lyrics for any songs by just passing in the song name (spelled or misspelled) in less than 2 seconds using this awesome Python Library.

extract-data lyrics-fetcher python-library search-algorithm

Last synced: 11 Apr 2025

https://github.com/Techcatchers/PyLyrics-Extractor

Get Lyrics for any songs by just passing in the song name (spelled or misspelled) in less than 2 seconds using this awesome Python Library.

extract-data lyrics-fetcher python-library search-algorithm

Last synced: 22 Jul 2025

https://github.com/asad70/insider-trading

This program extracts insider trading data from the sec website and stores it in excel file for the specified time frame.

algotrading data-science extract-data insider-trading insiders tickers trading trading-strategies

Last synced: 27 Apr 2025

https://github.com/osh/gr-eventstream

gr-eventstream is a set of GNU Radio blocks for creating precisely timed events and either inserting them into, or extracting them from normal data-streams precisely. It allows for the definition of high speed time-synchronous c++ burst event handlers, as well as bridging to standard GNU Radio Async PDU messages with precise timing easily.

burst c-plus-plus event-handling extract-data extractor gnu-radio injection message-passing python radio signal-processing signaling-pathways synchronization synchronization-service synchronous timing-simulator

Last synced: 12 Apr 2025

https://github.com/ionictemplate-app/social-network-data-scraper-pro

Easily scrape 10,000+ email messages in one hour, helping you quickly increase your customers Extracts data from (LinkedIn, Facebook, Instagram, Youtube, Pinterest, Twitter) Perfect search by specific Keywords Ready-to-use Social Network Data Scraper Software to get started instantly 100% Include source code and install file

business-email business-extractor email-scraper extract-data extract-emails extractor-email google-extract scraper-address scraper-email scraper-facebook scraper-instagram scraper-linkedin scraper-name scraper-phone scraper-twitter social-media social-network social-scraper

Last synced: 03 Dec 2025

https://github.com/serhaturtis/tool-fastbatchimagecrop

A simple UI tool to batch crop images to prepare datasets from images and videos.

cropping-images dataset-generation extract-data gui image-classification machine-learning python stable-diffusion ui

Last synced: 12 May 2025

https://github.com/alienzhou/giframe

extract the first frame in GIF without reading whole bytes, support both browser and nodejs 📸

decoder extract-data frame gif gif87a gif89a progressive stream-like

Last synced: 06 May 2025

https://github.com/agenty/scrapingai

Build web scraping agents using AI to auto-extract the data from websites, capture screenshot, generate pdf from URL and web crawling with Agenty

crawler crawling datascraping extract-data scraping webscraper webscraping

Last synced: 12 Apr 2025

https://github.com/meltanolabs/tap-dbt

Singer Tap for dbt API v2 built with the Meltano SDK

dbt dbt-cloud elt extract-data meltano-sdk singer-io singer-tap

Last synced: 19 Oct 2025

https://github.com/darkskygit/chatimporter

import chat records from your im and store into single sqlite database

backup backup-tool chat chat-history extract-data

Last synced: 14 Apr 2025

https://github.com/darkskygit/ChatImporter

import chat records from your im and store into single sqlite database

backup backup-tool chat chat-history extract-data

Last synced: 18 Jul 2025

https://github.com/jehad-halahla/linux_project

a linux lab bash project that focuses on automation and text extraction

bash-script commands extract-data linux manual

Last synced: 10 Apr 2025

https://github.com/apurvasijaria/googleplaystorescrape

Python module to extract Google Play store reviews and other information of any android app.

app-data extract extract-data google-play-store googleplaystore module pypi pypi-package pypi-packages python python-module scraper selenium

Last synced: 11 Apr 2025

https://github.com/walidbosso/r_data_mining

Extract knowledge from a data using different techniques, including Association Rules Hierarchical Agglomerative Clustering (HAC) K-means Clustering Decision Trees

association-rule-mining association-rules clustering data-analysis data-mining data-science data-visualization decision-tree-classifier decision-trees exportation extract-data hac hierarchical-clustering k-means k-means-clustering k-means-r r-programming r-studio

Last synced: 23 Mar 2025

https://github.com/sc-networks/hydrator

A pragmatic hydrator and extractor library

extract extract-data extraction hydrate hydration hydrator php php7 php8

Last synced: 19 Mar 2025

https://github.com/qwazr/extractor

A WEB API for text and meta-data extraction

extract-data extractor metadata-extraction parse parser-auto-detection

Last synced: 03 Apr 2025

https://github.com/chetanxpro/document-ai

A app to extract structured data from a pdf document

extract-data

Last synced: 11 Oct 2025

https://github.com/kamalpaneru/xtractor

Splits cells from excel sheet images and extracts data.

azure-computer-vision extract-data ruby split-cells

Last synced: 14 May 2025

https://github.com/Kamalpaneru/Xtractor

Splits cells from excel sheet images and extracts data.

azure-computer-vision extract-data ruby split-cells

Last synced: 03 May 2025

https://github.com/netodeolino/tcc

Trabalho de Conclusão de Curso - Sistemas de Informação UFC

clustering data-mining extract-data jupyter-notebook python

Last synced: 18 Oct 2025

https://github.com/rainergo/uasfra-ms-knowledgegraph

Python project to read and use ESG data from XBRL-files to construct a neo4j Knowledge-Graph to be enriched with external data (Wikidata, DBPedia). An OpenAI-attached chat bot is used to query the Graph.

chatbot data-science esg extract-data knowledge-graph neo4j openai xbrl

Last synced: 25 Dec 2025

https://github.com/tamk-kol/chatbot-q-a-in-invoice-extractor-llm

The Invoice Extractor markdown is a specific format used to extract relevant information from invoices. It's a standardized way to annotate invoices with key information, making it easier to automate the extraction process.

chatbot extract-data extractor-api extractpdftext gemini-api gemini-pro gemini-pro-api gemini-pro-vision googleapi llms single-page-app

Last synced: 24 Feb 2025

https://github.com/shubhranpara/auto-filler-web

This repository contains my internship project work at Flexbox Technologies. I have developed a system that fills the patient details form automatically with the patient data extracted from pdf file.

automation docx-files extract-data faiss-vector-database flan-t5 form-filler html-css-javascript huggingface-transformers json langchain llms medical-application patient-data pdf-converter pdf-document pptx-files python-3 qa streamlit-webapp

Last synced: 02 Apr 2025

https://github.com/fuutoru/face-recognition-using-machine-learning

This is a repo to face recognition on 5 famous people

extract-data face-recognition famous-people

Last synced: 27 Mar 2025

https://github.com/qyfashae/extract_off_data

Extract Data from offline file. Ex: Emails, Phone Numbers, Links etc.

extract extract-data extract-emails extract-links scraping

Last synced: 02 Mar 2025

https://github.com/jmitander/jmscraper

Scrape web pages and effortlessly extract the data you need. Easy, robust, efficient, and intuitively user-friendly.

extract-data extract-media extract-metadata extractor scraping scraping-web scraping-websites webscraper webscraping website-scraper webtool

Last synced: 06 Sep 2025

https://github.com/ispyhumanfly/prowler

Query the web, extract data from the results, and transform that data into a format you can use.

ai analytics business cryptocurrency data extract-data machine-learning mining scraping web

Last synced: 06 Sep 2025

https://github.com/bessouat40/pdf-region-picker

A project to select only part of a PDF file. It's usefull when you want to extract informations with some python library like fitz.

data-extraction data-selection extract-data fitz javascript parsing pdf region-picker

Last synced: 06 Mar 2025

https://github.com/zedseven/urlextractor

A small tool for extracting all urls from a blob of binary data (ex. PDFs).

blob extract extract-data lightweight-tool url url-extractor urlextractor utility

Last synced: 06 Mar 2025

https://github.com/simplyyan/cutinfo

go library to extract information based on references

extract-data go go-lib go-library golang string-manipulation strings

Last synced: 01 Apr 2025

https://github.com/shubhranpara/auto-filler

This repository contains my team's internship project work at Flexbox Technologies. We have developed a system that fills the patient details form automatically with the patient data extracted from pdf file.

docx extract-data faiss-vector-database flan-t5 form-filling gemma huggingface-transformers langchain llms pdf pdf-converter pptx python3 qa-automation streamlit-application

Last synced: 22 Feb 2025

https://github.com/Arman2409/data-falcon

Web crawler

crawler extract-data

Last synced: 02 Apr 2025

https://github.com/rubenslyra/vse-py

O Video Subtitle Extractor (vse-py) é um projeto em Python que permite extrair legendas de vídeos a partir de URLs fornecidas pelo usuário.

extract-data python subtitles youtube-dl

Last synced: 18 Mar 2025

https://github.com/ecrmnn/extract-index

Extract values from an array of arrays by index

array-manipulations array-processing arrays extract extract-data

Last synced: 28 Oct 2025

https://github.com/fityannugroho/idn-area-data-extractor

Extract Indonesia area data from the raw sources to csv for fityannugroho/idn-area-data

extract-data extractor idn-area

Last synced: 28 Mar 2025

https://github.com/randomgamingdev/mc_block_color_mapper

Python scripts & libraries for generating and mapping the average colors for each of the Minecraft blocks

average average-calculator cli data data-generator documented-api extract extract-data extractor fast minecraft python3 simple small texture texture-pack textures

Last synced: 26 Dec 2025

https://github.com/drisskhattabi6/meteo-data-mining

This repo contains using Data Mining Techniques to analyze meteorological (meteo) data. The objective is to extract meaningful insights and patterns from the data that can aid in understanding weather phenomena and predicting future weather conditions.

cart data-analysis data-mining data-visualization decision-making decision-tree extract-data extract-insights insights-analytics insights-data k-means knn machine-learning svm

Last synced: 21 Mar 2025

https://github.com/timothy-bartlett/pymupdf

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

data-science extract-data font mupdf ocr pdf pdf-documents pymupdf python table-extraction text-processing text-shaping xps

Last synced: 17 Mar 2025

https://github.com/nostalgiccoder/readexcelfile.lib

Extracts data from a spreadsheet and outputs its contents to a '.SQL' file. Data extraction tool useful for people using SQL Server Express with no access to SSMS addon and import wizard.

c-sharp console etl excel extract-data library net-framework spreadsheet sql

Last synced: 25 Dec 2025

https://github.com/loglux-lab/ip-extractor

ip-extractor.sh uses nano to extract IP addresses. Results are stored in 'hosts', with duplicates removed. Ideal for sifting through logs and data-rich files.

bash extract-data linux-shell nano regex regular-expression

Last synced: 25 Feb 2025

https://github.com/ammaryasirnaich/pyreqify

This project is a lightweight Python module designed to generate the reqirements.txt file. It streamline dependency management by automatically extracting imported modules from python or juypter files and generating there requirements.txt

dependency environment extract-data jupyter-notebooks pip project-setup python requirements-generator requirements-txt version

Last synced: 31 Jul 2025

https://github.com/spaceshaman/deckard

Extract structured data from unstructured text — no AI, just regular expressions. 🔍

data-extraction extract extract-data regex regular-expression

Last synced: 22 Aug 2025

https://github.com/dann-oliv/db_query_exporter

Script para acessar o banco de dados desejado e extrair uma planilha de resultados de acordo com a query inserida.

extract-data python sql

Last synced: 27 Aug 2025

https://github.com/mistralys/x4-data-extractor

Batch file generator to extract X4 game files with the XRCatTool including DLC metadata.

extract-data unpacker x4foundations

Last synced: 30 Aug 2025

https://github.com/randomgamingdev/minecraft-asset-extractor

This repository teaches you how to, and provides tools for extracting data from Minecraft, like texture packs and achievements

all-platform-supported all-platforms asset-management assets automated extract extract-data extractor fast minecraft minecraft-java minecraft-java-edition simple

Last synced: 25 Dec 2025

https://github.com/thee-unruly/optimal-character-recognition

Extracting info from documents / images

easyocr extract-data images

Last synced: 01 Sep 2025

https://github.com/athanclark/extractable-singleton

It's just a functor which has its stored value as isomorphic to Identity.

extract-data haskell singleton

Last synced: 28 Jun 2025

https://github.com/baikaresandip/node-extract-env-variables

This repo will extract the environment variables in the .env.example file of the repo.

environment environment-variables extract extract-data extraction node node-js nodejs npm scanner

Last synced: 03 Jul 2025

https://github.com/arthursilvadantas/extractjson

Aplicação Web para extrair informações de um arquivo JSON.

extract-data extract-json javascript js json

Last synced: 03 Jul 2025

https://github.com/manucabral/pysoccerdata

A python package for extracting real-time soccer data from diverse online sources, providing essential statistics and insights.

extract-data football football-analytics football-data scraper soccer soccer-analytics soccer-data

Last synced: 27 Feb 2025

https://github.com/isogeo/doc-old-extractor

Gitbook content about the Isogeo data extractor. In sync with gitbook.com.

documentation extract-data gitbook open-data

Last synced: 11 Mar 2025

https://github.com/mmikhail2001/photo_analysis

Извлечение метаданных Exif из фотографий формата JPEG. Десктоп-приложение на C++ фреймворке Qt.

binary-files exif extract-data jpeg oop patterns

Last synced: 16 Mar 2025

https://github.com/doarakko/japanese-company-extraction

This API extracts Japanese company names from text.

api extract-data japanese nlp python

Last synced: 07 Sep 2025

https://github.com/zuriel-hr/petojson

Extracción de características de archivos en formato portable ejecutable a archivo en formato JSON

extract-data json malware-analysis portable-executable

Last synced: 08 Oct 2025

https://github.com/dann-oliv/query_results_exporter

Script para acessar o banco de dados desejado e extrair uma planilha de resultados de acordo com a query inserida.

extract-data python sql

Last synced: 10 Oct 2025