Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with extract-data
A curated list of projects in awesome lists tagged with extract-data .
https://github.com/opendatalab/mineru
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
ai4science document-analysis extract-data layout-analysis ocr parser pdf pdf-converter pdf-extractor-llm pdf-extractor-pretrain pdf-extractor-rag pdf-parser python
Last synced: 16 Dec 2024
https://github.com/opendatalab/MinerU
A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。
ai4science document-analysis extract-data layout-analysis ocr parser pdf pdf-converter pdf-extractor-llm pdf-extractor-pretrain pdf-extractor-rag pdf-parser python
Last synced: 29 Oct 2024
https://github.com/bda-research/node-crawler
Web Crawler/Spider for NodeJS + server-side jQuery ;-)
cheerio crawler extract-data javascript jquery nodejs spider
Last synced: 16 Dec 2024
https://github.com/pymupdf/PyMuPDF
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
data-science epub extract-data font mupdf ocr pdf pdf-documents pymupdf python table-extraction tesseract text-processing text-shaping xps
Last synced: 06 Nov 2024
https://github.com/pymupdf/pymupdf
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
data-science epub extract-data font mupdf ocr pdf pdf-documents pymupdf python table-extraction tesseract text-processing text-shaping xps
Last synced: 16 Dec 2024
https://github.com/meltano/meltano
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
connectors data data-engineering data-pipelines dataops dataops-platform elt extract-data integration loaders meltano meltano-sdk open-source opensource pipelines singer tap taps target targets
Last synced: 17 Dec 2024
https://github.com/markummitchell/engauge-digitizer
Extracts data points from images of graphs
digitizer extract-data image-analysis utility
Last synced: 22 Dec 2024
https://github.com/elixir-crawly/crawly
Crawly, a high-level web crawling & scraping framework for Elixir.
crawler crawling elixir erlang extract-data scraper scraping scraping-websites spider
Last synced: 20 Dec 2024
https://github.com/oltarasenko/crawly
Crawly, a high-level web crawling & scraping framework for Elixir.
crawler crawling elixir erlang extract-data scraper scraping scraping-websites spider
Last synced: 27 Nov 2024
https://github.com/slotix/dataflowkit
Extract structured data from web sites. Web sites scraping.
cdp chrome-fetcher crawling extract-data go golang golang-library headless scraper scraping scraping-websites
Last synced: 26 Oct 2024
https://github.com/danschultzer/receipt-scanner
Receipt scanner extracts information from your PDF or image receipts - built in NodeJS
extract-data extract-information ocr optical-character-recognition receipt-scanner receipts
Last synced: 16 Dec 2024
https://github.com/omkarpathak/resumeparser
A simple resume parser used for extracting information from resumes
extract-data gui parser python python3 resume-parser
Last synced: 21 Dec 2024
https://github.com/OmkarPathak/ResumeParser
A simple resume parser used for extracting information from resumes
extract-data gui parser python python3 resume-parser
Last synced: 25 Nov 2024
https://github.com/yuanxu-li/html-table-extractor
extract data from html table
beautifulsoup crawler extract-data html html-table scraping table
Last synced: 06 Nov 2024
https://github.com/ropensci/smapr
An R package for acquisition and processing of NASA SMAP data
acquisition extract-data nasa peer-reviewed r r-package raster rstats smap-data soil-mapping soil-moisture soil-moisture-sensor
Last synced: 11 Nov 2024
https://github.com/msoap/html2data
Library and cli for extracting data from HTML via CSS selectors
cli css-selector extract-data golang homebrew html library parser scrapping
Last synced: 18 Dec 2024
https://github.com/Techcatchers/PyLyrics-Extractor
Get Lyrics for any songs by just passing in the song name (spelled or misspelled) in less than 2 seconds using this awesome Python Library.
extract-data lyrics-fetcher python-library search-algorithm
Last synced: 29 Nov 2024
https://github.com/asad70/insider-trading
This program extracts insider trading data from the sec website and stores it in excel file for the specified time frame.
algotrading data-science extract-data insider-trading insiders tickers trading trading-strategies
Last synced: 11 Nov 2024
https://github.com/ionictemplate-app/social-network-data-scraper-pro
Easily scrape 10,000+ email messages in one hour, helping you quickly increase your customers Extracts data from (LinkedIn, Facebook, Instagram, Youtube, Pinterest, Twitter) Perfect search by specific Keywords Ready-to-use Social Network Data Scraper Software to get started instantly 100% Include source code and install file
business-email business-extractor email-scraper extract-data extract-emails extractor-email google-extract scraper-address scraper-email scraper-facebook scraper-instagram scraper-linkedin scraper-name scraper-phone scraper-twitter social-media social-network social-scraper
Last synced: 15 Nov 2024
https://github.com/alienzhou/giframe
extract the first frame in GIF without reading whole bytes, support both browser and nodejs 📸
decoder extract-data frame gif gif87a gif89a progressive stream-like
Last synced: 13 Nov 2024
https://github.com/darkskygit/ChatImporter
import chat records from your im and store into single sqlite database
backup backup-tool chat chat-history extract-data
Last synced: 25 Nov 2024
https://github.com/darkskygit/chatimporter
import chat records from your im and store into single sqlite database
backup backup-tool chat chat-history extract-data
Last synced: 15 Oct 2024
https://github.com/agenty/scrapingai
Build web scraping agents using AI to auto-extract the data from websites, capture screenshot, generate pdf from URL and web crawling with Agenty
crawler crawling datascraping extract-data scraping webscraper webscraping
Last synced: 25 Nov 2024
https://github.com/kormanowsky/jextract
Allows extracting data from DOM
css css-selector dom extract-data html javascript jextract jquery js selector
Last synced: 14 Oct 2024
https://github.com/sypht-team/sypht-elixir-client
An Elixir client for the Sypht API https://sypht.com
api-client data-extraction document-capture elixir elixir-lang extract extract-data extract-fields information-retrieval information-retrieval-engine invoice invoice-parser pdf-parser receipt-capture receipt-reader receipt-scanner receipt-scanning sypht sypht-api sypht-api-elixir
Last synced: 15 Nov 2024
https://github.com/linuxndroidteam/whatsapp-gd-extract
Allows WhatsApp users on Android to extract their backed up WhatsApp data from Google Drive. Anytime, Anywhere
android-hacking-tools extract-data forensic-analysis forensics-tools hacking-tool whatsapp whatsapp-hack-tool whatsapp-hacking whatsapp-hacking-github
Last synced: 13 Nov 2024
https://github.com/jalal246/corename
Automatically extracts packages root name for monorepos
corename extract-data extract-information extract-text extracts get-info monorepo package-development package-json package-management production read-json utility
Last synced: 04 Dec 2024
https://github.com/sandeepbalachandran/pytheract
Tool for extracting data from files.
extract-data extract-data-from-image pytesseract pytheract tesseract
Last synced: 03 Dec 2024
https://github.com/chetanxpro/document-ai
A app to extract structured data from a pdf document
Last synced: 15 Nov 2024
https://github.com/sc-networks/hydrator
A pragmatic hydrator and extractor library
extract extract-data extraction hydrate hydration hydrator php php7 php8
Last synced: 27 Oct 2024
https://github.com/qwazr/extractor
A WEB API for text and meta-data extraction
extract-data extractor metadata-extraction parse parser-auto-detection
Last synced: 04 Nov 2024
https://github.com/kamalpaneru/xtractor
Splits cells from excel sheet images and extracts data.
azure-computer-vision extract-data ruby split-cells
Last synced: 06 Nov 2024
https://github.com/fityannugroho/idn-area-data-extractor
Extract Indonesia area data from the raw sources to csv for fityannugroho/idn-area-data
extract-data extractor idn-area
Last synced: 31 Oct 2024
https://github.com/shubhranpara/auto-filler-web
This repository contains my internship project work at Flexbox Technologies. I have developed a system that fills the patient details form automatically with the patient data extracted from pdf file.
automation docx-files extract-data faiss-vector-database flan-t5 form-filler html-css-javascript huggingface-transformers json langchain llms medical-application patient-data pdf-converter pdf-document pptx-files python-3 qa streamlit-webapp
Last synced: 18 Oct 2024
https://github.com/simplyyan/cutinfo
go library to extract information based on references
extract-data go go-lib go-library golang string-manipulation strings
Last synced: 14 Dec 2024
https://github.com/netodeolino/tcc
Trabalho de Conclusão de Curso - Sistemas de Informação UFC
clustering data-mining extract-data jupyter-notebook python
Last synced: 19 Nov 2024
https://github.com/zeeshanahmad4/nlp--data-extraction-microsoft-word-documents-into-a-csv
extract-data nlp pdf pdf-converter pdf-document pdf-document-processor pdf-generation pdfconverter pdfcrawler pdfdata pdfextractor pdffileconversion pdfkit pdfpython pdfscraper pdftoword
Last synced: 14 Dec 2024
https://github.com/fuutoru/face-recognition-using-machine-learning
This is a repo to face recognition on 5 famous people
extract-data face-recognition famous-people
Last synced: 06 Dec 2024
https://github.com/ispyhumanfly/prowler
Query the web, extract data from the results, and transform that data into a format you can use.
ai analytics business cryptocurrency data extract-data machine-learning mining scraping web
Last synced: 07 Dec 2024
https://github.com/tamk-kol/chatbot-q-a-in-invoice-extractor-llm
The Invoice Extractor markdown is a specific format used to extract relevant information from invoices. It's a standardized way to annotate invoices with key information, making it easier to automate the extraction process.
chatbot extract-data extractor-api extractpdftext gemini-api gemini-pro gemini-pro-api gemini-pro-vision googleapi llms single-page-app
Last synced: 10 Nov 2024
https://github.com/shubhranpara/auto-filler
This repository contains my team's internship project work at Flexbox Technologies. We have developed a system that fills the patient details form automatically with the patient data extracted from pdf file.
docx extract-data faiss-vector-database flan-t5 form-filling gemma huggingface-transformers langchain llms pdf pdf-converter pptx python3 qa-automation streamlit-application
Last synced: 09 Nov 2024
https://github.com/chris1111/macos-extractor
7z-archives archive bzip2 extract-data extraction zip
Last synced: 14 Nov 2024
https://github.com/zedseven/urlextractor
A small tool for extracting all urls from a blob of binary data (ex. PDFs).
blob extract extract-data lightweight-tool url url-extractor urlextractor utility
Last synced: 16 Nov 2024
https://github.com/Kamalpaneru/Xtractor
Splits cells from excel sheet images and extracts data.
azure-computer-vision extract-data ruby split-cells
Last synced: 13 Nov 2024
https://github.com/duart38/pdf-snippets
Chrome extension to extract a select portion / section of a webpage into a PDF file
chrome-extension convert-to-pdf designer-tool extract-data extract-images imagetopdf pdf pdf-generation quality-of-life texttopdf tool webscraping website-to-pdf
Last synced: 13 Dec 2024
https://github.com/qyfashae/extract_off_data
Extract Data from offline file. Ex: Emails, Phone Numbers, Links etc.
extract extract-data extract-emails extract-links scraping
Last synced: 13 Nov 2024
https://github.com/raspi/fs2util
FreeSpace 2 util
checksum command-line-tool extract-data freespace2 game go golang
Last synced: 10 Nov 2024
https://github.com/nostalgiccoder/readexcelfile.lib
Extracts data from a spreadsheet and outputs its contents to a '.SQL' file. Data extraction tool useful for people using SQL Server Express with no access to SSMS addon and import wizard.
console excel extract-data library net-framework spreadsheet sql
Last synced: 18 Nov 2024
https://github.com/ammaryasirnaich/pyreqify
This project is a lightweight Python module designed to generate the reqirements.txt file. It streamline dependency management by automatically extracting imported modules from python or juypter files and generating there requirements.txt
dependency environment extract-data jupyter-notebooks pip project-setup python requirements-generator requirements-txt version
Last synced: 15 Dec 2024
https://github.com/bessouat40/pdf-region-picker
A project to select only part of a PDF file. It's usefull when you want to extract informations with some python library like fitz.
extract-data fitz javascript pdf region-picker
Last synced: 16 Nov 2024
https://github.com/thee-unruly/optimal-character-recognition
Extracting info from documents / images
Last synced: 10 Nov 2024
https://github.com/lmlk-seal/printext
Printext is a lightweight, application that extracts text from images.
app application extract-data image-processing imagerecognition images imagetotext img2txt lightweight tesseract-ocr text tkinter-gui windows
Last synced: 30 Nov 2024
https://github.com/zuriel-hr/petojson
Extracción de características de archivos en formato portable ejecutable a archivo en formato JSON
extract-data json malware-analysis portable-executable
Last synced: 20 Dec 2024
https://github.com/walidbosso/r_data_mining
Extract knowledge from a data using different techniques.
association-rule-mining association-rules clustering data-analysis data-mining data-science data-visualization decision-tree-classifier decision-trees exportation extract-data hac hierarchical-clustering k-means k-means-clustering k-means-r r-programming r-studio
Last synced: 29 Nov 2024
https://github.com/timothy-bartlett/pymupdf
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
data-science extract-data font mupdf ocr pdf pdf-documents pymupdf python table-extraction text-processing text-shaping xps
Last synced: 23 Nov 2024
https://github.com/ecrmnn/extract-index
Extract values from an array of arrays by index
array-manipulations array-processing arrays extract extract-data
Last synced: 16 Dec 2024
https://github.com/mmikhail2001/photo_analysis
Извлечение метаданных Exif из фотографий формата JPEG. Десктоп-приложение на C++ фреймворке Qt.
binary-files exif extract-data jpeg oop patterns
Last synced: 22 Nov 2024