Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with extract-data

A curated list of projects in awesome lists tagged with extract-data .

https://github.com/opendatalab/mineru

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。

ai4science document-analysis extract-data layout-analysis ocr parser pdf pdf-converter pdf-extractor-llm pdf-extractor-pretrain pdf-extractor-rag pdf-parser python

Last synced: 16 Dec 2024

https://github.com/opendatalab/MinerU

A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。

ai4science document-analysis extract-data layout-analysis ocr parser pdf pdf-converter pdf-extractor-llm pdf-extractor-pretrain pdf-extractor-rag pdf-parser python

Last synced: 29 Oct 2024

https://github.com/bda-research/node-crawler

Web Crawler/Spider for NodeJS + server-side jQuery ;-)

cheerio crawler extract-data javascript jquery nodejs spider

Last synced: 16 Dec 2024

https://github.com/pymupdf/PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

data-science epub extract-data font mupdf ocr pdf pdf-documents pymupdf python table-extraction tesseract text-processing text-shaping xps

Last synced: 06 Nov 2024

https://github.com/pymupdf/pymupdf

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

data-science epub extract-data font mupdf ocr pdf pdf-documents pymupdf python table-extraction tesseract text-processing text-shaping xps

Last synced: 16 Dec 2024

https://github.com/meltano/meltano

Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.

connectors data data-engineering data-pipelines dataops dataops-platform elt extract-data integration loaders meltano meltano-sdk open-source opensource pipelines singer tap taps target targets

Last synced: 17 Dec 2024

https://github.com/markummitchell/engauge-digitizer

Extracts data points from images of graphs

digitizer extract-data image-analysis utility

Last synced: 22 Dec 2024

https://github.com/elixir-crawly/crawly

Crawly, a high-level web crawling & scraping framework for Elixir.

crawler crawling elixir erlang extract-data scraper scraping scraping-websites spider

Last synced: 20 Dec 2024

https://github.com/oltarasenko/crawly

Crawly, a high-level web crawling & scraping framework for Elixir.

crawler crawling elixir erlang extract-data scraper scraping scraping-websites spider

Last synced: 27 Nov 2024

https://github.com/slotix/dataflowkit

Extract structured data from web sites. Web sites scraping.

cdp chrome-fetcher crawling extract-data go golang golang-library headless scraper scraping scraping-websites

Last synced: 26 Oct 2024

https://github.com/danschultzer/receipt-scanner

Receipt scanner extracts information from your PDF or image receipts - built in NodeJS

extract-data extract-information ocr optical-character-recognition receipt-scanner receipts

Last synced: 16 Dec 2024

https://github.com/omkarpathak/resumeparser

A simple resume parser used for extracting information from resumes

extract-data gui parser python python3 resume-parser

Last synced: 21 Dec 2024

https://github.com/OmkarPathak/ResumeParser

A simple resume parser used for extracting information from resumes

extract-data gui parser python python3 resume-parser

Last synced: 25 Nov 2024

https://github.com/ropensci/smapr

An R package for acquisition and processing of NASA SMAP data

acquisition extract-data nasa peer-reviewed r r-package raster rstats smap-data soil-mapping soil-moisture soil-moisture-sensor

Last synced: 11 Nov 2024

https://github.com/msoap/html2data

Library and cli for extracting data from HTML via CSS selectors

cli css-selector extract-data golang homebrew html library parser scrapping

Last synced: 18 Dec 2024

https://github.com/Techcatchers/PyLyrics-Extractor

Get Lyrics for any songs by just passing in the song name (spelled or misspelled) in less than 2 seconds using this awesome Python Library.

extract-data lyrics-fetcher python-library search-algorithm

Last synced: 29 Nov 2024

https://github.com/asad70/insider-trading

This program extracts insider trading data from the sec website and stores it in excel file for the specified time frame.

algotrading data-science extract-data insider-trading insiders tickers trading trading-strategies

Last synced: 11 Nov 2024

https://github.com/ionictemplate-app/social-network-data-scraper-pro

Easily scrape 10,000+ email messages in one hour, helping you quickly increase your customers Extracts data from (LinkedIn, Facebook, Instagram, Youtube, Pinterest, Twitter) Perfect search by specific Keywords Ready-to-use Social Network Data Scraper Software to get started instantly 100% Include source code and install file

business-email business-extractor email-scraper extract-data extract-emails extractor-email google-extract scraper-address scraper-email scraper-facebook scraper-instagram scraper-linkedin scraper-name scraper-phone scraper-twitter social-media social-network social-scraper

Last synced: 15 Nov 2024

https://github.com/alienzhou/giframe

extract the first frame in GIF without reading whole bytes, support both browser and nodejs 📸

decoder extract-data frame gif gif87a gif89a progressive stream-like

Last synced: 13 Nov 2024

https://github.com/darkskygit/ChatImporter

import chat records from your im and store into single sqlite database

backup backup-tool chat chat-history extract-data

Last synced: 25 Nov 2024

https://github.com/darkskygit/chatimporter

import chat records from your im and store into single sqlite database

backup backup-tool chat chat-history extract-data

Last synced: 15 Oct 2024

https://github.com/agenty/scrapingai

Build web scraping agents using AI to auto-extract the data from websites, capture screenshot, generate pdf from URL and web crawling with Agenty

crawler crawling datascraping extract-data scraping webscraper webscraping

Last synced: 25 Nov 2024

https://github.com/linuxndroidteam/whatsapp-gd-extract

Allows WhatsApp users on Android to extract their backed up WhatsApp data from Google Drive. Anytime, Anywhere

android-hacking-tools extract-data forensic-analysis forensics-tools hacking-tool whatsapp whatsapp-hack-tool whatsapp-hacking whatsapp-hacking-github

Last synced: 13 Nov 2024

https://github.com/chetanxpro/document-ai

A app to extract structured data from a pdf document

extract-data

Last synced: 15 Nov 2024

https://github.com/sc-networks/hydrator

A pragmatic hydrator and extractor library

extract extract-data extraction hydrate hydration hydrator php php7 php8

Last synced: 27 Oct 2024

https://github.com/qwazr/extractor

A WEB API for text and meta-data extraction

extract-data extractor metadata-extraction parse parser-auto-detection

Last synced: 04 Nov 2024

https://github.com/kamalpaneru/xtractor

Splits cells from excel sheet images and extracts data.

azure-computer-vision extract-data ruby split-cells

Last synced: 06 Nov 2024

https://github.com/fityannugroho/idn-area-data-extractor

Extract Indonesia area data from the raw sources to csv for fityannugroho/idn-area-data

extract-data extractor idn-area

Last synced: 31 Oct 2024

https://github.com/shubhranpara/auto-filler-web

This repository contains my internship project work at Flexbox Technologies. I have developed a system that fills the patient details form automatically with the patient data extracted from pdf file.

automation docx-files extract-data faiss-vector-database flan-t5 form-filler html-css-javascript huggingface-transformers json langchain llms medical-application patient-data pdf-converter pdf-document pptx-files python-3 qa streamlit-webapp

Last synced: 18 Oct 2024

https://github.com/simplyyan/cutinfo

go library to extract information based on references

extract-data go go-lib go-library golang string-manipulation strings

Last synced: 14 Dec 2024

https://github.com/netodeolino/tcc

Trabalho de Conclusão de Curso - Sistemas de Informação UFC

clustering data-mining extract-data jupyter-notebook python

Last synced: 19 Nov 2024

https://github.com/fuutoru/face-recognition-using-machine-learning

This is a repo to face recognition on 5 famous people

extract-data face-recognition famous-people

Last synced: 06 Dec 2024

https://github.com/ispyhumanfly/prowler

Query the web, extract data from the results, and transform that data into a format you can use.

ai analytics business cryptocurrency data extract-data machine-learning mining scraping web

Last synced: 07 Dec 2024

https://github.com/tamk-kol/chatbot-q-a-in-invoice-extractor-llm

The Invoice Extractor markdown is a specific format used to extract relevant information from invoices. It's a standardized way to annotate invoices with key information, making it easier to automate the extraction process.

chatbot extract-data extractor-api extractpdftext gemini-api gemini-pro gemini-pro-api gemini-pro-vision googleapi llms single-page-app

Last synced: 10 Nov 2024

https://github.com/shubhranpara/auto-filler

This repository contains my team's internship project work at Flexbox Technologies. We have developed a system that fills the patient details form automatically with the patient data extracted from pdf file.

docx extract-data faiss-vector-database flan-t5 form-filling gemma huggingface-transformers langchain llms pdf pdf-converter pptx python3 qa-automation streamlit-application

Last synced: 09 Nov 2024

https://github.com/zedseven/urlextractor

A small tool for extracting all urls from a blob of binary data (ex. PDFs).

blob extract extract-data lightweight-tool url url-extractor urlextractor utility

Last synced: 16 Nov 2024

https://github.com/Kamalpaneru/Xtractor

Splits cells from excel sheet images and extracts data.

azure-computer-vision extract-data ruby split-cells

Last synced: 13 Nov 2024

https://github.com/qyfashae/extract_off_data

Extract Data from offline file. Ex: Emails, Phone Numbers, Links etc.

extract extract-data extract-emails extract-links scraping

Last synced: 13 Nov 2024

https://github.com/nostalgiccoder/readexcelfile.lib

Extracts data from a spreadsheet and outputs its contents to a '.SQL' file. Data extraction tool useful for people using SQL Server Express with no access to SSMS addon and import wizard.

console excel extract-data library net-framework spreadsheet sql

Last synced: 18 Nov 2024

https://github.com/ammaryasirnaich/pyreqify

This project is a lightweight Python module designed to generate the reqirements.txt file. It streamline dependency management by automatically extracting imported modules from python or juypter files and generating there requirements.txt

dependency environment extract-data jupyter-notebooks pip project-setup python requirements-generator requirements-txt version

Last synced: 15 Dec 2024

https://github.com/bessouat40/pdf-region-picker

A project to select only part of a PDF file. It's usefull when you want to extract informations with some python library like fitz.

extract-data fitz javascript pdf region-picker

Last synced: 16 Nov 2024

https://github.com/thee-unruly/optimal-character-recognition

Extracting info from documents / images

easyocr extract-data images

Last synced: 10 Nov 2024

https://github.com/zuriel-hr/petojson

Extracción de características de archivos en formato portable ejecutable a archivo en formato JSON

extract-data json malware-analysis portable-executable

Last synced: 20 Dec 2024

https://github.com/timothy-bartlett/pymupdf

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

data-science extract-data font mupdf ocr pdf pdf-documents pymupdf python table-extraction text-processing text-shaping xps

Last synced: 23 Nov 2024

https://github.com/ecrmnn/extract-index

Extract values from an array of arrays by index

array-manipulations array-processing arrays extract extract-data

Last synced: 16 Dec 2024

https://github.com/mmikhail2001/photo_analysis

Извлечение метаданных Exif из фотографий формата JPEG. Десктоп-приложение на C++ фреймворке Qt.

binary-files exif extract-data jpeg oop patterns

Last synced: 22 Nov 2024

https://github.com/arman2409/datafalcon

Web crawler

crawler extract-data

Last synced: 15 Dec 2024