Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with extraction
A curated list of projects in awesome lists tagged with extraction .
https://github.com/axa-group/Parsr
Transforms PDF, Documents and Images into Enriched Structured Data
data document extraction hacktoberfest images nlp ocr parsr pdf python typescript
Last synced: 30 Jul 2024
https://github.com/axa-group/parsr
Transforms PDF, Documents and Images into Enriched Structured Data
data document extraction hacktoberfest images nlp ocr parsr pdf python typescript
Last synced: 01 Oct 2024
https://github.com/trusted-ai/adversarial-robustness-toolbox
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
adversarial-attacks adversarial-examples adversarial-machine-learning ai artificial-intelligence attack blue-team evasion extraction inference machine-learning poisoning privacy python red-team trusted-ai trustworthy-ai
Last synced: 29 Sep 2024
https://github.com/Trusted-AI/adversarial-robustness-toolbox
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
adversarial-attacks adversarial-examples adversarial-machine-learning ai artificial-intelligence attack blue-team evasion extraction inference machine-learning poisoning privacy python red-team trusted-ai trustworthy-ai
Last synced: 31 Jul 2024
https://github.com/google/mtail
extract internal monitoring data from application logs for collection in a timeseries database
bytecode calculator collector compiler extraction go instrumentation logs metrics monitoring mtail mtail-programs observability prometheus proxy timeseries vm
Last synced: 31 Jul 2024
https://github.com/aubio/aubio
a library for audio and music analysis
analysis annotation audio beat c extraction mfcc music onset pitch python sound tempo-tracking
Last synced: 30 Sep 2024
https://github.com/symfony/property-access
Provides functions to read and write from/to an object or array using a simple string notation
access array component extraction index injection object php property property-path reflection symfony symfony-component
Last synced: 29 Sep 2024
https://github.com/morkt/garbro
Visual Novels resource browser
audio extraction gui images reverse-engineering visual-novel
Last synced: 30 Sep 2024
https://github.com/apache/tika
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
content extraction java metadata tika
Last synced: 28 Sep 2024
https://github.com/onekey-sec/unblob
Extract files from any kind of container formats
archive compression extraction filesystem python
Last synced: 30 Sep 2024
https://github.com/dbashford/textract
node.js module for extracting text from html, pdf, doc, docx, xls, xlsx, csv, pptx, png, jpg, gif, rtf and more!
extract-text extraction nodejs
Last synced: 01 Aug 2024
https://github.com/chrismattmann/tika-python
Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
buffer covid-19 detection extraction memex mime nlp nlp-library nlp-machine-learning parse parser-interface python recognition text-extraction text-recognition tika-python tika-server tika-server-jar translation-interface usc
Last synced: 30 Sep 2024
https://github.com/langchain-ai/langchain-extract
🦜⛏️ Did you say you like data?
extraction extraction-data fastapi langchain langchain-python llm llms
Last synced: 02 Oct 2024
https://github.com/philipperemy/stanford-openie-python
Stanford Open Information Extraction made simple!
extraction nlp python-wrapper stanford stanford-openie
Last synced: 03 Oct 2024
https://github.com/Lattyware/unrpa
A program to extract files from the RPA archive format.
extraction python renpy rpa visual-novels
Last synced: 09 Aug 2024
https://github.com/carlospuenteg/File-Injector
File Injector is a script that allows you to store any file in an image using steganography
extraction file file-injection file-injector files image image-manipulation image-processing injection noise numpy photography python python3 steganography storage
Last synced: 31 Jul 2024
https://github.com/rize/UriTemplate
PHP URI Template (RFC 6570) supports both URI expansion & extraction
expansion extraction php rfc-6570 uri-template
Last synced: 29 Jul 2024
https://github.com/nazuke/SEOMacroscope
SEO Macroscope is a website scanning tool, to check your website for broken links; including some technical SEO functionality, site scraping, Excel reporting, and more.
broken-links custom-filter duplicate-content extract-pdf-metadata extraction hreflang-checker hreflang-matrix link-checker scan-website seo seo-excel-report seo-macroscope seo-tools web-scraping webmaster
Last synced: 01 Aug 2024
https://github.com/DiegoCaraballo/Email-extractor
The main functionality is to extract all the emails from one or several URLs - La funcionalidad principal es extraer todos los correos electrónicos de una o varias Url
email email-extractor email-marketing emails extraction python scraper scrapers scraping scraping-websites scrapper scrapping scrapy scrapy-spider spyder stractor
Last synced: 04 Aug 2024
https://github.com/assafmo/xioc
Extract indicators of compromise from text, including "escaped" ones.
command-line command-line-tool data-mining defang escaping extract extraction indicators-of-compromise ioc iocs regex regexp text-mining text-processing
Last synced: 01 Aug 2024
https://github.com/chrise96/3D_Ground_Segmentation
A ground segmentation algorithm for 3D point clouds based on the work described in “Fast segmentation of 3D point clouds: a paradigm on LIDAR data for Autonomous Vehicle Applications”, D. Zermas, I. Izzat and N. Papanikolopoulos, 2017. Distinguish between road and non-road points. Road surface extraction. Plane fit ground filter
cpp extraction ground ground-segmentation lastools lidar non-ground point-cloud preprocessing road-surface
Last synced: 31 Jul 2024
https://github.com/MacPaw/XADMaster
Objective-C library for archive and file unarchiving and extraction
Last synced: 04 Aug 2024
https://github.com/ckorzen/pdf-text-extraction-benchmark
A project about benchmarking and evaluating existing PDF extraction tools on their semantic abilities to extract the body texts from PDF documents, especially from scientific articles.
arxiv benchmark evaluation extraction pdf tex text-extraction
Last synced: 03 Aug 2024
https://github.com/xyntopia/pydoxtools
Effortlessly extract information from unstructured data with this library, utilizing advanced AI techniques. Compose AI in customizable pipelines and diverse sources for your projects.
chatgpt document-analysis document-extraction extraction information-retrieval llm nlp pdf python
Last synced: 03 Aug 2024
https://github.com/freelawproject/doctor
A microservice for document conversion at scale
document extraction ffmpeg ocr pdf
Last synced: 01 Aug 2024
https://github.com/chrisvwn/Rnightlights
R package to extract data from satellite nightlights.
data dmsp-ols extraction nightlights noaa package r satellite snpp-viirs
Last synced: 05 Aug 2024
https://github.com/josuemtzmo/trackeddy
Tracking eddy algorithm:
eddies eddy extraction ocean oceanic-eddies
Last synced: 08 Aug 2024
https://github.com/croqaz/a-extractor
Article content extraction database
database extraction readability
Last synced: 06 Aug 2024
https://github.com/webfactory/zauberlehrling
Collection of tools and ideas for splitting up big monolithic PHP applications in smaller parts.
assets composer database extraction files microservice monolith mysql packages php tables
Last synced: 03 Aug 2024
https://github.com/Anonyfox/rake-js
A pure JS implementation of the Rapid Automated Keyword Extraction (RAKE) algorithm.
auto-tagging classification extraction keyword keywords rake tag tags
Last synced: 06 Aug 2024
https://github.com/hbish/smex
A blazing fast CLI application that processes sitemaps in golang.
cli cross-platform csv extraction go-cli golang golang-library json seotools sitemap sitemap-extractor sitemap-parser
Last synced: 01 Aug 2024
https://github.com/Systemcluster/wrappe
Packer for creating self-contained single-binary applications from executables and directories. Distribute your application without the need for an installer, with smaller file size and faster startup than many alternatives 📦
command-line-tool compression cross-platform extraction packer rust
Last synced: 06 Aug 2024
https://github.com/dotfurther/OpenDiscoverSDK
.NET 6 API for document file format identification, text/metadata/attachment/embedded object/sensitive item (PII/PHI)/entity extraction.
archive csharp dotnet email embedded-objects entity-extraction extraction file-deduplication file-format-detection file-identification indexing metadata microsoft-office phi pii pii-detection pst sdk text text-extraction
Last synced: 01 Aug 2024
https://github.com/rtymchyk/babel-plugin-extract-text
Babel plugin to extract strings from React components and gettext-like functions into a gettext PO file.
babel babel-plugin extraction gettext i18n internationalization js parser react translation
Last synced: 28 Aug 2024
https://github.com/hboisgibault/unicontent
Python module to extract structured metadata from URL, ISBN or DOI
doi extraction google-books isbn metadata open-graph python url
Last synced: 06 Aug 2024
https://github.com/au-cobra/coq-rust-extraction
Coq plugin for extracting Rust code
Last synced: 27 Sep 2024
https://github.com/datasciencecampus/readpyne
Toolkit for extracting relevant lines from receipts or similar image data.
dsc-projects extraction ocr receipts research
Last synced: 31 Jul 2024
https://github.com/rotgruengelb/mrpack2instance
Convert a .mrpack into a Minecraft instance (playable) without using something like MultiMC
downloader extraction minecraft modrinth
Last synced: 01 Aug 2024
https://github.com/samaybhavsar/copyrightextractor
Copyright Detector/Extractor - Detects and Extracts Copyright Snippet from HTML
copyright extraction extractor python python3
Last synced: 28 Sep 2024
https://github.com/au-cobra/coq-elm-extraction
Coq plugin for extracting Elm code
Last synced: 27 Sep 2024
https://github.com/codenoid/alodokter.com-database
a Alodokter.com Database, collected by Hofesh Bot (Scrapper)
alodokter data extraction hofesh
Last synced: 02 Oct 2024
https://github.com/ctih1/momera
A program that extracts the motion from your camera.
camera cv2 extraction extractor fast linux macos motion motion-extraction opencv opencv-python python python3 webcam windows
Last synced: 26 Sep 2024
https://github.com/t-charura/language-transfer-flashcards
CLI tool converting Language Transfer lessons into Anki flashcards, automating content extraction for efficient language learning.
anki cli extraction flashcards language-learning language-transfer python spaced-repetition vocabulary-flashcards
Last synced: 25 Sep 2024
https://github.com/jkphl/rdfa-lite-microdata
RDFa Lite 1.1 and HTML Microdata parser for web documents (HTML, SVG, XML)
extract extraction extractor linked-data microdata parser rdfa rdfa-lite schema-org semantic-web structured-data vocabulary
Last synced: 02 Oct 2024
https://github.com/hamedstack/hamedstack.globaltool.extract
A .NET global tool designed for extracting embedded resource files from a .NET assembly.
assembly cmd command-line command-line-tool commandline csharp csharp-library dll dotnet dotnet-core dotnetcore embedded embedded-resource embedded-resources extract extraction extractor global resource resources
Last synced: 29 Sep 2024
https://github.com/fracpete/resourceextractor4j
Java library for making it easy to extract/read resources on the classpath.
Last synced: 02 Oct 2024