Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with extraction

A curated list of projects in awesome lists tagged with extraction .

https://github.com/axa-group/Parsr

Transforms PDF, Documents and Images into Enriched Structured Data

data document extraction hacktoberfest images nlp ocr parsr pdf python typescript

Last synced: 30 Jul 2024

https://github.com/axa-group/parsr

Transforms PDF, Documents and Images into Enriched Structured Data

data document extraction hacktoberfest images nlp ocr parsr pdf python typescript

Last synced: 01 Oct 2024

https://github.com/trusted-ai/adversarial-robustness-toolbox

Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams

adversarial-attacks adversarial-examples adversarial-machine-learning ai artificial-intelligence attack blue-team evasion extraction inference machine-learning poisoning privacy python red-team trusted-ai trustworthy-ai

Last synced: 29 Sep 2024

https://github.com/Trusted-AI/adversarial-robustness-toolbox

Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams

adversarial-attacks adversarial-examples adversarial-machine-learning ai artificial-intelligence attack blue-team evasion extraction inference machine-learning poisoning privacy python red-team trusted-ai trustworthy-ai

Last synced: 31 Jul 2024

https://github.com/google/mtail

extract internal monitoring data from application logs for collection in a timeseries database

bytecode calculator collector compiler extraction go instrumentation logs metrics monitoring mtail mtail-programs observability prometheus proxy timeseries vm

Last synced: 31 Jul 2024

https://github.com/aubio/aubio

a library for audio and music analysis

analysis annotation audio beat c extraction mfcc music onset pitch python sound tempo-tracking

Last synced: 30 Sep 2024

https://github.com/symfony/property-access

Provides functions to read and write from/to an object or array using a simple string notation

access array component extraction index injection object php property property-path reflection symfony symfony-component

Last synced: 29 Sep 2024

https://github.com/morkt/garbro

Visual Novels resource browser

audio extraction gui images reverse-engineering visual-novel

Last synced: 30 Sep 2024

https://github.com/apache/tika

The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).

content extraction java metadata tika

Last synced: 28 Sep 2024

https://github.com/onekey-sec/unblob

Extract files from any kind of container formats

archive compression extraction filesystem python

Last synced: 30 Sep 2024

https://github.com/dbashford/textract

node.js module for extracting text from html, pdf, doc, docx, xls, xlsx, csv, pptx, png, jpg, gif, rtf and more!

extract-text extraction nodejs

Last synced: 01 Aug 2024

https://github.com/chrismattmann/tika-python

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.

buffer covid-19 detection extraction memex mime nlp nlp-library nlp-machine-learning parse parser-interface python recognition text-extraction text-recognition tika-python tika-server tika-server-jar translation-interface usc

Last synced: 30 Sep 2024

https://github.com/philipperemy/stanford-openie-python

Stanford Open Information Extraction made simple!

extraction nlp python-wrapper stanford stanford-openie

Last synced: 03 Oct 2024

https://github.com/Lattyware/unrpa

A program to extract files from the RPA archive format.

extraction python renpy rpa visual-novels

Last synced: 09 Aug 2024

https://github.com/carlospuenteg/File-Injector

File Injector is a script that allows you to store any file in an image using steganography

extraction file file-injection file-injector files image image-manipulation image-processing injection noise numpy photography python python3 steganography storage

Last synced: 31 Jul 2024

https://github.com/rize/UriTemplate

PHP URI Template (RFC 6570) supports both URI expansion & extraction

expansion extraction php rfc-6570 uri-template

Last synced: 29 Jul 2024

https://github.com/nazuke/SEOMacroscope

SEO Macroscope is a website scanning tool, to check your website for broken links; including some technical SEO functionality, site scraping, Excel reporting, and more.

broken-links custom-filter duplicate-content extract-pdf-metadata extraction hreflang-checker hreflang-matrix link-checker scan-website seo seo-excel-report seo-macroscope seo-tools web-scraping webmaster

Last synced: 01 Aug 2024

https://github.com/DiegoCaraballo/Email-extractor

The main functionality is to extract all the emails from one or several URLs - La funcionalidad principal es extraer todos los correos electrónicos de una o varias Url

email email-extractor email-marketing emails extraction python scraper scrapers scraping scraping-websites scrapper scrapping scrapy scrapy-spider spyder stractor

Last synced: 04 Aug 2024

https://github.com/chrise96/3D_Ground_Segmentation

A ground segmentation algorithm for 3D point clouds based on the work described in “Fast segmentation of 3D point clouds: a paradigm on LIDAR data for Autonomous Vehicle Applications”, D. Zermas, I. Izzat and N. Papanikolopoulos, 2017. Distinguish between road and non-road points. Road surface extraction. Plane fit ground filter

cpp extraction ground ground-segmentation lastools lidar non-ground point-cloud preprocessing road-surface

Last synced: 31 Jul 2024

https://github.com/MacPaw/XADMaster

Objective-C library for archive and file unarchiving and extraction

extraction unar unarchiver

Last synced: 04 Aug 2024

https://github.com/ckorzen/pdf-text-extraction-benchmark

A project about benchmarking and evaluating existing PDF extraction tools on their semantic abilities to extract the body texts from PDF documents, especially from scientific articles.

arxiv benchmark evaluation extraction pdf tex text-extraction

Last synced: 03 Aug 2024

https://github.com/xyntopia/pydoxtools

Effortlessly extract information from unstructured data with this library, utilizing advanced AI techniques. Compose AI in customizable pipelines and diverse sources for your projects.

chatgpt document-analysis document-extraction extraction information-retrieval llm nlp pdf python

Last synced: 03 Aug 2024

https://github.com/freelawproject/doctor

A microservice for document conversion at scale

document extraction ffmpeg ocr pdf

Last synced: 01 Aug 2024

https://github.com/chrisvwn/Rnightlights

R package to extract data from satellite nightlights.

data dmsp-ols extraction nightlights noaa package r satellite snpp-viirs

Last synced: 05 Aug 2024

https://github.com/josuemtzmo/trackeddy

Tracking eddy algorithm:

eddies eddy extraction ocean oceanic-eddies

Last synced: 08 Aug 2024

https://github.com/croqaz/a-extractor

Article content extraction database

database extraction readability

Last synced: 06 Aug 2024

https://github.com/webfactory/zauberlehrling

Collection of tools and ideas for splitting up big monolithic PHP applications in smaller parts.

assets composer database extraction files microservice monolith mysql packages php tables

Last synced: 03 Aug 2024

https://github.com/Anonyfox/rake-js

A pure JS implementation of the Rapid Automated Keyword Extraction (RAKE) algorithm.

auto-tagging classification extraction keyword keywords rake tag tags

Last synced: 06 Aug 2024

https://github.com/hbish/smex

A blazing fast CLI application that processes sitemaps in golang.

cli cross-platform csv extraction go-cli golang golang-library json seotools sitemap sitemap-extractor sitemap-parser

Last synced: 01 Aug 2024

https://github.com/Systemcluster/wrappe

Packer for creating self-contained single-binary applications from executables and directories. Distribute your application without the need for an installer, with smaller file size and faster startup than many alternatives 📦

command-line-tool compression cross-platform extraction packer rust

Last synced: 06 Aug 2024

https://github.com/dotfurther/OpenDiscoverSDK

.NET 6 API for document file format identification, text/metadata/attachment/embedded object/sensitive item (PII/PHI)/entity extraction.

archive csharp dotnet email embedded-objects entity-extraction extraction file-deduplication file-format-detection file-identification indexing metadata microsoft-office phi pii pii-detection pst sdk text text-extraction

Last synced: 01 Aug 2024

https://github.com/planio-gmbh/plaintext

This gem wraps command line tools to extract plain text from typical files, such as PDF and common office formats.

cv doc docx extract extraction files fulltext odt office pdf ppt pptx rtf ruby ruby-on-rails xsl xslt

Last synced: 01 Oct 2024

https://github.com/rtymchyk/babel-plugin-extract-text

Babel plugin to extract strings from React components and gettext-like functions into a gettext PO file.

babel babel-plugin extraction gettext i18n internationalization js parser react translation

Last synced: 28 Aug 2024

https://github.com/hboisgibault/unicontent

Python module to extract structured metadata from URL, ISBN or DOI

doi extraction google-books isbn metadata open-graph python url

Last synced: 06 Aug 2024

https://github.com/au-cobra/coq-rust-extraction

Coq plugin for extracting Rust code

coq extraction rust

Last synced: 27 Sep 2024

https://github.com/datasciencecampus/readpyne

Toolkit for extracting relevant lines from receipts or similar image data.

dsc-projects extraction ocr receipts research

Last synced: 31 Jul 2024

https://github.com/rotgruengelb/mrpack2instance

Convert a .mrpack into a Minecraft instance (playable) without using something like MultiMC

downloader extraction minecraft modrinth

Last synced: 01 Aug 2024

https://github.com/samaybhavsar/copyrightextractor

Copyright Detector/Extractor - Detects and Extracts Copyright Snippet from HTML

copyright extraction extractor python python3

Last synced: 28 Sep 2024

https://github.com/au-cobra/coq-elm-extraction

Coq plugin for extracting Elm code

coq elm extraction

Last synced: 27 Sep 2024

https://github.com/codenoid/alodokter.com-database

a Alodokter.com Database, collected by Hofesh Bot (Scrapper)

alodokter data extraction hofesh

Last synced: 02 Oct 2024

https://github.com/t-charura/language-transfer-flashcards

CLI tool converting Language Transfer lessons into Anki flashcards, automating content extraction for efficient language learning.

anki cli extraction flashcards language-learning language-transfer python spaced-repetition vocabulary-flashcards

Last synced: 25 Sep 2024

https://github.com/jkphl/rdfa-lite-microdata

RDFa Lite 1.1 and HTML Microdata parser for web documents (HTML, SVG, XML)

extract extraction extractor linked-data microdata parser rdfa rdfa-lite schema-org semantic-web structured-data vocabulary

Last synced: 02 Oct 2024

https://github.com/fracpete/resourceextractor4j

Java library for making it easy to extract/read resources on the classpath.

extraction resources

Last synced: 02 Oct 2024