Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with extraction

A curated list of projects in awesome lists tagged with extraction .

https://github.com/axa-group/parsr

Transforms PDF, Documents and Images into Enriched Structured Data

data document extraction hacktoberfest images nlp ocr parsr pdf python typescript

Last synced: 17 Dec 2024

https://github.com/axa-group/Parsr

Transforms PDF, Documents and Images into Enriched Structured Data

data document extraction hacktoberfest images nlp ocr parsr pdf python typescript

Last synced: 25 Oct 2024

https://github.com/trusted-ai/adversarial-robustness-toolbox

Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams

adversarial-attacks adversarial-examples adversarial-machine-learning ai artificial-intelligence attack blue-team evasion extraction inference machine-learning poisoning privacy python red-team trusted-ai trustworthy-ai

Last synced: 16 Dec 2024

https://github.com/Trusted-AI/adversarial-robustness-toolbox

Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams

adversarial-attacks adversarial-examples adversarial-machine-learning ai artificial-intelligence attack blue-team evasion extraction inference machine-learning poisoning privacy python red-team trusted-ai trustworthy-ai

Last synced: 28 Oct 2024

https://github.com/google/mtail

extract internal monitoring data from application logs for collection in a timeseries database

bytecode calculator collector compiler extraction go instrumentation logs metrics monitoring mtail mtail-programs observability prometheus proxy timeseries vm

Last synced: 29 Oct 2024

https://github.com/aubio/aubio

a library for audio and music analysis

analysis annotation audio beat c extraction mfcc music onset pitch python sound tempo-tracking

Last synced: 17 Dec 2024

https://github.com/symfony/property-access

Provides functions to read and write from/to an object or array using a simple string notation

access array component extraction index injection object php property property-path reflection symfony symfony-component

Last synced: 16 Dec 2024

https://github.com/apache/tika

The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).

content extraction java metadata tika

Last synced: 16 Dec 2024

https://github.com/morkt/garbro

Visual Novels resource browser

audio extraction gui images reverse-engineering visual-novel

Last synced: 19 Dec 2024

https://github.com/onekey-sec/unblob

Extract files from any kind of container formats

archive compression extraction filesystem python

Last synced: 19 Dec 2024

https://github.com/dbashford/textract

node.js module for extracting text from html, pdf, doc, docx, xls, xlsx, csv, pptx, png, jpg, gif, rtf and more!

extract-text extraction nodejs

Last synced: 21 Dec 2024

https://github.com/chrismattmann/tika-python

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.

buffer covid-19 detection extraction memex mime nlp nlp-library nlp-machine-learning parse parser-interface python recognition text-extraction text-recognition tika-python tika-server tika-server-jar translation-interface usc

Last synced: 17 Dec 2024

https://github.com/philipperemy/stanford-openie-python

Stanford Open Information Extraction made simple!

extraction nlp python-wrapper stanford stanford-openie

Last synced: 15 Dec 2024

https://github.com/lattyware/unrpa

A program to extract files from the RPA archive format.

extraction python renpy rpa visual-novels

Last synced: 15 Dec 2024

https://github.com/Lattyware/unrpa

A program to extract files from the RPA archive format.

extraction python renpy rpa visual-novels

Last synced: 29 Nov 2024

https://github.com/carlospuenteg/File-Injector

File Injector is a script that allows you to store any file in an image using steganography

extraction file file-injection file-injector files image image-manipulation image-processing injection noise numpy photography python python3 steganography storage

Last synced: 31 Oct 2024

https://github.com/bdbc-kg-nlp/ie-survey

北京航空航天大学大数据高精尖中心自然语言处理研究团队对信息抽取领域的调研。包括实体识别,关系抽取,属性抽取等子任务,每类子任务分别对学术界和工业界进行调研。

extraction nlp survey

Last synced: 12 Nov 2024

https://github.com/rize/UriTemplate

PHP URI Template (RFC 6570) supports both URI expansion & extraction

expansion extraction php rfc-6570 uri-template

Last synced: 23 Oct 2024

https://github.com/overtools/OWLib

Toolchain that lets you interact with the Overwatch files and extract models and stuff.

blizzard blizzard-games blte casc csharp extraction modeling ngdp overtools overwatch overwatch-2 tact

Last synced: 01 Nov 2024

https://github.com/puddly/android-otp-extractor

Extracts OTP tokens from rooted Android devices

adb android extraction otp python totp

Last synced: 20 Dec 2024

https://github.com/nazuke/SEOMacroscope

SEO Macroscope is a website scanning tool, to check your website for broken links; including some technical SEO functionality, site scraping, Excel reporting, and more.

broken-links custom-filter duplicate-content extract-pdf-metadata extraction hreflang-checker hreflang-matrix link-checker scan-website seo seo-excel-report seo-macroscope seo-tools web-scraping webmaster

Last synced: 08 Nov 2024

https://github.com/robinst/autolink-java

Java library to extract links (URLs, email addresses) from plain text; fast, small and smart

autolink extraction java-library linkify links url

Last synced: 21 Dec 2024

https://github.com/thrau/jarchivelib

A simple archiving and compression library for Java

archiving compression extraction

Last synced: 18 Dec 2024

https://github.com/neelshah18/emot

Open source Emoticons and Emoji detection library: emot

detection emoji emoticons extraction python

Last synced: 18 Dec 2024

https://github.com/DiegoCaraballo/Email-extractor

The main functionality is to extract all the emails from one or several URLs - La funcionalidad principal es extraer todos los correos electrónicos de una o varias Url

email email-extractor email-marketing emails extraction python scraper scrapers scraping scraping-websites scrapper scrapping scrapy scrapy-spider spyder stractor

Last synced: 21 Nov 2024

https://github.com/nazywam/autoit-ripper

Extract AutoIt scripts embedded in PE binaries

autoit extraction malware

Last synced: 21 Dec 2024

https://github.com/macpaw/xadmaster

Objective-C library for archive and file unarchiving and extraction

extraction unar unarchiver

Last synced: 19 Dec 2024

https://github.com/MacPaw/XADMaster

Objective-C library for archive and file unarchiving and extraction

extraction unar unarchiver

Last synced: 19 Nov 2024

https://github.com/chrise96/3D_Ground_Segmentation

A ground segmentation algorithm for 3D point clouds based on the work described in “Fast segmentation of 3D point clouds: a paradigm on LIDAR data for Autonomous Vehicle Applications”, D. Zermas, I. Izzat and N. Papanikolopoulos, 2017. Distinguish between road and non-road points. Road surface extraction. Plane fit ground filter

cpp extraction ground ground-segmentation lastools lidar non-ground point-cloud preprocessing road-surface

Last synced: 27 Oct 2024

https://github.com/bdbc-kg-nlp/covid-19-tracker

北航大数据高精尖中心研究团队进行数据来源的整理与获取,利用自然语言处理等技术从已公开全国4626确诊患者轨迹中抽取了基本信息(性别、年龄、常住地、工作、武汉/湖北接触史等)、轨迹(时间、地点、交通工具、事件)及病患关系形成结构化信息

covid-19 extraction nlp tracking visualization

Last synced: 12 Nov 2024

https://github.com/philipperemy/stanford-ner-python

Stanford Named Entity Recognizer (NER) - Python Wrapper

extraction named-entity-recognition nlp python-wrapper stanford stanford-ner

Last synced: 22 Oct 2024

https://github.com/ckorzen/pdf-text-extraction-benchmark

A project about benchmarking and evaluating existing PDF extraction tools on their semantic abilities to extract the body texts from PDF documents, especially from scientific articles.

arxiv benchmark evaluation extraction pdf tex text-extraction

Last synced: 17 Nov 2024

https://github.com/freelawproject/doctor

A microservice for document conversion at scale

document extraction ffmpeg ocr pdf

Last synced: 06 Nov 2024

https://github.com/xyntopia/pydoxtools

Effortlessly extract information from unstructured data with this library, utilizing advanced AI techniques. Compose AI in customizable pipelines and diverse sources for your projects.

chatgpt document-analysis document-extraction extraction information-retrieval llm nlp pdf python

Last synced: 17 Nov 2024

https://github.com/imperialcollegelondon/pnextract

Pore network extraction from micro-CT images of porous media

extraction pore-network

Last synced: 06 Nov 2024

https://github.com/chrisvwn/Rnightlights

R package to extract data from satellite nightlights.

data dmsp-ols extraction nightlights noaa package r satellite snpp-viirs

Last synced: 22 Nov 2024

https://github.com/josuemtzmo/trackeddy

Tracking eddy algorithm:

eddies eddy extraction ocean oceanic-eddies

Last synced: 27 Nov 2024

https://github.com/borderless/unfurl

Extract rich metadata from URLs

content extraction html json-ld metadata microdata rdf rdfa scraper

Last synced: 08 Nov 2024

https://github.com/aphp/edspdf

EDS-PDF is a generic, pure-Python framework for text extraction from PDF documents. It provides the machinery to use rule- or machine-learning-based approaches to classify text blocs between body and meta-data.

extraction machine-learning pdf

Last synced: 16 Dec 2024

https://github.com/croqaz/a-extractor

Article content extraction database

database extraction readability

Last synced: 10 Nov 2024

https://github.com/adamyaxley/unformat

Fastest type-safe parsing library in the world for C++14 or C++17 (up to 300x faster than std::regex)

cpp14 cpp17 extraction formatting header-only parse parser parsing parsing-library string unformat

Last synced: 14 Nov 2024

https://github.com/psolbach/metadoc

Aviation grade news article metadata extraction

extraction metadata news nlp perceptron

Last synced: 08 Nov 2024

https://github.com/webfactory/zauberlehrling

Collection of tools and ideas for splitting up big monolithic PHP applications in smaller parts.

assets composer database extraction files microservice monolith mysql packages php tables

Last synced: 07 Nov 2024

https://github.com/anonyfox/rake-js

A pure JS implementation of the Rapid Automated Keyword Extraction (RAKE) algorithm.

auto-tagging classification extraction keyword keywords rake tag tags

Last synced: 30 Oct 2024

https://github.com/Anonyfox/rake-js

A pure JS implementation of the Rapid Automated Keyword Extraction (RAKE) algorithm.

auto-tagging classification extraction keyword keywords rake tag tags

Last synced: 24 Nov 2024

https://github.com/lysxia/coq-simple-io

IO for Gallina

coq extraction ocaml

Last synced: 27 Oct 2024

https://github.com/Systemcluster/wrappe

Packer for creating self-contained single-binary applications from executables and directories. Distribute your application without the need for an installer, with smaller file size and faster startup than many alternatives 📦

command-line-tool compression cross-platform extraction packer rust

Last synced: 24 Nov 2024

https://github.com/agenty/browser-automation-api

Browser automation API for repetitive web-based tasks, with a friendly user interface. You can use it to scrape content or do many other things like capture a screenshot, generate pdf, extract content or execute custom Puppeteer, Playwright functions.

api browser-automation extraction nodejs pdf playwright puppeteer scraping screenshot webscraping

Last synced: 25 Nov 2024

https://github.com/hbish/smex

A blazing fast CLI application that processes sitemaps in golang.

cli cross-platform csv extraction go-cli golang golang-library json seotools sitemap sitemap-extractor sitemap-parser

Last synced: 20 Nov 2024

https://github.com/loyd/readability.rs

Really fast readability

dom extraction html text

Last synced: 13 Oct 2024

https://github.com/puntorigen/ti_recover

Appcelerator Titanium APK source code recovery tool

apk appcelerator decompiler extraction titanium titanium-alloy

Last synced: 20 Nov 2024

https://github.com/uditkarode/ucc

🖥 Compile and run programs through the TurboC Compiler without having to use the TurboC IDE or intricately fabricated DOS commands. Made out of frustration sometime in my high school days.

cli command-line extraction linux students turboc turbocpp ucc ucc-workspace

Last synced: 14 Nov 2024

https://github.com/smx-smx/wcpex

A tool to extract Windows Manifest files that can be found in the WinSxS folder

binary delta extraction manifest-files tool wcp windows winsxs

Last synced: 27 Nov 2024

https://github.com/dotfurther/OpenDiscoverSDK

.NET 6 API for document file format identification, text/metadata/attachment/embedded object/sensitive item (PII/PHI)/entity extraction.

archive csharp dotnet email embedded-objects entity-extraction extraction file-deduplication file-format-detection file-identification indexing metadata microsoft-office phi pii pii-detection pst sdk text text-extraction

Last synced: 07 Nov 2024

https://github.com/yeonghyeon/lung_extraction_from_cxr

Lung Extraction from Chest X-ray for Efficient Computing

computing deep-learning efficient extraction lung nih residual-networks

Last synced: 11 Nov 2024

https://github.com/kielx/anygrabber

Simplify AnyDesk log analysis by effortlessly searching, extracting, and generating reports on IP addresses and login dates.

anydesk extraction extractor grab grabber logs python

Last synced: 27 Oct 2024

https://github.com/planio-gmbh/plaintext

This gem wraps command line tools to extract plain text from typical files, such as PDF and common office formats.

cv doc docx extract extraction files fulltext odt office pdf ppt pptx rtf ruby ruby-on-rails xsl xslt

Last synced: 11 Oct 2024

https://github.com/rtymchyk/babel-plugin-extract-text

Babel plugin to extract strings from React components and gettext-like functions into a gettext PO file.

babel babel-plugin extraction gettext i18n internationalization js parser react translation

Last synced: 21 Dec 2024

https://github.com/yagoluiz/meuremedio-extracao

[PT-BR] Extração de dados de preço de medicamentos disponibilizados pela ANVISA

data extraction python3

Last synced: 23 Nov 2024

https://github.com/hboisgibault/unicontent

Python module to extract structured metadata from URL, ISBN or DOI

doi extraction google-books isbn metadata open-graph python url

Last synced: 25 Nov 2024

https://github.com/au-cobra/coq-rust-extraction

Coq plugin for extracting Rust code

coq extraction rust

Last synced: 10 Oct 2024

https://github.com/jacksongoode/nime-proceedings-analyzer

A tool written in Python to perform a bibliographic analysis of the NIME proceedings archive and other similar corpora.

analysis bibliometric extraction grobid nime proceedings

Last synced: 05 Nov 2024

https://github.com/bucanero/libun7zip

A library that provides 7-Zip (.7z) archive handling and extraction on PS3, PS4, and PS Vita

7z 7zip compression-library extraction ps3 ps4lib un7zip

Last synced: 07 Nov 2024

https://github.com/uudigitalhumanitieslab/perfectextractor

Extracting present perfects (and related forms) from parallel corpora

extraction parallel-corpus xpath

Last synced: 30 Nov 2024

https://github.com/mrodrig/deeks

Retrieve all keys and nested keys from objects and arrays of objects.

deep document extraction hacktoberfest javascript json key object parser

Last synced: 03 Dec 2024

https://github.com/lamba92/pinsir

PINSIR, or Person Identification Network Stack for Identity Recognition, is a scalable open source end to end solution for face detection and identity recognition.

comparison detection docker extraction face-detection grpc identity-recognition keras kotlin kotlin-multiplatform microservice neural-networks tensorflow

Last synced: 10 Nov 2024

https://github.com/rotgruengelb/mrpack2instance

Convert a .mrpack into a Minecraft instance (playable) without using something like MultiMC

downloader extraction minecraft modrinth

Last synced: 03 Nov 2024

https://github.com/agrafix/grabcite

Haskell: Library/Executable to extract citations from scientific papers

citation extraction haskell nlp paper text

Last synced: 12 Oct 2024

https://github.com/roughsketch/mdgcm

Command line extractor and builder for GameCube GCM discs.

building c-plus-plus cpp disc extract extraction gamecube gcm-discs

Last synced: 02 Dec 2024

https://github.com/dtboy1995/android-sex-size

:game_die: [deprecated] a nodejs cli tool for android screen adaptation 【推荐使用今日头条适配方案】

adaptation android extraction measure screen sex size

Last synced: 14 Oct 2024

https://github.com/datasciencecampus/readpyne

Toolkit for extracting relevant lines from receipts or similar image data.

dsc-projects extraction ocr receipts research

Last synced: 27 Oct 2024

https://github.com/infobyte/draytek-arsenal

Reverse Engineering and Observability toolkit for Draytek firewalls

extraction firmware modification reverse-engineering

Last synced: 09 Nov 2024

https://github.com/dashroshan/coc-sc-extract

🛠️ Python script to batch extract png image sprite sheets from *tex.sc files present inside the clash of clans game apk

clashofclans coc extraction

Last synced: 20 Dec 2024

https://github.com/freddez/pg-dump-filter

Filter tables from PostgreSQL dump

dump extraction filter postgresql table tool

Last synced: 10 Dec 2024

https://github.com/joomla-framework/archive

Joomla Framework Archive Package

extraction joomla joomla-framework php

Last synced: 17 Dec 2024

https://github.com/basharovv/whatsound

Neural network for classifying audio samples into categories. This was my BSc final year project.

audio-processing classification essentia extraction music neural-network pybrain

Last synced: 25 Nov 2024

https://github.com/au-cobra/coq-elm-extraction

Coq plugin for extracting Elm code

coq elm extraction

Last synced: 10 Oct 2024

https://github.com/c0nw0nk/extractnow

Automatic ExtractNow script to monitor directory and extract file useful for transmission qbittorrent utorrent sonarr radarr lidarr auto unrar unzip gzip 7z .rar .7z .zip .gzip .iso .tar etc .cmd .bat batch file command line cmd script

archived archives automatic automation batch-file batch-script batchfile cmd cmdline command-line compressed-data compressed-files extraction extractnow extractor gzip monitor rar windows zip

Last synced: 29 Oct 2024

https://github.com/sc-networks/hydrator

A pragmatic hydrator and extractor library

extract extract-data extraction hydrate hydration hydrator php php7 php8

Last synced: 27 Oct 2024

https://github.com/samaybhavsar/copyrightextractor

Copyright Detector/Extractor - Detects and Extracts Copyright Snippet from HTML

copyright extraction extractor python python3

Last synced: 22 Oct 2024

https://github.com/jwhittle933/docxology

Lightweight Golang Word Doc (.docx) file extractor and manipulator package

extraction go golang golang-package microsoft microsoft-word text txt word xml zip zipfile zipfiles

Last synced: 08 Dec 2024

https://github.com/stellarbear/stringssharp

Extract strings from files

csharp extraction net strings

Last synced: 12 Nov 2024

https://github.com/chenqingspring/jbuilder-except

A Jbuilder plugin for extracting resource except some attributes

except extract extraction gem jbuilder json rails rubygem

Last synced: 10 Nov 2024

https://github.com/hiirotsuki/packdat3

Tools for unpacking "packdat3" CAB archives

extraction galgame reverse-engineering visual-novel-engine

Last synced: 09 Nov 2024