Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with extraction
A curated list of projects in awesome lists tagged with extraction .
https://github.com/axa-group/parsr
Transforms PDF, Documents and Images into Enriched Structured Data
data document extraction hacktoberfest images nlp ocr parsr pdf python typescript
Last synced: 17 Dec 2024
https://github.com/axa-group/Parsr
Transforms PDF, Documents and Images into Enriched Structured Data
data document extraction hacktoberfest images nlp ocr parsr pdf python typescript
Last synced: 25 Oct 2024
https://github.com/trusted-ai/adversarial-robustness-toolbox
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
adversarial-attacks adversarial-examples adversarial-machine-learning ai artificial-intelligence attack blue-team evasion extraction inference machine-learning poisoning privacy python red-team trusted-ai trustworthy-ai
Last synced: 16 Dec 2024
https://github.com/Trusted-AI/adversarial-robustness-toolbox
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
adversarial-attacks adversarial-examples adversarial-machine-learning ai artificial-intelligence attack blue-team evasion extraction inference machine-learning poisoning privacy python red-team trusted-ai trustworthy-ai
Last synced: 28 Oct 2024
https://github.com/google/mtail
extract internal monitoring data from application logs for collection in a timeseries database
bytecode calculator collector compiler extraction go instrumentation logs metrics monitoring mtail mtail-programs observability prometheus proxy timeseries vm
Last synced: 29 Oct 2024
https://github.com/aubio/aubio
a library for audio and music analysis
analysis annotation audio beat c extraction mfcc music onset pitch python sound tempo-tracking
Last synced: 17 Dec 2024
https://github.com/symfony/property-access
Provides functions to read and write from/to an object or array using a simple string notation
access array component extraction index injection object php property property-path reflection symfony symfony-component
Last synced: 16 Dec 2024
https://github.com/apache/tika
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
content extraction java metadata tika
Last synced: 16 Dec 2024
https://github.com/morkt/garbro
Visual Novels resource browser
audio extraction gui images reverse-engineering visual-novel
Last synced: 19 Dec 2024
https://github.com/onekey-sec/unblob
Extract files from any kind of container formats
archive compression extraction filesystem python
Last synced: 19 Dec 2024
https://github.com/dbashford/textract
node.js module for extracting text from html, pdf, doc, docx, xls, xlsx, csv, pptx, png, jpg, gif, rtf and more!
extract-text extraction nodejs
Last synced: 21 Dec 2024
https://github.com/chrismattmann/tika-python
Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
buffer covid-19 detection extraction memex mime nlp nlp-library nlp-machine-learning parse parser-interface python recognition text-extraction text-recognition tika-python tika-server tika-server-jar translation-interface usc
Last synced: 17 Dec 2024
https://github.com/langchain-ai/langchain-extract
🦜⛏️ Did you say you like data?
extraction extraction-data fastapi langchain langchain-python llm llms
Last synced: 20 Dec 2024
https://github.com/philipperemy/stanford-openie-python
Stanford Open Information Extraction made simple!
extraction nlp python-wrapper stanford stanford-openie
Last synced: 15 Dec 2024
https://github.com/lattyware/unrpa
A program to extract files from the RPA archive format.
extraction python renpy rpa visual-novels
Last synced: 15 Dec 2024
https://github.com/Lattyware/unrpa
A program to extract files from the RPA archive format.
extraction python renpy rpa visual-novels
Last synced: 29 Nov 2024
https://github.com/carlospuenteg/File-Injector
File Injector is a script that allows you to store any file in an image using steganography
extraction file file-injection file-injector files image image-manipulation image-processing injection noise numpy photography python python3 steganography storage
Last synced: 31 Oct 2024
https://github.com/bdbc-kg-nlp/ie-survey
北京航空航天大学大数据高精尖中心自然语言处理研究团队对信息抽取领域的调研。包括实体识别,关系抽取,属性抽取等子任务,每类子任务分别对学术界和工业界进行调研。
Last synced: 12 Nov 2024
https://github.com/rize/UriTemplate
PHP URI Template (RFC 6570) supports both URI expansion & extraction
expansion extraction php rfc-6570 uri-template
Last synced: 23 Oct 2024
https://github.com/overtools/OWLib
Toolchain that lets you interact with the Overwatch files and extract models and stuff.
blizzard blizzard-games blte casc csharp extraction modeling ngdp overtools overwatch overwatch-2 tact
Last synced: 01 Nov 2024
https://github.com/puddly/android-otp-extractor
Extracts OTP tokens from rooted Android devices
adb android extraction otp python totp
Last synced: 20 Dec 2024
https://github.com/nazuke/SEOMacroscope
SEO Macroscope is a website scanning tool, to check your website for broken links; including some technical SEO functionality, site scraping, Excel reporting, and more.
broken-links custom-filter duplicate-content extract-pdf-metadata extraction hreflang-checker hreflang-matrix link-checker scan-website seo seo-excel-report seo-macroscope seo-tools web-scraping webmaster
Last synced: 08 Nov 2024
https://github.com/robinst/autolink-java
Java library to extract links (URLs, email addresses) from plain text; fast, small and smart
autolink extraction java-library linkify links url
Last synced: 21 Dec 2024
https://github.com/thrau/jarchivelib
A simple archiving and compression library for Java
archiving compression extraction
Last synced: 18 Dec 2024
https://github.com/neelshah18/emot
Open source Emoticons and Emoji detection library: emot
detection emoji emoticons extraction python
Last synced: 18 Dec 2024
https://github.com/DiegoCaraballo/Email-extractor
The main functionality is to extract all the emails from one or several URLs - La funcionalidad principal es extraer todos los correos electrónicos de una o varias Url
email email-extractor email-marketing emails extraction python scraper scrapers scraping scraping-websites scrapper scrapping scrapy scrapy-spider spyder stractor
Last synced: 21 Nov 2024
https://github.com/nazywam/autoit-ripper
Extract AutoIt scripts embedded in PE binaries
Last synced: 21 Dec 2024
https://github.com/bobld/tabula-sharp
Extract tables from PDF files (port of tabula-java)
csharp dotnet extract extract-table extracting-tables extraction extraction-engine netstandard pdf-table-extract pdf-table-extraction pdfparser pdfpig pdfs table table-extraction tabula tabula-java tabula-sharp
Last synced: 19 Dec 2024
https://github.com/assafmo/xioc
Extract indicators of compromise from text, including "escaped" ones.
command-line command-line-tool data-mining defang escaping extract extraction indicators-of-compromise ioc iocs regex regexp text-mining text-processing
Last synced: 30 Oct 2024
https://github.com/evyatarmeged/stegextract
Detect hidden files and text in images
bash capture-the-flag ctf extract-images extraction hidden-files images penetration-testing steg steganography stego
Last synced: 08 Nov 2024
https://github.com/macpaw/xadmaster
Objective-C library for archive and file unarchiving and extraction
Last synced: 19 Dec 2024
https://github.com/MacPaw/XADMaster
Objective-C library for archive and file unarchiving and extraction
Last synced: 19 Nov 2024
https://github.com/chrise96/3D_Ground_Segmentation
A ground segmentation algorithm for 3D point clouds based on the work described in “Fast segmentation of 3D point clouds: a paradigm on LIDAR data for Autonomous Vehicle Applications”, D. Zermas, I. Izzat and N. Papanikolopoulos, 2017. Distinguish between road and non-road points. Road surface extraction. Plane fit ground filter
cpp extraction ground ground-segmentation lastools lidar non-ground point-cloud preprocessing road-surface
Last synced: 27 Oct 2024
https://github.com/bdbc-kg-nlp/covid-19-tracker
北航大数据高精尖中心研究团队进行数据来源的整理与获取,利用自然语言处理等技术从已公开全国4626确诊患者轨迹中抽取了基本信息(性别、年龄、常住地、工作、武汉/湖北接触史等)、轨迹(时间、地点、交通工具、事件)及病患关系形成结构化信息
covid-19 extraction nlp tracking visualization
Last synced: 12 Nov 2024
https://github.com/philipperemy/stanford-ner-python
Stanford Named Entity Recognizer (NER) - Python Wrapper
extraction named-entity-recognition nlp python-wrapper stanford stanford-ner
Last synced: 22 Oct 2024
https://github.com/cisco-talos/locky
analysis extraction locky malware ransom unpacker
Last synced: 06 Nov 2024
https://github.com/skblaz/rakun2
RaKUn 2.0 - A fast keyword detection algorithm
extraction information-retrieval keyphrase keyphrase-extraction keyphrases keyword-extraction keywords keywords-extraction library multilingual natural-language natural-language-processing nlp nlp-keywords-extraction nlp-library nlp-machine-learning python scalable-machine-learning unsupervised-learning
Last synced: 17 Dec 2024
https://github.com/ckorzen/pdf-text-extraction-benchmark
A project about benchmarking and evaluating existing PDF extraction tools on their semantic abilities to extract the body texts from PDF documents, especially from scientific articles.
arxiv benchmark evaluation extraction pdf tex text-extraction
Last synced: 17 Nov 2024
https://github.com/freelawproject/doctor
A microservice for document conversion at scale
document extraction ffmpeg ocr pdf
Last synced: 06 Nov 2024
https://github.com/xyntopia/pydoxtools
Effortlessly extract information from unstructured data with this library, utilizing advanced AI techniques. Compose AI in customizable pipelines and diverse sources for your projects.
chatgpt document-analysis document-extraction extraction information-retrieval llm nlp pdf python
Last synced: 17 Nov 2024
https://github.com/imperialcollegelondon/pnextract
Pore network extraction from micro-CT images of porous media
Last synced: 06 Nov 2024
https://github.com/chrisvwn/Rnightlights
R package to extract data from satellite nightlights.
data dmsp-ols extraction nightlights noaa package r satellite snpp-viirs
Last synced: 22 Nov 2024
https://github.com/josuemtzmo/trackeddy
Tracking eddy algorithm:
eddies eddy extraction ocean oceanic-eddies
Last synced: 27 Nov 2024
https://github.com/borderless/unfurl
Extract rich metadata from URLs
content extraction html json-ld metadata microdata rdf rdfa scraper
Last synced: 08 Nov 2024
https://github.com/aphp/edspdf
EDS-PDF is a generic, pure-Python framework for text extraction from PDF documents. It provides the machinery to use rule- or machine-learning-based approaches to classify text blocs between body and meta-data.
extraction machine-learning pdf
Last synced: 16 Dec 2024
https://github.com/croqaz/a-extractor
Article content extraction database
database extraction readability
Last synced: 10 Nov 2024
https://github.com/adamyaxley/unformat
Fastest type-safe parsing library in the world for C++14 or C++17 (up to 300x faster than std::regex)
cpp14 cpp17 extraction formatting header-only parse parser parsing parsing-library string unformat
Last synced: 14 Nov 2024
https://github.com/psolbach/metadoc
Aviation grade news article metadata extraction
extraction metadata news nlp perceptron
Last synced: 08 Nov 2024
https://github.com/webfactory/zauberlehrling
Collection of tools and ideas for splitting up big monolithic PHP applications in smaller parts.
assets composer database extraction files microservice monolith mysql packages php tables
Last synced: 07 Nov 2024
https://github.com/anonyfox/rake-js
A pure JS implementation of the Rapid Automated Keyword Extraction (RAKE) algorithm.
auto-tagging classification extraction keyword keywords rake tag tags
Last synced: 30 Oct 2024
https://github.com/Anonyfox/rake-js
A pure JS implementation of the Rapid Automated Keyword Extraction (RAKE) algorithm.
auto-tagging classification extraction keyword keywords rake tag tags
Last synced: 24 Nov 2024
https://github.com/bobld/camelot-sharp
A C# library to extract tabular data from PDFs (port of camelot Python version using PdfPig).
camelot camelot-sharp csharp dotnet extract-table extracting-tables extraction extraction-engine netstandard opencv pdf-table-extract pdf-table-extraction pdfparser pdfpig pdfs table table-extraction
Last synced: 08 Nov 2024
https://github.com/Systemcluster/wrappe
Packer for creating self-contained single-binary applications from executables and directories. Distribute your application without the need for an installer, with smaller file size and faster startup than many alternatives 📦
command-line-tool compression cross-platform extraction packer rust
Last synced: 24 Nov 2024
https://github.com/agenty/browser-automation-api
Browser automation API for repetitive web-based tasks, with a friendly user interface. You can use it to scrape content or do many other things like capture a screenshot, generate pdf, extract content or execute custom Puppeteer, Playwright functions.
api browser-automation extraction nodejs pdf playwright puppeteer scraping screenshot webscraping
Last synced: 25 Nov 2024
https://github.com/hbish/smex
A blazing fast CLI application that processes sitemaps in golang.
cli cross-platform csv extraction go-cli golang golang-library json seotools sitemap sitemap-extractor sitemap-parser
Last synced: 20 Nov 2024
https://github.com/puntorigen/ti_recover
Appcelerator Titanium APK source code recovery tool
apk appcelerator decompiler extraction titanium titanium-alloy
Last synced: 20 Nov 2024
https://github.com/uditkarode/ucc
🖥 Compile and run programs through the TurboC Compiler without having to use the TurboC IDE or intricately fabricated DOS commands. Made out of frustration sometime in my high school days.
cli command-line extraction linux students turboc turbocpp ucc ucc-workspace
Last synced: 14 Nov 2024
https://github.com/smx-smx/wcpex
A tool to extract Windows Manifest files that can be found in the WinSxS folder
binary delta extraction manifest-files tool wcp windows winsxs
Last synced: 27 Nov 2024
https://github.com/zelon88/xpress
xPress File archiver and extractor
archive compression compression-algorithm decompression experimental extraction extractor python
Last synced: 23 Oct 2024
https://github.com/dotfurther/OpenDiscoverSDK
.NET 6 API for document file format identification, text/metadata/attachment/embedded object/sensitive item (PII/PHI)/entity extraction.
archive csharp dotnet email embedded-objects entity-extraction extraction file-deduplication file-format-detection file-identification indexing metadata microsoft-office phi pii pii-detection pst sdk text text-extraction
Last synced: 07 Nov 2024
https://github.com/yeonghyeon/lung_extraction_from_cxr
Lung Extraction from Chest X-ray for Efficient Computing
computing deep-learning efficient extraction lung nih residual-networks
Last synced: 11 Nov 2024
https://github.com/kielx/anygrabber
Simplify AnyDesk log analysis by effortlessly searching, extracting, and generating reports on IP addresses and login dates.
anydesk extraction extractor grab grabber logs python
Last synced: 27 Oct 2024
https://github.com/decisionfacts/df-extract
DF Extract Lib
asyncio document-parser docx extraction jpeg jpg pdf png pptx python3
Last synced: 08 Nov 2024
https://github.com/rtymchyk/babel-plugin-extract-text
Babel plugin to extract strings from React components and gettext-like functions into a gettext PO file.
babel babel-plugin extraction gettext i18n internationalization js parser react translation
Last synced: 21 Dec 2024
https://github.com/yagoluiz/meuremedio-extracao
[PT-BR] Extração de dados de preço de medicamentos disponibilizados pela ANVISA
Last synced: 23 Nov 2024
https://github.com/hboisgibault/unicontent
Python module to extract structured metadata from URL, ISBN or DOI
doi extraction google-books isbn metadata open-graph python url
Last synced: 25 Nov 2024
https://github.com/au-cobra/coq-rust-extraction
Coq plugin for extracting Rust code
Last synced: 10 Oct 2024
https://github.com/jacksongoode/nime-proceedings-analyzer
A tool written in Python to perform a bibliographic analysis of the NIME proceedings archive and other similar corpora.
analysis bibliometric extraction grobid nime proceedings
Last synced: 05 Nov 2024
https://github.com/bucanero/libun7zip
A library that provides 7-Zip (.7z) archive handling and extraction on PS3, PS4, and PS Vita
7z 7zip compression-library extraction ps3 ps4lib un7zip
Last synced: 07 Nov 2024
https://github.com/uudigitalhumanitieslab/perfectextractor
Extracting present perfects (and related forms) from parallel corpora
extraction parallel-corpus xpath
Last synced: 30 Nov 2024
https://github.com/valaphee/protod
Protobuf Decompiler
extract extraction extractor kotlin protobuf protobuf-definitions protobuf-java protocol-buffers
Last synced: 10 Nov 2024
https://github.com/mrodrig/deeks
Retrieve all keys and nested keys from objects and arrays of objects.
deep document extraction hacktoberfest javascript json key object parser
Last synced: 03 Dec 2024
https://github.com/lamba92/pinsir
PINSIR, or Person Identification Network Stack for Identity Recognition, is a scalable open source end to end solution for face detection and identity recognition.
comparison detection docker extraction face-detection grpc identity-recognition keras kotlin kotlin-multiplatform microservice neural-networks tensorflow
Last synced: 10 Nov 2024
https://github.com/cybercentrecanada/assemblyline-service-extract
Assemblyline 4 File extraction service
archive assemblyline extraction file malware-analysis
Last synced: 11 Nov 2024
https://github.com/rotgruengelb/mrpack2instance
Convert a .mrpack into a Minecraft instance (playable) without using something like MultiMC
downloader extraction minecraft modrinth
Last synced: 03 Nov 2024
https://github.com/agrafix/grabcite
Haskell: Library/Executable to extract citations from scientific papers
citation extraction haskell nlp paper text
Last synced: 12 Oct 2024
https://github.com/andreas-aeschlimann/gabor
Demo web application for Gabor filters
extraction fft filters fourier fourier-transform gabor ifft image image-processing processing recognition transform
Last synced: 25 Oct 2024
https://github.com/roughsketch/mdgcm
Command line extractor and builder for GameCube GCM discs.
building c-plus-plus cpp disc extract extraction gamecube gcm-discs
Last synced: 02 Dec 2024
https://github.com/dtboy1995/android-sex-size
:game_die: [deprecated] a nodejs cli tool for android screen adaptation 【推荐使用今日头条适配方案】
adaptation android extraction measure screen sex size
Last synced: 14 Oct 2024
https://github.com/datasciencecampus/readpyne
Toolkit for extracting relevant lines from receipts or similar image data.
dsc-projects extraction ocr receipts research
Last synced: 27 Oct 2024
https://github.com/infobyte/draytek-arsenal
Reverse Engineering and Observability toolkit for Draytek firewalls
extraction firmware modification reverse-engineering
Last synced: 09 Nov 2024
https://github.com/stephangeorg/postal-code-helpers
Helpers for intl. postal codes.
extraction postal postal-codes postcode postcodes validation
Last synced: 12 Nov 2024
https://github.com/dashroshan/coc-sc-extract
🛠️ Python script to batch extract png image sprite sheets from *tex.sc files present inside the clash of clans game apk
Last synced: 20 Dec 2024
https://github.com/freddez/pg-dump-filter
Filter tables from PostgreSQL dump
dump extraction filter postgresql table tool
Last synced: 10 Dec 2024
https://github.com/joomla-framework/archive
Joomla Framework Archive Package
extraction joomla joomla-framework php
Last synced: 17 Dec 2024
https://github.com/basharovv/whatsound
Neural network for classifying audio samples into categories. This was my BSc final year project.
audio-processing classification essentia extraction music neural-network pybrain
Last synced: 25 Nov 2024
https://github.com/arklab/artesian.sdk-python
Python Library for Artesian
ark artesian energy-data extraction market-data python timeseries
Last synced: 05 Nov 2024
https://github.com/au-cobra/coq-elm-extraction
Coq plugin for extracting Elm code
Last synced: 10 Oct 2024
https://github.com/jhermsmeier/node-xarchive
Extensible Archive Format
archive compression extraction file-format xar
Last synced: 03 Nov 2024
https://github.com/c0nw0nk/extractnow
Automatic ExtractNow script to monitor directory and extract file useful for transmission qbittorrent utorrent sonarr radarr lidarr auto unrar unzip gzip 7z .rar .7z .zip .gzip .iso .tar etc .cmd .bat batch file command line cmd script
archived archives automatic automation batch-file batch-script batchfile cmd cmdline command-line compressed-data compressed-files extraction extractnow extractor gzip monitor rar windows zip
Last synced: 29 Oct 2024
https://github.com/sc-networks/hydrator
A pragmatic hydrator and extractor library
extract extract-data extraction hydrate hydration hydrator php php7 php8
Last synced: 27 Oct 2024
https://github.com/mahirsust/rake-bengalikeywordextraction
Bangla Keyword Extraction Using RAKE Algorithm with some Modification.
bangla bangla-nlp banglakeywordextraction bengalikeywordextraction cse extraction keyword keyword-extraction nlp rake rake-bangla sust sustcse sustcse-13
Last synced: 12 Nov 2024
https://github.com/samaybhavsar/copyrightextractor
Copyright Detector/Extractor - Detects and Extracts Copyright Snippet from HTML
copyright extraction extractor python python3
Last synced: 22 Oct 2024
https://github.com/jwhittle933/docxology
Lightweight Golang Word Doc (.docx) file extractor and manipulator package
extraction go golang golang-package microsoft microsoft-word text txt word xml zip zipfile zipfiles
Last synced: 08 Dec 2024
https://github.com/chenqingspring/jbuilder-except
A Jbuilder plugin for extracting resource except some attributes
except extract extraction gem jbuilder json rails rubygem
Last synced: 10 Nov 2024
https://github.com/hiirotsuki/packdat3
Tools for unpacking "packdat3" CAB archives
extraction galgame reverse-engineering visual-novel-engine
Last synced: 09 Nov 2024