An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with extract

A curated list of projects in awesome lists tagged with extract .

https://github.com/yaofanguk/video-subtitle-extractor

视频硬字幕提取,生成srt文件。无需申请第三方API,本地实现文本识别。基于深度学习的视频字幕提取框架,包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.

deep-learning extract hardsub ocr ripper srt subrip subtitles

Last synced: 13 May 2025

https://github.com/dlt-hub/dlt

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️

data data-engineering data-lake data-loading data-warehouse elt extract load python transform

Last synced: 01 Apr 2026

https://github.com/YaoFANGUK/video-subtitle-extractor

视频硬字幕提取,生成srt文件。无需申请第三方API,本地实现文本识别。基于深度学习的视频字幕提取框架,包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.

deep-learning extract hardsub ocr ripper srt subrip subtitles

Last synced: 24 Mar 2025

https://github.com/scinfu/swiftsoup

SwiftSoup: Pure Swift HTML Parser, with best of DOM, CSS, and jquery (Supports Linux, iOS, Mac, tvOS, watchOS)

dom extract html html-document parse selector swift swiftsoup

Last synced: 01 Apr 2026

https://github.com/scinfu/SwiftSoup

SwiftSoup: Pure Swift HTML Parser, with best of DOM, CSS, and jquery (Supports Linux, iOS, Mac, tvOS, watchOS)

dom extract html html-document parse selector swift swiftsoup

Last synced: 24 Mar 2025

https://github.com/mholt/archiver

Easily create & extract archives, and compress & decompress files of various formats

7zip archives brotli bzip2 compression decompression extract go golang gzip lz4 rar snappy streaming streams tar xz zip zstandard

Last synced: 10 Apr 2025

https://github.com/torakiki/pdfsam

PDFsam, a desktop application to split, merge, mix, rotate PDF files and extract pages

combine extract java javafx merge merge-pdf merger pdf pdf-combiner pdf-extractor pdf-manipulation pdf-merge pdf-mix pdf-rotate pdf-split rotate split split-pdf splitter

Last synced: 13 May 2025

https://github.com/atlanhq/camelot

Camelot: PDF Table Extraction for Humans

extract for-humans pdf table

Last synced: 14 Jan 2026

https://github.com/mafaca/UtinyRipper

GUI and API library to work with Engine assets, serialized and bundle files

asset assetbundle bundle debug extract project resource ripper source unity unity3d unpack viewer

Last synced: 25 Apr 2025

https://github.com/mafaca/utinyripper

GUI and API library to work with Engine assets, serialized and bundle files

asset assetbundle bundle debug extract project resource ripper source unity unity3d unpack viewer

Last synced: 15 May 2025

https://github.com/catchthetornado/text-extract-api

Document (PDF, Word, PPTX ...) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSON or Markdown

anonymization api extract json llm ocr ocr-python pdf pii

Last synced: 14 May 2025

https://github.com/dompdf/php-font-lib

A library to read, parse, export and make subsets of different types of font files.

extract font font-files php truetype ttf woff

Last synced: 08 Apr 2026

https://github.com/extractus/article-extractor

To extract main article from given URL with Node.js

article article-extractor article-parser crawler extract nodejs readability scraper

Last synced: 27 Apr 2025

https://github.com/activescott/lessmsi

A tool to view and extract the contents of an Windows Installer (.msi) file.

c-sharp chocolatey extract extract-files install install-script msi windows

Last synced: 20 Feb 2026

https://github.com/jonathanlink/pdflayouttextstripper

Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf file for instance. This is a subclass of PDFTextStripper class (from the Apache PDFBox library).

data-extraction extract java layout pdf pdfbox text

Last synced: 15 May 2025

https://github.com/JonathanLink/PDFLayoutTextStripper

Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf file for instance. This is a subclass of PDFTextStripper class (from the Apache PDFBox library).

data-extraction extract java layout pdf pdfbox text

Last synced: 15 Mar 2025

https://github.com/wix-incubator/vscode-glean

The extension provides refactoring tools for your React codebase

clean-code extract jsx react refactoring vscode vscode-extension

Last synced: 10 Oct 2025

https://github.com/camelot-dev/excalibur

A web interface to extract tabular data from PDFs

extract for-humans pdf table

Last synced: 15 Mar 2025

https://github.com/kevva/download

Download and extract files

async decompress download extract http nodejs promise stream

Last synced: 14 May 2025

https://github.com/laktak/extrakto

extrakto for tmux - quickly select, copy/insert/complete text without a mouse

autocomplete clipboard complete completion copy-paste extract tmux

Last synced: 14 May 2025

https://github.com/masterscrat/chatistics

💬 Python scripts to parse Messenger, Hangouts, WhatsApp and Telegram chat logs into DataFrames.

cloud extract facebook-messenger google-hangouts hangouts-logs histogram parse parsers plot takeout telegram telegram-api whatsapp whatsapp-parser wordcloud

Last synced: 12 Apr 2025

https://github.com/MasterScrat/Chatistics

💬 Python scripts to parse Messenger, Hangouts, WhatsApp and Telegram chat logs into DataFrames.

cloud extract facebook-messenger google-hangouts hangouts-logs histogram parse parsers plot takeout telegram telegram-api whatsapp whatsapp-parser wordcloud

Last synced: 07 Apr 2025

https://github.com/exyte/ReadabilityKit

Preview extractor for news, articles and full-texts in Swift

extract preview swift swift-package-manager

Last synced: 06 Aug 2025

https://github.com/xboxdev/extract-xiso

Xbox ISO Creation/Extraction utility. Imported from SourceForge.

backup create disc dvd extract game iso xbox xgd xiso

Last synced: 15 May 2025

https://github.com/pgilad/leasot

Parse and output TODOs and FIXMEs from comments in your files

automation comments extract fixme hacktoberfest javascript parse productivity todo

Last synced: 11 Apr 2025

https://github.com/slingdata-io/sling-cli

Sling is a CLI tool that extracts data from a source storage/database and loads it in a target storage/database.

elt etl extract load

Last synced: 06 Mar 2026

https://github.com/op-engineering/link-preview-js

⛓ Extract web links information: title, description, images, videos, etc. [via OpenGraph], runs on mobiles and node.

chrome cors extract extract-information firefox http javascript js-library link nodejs parsing react-native safari typescript

Last synced: 15 Jan 2026

https://github.com/OP-Engineering/link-preview-js

⛓ Extract web links information: title, description, images, videos, etc. [via OpenGraph], runs on mobiles and node.

chrome cors extract extract-information firefox http javascript js-library link nodejs parsing react-native safari typescript

Last synced: 26 Mar 2025

https://github.com/linzaer/face-track-detect-extract

💎 Detect , track and extract the optimal face in multi-target faces (exclude side face and select the optimal face).

detection extract face kalman-tracking mtcnn tensorflow tracking

Last synced: 11 Jun 2025

https://github.com/ne-lexa/php-zip

PhpZip is a php-library for extended work with ZIP-archives.

archive extract php php-library unzip winzip zip zipalign ziparchive

Last synced: 14 May 2025

https://github.com/xvoland/Extract

Bash/Zsh function for extract: .zip, .rar, .bz2, .gz, .zlib, .tar, .tbz2, .tgz, .Z, .7z, .xz, .exe, .tar.bz2, .tar.gz, .tar.xz, etc.

archive archives arj bash bash-script command-line command-line-tool console-tool extract shell-script shell-scripts tar tgz utilities utility-scripts utils zlib zsh zsh-plugins zshrc

Last synced: 09 May 2025

https://github.com/xvoland/extract

Bash/Zsh function for extract: .zip, .rar, .bz2, .gz, .zlib, .tar, .tbz2, .tgz, .Z, .7z, .xz, .exe, .tar.bz2, .tar.gz, .tar.xz, etc.

archive archives arj bash bash-script command-line command-line-tool console-tool extract shell-script shell-scripts tar tgz utilities utility-scripts utils zlib zsh zsh-plugins zshrc

Last synced: 15 May 2025

https://github.com/kevva/decompress

Extracting archives made easy

decompress extract nodejs tar targz zip

Last synced: 15 May 2025

https://github.com/alexanderpro/windowtextextractor

WindowTextExtractor allows you to get a text from any window of an operating system including asterisk passwords

asterisk asterix bar-codes extract hide ocr password password-cracker password-recovery spy text transparent viewer window window-screenshot window-stream

Last synced: 16 May 2025

https://github.com/tufanbarisyildirim/php-apk-parser

Read basic info about an application from .apk file.

android apk-parser extract parser php

Last synced: 25 Aug 2025

https://github.com/peerigon/extract-loader

webpack loader to extract HTML and CSS from the bundle

extract extract-text-webpack-plugin mini-css-extract-plugin webpack webpack-loader

Last synced: 08 Apr 2025

https://github.com/alienator88/viz

Capture text/QR Codes/Barcodes/Colors from screen snippets

barcode color colorpicker extract macos picker scanner swiftui text

Last synced: 12 Apr 2025

https://github.com/opensemanticsearch/open-semantic-etl

Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database

annotation documents elasticsearch enrichment etl extract extract-information extract-text extractor ingest ingestion-pipeline ingests-documents named-entity-recognition nlp ocr pdf python rdf solr solr-dataimporter

Last synced: 06 Apr 2025

https://github.com/alienator88/Viz

Capture text/QR Codes/Barcodes from screen snippets

barcode extract macos scanner swiftui text

Last synced: 02 Mar 2025

https://github.com/Donatello-za/rake-php-plus

A keyword and phrase extraction library based on the Rapid Automatic Keyword Extraction algorithm (RAKE).

extract keyword language php phrases stopwords

Last synced: 04 Apr 2025

https://github.com/lipoja/URLExtract

URLExtract is python class for collecting (extracting) URLs from given text based on locating TLD.

extract extractor hacktoberfest urls

Last synced: 22 Mar 2025

https://github.com/macmade/dyld_cache_extract

A macOS utility to extract dynamic libraries from the dyld_shared_cache of macOS and iOS.

cache dyld dylib extract extract-dynamic-libraries ios macho macos macos-utility reverse-engineering

Last synced: 13 Apr 2025

https://github.com/jsverse/transloco-keys-manager

🦄 The Key to a Better Translation Experience

angular cli extract i18n translate transloco

Last synced: 15 May 2025

https://github.com/mholt/archives

Cross-platform library to create & extract archives, compress & decompress files, and walk virtual file systems across various formats

7zip archives brotli bzip2 compression extract fs go golang gzip lz4 lzip rar snappy streams tar xz zip zlib zstandard

Last synced: 04 Apr 2025

https://github.com/benwiley4000/gif-frames

🖼 Extract frames from an animated GIF with pure JS

extract frames gif gif-animation images pure-javascript

Last synced: 05 Apr 2025

https://github.com/chelh/VBASync

Cross-platform tool to synchronize macros from an Office VBA-enabled file with a version-controlled folder

cross-platform diff excel extract linux ms-office outlook powerpoint vba version-control windows word

Last synced: 05 May 2025

https://github.com/redcode-labs/sammler

A tool to extract useful data from documents

extract golang regex sammler

Last synced: 05 Apr 2025

https://github.com/ltrzesniewski/pcre-net

PCRE.NET - Perl Compatible Regular Expressions for .NET

c-sharp extract pcre regex regular-expression

Last synced: 05 Apr 2025

https://github.com/xlmnxp/extractify.zip

Extract and Explore compressed files online and securely

explorer extract nuxt preview pwa unzip webassembly zip

Last synced: 05 Apr 2025

https://github.com/lukeed/gittar

:guitar: Download and/or Extract git repositories (GitHub, GitLab, BitBucket). Cross-platform and Offline-first!

archive bitbucket download extract git github gitlab offline offline-first repo repository scaffold tarball

Last synced: 21 Jul 2025

https://github.com/jlu5/icoextract

Extract icons from Windows PE files (.exe/.dll)

extract ico icon icons windows

Last synced: 07 Apr 2025

https://github.com/calccrypto/tar

A simple tar implementation in C

c commandline-interface extract linux tar

Last synced: 12 Apr 2025

https://github.com/bpolaszek/bentools-etl

PHP ETL (Extract / Transform / Load) library with SOLID principles + almost no dependency.

callable etl export extract extractor import input invoke load loader loop output pattern php transform transformer

Last synced: 06 Apr 2025

https://github.com/michelecotrufo/pdf2doi

A python library/command-line tool to extract the DOI or other identifiers of a scientific paper from a pdf file.

arxiv arxiv-identifiers bibtex bibtex-entry doi extract extract-doi identifiers metadata pdf pdf-text pypdf2 python

Last synced: 16 May 2025

https://github.com/html2rss/html2rss

📰 Build RSS 2.0 feeds from websites (and JSON APIs) automatically or with a few CSS selectors.

atom-feed extract feed feed-configs html html2rss json rss rss-aggregator rss-bridge rss-builder rss-feed rss-feed-scraper rss-generator ruby scrape scraper scraping scraping-websites yahoo-pipes

Last synced: 14 Mar 2025

https://github.com/splitbrain/php-archive

Pure-PHP implementation to read and write TAR and ZIP archives

compression extract php php-library tar zip

Last synced: 15 May 2025

https://github.com/whoiskatrin/financial-statement-pdf-extractor

Python script to extract as much structured information as possible from annual/quarterly reports.

balance-sheet cash-flow cash-flow-statement data-processing extract financial-analysis financial-statements pdf quarterly-reports

Last synced: 04 Apr 2025

https://github.com/vuedoc/parser

Generate a JSON documentation for a SFC Vue component. Contribute: https://gitlab.com/vuedoc/parser#contribute

doc extract parse parser vue vuedoc

Last synced: 02 Aug 2025

https://github.com/jessielw/hdr-multi-tool

A graphical user interface for parsing HDR10+ and Dolby Vision

dolby dolbyvision electron extract gui hdr10 hdr10plus json modern parser queue rpu tool vision windows

Last synced: 19 Apr 2025

https://github.com/tautcony/chaptertool

A simple tool for video chapter extract & process

blu-ray bluray chapter dvd extract video

Last synced: 24 Apr 2025

https://github.com/tautcony/ChapterTool

A simple tool for video chapter extract & process

blu-ray bluray chapter dvd extract video

Last synced: 30 Mar 2025

https://github.com/olehkulykov/plzmasdk

PLzmaSDK is (Portable, Patched, Package, cross-P-latform) Lzma SDK.

7zip c cocoapods compress compression cpp extract js lzma lzma-sdk lzma2 multi-volume multi-volume-archives plzmasdk swift tar tarball xz

Last synced: 04 Apr 2026

https://github.com/zerkman/zzlib

zlib-compressed file depacking library in Lua

deflate extract gzip lua wtfpl zip zlib

Last synced: 19 Jan 2026

https://github.com/droe/acefile

read/test/extract ACE 1.0 and 2.0 archives in pure python

ace archiver-ace extract pure-python python python-library python3

Last synced: 02 Sep 2025

https://github.com/jessielw/HDR-Multi-Tool

A graphical user interface for parsing HDR10+ and Dolby Vision

dolby dolbyvision electron extract gui hdr10 hdr10plus json modern parser queue rpu tool vision windows

Last synced: 08 Jul 2025

https://github.com/fabiospampinato/enex-dump

Dump the content of .enex files, preserving attachements, some metadata and optionally converting notes to Markdown.

dump enex evernote extract markdown notes

Last synced: 23 Mar 2025

https://github.com/OlehKulykov/PLzmaSDK

PLzmaSDK is (Portable, Patched, Package, cross-P-latform) Lzma SDK.

7zip c cocoapods compress compression cpp extract js lzma lzma-sdk lzma2 multi-volume multi-volume-archives plzmasdk swift tar tarball xz

Last synced: 28 Mar 2025

https://github.com/nmapx/revolut-stocks-list

Extract Revolut stocks list from the list screenshot(s).

extract image list ocr revolut screenshot stocks tesseract

Last synced: 21 Jul 2025

https://github.com/d4vinci/chrome-extractor

Python script that will extract all saved passwords from your google chrome database on windows only

chrome chrome-extractor chrome-passwords extract google-chrome-database python-script

Last synced: 11 Apr 2025

https://github.com/mayswind/simpleofficereader

A simple office file reader can extract content and summary information from .doc,.docx,.ppt,.pptx files without Microsoft Office or interop.

content extract laola office ole parser reader summary

Last synced: 10 Aug 2025

https://github.com/scrapehero/yellowpages-scraper

Yellowpages.com Web Scraper written in Python and LXML to extract business details available based on a particular category and location.

business-directory extract html lxml parsing python scraper web-scraper yellow-pages yellow-pages-scraper

Last synced: 04 Apr 2025

https://github.com/michelecotrufo/pdf2bib

A python library/command-line tool to quickly and automatically generate BibTeX data starting from the pdf file of a scientific publication.

arxiv bibtex bibtex-entry bibtex-parser doi extract pdf pdf-files python

Last synced: 29 Oct 2025

https://github.com/gajus/extract-email-address

Extracts email address from an arbitrary text input.

email extract regex

Last synced: 04 Apr 2025