Projects in Awesome Lists tagged with extract
A curated list of projects in awesome lists tagged with extract .
https://github.com/yaofanguk/video-subtitle-extractor
视频硬字幕提取,生成srt文件。无需申请第三方API,本地实现文本识别。基于深度学习的视频字幕提取框架,包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.
deep-learning extract hardsub ocr ripper srt subrip subtitles
Last synced: 13 May 2025
https://github.com/dlt-hub/dlt
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
data data-engineering data-lake data-loading data-warehouse elt extract load python transform
Last synced: 01 Apr 2026
https://github.com/YaoFANGUK/video-subtitle-extractor
视频硬字幕提取,生成srt文件。无需申请第三方API,本地实现文本识别。基于深度学习的视频字幕提取框架,包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.
deep-learning extract hardsub ocr ripper srt subrip subtitles
Last synced: 24 Mar 2025
https://github.com/scinfu/swiftsoup
SwiftSoup: Pure Swift HTML Parser, with best of DOM, CSS, and jquery (Supports Linux, iOS, Mac, tvOS, watchOS)
dom extract html html-document parse selector swift swiftsoup
Last synced: 01 Apr 2026
https://github.com/scinfu/SwiftSoup
SwiftSoup: Pure Swift HTML Parser, with best of DOM, CSS, and jquery (Supports Linux, iOS, Mac, tvOS, watchOS)
dom extract html html-document parse selector swift swiftsoup
Last synced: 24 Mar 2025
https://github.com/torakiki/pdfsam
PDFsam, a desktop application to split, merge, mix, rotate PDF files and extract pages
combine extract java javafx merge merge-pdf merger pdf pdf-combiner pdf-extractor pdf-manipulation pdf-merge pdf-mix pdf-rotate pdf-split rotate split split-pdf splitter
Last synced: 13 May 2025
https://github.com/atlanhq/camelot
Camelot: PDF Table Extraction for Humans
Last synced: 14 Jan 2026
https://github.com/catchthetornado/text-extract-api
Document (PDF, Word, PPTX ...) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSON or Markdown
anonymization api extract json llm ocr ocr-python pdf pii
Last synced: 14 May 2025
https://github.com/retroplasma/earth-reverse-engineering
Reversing Google's 3D satellite mode
3d-models client exporter extract gis google-earth google-maps reverse-engineering
Last synced: 28 Sep 2025
https://github.com/donjayamanne/pythonvscode
This extension is now maintained in the Microsoft fork.
autopep8 editor extract intellisense jupyter linter pydocstyle pylint python python-3-6 python-terminal refactorings rename-refactorings scientific sort-imports terminal testing typescript visual-studio-code
Last synced: 14 May 2025
https://github.com/DonJayamanne/pythonVSCode
This extension is now maintained in the Microsoft fork.
autopep8 editor extract intellisense jupyter linter pydocstyle pylint python python-3-6 python-terminal refactorings rename-refactorings scientific sort-imports terminal testing typescript visual-studio-code
Last synced: 02 Apr 2025
https://github.com/dompdf/php-font-lib
A library to read, parse, export and make subsets of different types of font files.
extract font font-files php truetype ttf woff
Last synced: 08 Apr 2026
https://github.com/extractus/article-extractor
To extract main article from given URL with Node.js
article article-extractor article-parser crawler extract nodejs readability scraper
Last synced: 27 Apr 2025
https://github.com/activescott/lessmsi
A tool to view and extract the contents of an Windows Installer (.msi) file.
c-sharp chocolatey extract extract-files install install-script msi windows
Last synced: 20 Feb 2026
https://github.com/jonathanlink/pdflayouttextstripper
Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf file for instance. This is a subclass of PDFTextStripper class (from the Apache PDFBox library).
data-extraction extract java layout pdf pdfbox text
Last synced: 15 May 2025
https://github.com/JonathanLink/PDFLayoutTextStripper
Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf file for instance. This is a subclass of PDFTextStripper class (from the Apache PDFBox library).
data-extraction extract java layout pdf pdfbox text
Last synced: 15 Mar 2025
https://github.com/j4k0xb/webcrack
Deobfuscate obfuscator.io, unminify and unpack bundled javascript
ast browserify bundle debundle deobfuscation deobfuscator extract javascript javascript-obfuscator reverse-engineering unminify unpack webpack
Last synced: 14 May 2025
https://github.com/wix-incubator/vscode-glean
The extension provides refactoring tools for your React codebase
clean-code extract jsx react refactoring vscode vscode-extension
Last synced: 10 Oct 2025
https://github.com/camelot-dev/excalibur
A web interface to extract tabular data from PDFs
Last synced: 15 Mar 2025
https://github.com/kevva/download
Download and extract files
async decompress download extract http nodejs promise stream
Last synced: 14 May 2025
https://github.com/laktak/extrakto
extrakto for tmux - quickly select, copy/insert/complete text without a mouse
autocomplete clipboard complete completion copy-paste extract tmux
Last synced: 14 May 2025
https://github.com/masterscrat/chatistics
💬 Python scripts to parse Messenger, Hangouts, WhatsApp and Telegram chat logs into DataFrames.
cloud extract facebook-messenger google-hangouts hangouts-logs histogram parse parsers plot takeout telegram telegram-api whatsapp whatsapp-parser wordcloud
Last synced: 12 Apr 2025
https://github.com/MasterScrat/Chatistics
💬 Python scripts to parse Messenger, Hangouts, WhatsApp and Telegram chat logs into DataFrames.
cloud extract facebook-messenger google-hangouts hangouts-logs histogram parse parsers plot takeout telegram telegram-api whatsapp whatsapp-parser wordcloud
Last synced: 07 Apr 2025
https://github.com/omkarpathak/pyresparser
A simple resume parser used for extracting information from resumes
extract extracting-data machine-learning natural-language-processing nlp parser parsers pyresparser python python3 resume resume-parser resumes skills
Last synced: 15 May 2025
https://github.com/exyte/ReadabilityKit
Preview extractor for news, articles and full-texts in Swift
extract preview swift swift-package-manager
Last synced: 06 Aug 2025
https://github.com/OmkarPathak/pyresparser
A simple resume parser used for extracting information from resumes
extract extracting-data machine-learning natural-language-processing nlp parser parsers pyresparser python python3 resume resume-parser resumes skills
Last synced: 29 Apr 2025
https://github.com/pgilad/leasot
Parse and output TODOs and FIXMEs from comments in your files
automation comments extract fixme hacktoberfest javascript parse productivity todo
Last synced: 11 Apr 2025
https://github.com/slingdata-io/sling-cli
Sling is a CLI tool that extracts data from a source storage/database and loads it in a target storage/database.
Last synced: 06 Mar 2026
https://github.com/op-engineering/link-preview-js
⛓ Extract web links information: title, description, images, videos, etc. [via OpenGraph], runs on mobiles and node.
chrome cors extract extract-information firefox http javascript js-library link nodejs parsing react-native safari typescript
Last synced: 15 Jan 2026
https://github.com/003random/getjs
A tool to fastly get all javascript sources/files
bugbounty extract files go golang goquery hacking hacktoberfest javascript parser pentesting recon reconnaissance urls
Last synced: 13 May 2025
https://github.com/003random/getJS
A tool to fastly get all javascript sources/files
bugbounty extract files go golang goquery hacking hacktoberfest javascript parser pentesting recon reconnaissance urls
Last synced: 17 Mar 2025
https://github.com/paillave/etl.net
Mass processing data with a complete ETL for .net developers
business-intelligence csv csv-parser csv-reader csv-writer dotnet dotnet-core dotnet-standard entity-framework etl etl-job extract load sftp transform
Last synced: 14 May 2025
https://github.com/paillave/Etl.Net
Mass processing data with a complete ETL for .net developers
business-intelligence csv csv-parser csv-reader csv-writer dotnet dotnet-core dotnet-standard entity-framework etl etl-job extract load sftp transform
Last synced: 04 May 2025
https://github.com/OP-Engineering/link-preview-js
⛓ Extract web links information: title, description, images, videos, etc. [via OpenGraph], runs on mobiles and node.
chrome cors extract extract-information firefox http javascript js-library link nodejs parsing react-native safari typescript
Last synced: 26 Mar 2025
https://github.com/ICIJ/datashare
A self-hosted search engine for documents.
datashare docker elasticsearch extract investigative-journalism named-entity-recognition text-extraction web-gui
Last synced: 15 Apr 2025
https://github.com/retroplasma/flyover-reverse-engineering
Reversing Apple's 3D satellite mode
3d-models apple-flyover apple-maps extract gis reverse-engineering
Last synced: 06 Apr 2025
https://github.com/linzaer/face-track-detect-extract
💎 Detect , track and extract the optimal face in multi-target faces (exclude side face and select the optimal face).
detection extract face kalman-tracking mtcnn tensorflow tracking
Last synced: 11 Jun 2025
https://github.com/icij/datashare
A self-hosted search engine for documents.
datashare docker elasticsearch extract investigative-journalism named-entity-recognition text-extraction web-gui
Last synced: 25 Feb 2026
https://github.com/ne-lexa/php-zip
PhpZip is a php-library for extended work with ZIP-archives.
archive extract php php-library unzip winzip zip zipalign ziparchive
Last synced: 14 May 2025
https://github.com/xvoland/Extract
Bash/Zsh function for extract: .zip, .rar, .bz2, .gz, .zlib, .tar, .tbz2, .tgz, .Z, .7z, .xz, .exe, .tar.bz2, .tar.gz, .tar.xz, etc.
archive archives arj bash bash-script command-line command-line-tool console-tool extract shell-script shell-scripts tar tgz utilities utility-scripts utils zlib zsh zsh-plugins zshrc
Last synced: 09 May 2025
https://github.com/xvoland/extract
Bash/Zsh function for extract: .zip, .rar, .bz2, .gz, .zlib, .tar, .tbz2, .tgz, .Z, .7z, .xz, .exe, .tar.bz2, .tar.gz, .tar.xz, etc.
archive archives arj bash bash-script command-line command-line-tool console-tool extract shell-script shell-scripts tar tgz utilities utility-scripts utils zlib zsh zsh-plugins zshrc
Last synced: 15 May 2025
https://github.com/kevva/decompress
Extracting archives made easy
decompress extract nodejs tar targz zip
Last synced: 15 May 2025
https://github.com/alexanderpro/windowtextextractor
WindowTextExtractor allows you to get a text from any window of an operating system including asterisk passwords
asterisk asterix bar-codes extract hide ocr password password-cracker password-recovery spy text transparent viewer window window-screenshot window-stream
Last synced: 16 May 2025
https://github.com/tufanbarisyildirim/php-apk-parser
Read basic info about an application from .apk file.
android apk-parser extract parser php
Last synced: 25 Aug 2025
https://github.com/peerigon/extract-loader
webpack loader to extract HTML and CSS from the bundle
extract extract-text-webpack-plugin mini-css-extract-plugin webpack webpack-loader
Last synced: 08 Apr 2025
https://github.com/kacos2000/MFT_Browser
$MFT directory tree reconstruction & FILE record info
carve carver directory-tree extract file gui gui-application metadata-information mft mft-browser mft-files ntfs powershell record signed winform
Last synced: 29 Apr 2025
https://github.com/kacos2000/mft_browser
$MFT directory tree reconstruction & FILE record info
carve carver directory-tree extract file gui gui-application metadata-information mft mft-browser mft-files ntfs powershell record signed winform
Last synced: 12 Apr 2025
https://github.com/alienator88/viz
Capture text/QR Codes/Barcodes/Colors from screen snippets
barcode color colorpicker extract macos picker scanner swiftui text
Last synced: 12 Apr 2025
https://github.com/opensemanticsearch/open-semantic-etl
Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database
annotation documents elasticsearch enrichment etl extract extract-information extract-text extractor ingest ingestion-pipeline ingests-documents named-entity-recognition nlp ocr pdf python rdf solr solr-dataimporter
Last synced: 06 Apr 2025
https://github.com/lipoja/URLExtract
URLExtract is python class for collecting (extracting) URLs from given text based on locating TLD.
extract extractor hacktoberfest urls
Last synced: 22 Mar 2025
https://github.com/macmade/dyld_cache_extract
A macOS utility to extract dynamic libraries from the dyld_shared_cache of macOS and iOS.
cache dyld dylib extract extract-dynamic-libraries ios macho macos macos-utility reverse-engineering
Last synced: 13 Apr 2025
https://github.com/mholt/archives
Cross-platform library to create & extract archives, compress & decompress files, and walk virtual file systems across various formats
7zip archives brotli bzip2 compression extract fs go golang gzip lz4 lzip rar snappy streams tar xz zip zlib zstandard
Last synced: 04 Apr 2025
https://github.com/jgranstrom/sass-extract
Extract structured variables from sass files
css extract extracted-variables javascript node-sass sass sass-extract scss variables webpack
Last synced: 04 Apr 2025
https://github.com/akameco/extract-react-intl-messages
extract react intl messages
extract i18n messages react react-intl react-intl-auto
Last synced: 16 May 2025
https://github.com/bobld/tabula-sharp
Extract tables from PDF files (port of tabula-java)
csharp dotnet extract extract-table extracting-tables extraction extraction-engine netstandard pdf-table-extract pdf-table-extraction pdfparser pdfpig pdfs table table-extraction tabula tabula-java tabula-sharp
Last synced: 15 May 2025
https://github.com/benwiley4000/gif-frames
🖼 Extract frames from an animated GIF with pure JS
extract frames gif gif-animation images pure-javascript
Last synced: 05 Apr 2025
https://github.com/iansan5653/open-mcr
:pencil: Exam bubble sheet scorer. Created with OpenCV and Python.
computer-vision education exam-sheets extract machine oer omr open-educational-resources opencv optical-mark-recognition python scanner
Last synced: 10 Apr 2025
https://github.com/sypht-team/sypht-python-client
A python client for the Sypht API
api-client data-extraction document-capture extract extract-data-from-pdf extract-fields information-extraction invoice invoice-parser pdf-parser python python3 python3-library receipt-capture receipt-reader receipt-scanner receipt-scanning sypht sypht-api sypht-python-client
Last synced: 11 Jul 2025
https://github.com/chelh/VBASync
Cross-platform tool to synchronize macros from an Office VBA-enabled file with a version-controlled folder
cross-platform diff excel extract linux ms-office outlook powerpoint vba version-control windows word
Last synced: 05 May 2025
https://github.com/assafmo/xioc
Extract indicators of compromise from text, including "escaped" ones.
command-line command-line-tool data-mining defang escaping extract extraction indicators-of-compromise ioc iocs regex regexp text-mining text-processing
Last synced: 22 Jun 2025
https://github.com/redcode-labs/sammler
A tool to extract useful data from documents
Last synced: 05 Apr 2025
https://github.com/suntong/cascadia
Go cascadia package command line CSS selector
cascadia command-line command-line-tool css-selector csv-table curl extract html-source html-text tsv web-scraper web-scraping
Last synced: 05 Jul 2025
https://github.com/ltrzesniewski/pcre-net
PCRE.NET - Perl Compatible Regular Expressions for .NET
c-sharp extract pcre regex regular-expression
Last synced: 05 Apr 2025
https://github.com/xlmnxp/extractify.zip
Extract and Explore compressed files online and securely
explorer extract nuxt preview pwa unzip webassembly zip
Last synced: 05 Apr 2025
https://github.com/lukeed/gittar
:guitar: Download and/or Extract git repositories (GitHub, GitLab, BitBucket). Cross-platform and Offline-first!
archive bitbucket download extract git github gitlab offline offline-first repo repository scaffold tarball
Last synced: 21 Jul 2025
https://github.com/calccrypto/tar
A simple tar implementation in C
c commandline-interface extract linux tar
Last synced: 12 Apr 2025
https://github.com/michelecotrufo/pdf2doi
A python library/command-line tool to extract the DOI or other identifiers of a scientific paper from a pdf file.
arxiv arxiv-identifiers bibtex bibtex-entry doi extract extract-doi identifiers metadata pdf pdf-text pypdf2 python
Last synced: 16 May 2025
https://github.com/html2rss/html2rss
📰 Build RSS 2.0 feeds from websites (and JSON APIs) automatically or with a few CSS selectors.
atom-feed extract feed feed-configs html html2rss json rss rss-aggregator rss-bridge rss-builder rss-feed rss-feed-scraper rss-generator ruby scrape scraper scraping scraping-websites yahoo-pipes
Last synced: 14 Mar 2025
https://github.com/splitbrain/php-archive
Pure-PHP implementation to read and write TAR and ZIP archives
compression extract php php-library tar zip
Last synced: 15 May 2025
https://github.com/whoiskatrin/financial-statement-pdf-extractor
Python script to extract as much structured information as possible from annual/quarterly reports.
balance-sheet cash-flow cash-flow-statement data-processing extract financial-analysis financial-statements pdf quarterly-reports
Last synced: 04 Apr 2025
https://github.com/olehkulykov/plzmasdk
PLzmaSDK is (Portable, Patched, Package, cross-P-latform) Lzma SDK.
7zip c cocoapods compress compression cpp extract js lzma lzma-sdk lzma2 multi-volume multi-volume-archives plzmasdk swift tar tarball xz
Last synced: 04 Apr 2026
https://github.com/sypht-team/sypht-java-client
A Java client for the Sypht API
api-client data-extraction document-capture extract extract-data-from-pdf extract-fields information-retrieval information-retrieval-engine invoice invoice-parser java java8 pdf-parser receipt-capture receipt-reader receipt-scanner receipt-scanning sypht sypht-api sypht-java-client
Last synced: 10 Apr 2025
https://github.com/droe/acefile
read/test/extract ACE 1.0 and 2.0 archives in pure python
ace archiver-ace extract pure-python python python-library python3
Last synced: 02 Sep 2025
https://github.com/OlehKulykov/PLzmaSDK
PLzmaSDK is (Portable, Patched, Package, cross-P-latform) Lzma SDK.
7zip c cocoapods compress compression cpp extract js lzma lzma-sdk lzma2 multi-volume multi-volume-archives plzmasdk swift tar tarball xz
Last synced: 28 Mar 2025
https://github.com/nmapx/revolut-stocks-list
Extract Revolut stocks list from the list screenshot(s).
extract image list ocr revolut screenshot stocks tesseract
Last synced: 21 Jul 2025
https://github.com/d4vinci/chrome-extractor
Python script that will extract all saved passwords from your google chrome database on windows only
chrome chrome-extractor chrome-passwords extract google-chrome-database python-script
Last synced: 11 Apr 2025
https://github.com/scrapehero/yellowpages-scraper
Yellowpages.com Web Scraper written in Python and LXML to extract business details available based on a particular category and location.
business-directory extract html lxml parsing python scraper web-scraper yellow-pages yellow-pages-scraper
Last synced: 04 Apr 2025
https://github.com/michelecotrufo/pdf2bib
A python library/command-line tool to quickly and automatically generate BibTeX data starting from the pdf file of a scientific publication.
arxiv bibtex bibtex-entry bibtex-parser doi extract pdf pdf-files python
Last synced: 29 Oct 2025
https://github.com/gajus/extract-email-address
Extracts email address from an arbitrary text input.
Last synced: 04 Apr 2025