Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jeffgrunewald/stargate
An Apache Pulsar client written in Elixir
data-processing elixir pulsar-client
Last synced: 29 Jun 2024
![](https://github.com/jeffgrunewald.png)
https://github.com/AlirezaTheH/perke
A keyphrase extractor for Persian
data-mining data-processing information-retrieval keyphrase keyphrase-extraction keyphrase-extractor keyword keyword-extraction keyword-extractor machine-learning ml natural-language-processing nlp persian persian-language python text-mining text-processing unsupervised-learning
Last synced: 27 Jun 2024
![](https://github.com/AlirezaTheH.png)
https://github.com/jweinst1/Wind
The Flow-based Programming Language
compiler data-processing flow-based-programming programming-language reactive-programming
Last synced: 26 Jun 2024
![](https://github.com/jweinst1.png)
https://github.com/streamnative/pulsar-spark
Spark Connector to read and write with Pulsar
apache-pulsar apache-spark batch-processing data-processing data-science flink spark spark-sql stream-processing structured-streaming
Last synced: 26 Jun 2024
![](https://github.com/streamnative.png)
https://github.com/streamnative/pulsar-flink
Elastic data processing with Apache Pulsar and Apache Flink
apache-flink apache-pulsar batch-processing catalog data-processing flink flink-connector flink-stream-processing pulsar schema schema-registry sql stream-processing
Last synced: 26 Jun 2024
![](https://github.com/streamnative.png)
https://github.com/jqnpm/jqnpm
A package manager built for the command-line JSON processor jq.
command-line-tool data data-processing jq json package-manager
Last synced: 25 Jun 2024
![](https://github.com/jqnpm.png)
https://github.com/markus-wa/cq
Clojure Query: A Command-line Data Processor for JSON, YAML, EDN, XML and more
cli clojure command-line csv data-processing data-transformation edn hacktoberfest json msgpack transformation xml yaml
Last synced: 24 Jun 2024
![](https://github.com/markus-wa.png)
https://github.com/SebKrantz/collapse
Advanced and Fast Data Transformation in R
cran data-aggregation data-analysis data-manipulation data-processing data-science data-transformation econometrics high-performance panel-data r rstats scientific-computing statistics time-series weighted weights
Last synced: 21 Jun 2024
![](https://github.com/SebKrantz.png)
https://github.com/GoogleCloudPlatform/DataflowJavaSDK
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
big-data data-analysis data-mining data-processing data-science google-cloud-dataflow
Last synced: 20 Jun 2024
![](https://github.com/GoogleCloudPlatform.png)
https://github.com/wq/itertable
⇔ IterTable is a Pythonic API for iterating through tabular data formats, including CSV, XLSX, XML, and JSON.
csv data-processing excel export import iterable json openpyxl pandas pythonic spreadsheet tabular-data xml
Last synced: 20 Jun 2024
![](https://github.com/wq.png)
https://github.com/M4t1ss/parallel-corpora-tools
Tools for filtering and cleaning parallel and monolingual corpora for machine translation and other natural language processing tasks.
cleaning corpora corpus-tools data-processing data-science filtering language language-processing machine machine-translation natural-language natural-language-processing neural neural-machine-translation nlp nmt translation
Last synced: 20 Jun 2024
![](https://github.com/M4t1ss.png)
https://github.com/zakarialaoui10/ZikoMatrix
Arduino library for creating and manipulating matrices of arbitrary size and data type. The library provides a Matrix class that can be used to create matrices, perform basic matrix operations
arduino cpp data-processing esp32 esp8266 hardware library morocco std
Last synced: 17 Jun 2024
![](https://github.com/zakarialaoui10.png)
https://github.com/RemiRigal/DatasetExplorer
A web tool for local dataset browsing and processing developped using the Flask + Angular stack.
ai angular data-processing data-science data-visualization dataset dataset-analysis docker docker-compose flask web-application
Last synced: 16 Jun 2024
![](https://github.com/RemiRigal.png)
https://github.com/Siteimprove/alfa
:wheelchair: Suite of open and standards-based tools for performing reliable accessibility conformance testing at scale
a11y accessibility act aria customer-facing data-processing earl horizon2020 json-ld monorepo sarif testing typescript wcag
Last synced: 12 Jun 2024
![](https://github.com/Siteimprove.png)
https://github.com/NVIDIA/DALI
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
audio-processing data-augmentation data-processing deep-learning fast-data-pipeline gpu gpu-tensorflow image-augmentation image-processing machine-learning mxnet neural-network paddle python pytorch
Last synced: 12 Jun 2024
![](https://github.com/NVIDIA.png)
https://github.com/flow-php/etl
PHP - ETL (Extract Transform Load) data processing library
data-engineering data-processing etl flow-php
Last synced: 11 Jun 2024
![](https://github.com/flow-php.png)
https://github.com/vh-d/Rflow
Rflow is a general-purpose workflow management framework for R
data-processing database dataflow etl etl-framework r reproducibility rlang rstats rstats-package workflow-management
Last synced: 10 Jun 2024
![](https://github.com/vh-d.png)
https://github.com/LukasLoeffler/data-graph
Flow and event based data processing
data-processing etl etl-pipeline flow-based-programming graph graphical-user-interface low-code no-code
Last synced: 09 Jun 2024
![](https://github.com/LukasLoeffler.png)
https://github.com/ChenghaoMou/text-dedup
All-in-one text de-duplication
data-processing de-duplication nlp text-processing
Last synced: 07 Jun 2024
![](https://github.com/ChenghaoMou.png)
https://github.com/marksweiss/sofine
Lightweight framework for creating data-collecting plugins and chaining calls to them from CLI, REST or Python to return unified data sets.
cross-language data-cleaning data-processing data-retrieval json python
Last synced: 03 Jun 2024
![](https://github.com/marksweiss.png)
https://github.com/AvinashSingh786/RegSmart
Windows Registry Analysis Tool
big-data data-processing forensic-analysis parsing windows-registry
Last synced: 03 Jun 2024
![](https://github.com/AvinashSingh786.png)
https://github.com/benibela/xidel
Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.
cli command-line css-selector curl data-processing datascraping html http httpie json rest scraper web webscraper webscraping wget xml xmlstarlet xpath xquery
Last synced: 03 Jun 2024
![](https://github.com/benibela.png)
https://github.com/scramjetorg/scramjet
Public tracker for Scramjet Cloud Platform, a platform that bring data from many environments together.
data-processing data-space data-stream edge-computing event-stream javascript python raspberry-pi reactive-programming transformations virtual-data-environment
Last synced: 01 Jun 2024
![](https://github.com/scramjetorg.png)
https://github.com/onceupon/Bash-Oneliner
A collection of handy Bash One-Liners and terminal tricks for data processing and Linux system maintenance.
bash data-processing grep hardware linux linux-administration one-liners oneliner-commands shell shell-oneliner system terminal variables xargs xwindow
Last synced: 28 May 2024
![](https://github.com/onceupon.png)
https://github.com/constellation-rs/amadeus
Harmonious distributed data analysis in Rust.
data-analysis data-processing distributed-computing parallel-computing rust stream-processing
Last synced: 28 May 2024
![](https://github.com/constellation-rs.png)
https://github.com/numaproj/numaflow
Kubernetes-native platform to run massively parallel data/streaming jobs
data-processing hacktoberfest k8s kubernetes map-reduce pipeline stream-processing
Last synced: 24 May 2024
![](https://github.com/numaproj.png)
https://kousun12.github.io/eternal
👾~ music, eternal ~ 👾
3d-graphics art creative-coding data-processing glsl midi music node-based webaudio webgl
Last synced: 23 May 2024
![](https://github.com/kousun12.png)
https://github.com/deverte/awesome-science
A currated list of awesome scientific software, libraries and services.
academic awesome-list data-processing data-storage experiment literature-management moocs research science scientific
Last synced: 19 May 2024
![](https://github.com/deverte.png)
https://github.com/brexhq/substation
Substation is a cloud-native, event-driven data pipeline toolkit built for security teams.
aws data-engineering data-processing etl go security serverless
Last synced: 16 May 2024
![](https://github.com/brexhq.png)
https://github.com/remotesensinginfo/rsgislib
Remote Sensing and GIS Software Library; python module tools for processing spatial data.
classification data data-analysis data-processing data-science earth earth-observation gis machine-learning numpy observation python remote remote-sensing rsgislib sensing spatial spatial-analysis spatial-data spatial-data-analysis
Last synced: 15 May 2024
![](https://github.com/remotesensinginfo.png)
https://github.com/johnhany/awesome-list
A list of useful stuff in Machine Learning, Computer Graphics, Software Development, ...
algorithm awesome-list causal-inference computer-graphics computer-vision data-processing data-visualization deep-learning desktop-development devops graph linear-algebra machine-learning mobile-development natural-language-processing recommender-system reinforcement-learning statistics web-development
Last synced: 14 May 2024
![](https://github.com/johnhany.png)
https://github.com/adilkhash/luigi-telegram
Luigi Tasks status notifications to Telegram
data-pipeline data-processing etl luigi notification-plugin
Last synced: 13 May 2024
![](https://github.com/adilkhash.png)
https://github.com/AtomGraph/Processor
Ontology-driven Linked Data processor and server for SPARQL backends. Apache License.
appengine crud data-driven data-processing declarative docker-image framework generic hypermedia knowledge-graph ldt linked-data linked-data-templates ontology-driven-development rdf rest semantic-web server sparql
Last synced: 12 May 2024
![](https://github.com/AtomGraph.png)
https://github.com/Yord/pxi
🧚 pxi (pixie) is a small, fast, and magical command-line data processor similar to jq, mlr, and awk.
csv data-processing deserializer dsv json marshaller parser pixie pxi serializer ssv tsv
Last synced: 10 May 2024
![](https://github.com/Yord.png)
https://github.com/senbox-org/snap-engine
ESA Earth Observation Toolbox and Java Development Platform
data-processing data-visualization earth-observation eo linux macos raster-data remote-sensing windows
Last synced: 10 May 2024
![](https://github.com/senbox-org.png)
https://github.com/infoslack/awesome-kafka
A list about Apache Kafka
apache-kafka apache-spark data-pipeline data-processing infrastructure kafka kafka-streams stream-processing streaming-data
Last synced: 07 May 2024
![](https://github.com/infoslack.png)
https://github.com/senbox-org/snap-desktop
Desktop GUI for SNAP based on NetBeans Platform
data-analysis data-processing data-visualization desktop-application earth-observation eo linux macos remote-sensing windows
Last synced: 07 May 2024
![](https://github.com/senbox-org.png)
https://github.com/johnkerl/miller
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
command-line command-line-tools csv csv-format data-cleaning data-processing data-reduction data-regression devops devops-tools json json-data miller statistical-analysis statistics streaming-algorithms streaming-data tabular-data tsv unix-toolkit
Last synced: 07 May 2024
![](https://github.com/johnkerl.png)
https://github.com/kousun12/eternal
👾~ music, eternal ~ 👾
3d-graphics art creative-coding data-processing glsl midi music node-based webaudio webgl
Last synced: 06 May 2024
![](https://github.com/kousun12.png)
https://github.com/unidentifieddeveloper/blaze
A blazing fast exporter for your Elasticsearch data.
data-dump data-export data-processing devops devops-tools elasticsearch libcurl rapidjson
Last synced: 04 May 2024
![](https://github.com/unidentifieddeveloper.png)
https://github.com/dashbitco/broadway
Concurrent and multi-stage data ingestion and data processing with Elixir
broadway concurrent data-ingestion data-processing elixir genstage
Last synced: 01 May 2024
![](https://github.com/dashbitco.png)
https://github.com/tomwright/dasel
Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.
cli config configuration data-processing data-structures data-wrangling devops-tools go golang json json-processing parser query selector toml update xml yaml yaml-processor
Last synced: 29 Apr 2024
![](https://github.com/TomWright.png)
https://github.com/hstreamdb/hstream
HStreamDB is an open-source, cloud-native streaming database for IoT and beyond. Modernize your data stack for real-time applications.
data-processing database distributed-database distributed-systems financial-analysis haskell hstreamdb iot iot-database kafka materialized-view real-time realtime-database scale sql stream-processing streaming streaming-data streaming-database
Last synced: 29 Apr 2024
![](https://github.com/hstreamdb.png)
https://github.com/unionai-oss/pandera
A light-weight, flexible, and expressive statistical data testing library
assertions data-assertions data-check data-cleaning data-processing data-validation data-verification dataframe-schema dataframes hypothesis-testing pandas pandas-dataframe pandas-validation pandas-validator schema testing testing-tools validation
Last synced: 28 Apr 2024
![](https://github.com/unionai-oss.png)
https://github.com/svenkreiss/pysparkling
A pure Python implementation of Apache Spark's RDD and DStream interfaces.
apache-spark data-processing data-science python
Last synced: 28 Apr 2024
![](https://github.com/svenkreiss.png)
https://github.com/alttch/rapidtables
Super fast list of dicts to pre-formatted tables conversion library for Python 2/3
data-processing dictionary-data library python python3 text-formatting
Last synced: 27 Apr 2024
![](https://github.com/alttch.png)
https://github.com/microsoft/DialoGPT
Large-scale pretraining for dialogue
data-processing dialogpt dialogue gpt-2 machine-learning pytorch text-data text-generation transformer
Last synced: 27 Apr 2024
![](https://github.com/microsoft.png)
https://github.com/ColasGael/Machine-Learning-for-Solar-Energy-Prediction
Predict the Power Production of a solar panel farm from Weather Measurements using Machine Learning
data-processing machine-learning matlab neural-network python tensorflow
Last synced: 22 Apr 2024
![](https://github.com/ColasGael.png)
https://onceupon.github.io/Bash-Oneliner/
A collection of handy Bash One-Liners and terminal tricks for data processing and Linux system maintenance.
bash data-processing grep hardware linux linux-administration one-liners oneliner-commands shell shell-oneliner system terminal variables xargs xwindow
Last synced: 19 Apr 2024
![](https://github.com/onceupon.png)
https://github.com/msamogh/nonechucks
Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
data-cleaning data-pipeline data-preprocessing data-processing machine-learning preprocessing pytorch torch
Last synced: 19 Apr 2024
![](https://github.com/msamogh.png)
https://github.com/asyml/texar-pytorch
Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/
bert casl-project data-processing deep-learning dialog-systems gpt-2 machine-learning machine-translation natural-language-processing python pytorch roberta texar texar-pytorch text-data text-generation xlnet
Last synced: 19 Apr 2024
![](https://github.com/asyml.png)
https://github.com/GoogleCloudPlatform/data-science-on-gcp
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
cloud-computing data-analysis data-engineering data-pipeline data-processing data-science data-visualization machine-learning
Last synced: 17 Apr 2024
![](https://github.com/GoogleCloudPlatform.png)
https://github.com/LukasHedegaard/datasetops
Fluent dataset operations, compatible with your favorite libraries
data-cleaning data-munging data-processing data-science data-wrangling dataset dataset-combinations deep-learning multiple-datasets pytorch tensorflow
Last synced: 16 Apr 2024
![](https://github.com/LukasHedegaard.png)
https://github.com/lgrcia/prairie
A visual programming environment for Python
data-processing python scientific-visualization visual-programming
Last synced: 15 Apr 2024
![](https://github.com/lgrcia.png)
https://github.com/getstrm/pace
Data policy IN, dynamic view OUT: PACE is the Policy As Code Engine. It helps you to programatically create and apply a data policy to a processing platform like Databricks, Snowflake or BigQuery, with definitions imported from Collibra, Datahub, ODD and the like.
bigquery data-catalog data-contracts data-governance data-processing databricks policy-enforcement snowflake
Last synced: 11 Apr 2024
![](https://github.com/getstrm.png)
https://github.com/asyml/texar
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/
bert casl-project data-processing deep-learning dialog-systems gpt-2 machine-learning machine-translation natural-language-processing python tensorflow texar text-data text-generation xlnet
Last synced: 11 Apr 2024
![](https://github.com/asyml.png)
https://github.com/m-clark/data-processing-and-visualization
This document forms the basis of several workshops/talks that get into everyday programming with R, but also includes mirrored code in Python as Jupyter notebooks.
data-processing data-science datatable dplyr ggplot2 htmlwidgets jupyter-notebooks machine-learning model-criticism modeling numpy pandas programming programming-exercises python r tidyverse visualization workshop workshops
Last synced: 10 Apr 2024
![](https://github.com/m-clark.png)
https://github.com/jpkli/p4
P4: Portable Parallel Processing Pipeline
data-processing gpu visualizations
Last synced: 08 Apr 2024
![](https://github.com/jpkli.png)
https://github.com/luckylittle/blinkist-m4a-downloader
Grabs all of the audio files from all of the Blinkist books
audiobooks blinkist books crawler data-archiving data-mining data-processing go golang scraper spider
Last synced: 05 Apr 2024
![](https://github.com/luckylittle.png)
https://github.com/iTechArt/convtools-ita
convtools is a python library to declaratively define conversions for processing collections, doing complex aggregations and joins.
code-generation conversions data-preparation data-preprocessing data-processing functional-programming python transformations
Last synced: 01 Apr 2024
![](https://github.com/iTechArt.png)
https://github.com/zazuko/barnard59
An intuitive and flexible RDF pipeline solution designed to simplify and automate ETL processes for efficient data management.
data-integration data-pipeline data-processing etl json-ld linked-data pipeline rdf semantic-web
Last synced: 01 Apr 2024
![](https://github.com/zazuko.png)
https://github.com/mech-lang/mech
🦾 Main repository for the Mech programming language. Start here!
compiler data-processing ide language live-programming programming-environment programming-language reactive-programming robotics
Last synced: 28 Mar 2024
![](https://github.com/mech-lang.png)
https://github.com/microsoft/GODEL
Large-scale pretrained models for goal-directed dialog
conversational-ai data-processing dialogpt dialogue dialogue-systems grounded-generation language-grounding language-model machine-learning pretrained-model pytorch text-data text-generation transformer transformers
Last synced: 28 Mar 2024
![](https://github.com/microsoft.png)
https://github.com/allenai/dolma
Data and tools for generating and inspecting OLMo pre-training data.
data-processing large-language-models llm machile-learning nlp
Last synced: 24 Mar 2024
![](https://github.com/allenai.png)
https://github.com/bytewax/bytewax
Python Stream Processing
data-engineering data-processing data-science dataflow machine-learning python rust stream-processing streaming-data
Last synced: 23 Mar 2024
![](https://github.com/bytewax.png)
https://github.com/python-bonobo/bonobo
Extract Transform Load for Python 3.5+
automation bonobo data-processing extract-transform-load parallelization python3
Last synced: 23 Mar 2024
![](https://github.com/python-bonobo.png)
https://github.com/eyecuvision/bumblebee
Video Processing API
computer-vision data-processing numpy opencv torch video-processing-pipeline
Last synced: 18 Mar 2024
![](https://github.com/eyecuvision.png)
https://github.com/asavinov/prosto
Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
business-intelligence data-preparation data-preprocessing data-processing data-science data-wrangling feature-engineering map-reduce olap pandas python spark workflow
Last synced: 18 Mar 2024
![](https://github.com/asavinov.png)
https://github.com/machiela-lab/UKBBcleanR
Prepare electronic medical record data from the UK Biobank for time-to-event analyses
data-processing electronic-medical-records r r-package rstats rstats-package time-to-event uk-biobank
Last synced: 17 Mar 2024
![](https://github.com/machiela-lab.png)