Projects in Awesome Lists tagged with data-processing
A curated list of projects in awesome lists tagged with data-processing .
https://github.com/pathwaycom/pathway
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
batch-processing data-analytics data-pipelines data-processing dataflow etl etl-framework iot-analytics kafka machine-learning-algorithms pathway python real-time rust stream-processing streaming time-series-analysis
Last synced: 09 Sep 2025
https://onceupon.github.io/Bash-Oneliner/
A collection of handy Bash One-Liners and terminal tricks for data processing and Linux system maintenance.
bash data-processing grep hardware linux linux-administration one-liners oneliner-commands shell shell-oneliner system terminal variables xargs xwindow
Last synced: 16 Nov 2025
https://github.com/onceupon/bash-oneliner
A collection of handy Bash One-Liners and terminal tricks for data processing and Linux system maintenance.
bash data-processing grep hardware linux linux-administration one-liners oneliner-commands shell shell-oneliner system terminal variables xargs xwindow
Last synced: 14 May 2025
https://github.com/onceupon/Bash-Oneliner
A collection of handy Bash One-Liners and terminal tricks for data processing and Linux system maintenance.
bash data-processing grep hardware linux linux-administration one-liners oneliner-commands shell shell-oneliner system terminal variables xargs xwindow
Last synced: 26 Mar 2025
https://github.com/johnkerl/miller
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
command-line command-line-tools csv csv-format data-cleaning data-processing data-reduction data-regression devops devops-tools json json-data miller statistical-analysis statistics streaming-algorithms streaming-data tabular-data tsv unix-toolkit
Last synced: 14 May 2025
https://github.com/tomwright/dasel
Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.
cli config configuration data-processing data-structures data-wrangling devops-tools go golang json json-processing parser query selector toml update xml yaml yaml-processor
Last synced: 26 Dec 2025
https://github.com/TomWright/dasel
Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.
cli config configuration data-processing data-structures data-wrangling devops-tools go golang json json-processing parser query selector toml update xml yaml yaml-processor
Last synced: 12 Mar 2025
https://github.com/datajuicer/data-juicer
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
data data-analysis data-pipeline data-processing data-science data-visualization foundation-models instruction-tuning large-language-models llm llms multi-modal pre-training synthetic-data
Last synced: 08 Nov 2025
https://github.com/nvidia/dali
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
audio-processing data-augmentation data-processing deep-learning fast-data-pipeline gpu gpu-tensorflow image-augmentation image-processing machine-learning mxnet neural-network paddle python pytorch
Last synced: 13 May 2025
https://github.com/NVIDIA/DALI
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
audio-processing data-augmentation data-processing deep-learning fast-data-pipeline gpu gpu-tensorflow image-augmentation image-processing machine-learning mxnet neural-network paddle python pytorch
Last synced: 15 Mar 2025
https://github.com/deepseek-ai/smallpond
A lightweight data processing framework built on DuckDB and 3FS.
Last synced: 16 Jul 2025
https://github.com/unionai-oss/pandera
A light-weight, flexible, and expressive statistical data testing library
assertions data-assertions data-check data-cleaning data-processing data-validation data-verification dataframe-schema dataframes hypothesis-testing pandas pandas-dataframe pandas-validation pandas-validator schema testing testing-tools validation
Last synced: 12 Dec 2025
https://github.com/dashbitco/broadway
Concurrent and multi-stage data ingestion and data processing with Elixir
broadway concurrent data-ingestion data-processing elixir genstage
Last synced: 14 May 2025
https://github.com/microsoft/DialoGPT
Large-scale pretraining for dialogue
data-processing dialogpt dialogue gpt-2 machine-learning pytorch text-data text-generation transformer
Last synced: 19 Jul 2025
https://github.com/asyml/texar
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/
bert casl-project data-processing deep-learning dialog-systems gpt-2 machine-learning machine-translation natural-language-processing python tensorflow texar text-data text-generation xlnet
Last synced: 14 May 2025
https://github.com/microsoft/dialogpt
Large-scale pretraining for dialogue
data-processing dialogpt dialogue gpt-2 machine-learning pytorch text-data text-generation transformer
Last synced: 15 May 2025
https://github.com/numaproj/numaflow
Kubernetes-native platform to run massively parallel data/streaming jobs
data-processing hacktoberfest k8s kubernetes map-reduce pipeline stream-processing
Last synced: 23 Oct 2025
https://github.com/bytewax/bytewax
Python Stream Processing
data-engineering data-processing data-science dataflow machine-learning python rust stream-processing streaming-data
Last synced: 13 May 2025
https://github.com/python-bonobo/bonobo
Extract Transform Load for Python 3.5+
automation bonobo data-processing extract-transform-load parallelization python3
Last synced: 14 May 2025
https://github.com/googlecloudplatform/data-science-on-gcp
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
cloud-computing data-analysis data-engineering data-pipeline data-processing data-science data-visualization machine-learning
Last synced: 14 Apr 2025
https://github.com/allenai/dolma
Data and tools for generating and inspecting OLMo pre-training data.
data-processing large-language-models llm machile-learning nlp
Last synced: 13 Oct 2025
https://github.com/GoogleCloudPlatform/data-science-on-gcp
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
cloud-computing data-analysis data-engineering data-pipeline data-processing data-science data-visualization machine-learning
Last synced: 19 Jul 2025
https://github.com/cocoindex-io/cocoindex
ETL framework to turn your data AI-ready - with realtime incremental updates and support custom logic like lego.
ai change-data-capture data data-engineering data-indexing data-infrastructure data-processing dataflow etl help-wanted indexing knowledge-graph llm pipeline python rag real-time rust semantic-search streaming
Last synced: 14 May 2025
https://github.com/NVIDIA/NeMo-Curator
Scalable data pre processing and curation toolkit for LLMs
data data-curation data-prep data-preparation data-processing data-processing-pipelines data-quality datacuration datarecipes deduplication fast-data-processing fine-tuning large-language-models large-scale-data-processing llm llm-data-quality llmapps python semantic-deduplication
Last synced: 29 Jul 2025
https://github.com/nvidia/nemo-curator
Scalable data pre processing and curation toolkit for LLMs
data data-curation data-prep data-preparation data-processing data-processing-pipelines data-quality datacuration datarecipes deduplication fast-data-processing fine-tuning large-language-models large-scale-data-processing llm llm-data-quality llmapps python semantic-deduplication
Last synced: 14 May 2025
https://github.com/microsoft/godel
Large-scale pretrained models for goal-directed dialog
conversational-ai data-processing dialogpt dialogue dialogue-systems grounded-generation language-grounding language-model machine-learning pretrained-model pytorch text-data text-generation transformer transformers
Last synced: 12 Apr 2025
https://github.com/microsoft/GODEL
Large-scale pretrained models for goal-directed dialog
conversational-ai data-processing dialogpt dialogue dialogue-systems grounded-generation language-grounding language-model machine-learning pretrained-model pytorch text-data text-generation transformer transformers
Last synced: 27 Mar 2025
https://github.com/GoogleCloudPlatform/DataflowJavaSDK
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
big-data data-analysis data-mining data-processing data-science google-cloud-dataflow
Last synced: 01 May 2025
https://github.com/googlecloudplatform/dataflowjavasdk
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
big-data data-analysis data-mining data-processing data-science google-cloud-dataflow
Last synced: 03 Oct 2025
https://github.com/jofpin/synthBTC
A tool that uses advanced Monte Carlo simulations and Turbit parallel processing to create possible Bitcoin prediction scenarios.
bitcoin data-processing monte-carlo-simulation nodejs prediction synthetic-data turbit
Last synced: 27 Sep 2025
https://github.com/asyml/texar-pytorch
Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/
bert casl-project data-processing deep-learning dialog-systems gpt-2 machine-learning machine-translation natural-language-processing python pytorch roberta texar texar-pytorch text-data text-generation xlnet
Last synced: 08 Oct 2025
https://github.com/chenghaomou/text-dedup
All-in-one text de-duplication
data-processing de-duplication nlp text-processing
Last synced: 14 Dec 2025
https://github.com/hstreamdb/hstream
HStreamDB is an open-source, cloud-native streaming database for IoT and beyond. Modernize your data stack for real-time applications.
data-processing database distributed-database distributed-systems financial-analysis haskell hstreamdb iot iot-database kafka materialized-view real-time realtime-database scale sql stream-processing streaming streaming-data streaming-database
Last synced: 15 May 2025
https://github.com/benibela/xidel
Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.
cli command-line css-selector curl data-processing datascraping html http httpie json rest scraper web webscraper webscraping wget xml xmlstarlet xpath xquery
Last synced: 15 May 2025
https://github.com/sebkrantz/collapse
Advanced and Fast Data Transformation in R
cran data-aggregation data-analysis data-manipulation data-processing data-science data-transformation econometrics high-performance panel-data r rstats scientific-computing statistics time-series weighted weights
Last synced: 14 May 2025
https://github.com/jofpin/synthbtc
A tool that uses advanced Monte Carlo simulations and Turbit parallel processing to create possible Bitcoin prediction scenarios.
bitcoin data-processing monte-carlo-simulation nodejs prediction synthetic-data turbit
Last synced: 16 May 2025
https://github.com/ChenghaoMou/text-dedup
All-in-one text de-duplication
data-processing de-duplication nlp text-processing
Last synced: 03 Apr 2025
https://github.com/SebKrantz/collapse
Advanced and Fast Data Transformation in R
cran data-aggregation data-analysis data-manipulation data-processing data-science data-transformation econometrics high-performance panel-data r rstats scientific-computing statistics time-series weighted weights
Last synced: 26 Apr 2025
https://github.com/NVIDIA-NeMo/Curator
Scalable data pre processing and curation toolkit for LLMs
data data-curation data-prep data-preparation data-processing data-processing-pipelines data-quality datacuration datarecipes deduplication fast-data-processing fine-tuning large-language-models large-scale-data-processing llm llm-data-quality llmapps python semantic-deduplication
Last synced: 20 Jul 2025
https://github.com/kousun12/eternal
👾~ music, eternal ~ 👾
3d-graphics art creative-coding data-processing glsl midi music node-based webaudio webgl
Last synced: 14 Mar 2025
https://github.com/puchaczov/musoq
SQL Syntax without any database
ai-assisted-queries cross-platform csharp csv data-analysis-sql data-exploration data-processing dotnet dotnet-core dotnetcore file-system plugin-architecture query-language sql text-processing
Last synced: 16 May 2025
https://github.com/constellation-rs/amadeus
Harmonious distributed data analysis in Rust.
data-analysis data-processing distributed-computing parallel-computing rust stream-processing
Last synced: 08 Apr 2025
https://github.com/polyaxon/haupt
Lineage metadata API, artifacts streams, sandbox, API, and spaces for Polyaxon
bokeh data-processing data-profiling data-science data-visualization deep-learning jupyter lineage machine-learning matplotlib mlops models plotly python pytorch serving tensorflow tracking ui visualization
Last synced: 14 May 2025
https://github.com/msamogh/nonechucks
Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
data-cleaning data-pipeline data-preprocessing data-processing machine-learning preprocessing pytorch torch
Last synced: 07 May 2025
https://github.com/flow-php/etl
PHP - ETL (Extract Transform Load) data processing library
data-engineering data-processing etl flow-php
Last synced: 12 Apr 2025
https://github.com/lithops-cloud/lithops
A multi-cloud framework for big data analytics and embarrassingly parallel jobs, that provides an universal API for building parallel applications in the cloud ☁️🚀
big-data big-data-analytics cloud-computing data-processing distributed kubernetes multicloud multiprocessing object-storage parallel python serverless serverless-computing serverless-functions
Last synced: 03 Jan 2026
https://github.com/alttch/rapidtables
Super fast list of dicts to pre-formatted tables conversion library for Python 2/3
data-processing dictionary-data library python python3 text-formatting
Last synced: 05 Apr 2025
https://github.com/streamnative/pulsar-flink
Elastic data processing with Apache Pulsar and Apache Flink
apache-flink apache-pulsar batch-processing catalog data-processing flink flink-connector flink-stream-processing pulsar schema schema-registry sql stream-processing
Last synced: 14 May 2025
https://github.com/yord/pxi
🧚 pxi (pixie) is a small, fast, and magical command-line data processor similar to jq, mlr, and awk.
csv data-processing deserializer dsv json marshaller parser pixie pxi serializer ssv tsv
Last synced: 19 Jun 2025
https://github.com/svenkreiss/pysparkling
A pure Python implementation of Apache Spark's RDD and DStream interfaces.
apache-spark data-processing data-science python
Last synced: 07 Apr 2025
https://github.com/Yord/pxi
🧚 pxi (pixie) is a small, fast, and magical command-line data processor similar to jq, mlr, and awk.
csv data-processing deserializer dsv json marshaller parser pixie pxi serializer ssv tsv
Last synced: 06 Apr 2025
https://github.com/ColasGael/Machine-Learning-for-Solar-Energy-Prediction
Predict the Power Production of a solar panel farm from Weather Measurements using Machine Learning
data-processing machine-learning matlab neural-network python tensorflow
Last synced: 07 May 2025
https://github.com/scramjetorg/scramjet
Public tracker for Scramjet Cloud Platform, a platform that bring data from many environments together.
data-processing data-space data-stream edge-computing event-stream javascript python raspberry-pi reactive-programming transformations virtual-data-environment
Last synced: 07 Apr 2025
https://github.com/asyml/forte
Forte is a flexible and powerful ML workflow builder. This is part of the CASL project: http://casl-project.ai/
data-processing deep-learning information-retrieval machine-learning natural-language natural-language-processing pipeline python text-data
Last synced: 04 Apr 2025
https://github.com/airscholar/e2e-data-engineering
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.
apache-airflow apache-kafka apache-spark apache-zookeeper big-data cassandra containerization data-engineering data-pipeline data-processing data-storage docker etl-pipeline postgresql real-time-analytics
Last synced: 16 May 2025
https://github.com/colasgael/machine-learning-for-solar-energy-prediction
Predict the Power Production of a solar panel farm from Weather Measurements using Machine Learning
data-processing machine-learning matlab neural-network python tensorflow
Last synced: 09 Apr 2025
https://github.com/helmholtz-analytics/heat
Distributed tensors and Machine Learning framework with GPU and MPI acceleration in Python
array-api data-analytics data-processing data-science distributed gpu hpc machine-learning massive-datasets mpi mpi4py multi-gpu multi-node-cluster numpy parallelism python pytorch tensors
Last synced: 15 May 2025
https://github.com/apache/incubator-wayang
Apache Wayang(incubating) is the first cross-platform data processing system.
apache big-data cross-platform data-management-platform data-processing distributed-system hadoop java jdbc middleware open-source performance scala spark
Last synced: 15 May 2025
https://github.com/senbox-org/snap-engine
ESA Earth Observation Toolbox and Java Development Platform
data-processing data-visualization earth-observation eo linux macos raster-data remote-sensing windows
Last synced: 12 Jul 2025
https://github.com/hxz393/brutalityextractor
适用于高性能系统的多进程解压缩软件(A multiprocess decompression software for high-performance system)
brute-force brute-force-attack brute-force-decompression brute-force-techniques computational-efficiency data-processing decompression efficient-compression-tool extractor high-performance high-speed-decompression optimization parallel-computing parallel-decompression parallel-optimization parallel-processing performance-enhancement performance-optimization performance-testing scalable
Last synced: 20 Aug 2025
https://github.com/markus-wa/cq
Clojure Query: A Command-line Data Processor for JSON, YAML, EDN, XML and more
cli clojure command-line csv data-processing data-transformation edn hacktoberfest json msgpack transformation xml yaml
Last synced: 10 May 2025
https://github.com/senbox-org/snap-desktop
Desktop GUI for SNAP based on NetBeans Platform
data-analysis data-processing data-visualization desktop-application earth-observation eo linux macos remote-sensing windows
Last synced: 12 Jul 2025
https://github.com/luckylittle/blinkist-m4a-downloader
Grabs all of the audio files from all of the Blinkist books
audiobooks blinkist books crawler data-archiving data-mining data-processing go golang scraper spider
Last synced: 29 Apr 2025
https://github.com/iam-mhaseeb/skytrax-data-warehouse
A full data warehouse infrastructure with ETL pipelines running inside docker on Apache Airflow for data orchestration, AWS Redshift for cloud data warehouse and Metabase to serve the needs of data visualizations such as analytical dashboards.
airflow data-analysis data-analytics data-cleaning data-engineering data-orchestration data-processing data-visualization data-warehouse data-warehousing database docker metabase python python3 redshift s3 s3-bucket sql
Last synced: 12 Aug 2025
https://github.com/utdemir/distributed-dataset
A distributed data processing framework in Haskell.
aws-lambda data-processing distributed haskell spark
Last synced: 11 Dec 2025
https://github.com/libertem/libertem
Open pixelated STEM framework
data-processing electron-microscopy image-processing python
Last synced: 09 Apr 2025
https://github.com/streamnative/pulsar-spark
Spark Connector to read and write with Pulsar
apache-pulsar apache-spark batch-processing data-processing data-science flink spark spark-sql stream-processing structured-streaming
Last synced: 16 May 2025
https://github.com/siteimprove/alfa
:wheelchair: Suite of open and standards-based tools for performing reliable accessibility conformance testing at scale
a11y accessibility act aria customer-facing data-processing earl horizon2020 json-ld monorepo sarif testing typescript wcag
Last synced: 08 Apr 2025
https://github.com/Siteimprove/alfa
:wheelchair: Suite of open and standards-based tools for performing reliable accessibility conformance testing at scale
a11y accessibility act aria customer-facing data-processing earl horizon2020 json-ld monorepo sarif testing typescript wcag
Last synced: 15 Apr 2025
https://github.com/nvidia/nvimagecodec
A nvImageCodec library of GPU- and CPU- accelerated codecs featuring a unified interface
computer-vision cpp cuda dali data-processing deep-learning fast-data-pipeline gpu image-processing machine-learning nvidia python pytorch
Last synced: 16 May 2025
https://github.com/whoiskatrin/financial-statement-pdf-extractor
Python script to extract as much structured information as possible from annual/quarterly reports.
balance-sheet cash-flow cash-flow-statement data-processing extract financial-analysis financial-statements pdf quarterly-reports
Last synced: 04 Apr 2025
https://github.com/asavinov/prosto
Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
business-intelligence data-preparation data-preprocessing data-processing data-science data-wrangling feature-engineering map-reduce olap pandas python spark workflow
Last synced: 11 Apr 2025
https://github.com/pauliacomi/pygaps
A framework for processing adsorption data and isotherm fitting
adsorption data-processing materials-science
Last synced: 21 Oct 2025
https://github.com/aces/cbrain
CBRAIN is a flexible Ruby on Rails framework for accessing and processing of large data on high-performance computing infrastructures.
cbrain cbrain-api cbrain-architecture cbrain-platform cbrain-service data-processing hpc rails-application ruby science
Last synced: 05 Apr 2025
https://github.com/kubeflow/mcp-apache-spark-history-server
MCP Server for Apache Spark History Server. The bridge between Agnetic AI and Apache Spark.
apache-spark big-data data-processing kubernetes mcp mcp-server
Last synced: 19 Sep 2025
https://github.com/AlirezaTheH/perke
A keyphrase extractor for Persian
data-mining data-processing information-retrieval keyphrase keyphrase-extraction keyphrase-extractor keyword keyword-extraction keyword-extractor machine-learning ml natural-language-processing nlp persian persian-language python text-mining text-processing unsupervised-learning
Last synced: 09 Jul 2025
https://github.com/alirezatheh/perke
A keyphrase extractor for Persian
data-mining data-processing information-retrieval keyphrase keyphrase-extraction keyphrase-extractor keyword keyword-extraction keyword-extractor machine-learning ml natural-language-processing nlp persian persian-language python text-mining text-processing unsupervised-learning
Last synced: 20 Aug 2025
https://github.com/urbanos-public/smartcitiesdata
The core micro services of UrbanOS as an umbrella project with component documentation
data-analytics data-processing data-visualization elixir elixir-phoenix
Last synced: 06 Apr 2025
https://github.com/unidentifieddeveloper/blaze
A blazing fast exporter for your Elasticsearch data.
data-dump data-export data-processing devops devops-tools elasticsearch libcurl rapidjson
Last synced: 09 Apr 2025
https://github.com/atomgraph/processor
Ontology-driven Linked Data processor and server for SPARQL backends. Apache License.
appengine crud data-driven data-processing declarative docker-image framework generic hypermedia knowledge-graph ldt linked-data linked-data-templates ontology-driven-development rdf rest semantic-web server sparql
Last synced: 26 Jun 2025
https://github.com/AtomGraph/Processor
Ontology-driven Linked Data processor and server for SPARQL backends. Apache License.
appengine crud data-driven data-processing declarative docker-image framework generic hypermedia knowledge-graph ldt linked-data linked-data-templates ontology-driven-development rdf rest semantic-web server sparql
Last synced: 20 Jun 2025
https://github.com/josephmachado/online_store
End to end data engineering project
dagster data-analysis data-engineering data-pipeline data-platform data-processing datawarehouse postgresql python3
Last synced: 05 Jul 2025
https://github.com/wq/itertable
⇔ IterTable is a Pythonic API for iterating through tabular data formats, including CSV, XLSX, XML, and JSON.
csv data-processing excel export import iterable json openpyxl pandas pythonic spreadsheet tabular-data xml
Last synced: 03 Apr 2025
https://github.com/tirendazacademy/data-visualization-with-python
Data Visualization Tutorial | Matplotlib | Seaborn | Pandas
data-analysis data-processing data-visualization matplotlib matplotlib-tutorial pandas-python python seaborn visualization
Last synced: 19 Apr 2025
https://github.com/samson-mano/fast_fourier_transform
C# implementation of Cooley–Tukey's FFT algorithm.
cooley-tukey cooley-tukey-fft data-processing fast-fourier-transform fft fft-analysis fourier-transform frequency-domain time-domain
Last synced: 13 Apr 2025
https://github.com/gabyx/executiongraph
Fast Generic Execution Graph/Network
data-analysis data-processing execution-graph execution-pipeline graph simulink
Last synced: 15 May 2025
https://github.com/p-ranav/pipeline
Pipelines for Modern C++
concurrency constexpr cpp17 cpp17-library data-processing expressive header-only pipeline pipes single-header-lib single-header-library taskflow tuples
Last synced: 05 May 2025
https://github.com/jqnpm/jqnpm
A package manager built for the command-line JSON processor jq.
command-line-tool data data-processing jq json package-manager
Last synced: 21 Jul 2025
https://github.com/soumyadip007/data-science-using-python-university-course-module
“Data science” is just about as broad of a term as they come. It may be easiest to describe what it is by listing its more concrete components: Data exploration & analysis. Included here: Pandas; NumPy; SciPy; a helping hand from Python's Standard Library.
data-preparation data-preprocessing data-processing data-science data-visualization jupyter-notebook knn numpy panda plotting python
Last synced: 23 Jun 2025
https://github.com/jeffgrunewald/stargate
An Apache Pulsar client written in Elixir
data-processing elixir pulsar-client
Last synced: 11 Jul 2025
https://github.com/jpkli/p4
P4: Portable Parallel Processing Pipeline
data-processing gpu visualizations
Last synced: 02 May 2025
https://github.com/industrial-edge/developer-guide-hands-on-app
Handson application for Industrial Edge Developer Guide
data-processing grafana ie-databus ie-flow-creator industrial-edge influxdb opc-ua opc-ua-connector s7-connector v1-2
Last synced: 20 Aug 2025
https://github.com/zakarialaoui10/zikomatrix
Arduino library for creating and manipulating matrices of arbitrary size and data type. The library provides a Matrix class that can be used to create matrices, perform basic matrix operations
arduino cpp data-processing esp32 esp8266 hardware library morocco std
Last synced: 09 Apr 2025
https://github.com/getstrm/pace
Data policy IN, dynamic view OUT: PACE is the Policy As Code Engine. It helps you to programatically create and apply a data policy to a processing platform like Databricks, Snowflake or BigQuery (or plain 'ol Postgres, even!) with definitions imported from Collibra, Datahub, ODD and the like.
bigquery data-catalog data-contracts data-governance data-processing databricks policy-enforcement snowflake
Last synced: 13 Oct 2025
https://github.com/greenelab/tdm
R package for normalizing RNA-seq data to make them comparable to microarray data.
data-processing microarray package r rna-seq
Last synced: 11 Jun 2025
https://github.com/m-clark/data-processing-and-visualization
This document forms the basis of several workshops/talks that get into everyday programming with R, but also includes mirrored code in Python as Jupyter notebooks.
data-processing data-science datatable dplyr ggplot2 htmlwidgets jupyter-notebooks machine-learning model-criticism modeling numpy pandas programming programming-exercises python r tidyverse visualization workshop workshops
Last synced: 02 Sep 2025
https://github.com/zazuko/barnard59
An intuitive and flexible RDF pipeline solution designed to simplify and automate ETL processes for efficient data management.
data-integration data-pipeline data-processing etl json-ld linked-data pipeline rdf semantic-web
Last synced: 06 Apr 2025
https://github.com/zakarialaoui10/ZikoMatrix
Arduino library for creating and manipulating matrices of arbitrary size and data type. The library provides a Matrix class that can be used to create matrices, perform basic matrix operations
arduino cpp data-processing esp32 esp8266 hardware library morocco std
Last synced: 29 Apr 2025
https://github.com/wandersoncferreira/meta-schema
Little DSL to make data processing sane with clojure.spec and spec-tools
clojure clojure-spec data-processing dsl edn spec
Last synced: 05 May 2025