An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with data-processing

A curated list of projects in awesome lists tagged with data-processing .

https://github.com/jpkli/p4

P4: Portable Parallel Processing Pipeline

data-processing gpu visualizations

Last synced: 18 Feb 2026

https://github.com/greenelab/tdm

R package for normalizing RNA-seq data to make them comparable to microarray data.

data-processing microarray package r rna-seq

Last synced: 11 Jun 2025

https://github.com/getstrm/pace

Data policy IN, dynamic view OUT: PACE is the Policy As Code Engine. It helps you to programatically create and apply a data policy to a processing platform like Databricks, Snowflake or BigQuery (or plain 'ol Postgres, even!) with definitions imported from Collibra, Datahub, ODD and the like.

bigquery data-catalog data-contracts data-governance data-processing databricks policy-enforcement snowflake

Last synced: 13 Oct 2025

https://github.com/zakarialaoui10/zikomatrix

Arduino library for creating and manipulating matrices of arbitrary size and data type. The library provides a Matrix class that can be used to create matrices, perform basic matrix operations

arduino cpp data-processing esp32 esp8266 hardware library morocco std

Last synced: 09 Apr 2025

https://github.com/m-clark/data-processing-and-visualization

This document forms the basis of several workshops/talks that get into everyday programming with R, but also includes mirrored code in Python as Jupyter notebooks.

data-processing data-science datatable dplyr ggplot2 htmlwidgets jupyter-notebooks machine-learning model-criticism modeling numpy pandas programming programming-exercises python r tidyverse visualization workshop workshops

Last synced: 02 Sep 2025

https://github.com/asavinov/machine-learning-and-data-processing

A collection of resources on machine learning, data processing and related areas

analytics big-data data-mining data-processing data-science databases machine-learning software stream-processing

Last synced: 02 Mar 2026

https://github.com/zazuko/barnard59

An intuitive and flexible RDF pipeline solution designed to simplify and automate ETL processes for efficient data management.

data-integration data-pipeline data-processing etl json-ld linked-data pipeline rdf semantic-web

Last synced: 06 Apr 2025

https://github.com/ion-fusion/fusion-java

Ion Fusion is a customizable programming language for working with JSON and Amazon Ion data.

amazon-ion data-processing java json programming-language racket scheme

Last synced: 15 Apr 2026

https://github.com/wandersoncferreira/meta-schema

Little DSL to make data processing sane with clojure.spec and spec-tools

clojure clojure-spec data-processing dsl edn spec

Last synced: 05 May 2025

https://github.com/zakarialaoui10/ZikoMatrix

Arduino library for creating and manipulating matrices of arbitrary size and data type. The library provides a Matrix class that can be used to create matrices, perform basic matrix operations

arduino cpp data-processing esp32 esp8266 hardware library morocco std

Last synced: 29 Apr 2025

https://github.com/python-bonobo/bonobo-sqlalchemy

PREVIEW - SQL databases in Bonobo, using sqlalchemy

bonobo data-processing databases extract-transform-load python3 sqlalchemy

Last synced: 02 Mar 2026

https://github.com/edrewitz/wxdata

A Python package of end-to-end weather data clients & raw data clients with VPN/PROXY support, data processors that decode variable keys from GRIB format into a plain-language format & various tools for assisting Python automated workflows, querying meteorological datasets and filling gaps in meteorological data.

automation data data-clients data-engineering data-engineering-pipeline data-processing data-processing-pipelines data-science meteorology meteorology-library python weather-data

Last synced: 23 May 2026

https://github.com/rpj/rpi

RPJiOS: RPJ's RPi OS, a sensor data platform for the Raspberry Pi built with python2.7 and redis.

data-pipeline data-platform data-processing data-stream garden-bots python raspberry-pi redis rpi sensor sensors

Last synced: 12 Apr 2025

https://github.com/lgrcia/prairie

A visual programming environment for Python

data-processing python scientific-visualization visual-programming

Last synced: 03 Apr 2025

https://github.com/okfn-brasil/querido-diario-data-processing

Text processing repository to free brazilian municipal gazettes from closed file formats for the Querido Diário project.

data-processing hacktoberfest opensearch pipelines python sql

Last synced: 11 Jul 2025

https://github.com/guancecloud/platypus

Platypus is a programming language for Observability Data Pipeline

data-processing dsl go observability

Last synced: 12 Jan 2026

https://github.com/graphbookai/graphbook

The framework for AI-driven data pipelines. Build interactive, highly efficient data pipelines with PyTorch. ⭐ Leave a star to support us!

ai data-processing data-processing-pipelines data-science framework machine-learning ml pytorch research workflow

Last synced: 07 Sep 2025

https://github.com/alexandrehiroyuki/movingaverageplus

Moving Average Plus is a C++ library that implements a moving average on the Arduino platform. Performance and usability are the two focuses I thought of when creating this library, so every improvement tip is welcome. It is useful for filtering noisy data from sensors, for example.

algorithms arduino arduino-library arduino-platform cpp data-processing data-structures filters moving-average pio platformio

Last synced: 12 Apr 2025

https://github.com/lherman-cs/go-rosbag

Rosbag parser written in pure Go

analytics cli cloud data-processing decoder parser robotics ros rosbag

Last synced: 02 Nov 2025

https://github.com/cemc-oper/reki

A data preparation tool in CEMC/CMA.

cedarkit data-processing grib2

Last synced: 14 Jan 2026

https://github.com/scicloj/tablecloth.time

Tools for the processing and manipulation of time-series data in Clojure.

clojure data-processing data-science dataset scicloj tablecloth time-series

Last synced: 14 Apr 2025

https://github.com/etsap-TIMES/xl2times

Open source tool to convert TIMES models specified in Excel

data-processing energy-systems-modelling open-science open-source times-model

Last synced: 16 Oct 2025

https://github.com/forieux/qmm

Python Quadratic Majorization-Minimization (MM) optimization algorithms of half-quadratic criteria. Inverses problems, image restoration, denoising, ...

data-processing data-science denoising image-processing inverse-problems non-linear-optimization nonlinear-optimization optimization optimization-algorithms optimization-methods python

Last synced: 23 Jan 2026

https://github.com/vmandic/meds-processor

Learn C# and .NET Core by building a scraper, downloader and Excel parser for Croatia's Health Insurance Fund primary and supplementary drugs list.

aspnetcore data-processing dotnet dotnet-core learning practice scraper tutorial web-development

Last synced: 27 Oct 2025

https://github.com/bbva/mercury-dataschema

Utility package that, given a Pandas DataFrame, it uses the DataSchema class which auto-infers feature types and automatically calculates different statistics depending on the types.

analytics data data-cleaning data-processing data-science feature-engineering

Last synced: 21 Jun 2025

https://github.com/sisinflab/datarec

A Python Library for Standardized and Reproducible Data Management in Recommender Systems

data-preparation data-processing datasets recommender-systems

Last synced: 11 Feb 2026

https://github.com/eshikashah/basic-machine-learning-models

Hands on code in python for basic machine learning models.

classification clustering data-processing machine-learning regression

Last synced: 07 May 2025

https://github.com/thu-ml/embodied-data-toolkit

A toolkit for processing raw embodied data into standardized formats and converting between embodied dataset schemas.

data-processing embodied-ai format-converter-tool

Last synced: 20 Apr 2026

https://github.com/monimesl/pulserl

Apache Pulsar client library for Erlang/Elixir

data-processing elixir erlang pulsar pulsar-client

Last synced: 26 Oct 2025

https://github.com/qcri/tasrif

Tasrif is a python library for processing of wearable data from fitness trackers and wearable health devices

data-processing fitbit myheartcounts sleephealth wearable wearable-devices withings

Last synced: 30 Jul 2025

https://github.com/qwen-php/qwen-laravel

Laravel wrapper for the Qwen PHP client, offering an intuitive and efficient way to integrate and interact with Alibaba Qwen API in your Laravel applications.

ai ai-sdk alibaba-ai alibaba-api alibaba-cloud alibabaai client data-processing deepseek natural natural-language-processing openai php-ai processing qwen-api qwen-client qwen-coder qwen-laravel qwen-php qwen-php-client

Last synced: 15 Apr 2025

https://github.com/n0rdy/pippin

Go library to create and manage data pipelines on your machine

async asynchronous data data-engineering data-pipeline data-processing go golang golang-library golang-package goroutines pipeline

Last synced: 30 Jan 2026

https://github.com/usedatabrew/blink

OpenSource data platform to build event-driven systems. It's like Deebezium for golang :)

data-engineering data-processing debezium etl etl-pipeline kafka stream-processing stream-processor streaming

Last synced: 24 Dec 2025

https://github.com/flow-php/doctrine-dbal-bulk

Doctrine DBAL Bulk Operations for selected database engines

bulk data-engineering data-processing dbal doctrine flow-php

Last synced: 21 Feb 2026

https://github.com/slok/terraform-provider-dataprocessor

Terraform provider for easy and clean data processing (JQ, YQ, Go plugins...).

data-processing go-plugin golang infrastructure jq terraform terraform-provider yaegi yq

Last synced: 25 Mar 2025

https://github.com/python-bonobo/bonobo-docker

PREVIEW - Run Bonobo data processing graphs in docker containers.

bonobo containers data-processing docker extract-transform-load python3 runtime

Last synced: 06 May 2025

https://github.com/phineas-pta/speech-synthesis-ngngngan

python script to download & process data to train a speech-synthesis model of Vietnamese M.C. Nguyễn Ngọc Ngạn

data-processing deep-learning matcha-tts model-training pytorch rvc training-data vietnamese vits2

Last synced: 23 Jun 2025

https://github.com/opengrav/gmeterpy

Processing gravity measurements with Python

data-processing geodesy geophysics gravimetry gravity

Last synced: 14 Jan 2026

https://github.com/neuro-ml/connectome

A library for datasets containing heterogeneous data

data-processing pipelines python

Last synced: 23 Apr 2025

https://github.com/asyml/fortehealth

The project is in the incubation stage and still under development. ForteHealth is a flexible and powerful ML workflow builder for biomedical and clinical scenarios. This is part of the CASL project: http://casl-project.ai/

biomedical-named-entity-recognition clinical-nlp clinical-text-processing data-processing deep-learning information-retrieval machine-learning natural-language natural-language-processing python

Last synced: 02 May 2025

https://github.com/machiela-lab/UKBBcleanR

Prepare electronic medical record data from the UK Biobank for time-to-event analyses

data-processing electronic-medical-records r r-package rstats rstats-package time-to-event uk-biobank

Last synced: 09 Apr 2025

https://github.com/emineugurlu/python-automation-hub

A centralized ecosystem of high-efficiency Python automation tools. Featuring modular scripts for web scraping, data transformation, and file management. Built for scalability and productivity.

automation computer-engineering data-processing json-cleaner open-source pdf-generator productivity-tools python web-scraping

Last synced: 14 Jun 2026

https://github.com/alexandrehiroyuki/datatome

Data analysis and filtering using time series for embedded devices (IoT). All in a single C++ library, Data Tome. Focus on the developer's experience and performance. It is the successor to the MovingAveragePlus library.

algorithms analysis arduino arduino-library cpp cumulative-mean data-processing data-structures exponential-moving-average filters median moving-average moving-median pio platformio platformio-library standard-deviation variance

Last synced: 24 Oct 2025

https://github.com/ihabbendidi/file-handling

Finding similarities between documents, and document search engine query language implementation

cosine-similarity data-processing inverted-index nlp python python-3 stemming-algorithm stemming-porters tf-idf

Last synced: 16 Aug 2025

https://github.com/lmammino/stream-accumulator

Accumulate all the data flowing through a stream and emit it as a single chunk or as a promise

accumulator buffer data-pipeline data-processing library module node nodejs stream streams

Last synced: 07 May 2025

https://github.com/angular-rust/ux-dataflow

UX-Dataflow is a streaming capable data multiplexer that allows you to aggregate data and then process it using a Chain of Responsibility design pattern.

data-processing data-structures dataframe datastream datatable rust

Last synced: 16 Oct 2025

https://github.com/statcan/gensol-gseries

(EN) Package gseries - R version of generalized system G-Series https://StatCan.github.io/gensol-gseries/en/ =========================== (FR) Librairie gseries - Version R du système généralisé G-Séries https://StatCan.github.io/gensol-gseries/fr/

data-processing r time-series

Last synced: 18 Feb 2026

https://github.com/glassflow/glassflow-python-sdk

GlassFlow Python SDK to publish and consume data to your pipelines at Glassflow.dev

data data-processing datastreaming python real-time sdk stream-processing

Last synced: 18 Feb 2026

https://github.com/code-rhapsodie/ezdataflow-bundle

Import/export bundle for eZ Platform / Ibexa Content based on Code-Rhapsodie Dataflow

data-processing dataflow export ez-platform ez-publish ibexa ibexa-content ibexa-platform ibexadxp import portphp

Last synced: 09 Apr 2025

https://github.com/technologiestiftung/erfrischungskarte-daten

Code for preprocessing and modeling and raw and resulting data for the 'Erfrischungskarte'.

data-processing odis open-data

Last synced: 16 Jul 2025

https://github.com/yaph/james-bond-actors

Script to grab Freebase data about James Bond actors and generate gexf data file.

data-cleaning data-processing data-retrieval freebase james-bond-actors network-graph

Last synced: 08 Sep 2025

https://github.com/speedcell4/torchglyph

Data Processor Combinators for Natural Language Processing

data-processing deep-learning machine-learning natural-language-processing pytorch

Last synced: 11 Sep 2025

https://github.com/marksweiss/sofine

Lightweight framework for creating data-collecting plugins and chaining calls to them from CLI, REST or Python to return unified data sets.

cross-language data-cleaning data-processing data-retrieval json python

Last synced: 18 Feb 2026

https://github.com/yamalight/microcore

Core library for simple creation of pipelinening microservices in Node.js with RabbitMQ

data-processing job-queue microservices nodejs pipeline rabbitmq

Last synced: 27 Jun 2025

https://github.com/shamspias/agriaid

AgriAid is an AI-powered tool for farmers & agricultural agents in Bangladesh, offering plant disease forecasting & identification. Using machine learning, deep learning & Python, it helps increase crop yield & food security.

agriculture agriculture-research artificial-intelligence cnn data-processing disease disease-prediction gradient-boosting logistic-regression machine-learning machine-learning-algorithms prediction-model svm-training

Last synced: 09 Oct 2025

https://github.com/aymane-maghouti/real-time-streaming-kafka-debezium-spark-streaming

This project demonstrates real-time data streaming and processing architecture using Kafka, Spark Streaming, and Debezium for capturing CDC (Change Data Capture) events. The pipeline collects transaction data, processes it in real time, and updates a dashboard to display real-time analytics for smartphone data.

change-data-capture dashboard data-analytics data-processing debezium docker java kafka mysql-database notifications postgresql-database python reactjs real-time-data-pipeline real-time-systems spark-streaming spring-boot web-development

Last synced: 25 Mar 2025

https://github.com/haroldeustaquio/python-coding-challenges

Repository dedicated to solving Python problems from LeetCode, DataLemur and other programming challenges. Contains solutions implemented in Python to improve skills in algorithms and data structures.

challenge data-processing data-structures datalemur leetcode pandas python

Last synced: 14 May 2025

https://github.com/strmprivacy/cli

This is the STRM Privacy Command Line Interface, to define and manage your privacy streams, data schemas, event contracts and much more.

cli data data-pipeline data-privacy data-privacy-compliance data-processing privacy

Last synced: 23 Jun 2025

https://github.com/dannycho7/youtube_captions

A collection of scripts used for aggregating youtube video caption data

data-processing nodejs transcription youtube-api

Last synced: 26 Oct 2025

https://github.com/d-k-deng/dataprocessing_practice

The program processes personal records from a file using custom-implemented data structures such as Binary Search Trees (BST), heaps, and hashmaps, with an emphasis on generics and without relying on built-in Java classes like LinkedList and PriorityQueue.

data-processing data-structures java

Last synced: 14 Oct 2025

https://github.com/cs-joy/analysis-of-algorithms

A process or set of rules to be followed in calculations or other problem-solving operations, especially by a computer.

algorithm computation cpp data-processing

Last synced: 21 Apr 2025

https://github.com/libertem/libertem-blobfinder

LiberTEM correlation and refinement library

data-processing electron-microscopy image-processing python

Last synced: 10 Apr 2025

https://github.com/vensim/embedded_ml

Application of TinyML on an ESP32 system. To sample ECG data, feature gather and output new ML model based on sampled data to be re-compiled into ESP32.

data-processing ecg esp32 esp32-arduino machine-learning sampling tinyml

Last synced: 16 Apr 2025

https://github.com/amokan/broadway_redis

A Broadway producer for Redis lists

broadway data-ingestion data-processing elixir genstage

Last synced: 05 May 2025

https://github.com/vedadiyan/genql

GenQL is a generic querying language fully written in Go

data-analysis data-mapping data-processing data-science data-translation json json-data sql

Last synced: 22 Jun 2025

https://github.com/linagora-labs/ssak

SSAK contains helpers and tools to process data and train/infer ASR models.

asr data-processing kaldi machine-learning nemo speech-recognition speech-to-text toolkit whisper

Last synced: 05 Oct 2025