Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/jeffgrunewald/stargate

An Apache Pulsar client written in Elixir

data-processing elixir pulsar-client

Last synced: 29 Jun 2024

https://github.com/jqnpm/jqnpm

A package manager built for the command-line JSON processor jq.

command-line-tool data data-processing jq json package-manager

Last synced: 25 Jun 2024

https://github.com/markus-wa/cq

Clojure Query: A Command-line Data Processor for JSON, YAML, EDN, XML and more

cli clojure command-line csv data-processing data-transformation edn hacktoberfest json msgpack transformation xml yaml

Last synced: 24 Jun 2024

https://github.com/GoogleCloudPlatform/DataflowJavaSDK

Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.

big-data data-analysis data-mining data-processing data-science google-cloud-dataflow

Last synced: 20 Jun 2024

https://github.com/wq/itertable

⇔ IterTable is a Pythonic API for iterating through tabular data formats, including CSV, XLSX, XML, and JSON.

csv data-processing excel export import iterable json openpyxl pandas pythonic spreadsheet tabular-data xml

Last synced: 20 Jun 2024

https://github.com/M4t1ss/parallel-corpora-tools

Tools for filtering and cleaning parallel and monolingual corpora for machine translation and other natural language processing tasks.

cleaning corpora corpus-tools data-processing data-science filtering language language-processing machine machine-translation natural-language natural-language-processing neural neural-machine-translation nlp nmt translation

Last synced: 20 Jun 2024

https://github.com/zakarialaoui10/ZikoMatrix

Arduino library for creating and manipulating matrices of arbitrary size and data type. The library provides a Matrix class that can be used to create matrices, perform basic matrix operations

arduino cpp data-processing esp32 esp8266 hardware library morocco std

Last synced: 17 Jun 2024

https://github.com/RemiRigal/DatasetExplorer

A web tool for local dataset browsing and processing developped using the Flask + Angular stack.

ai angular data-processing data-science data-visualization dataset dataset-analysis docker docker-compose flask web-application

Last synced: 16 Jun 2024

https://github.com/Siteimprove/alfa

:wheelchair: Suite of open and standards-based tools for performing reliable accessibility conformance testing at scale

a11y accessibility act aria customer-facing data-processing earl horizon2020 json-ld monorepo sarif testing typescript wcag

Last synced: 12 Jun 2024

https://github.com/NVIDIA/DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

audio-processing data-augmentation data-processing deep-learning fast-data-pipeline gpu gpu-tensorflow image-augmentation image-processing machine-learning mxnet neural-network paddle python pytorch

Last synced: 12 Jun 2024

https://github.com/flow-php/etl

PHP - ETL (Extract Transform Load) data processing library

data-engineering data-processing etl flow-php

Last synced: 11 Jun 2024

https://github.com/vh-d/Rflow

Rflow is a general-purpose workflow management framework for R

data-processing database dataflow etl etl-framework r reproducibility rlang rstats rstats-package workflow-management

Last synced: 10 Jun 2024

https://github.com/ChenghaoMou/text-dedup

All-in-one text de-duplication

data-processing de-duplication nlp text-processing

Last synced: 07 Jun 2024

https://github.com/marksweiss/sofine

Lightweight framework for creating data-collecting plugins and chaining calls to them from CLI, REST or Python to return unified data sets.

cross-language data-cleaning data-processing data-retrieval json python

Last synced: 03 Jun 2024

https://github.com/benibela/xidel

Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.

cli command-line css-selector curl data-processing datascraping html http httpie json rest scraper web webscraper webscraping wget xml xmlstarlet xpath xquery

Last synced: 03 Jun 2024

https://github.com/scramjetorg/scramjet

Public tracker for Scramjet Cloud Platform, a platform that bring data from many environments together.

data-processing data-space data-stream edge-computing event-stream javascript python raspberry-pi reactive-programming transformations virtual-data-environment

Last synced: 01 Jun 2024

https://github.com/onceupon/Bash-Oneliner

A collection of handy Bash One-Liners and terminal tricks for data processing and Linux system maintenance.

bash data-processing grep hardware linux linux-administration one-liners oneliner-commands shell shell-oneliner system terminal variables xargs xwindow

Last synced: 28 May 2024

https://github.com/numaproj/numaflow

Kubernetes-native platform to run massively parallel data/streaming jobs

data-processing hacktoberfest k8s kubernetes map-reduce pipeline stream-processing

Last synced: 24 May 2024

https://github.com/deverte/awesome-science

A currated list of awesome scientific software, libraries and services.

academic awesome-list data-processing data-storage experiment literature-management moocs research science scientific

Last synced: 19 May 2024

https://github.com/brexhq/substation

Substation is a cloud-native, event-driven data pipeline toolkit built for security teams.

aws data-engineering data-processing etl go security serverless

Last synced: 16 May 2024

https://github.com/adilkhash/luigi-telegram

Luigi Tasks status notifications to Telegram

data-pipeline data-processing etl luigi notification-plugin

Last synced: 13 May 2024

https://github.com/Yord/pxi

🧚 pxi (pixie) is a small, fast, and magical command-line data processor similar to jq, mlr, and awk.

csv data-processing deserializer dsv json marshaller parser pixie pxi serializer ssv tsv

Last synced: 10 May 2024

https://github.com/senbox-org/snap-engine

ESA Earth Observation Toolbox and Java Development Platform

data-processing data-visualization earth-observation eo linux macos raster-data remote-sensing windows

Last synced: 10 May 2024

https://github.com/unidentifieddeveloper/blaze

A blazing fast exporter for your Elasticsearch data.

data-dump data-export data-processing devops devops-tools elasticsearch libcurl rapidjson

Last synced: 04 May 2024

https://github.com/dashbitco/broadway

Concurrent and multi-stage data ingestion and data processing with Elixir

broadway concurrent data-ingestion data-processing elixir genstage

Last synced: 01 May 2024

https://github.com/tomwright/dasel

Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.

cli config configuration data-processing data-structures data-wrangling devops-tools go golang json json-processing parser query selector toml update xml yaml yaml-processor

Last synced: 29 Apr 2024

https://github.com/hstreamdb/hstream

HStreamDB is an open-source, cloud-native streaming database for IoT and beyond. Modernize your data stack for real-time applications.

data-processing database distributed-database distributed-systems financial-analysis haskell hstreamdb iot iot-database kafka materialized-view real-time realtime-database scale sql stream-processing streaming streaming-data streaming-database

Last synced: 29 Apr 2024

https://github.com/svenkreiss/pysparkling

A pure Python implementation of Apache Spark's RDD and DStream interfaces.

apache-spark data-processing data-science python

Last synced: 28 Apr 2024

https://github.com/alttch/rapidtables

Super fast list of dicts to pre-formatted tables conversion library for Python 2/3

data-processing dictionary-data library python python3 text-formatting

Last synced: 27 Apr 2024

https://github.com/ColasGael/Machine-Learning-for-Solar-Energy-Prediction

Predict the Power Production of a solar panel farm from Weather Measurements using Machine Learning

data-processing machine-learning matlab neural-network python tensorflow

Last synced: 22 Apr 2024

https://onceupon.github.io/Bash-Oneliner/

A collection of handy Bash One-Liners and terminal tricks for data processing and Linux system maintenance.

bash data-processing grep hardware linux linux-administration one-liners oneliner-commands shell shell-oneliner system terminal variables xargs xwindow

Last synced: 19 Apr 2024

https://github.com/msamogh/nonechucks

Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!

data-cleaning data-pipeline data-preprocessing data-processing machine-learning preprocessing pytorch torch

Last synced: 19 Apr 2024

https://github.com/asyml/texar-pytorch

Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/

bert casl-project data-processing deep-learning dialog-systems gpt-2 machine-learning machine-translation natural-language-processing python pytorch roberta texar texar-pytorch text-data text-generation xlnet

Last synced: 19 Apr 2024

https://github.com/GoogleCloudPlatform/data-science-on-gcp

Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017

cloud-computing data-analysis data-engineering data-pipeline data-processing data-science data-visualization machine-learning

Last synced: 17 Apr 2024

https://github.com/lgrcia/prairie

A visual programming environment for Python

data-processing python scientific-visualization visual-programming

Last synced: 15 Apr 2024

https://github.com/getstrm/pace

Data policy IN, dynamic view OUT: PACE is the Policy As Code Engine. It helps you to programatically create and apply a data policy to a processing platform like Databricks, Snowflake or BigQuery, with definitions imported from Collibra, Datahub, ODD and the like.

bigquery data-catalog data-contracts data-governance data-processing databricks policy-enforcement snowflake

Last synced: 11 Apr 2024

https://github.com/asyml/texar

Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

bert casl-project data-processing deep-learning dialog-systems gpt-2 machine-learning machine-translation natural-language-processing python tensorflow texar text-data text-generation xlnet

Last synced: 11 Apr 2024

https://github.com/m-clark/data-processing-and-visualization

This document forms the basis of several workshops/talks that get into everyday programming with R, but also includes mirrored code in Python as Jupyter notebooks.

data-processing data-science datatable dplyr ggplot2 htmlwidgets jupyter-notebooks machine-learning model-criticism modeling numpy pandas programming programming-exercises python r tidyverse visualization workshop workshops

Last synced: 10 Apr 2024

https://github.com/jpkli/p4

P4: Portable Parallel Processing Pipeline

data-processing gpu visualizations

Last synced: 08 Apr 2024

https://github.com/luckylittle/blinkist-m4a-downloader

Grabs all of the audio files from all of the Blinkist books

audiobooks blinkist books crawler data-archiving data-mining data-processing go golang scraper spider

Last synced: 05 Apr 2024

https://github.com/iTechArt/convtools-ita

convtools is a python library to declaratively define conversions for processing collections, doing complex aggregations and joins.

code-generation conversions data-preparation data-preprocessing data-processing functional-programming python transformations

Last synced: 01 Apr 2024

https://github.com/zazuko/barnard59

An intuitive and flexible RDF pipeline solution designed to simplify and automate ETL processes for efficient data management.

data-integration data-pipeline data-processing etl json-ld linked-data pipeline rdf semantic-web

Last synced: 01 Apr 2024

https://github.com/mech-lang/mech

🦾 Main repository for the Mech programming language. Start here!

compiler data-processing ide language live-programming programming-environment programming-language reactive-programming robotics

Last synced: 28 Mar 2024

https://github.com/allenai/dolma

Data and tools for generating and inspecting OLMo pre-training data.

data-processing large-language-models llm machile-learning nlp

Last synced: 24 Mar 2024

https://github.com/asavinov/prosto

Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby

business-intelligence data-preparation data-preprocessing data-processing data-science data-wrangling feature-engineering map-reduce olap pandas python spark workflow

Last synced: 18 Mar 2024

https://github.com/machiela-lab/UKBBcleanR

Prepare electronic medical record data from the UK Biobank for time-to-event analyses

data-processing electronic-medical-records r r-package rstats rstats-package time-to-event uk-biobank

Last synced: 17 Mar 2024