An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with dataprocessing

A curated list of projects in awesome lists tagged with dataprocessing .

https://github.com/prakhar21/50-Days-of-ML

A day to day plan for this challenge (50 Days of Machine Learning) . Covers both theoretical and practical aspects

100daysofcode 100daysofmlcode dataprocessing deep-learning deep-neural-networks machine-learning pandas python siraj-raval tutorial

Last synced: 14 Apr 2025

https://github.com/pedrokehl/caminho

Tool for creating efficient data pipelines in a JavaScript environment

backpressure concurency data dataprocessing functional javascript parallel pipeline reactive typescript

Last synced: 06 Apr 2025

https://github.com/nkaz001/data-tardis

Process tardis.dev cryptocurrency data, reconstructing the market depth and computing imbalance.

cryptocurrency dataprocessing orderbook orderbook-tick-data tardis trading-strategies

Last synced: 14 Apr 2025

https://github.com/divithraju/divith-raju-immigration-data-engineering

A Capstone Project that covers several aspects of Data Engineering (Data Exploration, Cleaning, Modeling, Pipelining, Processing)

apachespark bigdata bigdataprocessing bigdataproject capstone-project datacleaning dataengineering datalake datamodeling datapipeline dataprocessing dataschema dataset datawherehouse pandas sql

Last synced: 29 Apr 2026

https://github.com/zeeshanahmad4/webscrapesummarizer

WebScrapeSummarizer 🌐✍️: A web tool that fetches and summarizes content from any domain, offering insights in a compact CSV format.

contentsummarization csv dataprocessing naturallanguageprocessing nlp openai php tools webdevelopment webscraping webtool

Last synced: 25 Oct 2025

https://github.com/gojibjib/voice-grabber

Collection of scripts to gather training (meta) data for the ML model

dataprocessing dataset europeana golang

Last synced: 10 May 2026

https://github.com/rafat-decodis/robust-asr-for-low-resource-languages

Exploring Benchmark Gaps and Real-World Speech Generalization for Language in Low Resource

artificial-intelligence automatic-speech-recognition data-analysis dataprocessing whisper

Last synced: 23 Jun 2025

https://github.com/yoannpa/computational_epigenomics

This repository contains data and functions related to computational epigenomics data analysis.

450k beta-values bioinformatics computational-epigenomics coverage dataprocessing epigenomics human-methylation-450k methylation wgbs

Last synced: 23 Feb 2026

https://github.com/hq969/credit-card-fraud-detection

The Credit Card Fraud Detection project is a machine learning-based system designed to identify fraudulent transactions in real-time. Using historical transaction data, the model classifies transactions as either fraudulent or legitimate, helping financial institutions reduce financial losses and improve security.

anomaly-detection dataprocessing machine-learning-algorithms security-audit

Last synced: 17 Feb 2026

https://github.com/garugaru/gearpump-swarm

Deploy gearpump on a docker-swarm cluster

bigdata dataprocessing docker real-time swarm

Last synced: 29 Apr 2026

https://github.com/blocknotes-4515/multiple-disease-detection-system-all_in_one-

🩺 Heart Disease Detection System 💓 This AI tool predicts heart disease risk by analyzing key health metrics like age, cholesterol, and blood pressure. 🧠🔍 It provides quick, accurate results to help prevent serious conditions and support early treatment. Perfect for both healthcare pros and patients! 🌟

data-visualization dataprocessing linear-regression standard-library streamlit svm-model test-automation training

Last synced: 25 Feb 2026

https://github.com/asolimando/xqueryprojector

XQuery query processing optimization based on XML projection

dataprocessing query-optimization type-system xml xquery

Last synced: 31 Jan 2026

https://github.com/sayanmondal2098/easytoken

Tokenizer is an independent Open Source, Natural Language Processing python library which implements a tokenizer to create token from Both Sentence and Paragraph.

data-science datapreprocessing dataprocessing natural-language natural-language-processing nlp nlp-library nlp-machine-learning python-library python3 text-processing text-summarization token tokenizer

Last synced: 14 Dec 2025

https://github.com/vinayakdon/machine-learning-project-sentimental-classifier-

A sentiment classification tool using machine learning in Python to analyze and predict the sentiment of text data. Features preprocessing, model training, hyperparameter tuning, and evaluation for accurate sentiment analysis.

dataanalytics dataprocessing datascience python training-data

Last synced: 17 May 2026

https://github.com/fgonzalesc/transcripcion_ai

Transcripción de audios con Azure Speech y extracción de insights con Open AI

ai azure dataprocessing diarization openai-api python speechtotext

Last synced: 18 May 2026

https://github.com/nouranhaitham/ml_waterquality

A notebook aimed at predicting and improving water safety by analyzing contaminants and pollution levels in water sources, enhancing public health and ensuring access to clean drinking water.

classification-models cleansing-data dataprocessing dataset decision-trees gridsearchcv hyperparameter-tuning logistic-regression machine-learning prediction python randomforestclassifier regression-models water-quality

Last synced: 05 Jan 2026

https://github.com/rushikeshbihade/django_bsased_dataanalyzer_webapp

Data Analyzer is a Django web application that enables users to upload CSV files, perform data analysis using pandas and numpy, and view results and visualizations on an interactive web interface. It simplifies data analysis by offering a user-friendly platform for data upload, processing, and visualization.

backend-development bootstrap css3 dataanalysis dataprocessing django-application django-framework djangotemplates html5 javascript numpy pandas plotly-express python3 seaborn-plots webapp

Last synced: 02 Mar 2026

https://github.com/redayzarra/nlp_yelpreviews

This project covers the topic of natural language processing or NLP to classify user-generated text and determine their intent. The goal of this project is to build a model that can classify 10,000 Yelp reviews into either one-star or 5-star reviews. This project showcases a step-by-step implementation of the model as well as in-depth notes.

datapreprocessing dataprocessing machine-learning multinomial-naive-bayes naive-bayes naive-bayes-algorithm naive-bayes-classifier natural-language-processing nlp sentiment-analysis text-classification

Last synced: 22 Aug 2025

https://github.com/soumyagautam/sign-sense

Deep Learning and Neural Network based Sign Sense or 'Sign Language' to Speech converter is an desktop app which can detect hand signs in a frame and can convert them to Speech, according to their respective meaning. Opposite to this, it can also recognise your voice and can convert it to sign language.

ai cv2 dataprocessing deep-learning keras machine-learning mediapipe moviepy-library neural-network openai-whisper scikit-learn tensorflow tkinter-python

Last synced: 10 Apr 2026

https://github.com/happydream9032/tkinter_demo

This is simple automation project with Python and Tkinter Framework.

automation csv dataprocessing pyinstaller python tkinter

Last synced: 28 May 2026

https://github.com/vidhi1290/zomato-data-analysis

Zomato Data Analysis - Explore the world of Zomato restaurant data through Python and data analysis. Uncover trends and insights using Pandas for data manipulation and Matplotlib for visualization. Join us in this journey to reveal the hidden stories within the data!

data-analysis data-analysis-python data-science data-visualization dataprocessing machine-learning machine-learning-algorithms matplotlib numpy pandas python scikit-learn zomato-data-analysis

Last synced: 11 Apr 2026

https://github.com/waikato-datamining/multiway-algorithms

Java library of multi-way algorithms.

dataprocessing java multiway-algorithms parafac

Last synced: 17 Oct 2025

https://github.com/addingama/sid_waterpoints

Advance IT Test for Summit Institute of Development

dataprocessing java sid tdd

Last synced: 15 Mar 2025

https://github.com/silent0wings/ta-management-system

The TA Management System is a C++ project designed to manage records of Teaching Assistants (TAs) within a department. The system ensures that only eligible TAs—those who are currently registered students—are maintained in the records. The project involves filtering out records of TAs who have graduated and updating the TA file accordingly.

clientmanagement clientprofiles cplusplus dataprocessing datavalidation education faculty-management javainterfaces objectorientedprogramming projectreport scheduling softwaredevelopment stringparsing student-management teaching-assistant-management university-tool

Last synced: 01 Mar 2025

https://github.com/sravanigodavarthi/automated-elt-pipeline-aws

An Apache Airflow data pipeline is designed to perform ELT operations, utilizing Amazon S3 and Amazon Redshift Serverless.

airflow aws datamodeling datapipeline dataprocessing dataqualitycheck docker elt-pipeline parquet python redshift-serverless s3-buckets sql

Last synced: 08 May 2026

https://github.com/vigneshkanna18/foodhunter-revenue-drop-analysis

A BI solution developed for FoodHunter to investigate a significant drop in revenue over a four month period. This analysis helps uncover actionable insights through data exploration, visualization and hypothesis-driven analysis to support informed decision-making.

analysis dashboarding database dataengineering datamining dataprocessing datavisualization etl-pipeline ipynb mysql powerbi sql streamlit visualization-pipeline

Last synced: 24 Jun 2025

https://github.com/kevinndungu-source/amazon_emr_project_resources

Explore and replicate Amazon EMR (Elastic MapReduce) setup and utilization for big data processing and analytics tasks, featuring comprehensive demonstrations from VPC creation to Spark job execution.

aws-ec2 bigdata bigdatainfrastructure datamanagement dataprocessing emr-cluster juypter-notebook pyspark python

Last synced: 19 May 2026

https://github.com/bobergot/large-scale-data-processing-design-patterns

Explore essential MapReduce design patterns for big data processing! This repository includes practical implementations of patterns from the "MapReduce Design Patterns" book, complete with examples across summarization, filtering, organization, joins, and more.

bigdata bigdataanalytics cloudcomputing dataengineering dataprocessing datascience designpatterns distributedcomputing hadoop java mapreduce

Last synced: 16 Mar 2025

https://github.com/analyticalnahid/data-preprocessing

Analyze your data by applying pre-processing techniques

dataanalysis datapreprocessing dataprocessing

Last synced: 05 Sep 2025

https://github.com/neelimabonangi/real-time-weather-data-processing

Processes and analyzes near real-time weather data using the Kappa architecture,Apache Kafka,Spark,Cassandra,docker,AWS EC2,spring boot API

aws cassandra data-visualization dataanalysis dataprocessing docker ec2 json kafka kappa-architecture machine-learning restapi spark springboot-api xml

Last synced: 13 Apr 2026

https://github.com/flaviuvadan/pipe-flow

A data processing pipeline library with a common vocabulary API

dataprocessing golang pipeline

Last synced: 01 Jun 2026

https://github.com/dev-rke/liveprocess

Simple tool to process data on the fly with JavaScript

dataprocessing instant onthefly

Last synced: 08 Jan 2026

https://github.com/1401dev/customer-lifetime-value-prediction

A data science project leveraging Python and Scikit-Learn to build predictive models that estimate customer lifetime value (CLV). Includes data cleaning, feature engineering, and model selection to identify key drivers of CLV, supporting strategic decision-making in customer retention and marketing.

clv clv-analysis customer-retention data-analysis dataprocessing feature-engineering machine-learning marketing-analytics predictive-modeling python regression-analysis scikit-learn

Last synced: 06 May 2026

https://github.com/elijah-1994/pre-process-e-commerce-dataset

Importing, Cleaning, and Pre-Processing E-Commerce Data for Analysis Using MySQL.

analytics data dataanalytics datacleaning dataprocessing mysql mysql-database sql

Last synced: 11 Mar 2025

https://github.com/ngangawairimu/regression-model-for-predicting-house-prices

This project focuses on applying statistical modeling techniques to predict house prices in Melbourne using the Melbourne House Price dataset. It involves data cleaning, exploratory data analysis (EDA), feature selection, and fitting a regression model to predict the target variable, which is the house price.

datacleaning dataprocessing explanatory-data-analysis modelevaluation modelinterpretability regression-analysis

Last synced: 28 Mar 2025

https://github.com/nicolay-r/sentinerel-attitude-extraction

This repository represents studies related to sentiment attitude extraction, provided for sentiment relations (RuSentNE), for SentiNEREL dataset.

bert cnn dataprocessing lstm machine-learning nlp relationextraction russian-language sentiment-analysis

Last synced: 05 Apr 2025

https://github.com/lazycatcoder/waterheatmap

This application generates heatmaps based on temperature data. The application developed using Node.js

canvas chai dataanalysis dataprocessing expressjs heatmapping html2canvas javascript mocha mocha-chai nodejs nodejsapp temperature-map temperaturevisualization testing webdevelopment

Last synced: 08 Apr 2026

https://github.com/ssahas/implementing-gpt-from-scratch

Building a decoder-only (GPT-style) LLM from scratch using PyTorch and training it for text generation.

datacleaning dataprocessing large-language-models llm llm-inference llm-training python

Last synced: 14 Oct 2025

https://github.com/devpablooliveira/matrixplore

Web app for processing, uploading, and downloading matrices using FastAPI. Users can upload CSV files, manually input data, and download pre-set matrices. Includes analysis of matrix properties like functionality, injectivity, and surjectivity, with support for matrix combinations and transpose calculations. Built with FastAPI and Jinja2.

academictools algorithms backenddevelopment csv dataprocessing fastapi jinja2 jinja2-templates manipulation mathematics matrixoperations python templates webapplication

Last synced: 09 May 2026

https://github.com/kevinndungu-source/amazon_emr_serverless_demonstration

Explore the capabilities of Amazon EMR Serverless by processing semi-structured review data with Apache Spark, showcasing efficient big data analysis without managing clusters.

apache-spark bigdatacloud bigdatainfrastructure dataprocessing emrserverless python sql-query

Last synced: 19 Jan 2026

https://github.com/aadityasikder/Object-Detection-with-raspberry-pi-implementing-TinyML-models

Repository for Raspberry Pi-based object detection with TinyML models like TensorFlow Lite, PyTorch Nano, including data gathering, mAP evaluation, and image data preparation in Jupyter notebooks.

data-gathering datacleaning dataprocessing image-preparation object-detection pytorch-nano raspberry-pi-4 tensorflow-lite tinyml

Last synced: 16 Dec 2025

https://github.com/srimantapal205/dataengineerwireframedesigns

Data Engineer Wireframe Designs are essential for planning and visualizing data pipelines, architecture, and workflows before implementation.

data-analysis data-engineering dataflow dataflow-programming datapipeline dataprocessing development visualization

Last synced: 29 Jan 2026

https://github.com/annakhsengiv/foodhunter_revenue_drop_analysis

A BI solution developed for FoodHunter to investigate a significant drop in revenue over a four month period. This analysis helps uncover actionable insights through data exploration, visualization and hypothesis-driven analysis to support informed decision-making.

analysis dashboarding database dataengineering datamining dataprocessing datavisualization etl-pipeline ipynb mysql powerbi sql streamlit visualization-pipeline

Last synced: 07 Jul 2025

https://github.com/kaustubholpadkar/r-fundamentals

This repository comprises the solutions to various problems on R Fundamentals.

advanced-database data-science datamining dataprocessing jupyter-notebook r r-packages r-programming statistical-programming

Last synced: 28 Apr 2026

https://github.com/tanzim-prog/sentiment_analysis_bing_lexicon

The motive of this project is to find out the customer satisfaction of some residential hotels of Dhaka.

dataanalysis dataprocessing datavisualization lexical-analysis sentiment-analysis webscraping

Last synced: 06 Jun 2026

https://github.com/jadesrochers/streams

Stream wrapper to allow creation of streams with just a function passed to define its operation.

dataprocessing stream

Last synced: 17 Mar 2025

https://github.com/ngupta23/data_prep_helper

A helper package for preparing and combining data from a variety of sources

data data-science dataprep datapreparation dataprocessing helpers python

Last synced: 03 Apr 2025

https://github.com/trident09/net-sec-ai-mp

This project predicts network traffic patterns using a machine learning model trained on the CICIDS dataset. It includes a Streamlit app for real-time predictions, displaying predicted labels and probabilities for uploaded CSV data. The project is structured into three parts: dataset, model training, and frontend (Streamlit app).

cybersecurity dataprocessing ml network-traffic-analysis random-forest

Last synced: 29 Apr 2026

https://github.com/cagandemirmr/airbnb_available_houses

In this repo, i create dashboard using Tableau.In this process, i use SQL and Python languages.

dashboard data-visualization dataprocessing python sql tableau

Last synced: 30 Apr 2026

https://github.com/aadityasikder/object-detection-with-raspberry-pi-implementing-tinyml-models

Repository for Raspberry Pi-based object detection with TinyML models like TensorFlow Lite, PyTorch Nano, including data gathering, mAP evaluation, and image data preparation in Jupyter notebooks.

data-gathering datacleaning dataprocessing image-preparation object-detection pytorch-nano raspberry-pi-4 tensorflow-lite tinyml

Last synced: 18 Feb 2026

https://github.com/kaushik-puttaswamy/dynamic-movie-booking-insights-platform-using-snowflake

The Dynamic Movie Booking Insights Platform processes real-time booking data using Snowflake’s Dynamic Tables, Streams, and Tasks to deliver actionable insights. It features an interactive Streamlit dashboard for visualizing revenue, sales trends, and booking metric.

businessintelligence changedatacapture dataprocessing datavisualization dynamictables moviebooking python realtimeanalytics revenueinsights snowflake sql streamlit

Last synced: 20 May 2026

https://github.com/nivasharmaa/friskwatch

A Java program for analyzing stop-and-frisk data from the NYPD. Features data import, organization, and statistical analysis to compare occurrences during and after policy implementation.

data-analysis data-visualization dataprocessing datascience file-io java java-oop nypd-data

Last synced: 19 May 2026

https://github.com/qrailibs/dataflow

✨ Data processing in Node.js made multithreaded and type-safe.

data dataprocessing multithread node

Last synced: 04 May 2026

https://github.com/ponycool/tcga

一个开源脚本,简化TCGA(The Cancer Genome Atlas)数据的获取、解析与基因表达矩阵构建流程。通过R语言脚本实现从数据下载到矩阵合成的全自动化处理,帮助科研人员快速获取高质量的表达数据。

bioinformatics cancer-research dataprocessing geneexpression r tcga

Last synced: 30 Aug 2025

https://github.com/msamij/zig-flow

Data Engineering pipeline.

apache-spark dataprocessing distributed-computing

Last synced: 07 May 2026

https://github.com/jigyasag18/financial-risk-analysis-project

The Credit Card Financial Risk Analysis Dashboard is a real-time Power BI tool designed to provide insights into credit card transactions and customer demographics. It features interactive visualizations, efficient data processing, and actionable insights to support decision-making. Utilizing data from SQL database, the dashboard tracks key metrics

data dataanalysis database datacleaning datapreprocessing dataprocessing datavisualization financial-analysis financialriskanalysis mysql powerbi sql statistical-analysis

Last synced: 06 Mar 2026