Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/jacksonburns/astartes

Better Data Splits for Machine Learning

ai data-science machine-learning ml python sampling

Last synced: 19 Dec 2024

https://github.com/kennethleungty/end-to-end-automl-insurance

An End-to-End Implementation of AutoML with H2O, MLflow, FastAPI, and Streamlit for Insurance Cross-Sell

automl data-science fastapi h2o h2o-automl machine-learning mlflow mlops python streamlit

Last synced: 22 Nov 2024

https://github.com/rcdilorenzo/ecce

ML Prediction of Bible Topics and Passages (Python / React)

data-science fastapi fully-connected-network interactive-visualizations keras-tensorflow reactjs

Last synced: 11 Nov 2024

https://github.com/kaggledatasets/kaggledatasets

Collection of Kaggle Datasets ready to use for Everyone (Looking for contributors)

data-science datasets deep-learning kaggle keras machine-learning python pytorch scikit-learn tensorflow

Last synced: 13 Oct 2024

https://github.com/nolanbconaway/pitchfork-data

Analyses on over 18,000 pitchfork reviews.

data-science ipynb jupyter music pitchfork

Last synced: 02 Jan 2025

https://github.com/zincware/ZnTrack

Create, visualize, run & benchmark DVC pipelines in Python & Jupyter notebooks.

data-science data-version-control developer-tools dvc git machine-learning python reproducibility

Last synced: 14 Nov 2024

https://github.com/PKU-DAIR/mindware

An efficient open-source AutoML system for automating machine learning lifecycle, including feature engineering, neural architecture search, and hyper-parameter tuning.

automl-algorithms automl-pipeline bayesian-optimization blackbox-optimization data-science deep-learning distributed-systems ensemble-learning hyper-parameter-optimization knobs-tuning machine-learning meta-learning neural-architecture-search python

Last synced: 16 Nov 2024

https://github.com/daun-io/Study-Data-Science

Practical data science notebooks that I used to study at 2016

data-science jupyter-notebook machine-learning tensorflow

Last synced: 27 Nov 2024

https://github.com/daun-io/study-data-science

Practical data science notebooks that I used to study at 2016

data-science jupyter-notebook machine-learning tensorflow

Last synced: 30 Nov 2024

https://github.com/jonathandinu/spark-ray-data-science

Supporting content (slides and exercises) for the Pearson video series covering best practices for developing scalable applications with Spark and Ray in the context of a data scientist's standard workflow.

artificial-intelligence data-science distributed-computing machine-learning python ray spark

Last synced: 15 Nov 2024

https://github.com/tatevkaren/artificial-neural-network-business_case_study

Business Case Study to predict customer churn rate based on Artificial Neural Network (ANN), with TensorFlow and Keras in Python. This is a customer churn analysis that contains training, testing, and evaluation of an ANN model. (Includes: Case Study Paper, Code)

ann ann-model artificial-neural-network artificial-neural-networks bank-customers case-study churn-analysis data-science deep-learning machine-learning prediction-model predictive-analytics python3 tensorflow-tutorials

Last synced: 12 Nov 2024

https://github.com/lter/lterdatasampler

LTER data samples to teach environmental data science

data-science ecology lter-science r r-package

Last synced: 27 Oct 2024

https://github.com/theengineeringworld/statistics-using-python

These files are part of Youtube Course "Statistics Using Python" Offered By The Engineering WOrld. Offered By: http://youtube.com/theengineeringworld

cleaning data-analysis data-mining data-science data-visualization database jupyter-notebooks python python3 statistics

Last synced: 08 Nov 2024

https://github.com/opengeos/geoai

A Python package for using Artificial Intelligence (AI) with geospatial data

ai data-science geoai geopython geospatial jupyter python

Last synced: 11 Nov 2024

https://github.com/welding-torch/excel-anonymizer

A Python script that anonymizes an Excel file and synthesizes new data in its place.

data-science microsoft nlp pandas presidio privacy

Last synced: 07 Nov 2024

https://github.com/okfn-brasil/whistleblower

🚨A Twitter bot for publicly reporting suspicions found by Rosie, Serenata de Amor's AI

data-science facebook-messenger-bot machine-learning twitter-bot

Last synced: 31 Oct 2024

https://github.com/electronick1/stairs

Framework which helps you to make parallel/distributed calculations using data pipelines

data-engineering data-pipeline data-science distributed-computing python

Last synced: 10 Nov 2024

https://github.com/credo-ai/credoai_lens

Credo AI Lens is a comprehensive assessment framework for AI systems. Lens standardizes model and data assessment, and acts as a central gateway to assessments created in the open source community.

ai artificial-intelligence assessment data-science ethical-artificial-intelligence fairness-ai fairness-ml jupyter machine-learning ml python reporting responsible-ai visualization

Last synced: 28 Sep 2024

https://github.com/tatevkaren/tatevkaren-data-science-portfolio

Data Science Portfolio of Tatev Karen Aslanyan including Case Studies and Research Projects that I have completed that solve business problems or introduce new products. Case Study papers, codes, and additional resources are all included.

blog case-study computer-science data-analysis data-science deep-learning econometrics machine-learning papers portfolio portfolio-website statistics

Last synced: 07 Dec 2024

https://github.com/kb22/GitHub-User-Insights-using-API

The project involves using the GitHub API using user authentication to fetch information such as commits and repositories for that specific user and store them as CSV files for data collection and analysis.

api data-analysis data-science data-scraping github-api python

Last synced: 08 Nov 2024

https://github.com/lkuffo/data-viz

Más de 50 ejemplos de visualizaciones y análisis de datos en Matplotlib, Pandas, Seaborn, Plotly, Bokeh y Networkx

data-analysis data-science dataviz geoviz jupyter jupyter-notebook matplotlib networkx pandas plotly python seaborn

Last synced: 18 Dec 2024

https://github.com/henestrosadev/sololearn

Compilation of all SoloLearn courses with their respective projects and practices and all 72 code challenges for all 7 supported languages.

code-challenge code-practice data-science programming-exercises programming-languages python sololearn sololearn-cert sololearn-solutions

Last synced: 27 Oct 2024

https://github.com/fremantle-industries/prop

An open and opinionated trading platform using productive & familiar open source libraries and tools for strategy research, execution and operation.

algo-trading data-science defi elixir grafana trading-platform

Last synced: 07 Nov 2024

https://github.com/giswqs/postgis

Spatial Data Management with PostgreSQL and PostGIS https://gishub.org/sdm

data-science database geospatial postgis postgres postgresql

Last synced: 02 Nov 2024

https://github.com/plantinformatics/pretzel

Javascript full-stack framework for Big Data visualisation and analysis

big-data bioinformatics data-science data-visualization ember emberjs express expressjs javascript open-source

Last synced: 08 Jan 2025

https://github.com/soumyadip007/data-science-using-python-university-course-module

“Data science” is just about as broad of a term as they come. It may be easiest to describe what it is by listing its more concrete components: Data exploration & analysis. Included here: Pandas; NumPy; SciPy; a helping hand from Python's Standard Library.

data-preparation data-preprocessing data-processing data-science data-visualization jupyter-notebook knn numpy panda plotting python

Last synced: 28 Oct 2024

https://github.com/jason2brownlee/machinelearningmischief

Machine Learning Mischief: Examples from the dark side of data science

data-science ethics hacking machine-learning statistics

Last synced: 24 Dec 2024

https://github.com/imgcook/datacook

Machine Learning and Data Analysis in JavaScript.

data-science feature-engineering javascript machine-learning

Last synced: 13 Nov 2024

https://github.com/mlabonne/how-to-data-science

Scripts, notebooks, and articles about data science in general.

data-science numpy pandas pandas-dataframe python pytorch

Last synced: 02 Jan 2025

https://github.com/ropensci/rdataretriever

R interface to the Data Retriever

data data-science database datasets r r-package rstats science

Last synced: 04 Dec 2024

https://github.com/google/bayesnf

Bayesian Neural Field models for prediction in large-scale spatiotemporal datasets

bayesian-inference data-science machine-learning spatiotemporal-data-analysis statistics

Last synced: 09 Jan 2025

https://github.com/dfinke/PSDuckDB

PSDuckDB is a PowerShell module that provides seamless integration with DuckDB, enabling efficient execution of analytical SQL queries directly from the PowerShell environment.

data-analysis data-science duckdb powershell sql

Last synced: 16 Dec 2024

https://github.com/weiji14/deepbedmap

Going beyond BEDMAP2 using a super resolution deep neural network. Also a convenient flat file data repository for high resolution bed elevation datasets around Antarctica.

antarctica bedmap binder chainer data-science deep-neural-network digital-elevation-model flat-file-db generative-adversarial-network glaciology jupyter-notebook optuna pangeo remote-sensing super-resolution

Last synced: 07 Jan 2025

https://github.com/datalab-platform/datalab

Open-source Platform for Scientific and Technical Data Processing and Visualization

data-science data-visualization image-processing opencv python scientific-computing scikit-image scipy signal-processing visualization

Last synced: 14 Jan 2025

https://github.com/vida-nyu/data-polygamy

Data Polygamy is a topology-based framework that allows users to query for statistically significant relationships between spatio-temporal data sets.

data data-science nyucds

Last synced: 24 Nov 2024

https://github.com/rfordatascience/rfordatasciencewiki

Resources for the R4DS Online Learning Community, including answer keys to the text

beginner beginner-friendly beginner-tutorial-series data-science help-wanted r4ds rstats rstudio tidyverse

Last synced: 14 Nov 2024

https://github.com/joaopaulolndev/my-data-scientist-roadmap

Description about my roadmap to become Data Scientist and Engineer Machine Learning

artificial-intelligence data-science deep-learning machine-learning python python3

Last synced: 09 Jan 2025

https://github.com/ActuariesInstitute/cookbook

Data and analytics cookbook for actuaries

actuarial analytics data-science hacktoberfest

Last synced: 27 Nov 2024

https://github.com/ploomber/soopervisor

☁️ Export Ploomber pipelines to Kubernetes (Argo), Airflow, AWS Batch, SLURM, and Kubeflow.

airflow argo argo-workflows aws data-science kubeflow kubeflow-pipelines kubernetes machine-learning slurm workflow

Last synced: 19 Dec 2024

https://github.com/codait/max-central-repo

Central Repository of Model Asset Exchange project. This repository contains information about the available models, current project status, contribution guidelines and supporting assets.

cloud codait data-science deep-learning ibm-developer kubernetes model-asset-exchange node-red-flow openshift trainable-models watson-machine-learning watson-st

Last synced: 09 Nov 2024

https://github.com/briatte/dsr

Introduction to Data Science with R (Sciences Po, Paris, 2023)

course data-analysis data-science data-visualization r statistics

Last synced: 27 Oct 2024

https://github.com/elysian01/data-purifier

A Python library for Automated Exploratory Data Analysis, Automated Data Cleaning, and Automated Data Preprocessing For Machine Learning and Natural Language Processing Applications in Python.

data-analysis data-cleaning data-cleaning-pipeline data-preprocessing data-science data-visualization datapurifier eda exploratory-data-analysis jupyter python-lib python-library python3

Last synced: 07 Nov 2024

https://github.com/nicolaskruchten/scipy2021

Data Visualization as the First and Last Mile of Data Science: Plotly Express and Dash

data-analysis data-science data-visualization python visualization

Last synced: 08 Nov 2024

https://github.com/mlr-org/mlr3torch

Deep learning framework for the mlr3 ecosystem based on torch

data-science deep-learning machine-learning mlr3 r r-package torch

Last synced: 15 Jan 2025

https://github.com/goldencheetah/scikit-sports

Sports analysis library for Python

data-science sports

Last synced: 08 Nov 2024

https://github.com/lukasmosser/snist

A Benchmark for Seismic Velocity Inversion from Synthetics

data-science deep-learning geology geophysics machine-learning physics seismic waveform

Last synced: 19 Dec 2024

https://github.com/stefan-m-lenz/BoltzmannMachines.jl

A Julia package for training and evaluating multimodal deep Boltzmann machines

data-science deep-boltzmann-machine deep-learning julia machine-learning neural-networks restricted-boltzmann-machine

Last synced: 13 Nov 2024

https://github.com/jrfiedler/causal_inference_julia_code

Julia code for part 2 of the book Causal Inference: What If, by Miguel Hernán and James Robins

causal-inference causality data-science julia julialang

Last synced: 12 Oct 2024

https://github.com/joaquinamatrodrigo/cienciadedatos.net

Web de divulgación con material formativo sobre estadística, algoritmos de machine learning, ciencia de datos y programación en R y Python.

analytics ciencia-de-dados data-science estadistica forecasting machine-learning python r-programming rstats statistics

Last synced: 10 Jan 2025

https://github.com/franzdiebold/data-science-cheat-sheets

A collection of Data Science cheat sheets.

cheat-sheet cheat-sheets data-science pandas

Last synced: 23 Dec 2024

https://github.com/AidanCooper/shap-analysis-guide

How to Interpret SHAP Analyses: A Non-Technical Guide

data-science machine-learning shap tutorial

Last synced: 12 Nov 2024

https://github.com/facultyai/scala-plotly-client

Visualise your data from Scala using Plotly

data-science graph plot plotly scala visualisation

Last synced: 08 Nov 2024

https://github.com/SOCR/SOCRAT

A Dynamic Web Toolbox for Interactive Data Processing, Analysis, and Visualization

data-analysis data-science data-visualization socr statistics visual-analytics visualization

Last synced: 03 Nov 2024

https://github.com/jules32/rmarkdown-website-tutorial

Tutorial for creating websites w/ R Markdown

data-science rmarkdown rstats teaching tutorial

Last synced: 25 Dec 2024

https://github.com/m-dadej/marswitching.jl

MarSwitching.jl: Julia package for Markov switching dynamic models :chart_with_upwards_trend:

data-science econometrics julia machine-learning markov-chain statistics time-series

Last synced: 12 Oct 2024

https://github.com/repetere/modelscript

REPO MOVED TO https://github.com/repetere/jsonstack-data - Data Science and Machine learning in JavaScript

data-mining data-preprocessing data-science javascript machine-learning

Last synced: 27 Sep 2024