Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/shlizee/NeuroAI

NeuroAI-UW seminar, a regular weekly seminar for the UW community, organized by NeuroAI Shlizerman Lab.

ai cvpr data-science deep-learning eccv icml neural-networks neurips neuroscience-methods recurrent-neural-networks sfn

Last synced: 12 Nov 2024

https://github.com/stitchfix/mab

Library for multi-armed bandit selection strategies, including efficient deterministic implementations of Thompson sampling and epsilon-greedy.

data-science experimentation go golang multi-armed-bandit multi-armed-bandits multiarmed-bandits reinforcement-learning thompson thompson-sampling

Last synced: 10 Nov 2024

https://github.com/fcakyon/instafake-dataset

Dataset for Intagram Fake and Automated Account Detection

bot classification data-science dataset fake instafake instagram machine-learning research

Last synced: 22 Oct 2024

https://github.com/kaggledatasets/kaggledatasets

Collection of Kaggle Datasets ready to use for Everyone (Looking for contributors)

data-science datasets deep-learning kaggle keras machine-learning python pytorch scikit-learn tensorflow

Last synced: 13 Oct 2024

https://github.com/jacksonburns/astartes

Better Data Splits for Machine Learning

ai data-science machine-learning ml python sampling

Last synced: 31 Oct 2024

https://github.com/tommyod/paretoset

Compute the Pareto (non-dominated) set, i.e., skyline operator/query.

data-mining data-science datascience multi-objective-optimization optimization pandas skyline-query

Last synced: 11 Nov 2024

https://github.com/rcdilorenzo/ecce

ML Prediction of Bible Topics and Passages (Python / React)

data-science fastapi fully-connected-network interactive-visualizations keras-tensorflow reactjs

Last synced: 11 Nov 2024

https://github.com/zincware/ZnTrack

Create, visualize, run & benchmark DVC pipelines in Python & Jupyter notebooks.

data-science data-version-control developer-tools dvc git machine-learning python reproducibility

Last synced: 14 Nov 2024

https://github.com/PKU-DAIR/mindware

An efficient open-source AutoML system for automating machine learning lifecycle, including feature engineering, neural architecture search, and hyper-parameter tuning.

automl-algorithms automl-pipeline bayesian-optimization blackbox-optimization data-science deep-learning distributed-systems ensemble-learning hyper-parameter-optimization knobs-tuning machine-learning meta-learning neural-architecture-search python

Last synced: 16 Nov 2024

https://github.com/daun-io/Study-Data-Science

Practical data science notebooks that I used to study at 2016

data-science jupyter-notebook machine-learning tensorflow

Last synced: 07 Aug 2024

https://github.com/nolanbconaway/pitchfork-data

Analyses on over 18,000 pitchfork reviews.

data-science ipynb jupyter music pitchfork

Last synced: 11 Oct 2024

https://github.com/tatevkaren/artificial-neural-network-business_case_study

Business Case Study to predict customer churn rate based on Artificial Neural Network (ANN), with TensorFlow and Keras in Python. This is a customer churn analysis that contains training, testing, and evaluation of an ANN model. (Includes: Case Study Paper, Code)

ann ann-model artificial-neural-network artificial-neural-networks bank-customers case-study churn-analysis data-science deep-learning machine-learning prediction-model predictive-analytics python3 tensorflow-tutorials

Last synced: 12 Nov 2024

https://github.com/jonathandinu/spark-ray-data-science

Supporting content (slides and exercises) for the Pearson video series covering best practices for developing scalable applications with Spark and Ray in the context of a data scientist's standard workflow.

artificial-intelligence data-science distributed-computing machine-learning python ray spark

Last synced: 15 Nov 2024

https://github.com/theengineeringworld/statistics-using-python

These files are part of Youtube Course "Statistics Using Python" Offered By The Engineering WOrld. Offered By: http://youtube.com/theengineeringworld

cleaning data-analysis data-mining data-science data-visualization database jupyter-notebooks python python3 statistics

Last synced: 08 Nov 2024

https://github.com/lter/lterdatasampler

LTER data samples to teach environmental data science

data-science ecology lter-science r r-package

Last synced: 27 Oct 2024

https://ddotta.github.io/cookbook-rpolars/

Cookbook to provide solutions to common tasks and problems in using Polars with R

benchmark cookbook data-engineering data-science datatable dplyr polars r tidyr

Last synced: 04 Aug 2024

https://github.com/scicloj/wolframite

An interface between Clojure and Wolfram Language (the language of Mathematica)

clojure data-science mathematica wolfram-language

Last synced: 15 Nov 2024

https://github.com/okfn-brasil/whistleblower

🚨A Twitter bot for publicly reporting suspicions found by Rosie, Serenata de Amor's AI

data-science facebook-messenger-bot machine-learning twitter-bot

Last synced: 31 Oct 2024

https://github.com/electronick1/stairs

Framework which helps you to make parallel/distributed calculations using data pipelines

data-engineering data-pipeline data-science distributed-computing python

Last synced: 10 Nov 2024

https://github.com/welding-torch/excel-anonymizer

A Python script that anonymizes an Excel file and synthesizes new data in its place.

data-science microsoft nlp pandas presidio privacy

Last synced: 07 Nov 2024

https://github.com/opengeos/geoai

A Python package for using Artificial Intelligence (AI) with geospatial data

ai data-science geoai geopython geospatial jupyter python

Last synced: 11 Nov 2024

https://github.com/credo-ai/credoai_lens

Credo AI Lens is a comprehensive assessment framework for AI systems. Lens standardizes model and data assessment, and acts as a central gateway to assessments created in the open source community.

ai artificial-intelligence assessment data-science ethical-artificial-intelligence fairness-ai fairness-ml jupyter machine-learning ml python reporting responsible-ai visualization

Last synced: 28 Sep 2024

https://github.com/kb22/GitHub-User-Insights-using-API

The project involves using the GitHub API using user authentication to fetch information such as commits and repositories for that specific user and store them as CSV files for data collection and analysis.

api data-analysis data-science data-scraping github-api python

Last synced: 08 Nov 2024

https://github.com/henestrosadev/sololearn

Compilation of all SoloLearn courses with their respective projects and practices and all 72 code challenges for all 7 supported languages.

code-challenge code-practice data-science programming-exercises programming-languages python sololearn sololearn-cert sololearn-solutions

Last synced: 27 Oct 2024

https://github.com/ropensci/rdataretriever

R interface to the Data Retriever

data data-science database datasets r r-package rstats science

Last synced: 13 Aug 2024

https://github.com/giswqs/postgis

Spatial Data Management with PostgreSQL and PostGIS https://gishub.org/sdm

data-science database geospatial postgis postgres postgresql

Last synced: 02 Nov 2024

https://github.com/imgcook/datacook

Machine Learning and Data Analysis in JavaScript.

data-science feature-engineering javascript machine-learning

Last synced: 13 Nov 2024

https://github.com/fremantle-industries/prop

An open and opinionated trading platform using productive & familiar open source libraries and tools for strategy research, execution and operation.

algo-trading data-science defi elixir grafana trading-platform

Last synced: 07 Nov 2024

https://github.com/soumyadip007/data-science-using-python-university-course-module

“Data science” is just about as broad of a term as they come. It may be easiest to describe what it is by listing its more concrete components: Data exploration & analysis. Included here: Pandas; NumPy; SciPy; a helping hand from Python's Standard Library.

data-preparation data-preprocessing data-processing data-science data-visualization jupyter-notebook knn numpy panda plotting python

Last synced: 28 Oct 2024

https://github.com/google/bayesnf

Bayesian Neural Field models for prediction in large-scale spatiotemporal datasets

bayesian-inference data-science machine-learning spatiotemporal-data-analysis statistics

Last synced: 18 Sep 2024

https://github.com/plantinformatics/pretzel

Javascript full-stack framework for Big Data visualisation and analysis

big-data bioinformatics data-science data-visualization ember emberjs express expressjs javascript open-source

Last synced: 31 Oct 2024

https://github.com/ndleah/8-week-sql-challenge

#8WeekSQLChallenge by Danny Ma.

data-analysis data-science sql

Last synced: 13 Nov 2024

https://github.com/rfordatascience/rfordatasciencewiki

Resources for the R4DS Online Learning Community, including answer keys to the text

beginner beginner-friendly beginner-tutorial-series data-science help-wanted r4ds rstats rstudio tidyverse

Last synced: 14 Nov 2024

https://github.com/codait/max-central-repo

Central Repository of Model Asset Exchange project. This repository contains information about the available models, current project status, contribution guidelines and supporting assets.

cloud codait data-science deep-learning ibm-developer kubernetes model-asset-exchange node-red-flow openshift trainable-models watson-machine-learning watson-st

Last synced: 09 Nov 2024

https://github.com/elysian01/data-purifier

A Python library for Automated Exploratory Data Analysis, Automated Data Cleaning, and Automated Data Preprocessing For Machine Learning and Natural Language Processing Applications in Python.

data-analysis data-cleaning data-cleaning-pipeline data-preprocessing data-science data-visualization datapurifier eda exploratory-data-analysis jupyter python-lib python-library python3

Last synced: 07 Nov 2024

https://github.com/ploomber/soopervisor

☁️ Export Ploomber pipelines to Kubernetes (Argo), Airflow, AWS Batch, SLURM, and Kubeflow.

airflow argo argo-workflows aws data-science kubeflow kubeflow-pipelines kubernetes machine-learning slurm workflow

Last synced: 13 Nov 2024

https://github.com/briatte/dsr

Introduction to Data Science with R (Sciences Po, Paris, 2023)

course data-analysis data-science data-visualization r statistics

Last synced: 27 Oct 2024

https://github.com/nicolaskruchten/scipy2021

Data Visualization as the First and Last Mile of Data Science: Plotly Express and Dash

data-analysis data-science data-visualization python visualization

Last synced: 08 Nov 2024

https://github.com/lukasmosser/snist

A Benchmark for Seismic Velocity Inversion from Synthetics

data-science deep-learning geology geophysics machine-learning physics seismic waveform

Last synced: 15 Nov 2024

https://github.com/jrfiedler/causal_inference_julia_code

Julia code for part 2 of the book Causal Inference: What If, by Miguel Hernán and James Robins

causal-inference causality data-science julia julialang

Last synced: 12 Oct 2024

https://github.com/stefan-m-lenz/BoltzmannMachines.jl

A Julia package for training and evaluating multimodal deep Boltzmann machines

data-science deep-boltzmann-machine deep-learning julia machine-learning neural-networks restricted-boltzmann-machine

Last synced: 13 Nov 2024

https://github.com/joaquinamatrodrigo/cienciadedatos.net

Web de divulgación con material formativo sobre estadística, algoritmos de machine learning, ciencia de datos y programación en R y Python.

analytics ciencia-de-dados data-science estadistica forecasting machine-learning python r-programming rstats statistics

Last synced: 15 Nov 2024

https://github.com/goldencheetah/scikit-sports

Sports analysis library for Python

data-science sports

Last synced: 08 Nov 2024

https://github.com/weiji14/deepbedmap

Going beyond BEDMAP2 using a super resolution deep neural network. Also a convenient flat file data repository for high resolution bed elevation datasets around Antarctica.

antarctica bedmap binder chainer data-science deep-neural-network digital-elevation-model flat-file-db generative-adversarial-network glaciology jupyter-notebook optuna pangeo remote-sensing super-resolution

Last synced: 22 Oct 2024

https://github.com/facultyai/scala-plotly-client

Visualise your data from Scala using Plotly

data-science graph plot plotly scala visualisation

Last synced: 08 Nov 2024

https://github.com/AidanCooper/shap-analysis-guide

How to Interpret SHAP Analyses: A Non-Technical Guide

data-science machine-learning shap tutorial

Last synced: 12 Nov 2024

https://github.com/SOCR/SOCRAT

A Dynamic Web Toolbox for Interactive Data Processing, Analysis, and Visualization

data-analysis data-science data-visualization socr statistics visual-analytics visualization

Last synced: 03 Nov 2024

https://github.com/datakitchen/dataops-testgen

DataOps Data Quality TestGen is part of DataKitchen's Open Source Data Observability. DataOps TestGen delivers simple, fast data quality test generation and execution by data profiling,  new dataset hygiene review, AI generation of data quality validation tests, ongoing testing of data refreshes, & continuous anomaly monitoring

data data-engineering data-observability data-quality data-science data-testing datachecker dataops dataprofiling dataquality datavalidation mssql postgresql python redshift self-hosted snowflake

Last synced: 06 Nov 2024

https://github.com/repetere/modelscript

REPO MOVED TO https://github.com/repetere/jsonstack-data - Data Science and Machine learning in JavaScript

data-mining data-preprocessing data-science javascript machine-learning

Last synced: 27 Sep 2024

https://github.com/m-dadej/marswitching.jl

MarSwitching.jl: Julia package for Markov switching dynamic models :chart_with_upwards_trend:

data-science econometrics julia machine-learning markov-chain statistics time-series

Last synced: 12 Oct 2024

https://github.com/edinsonrequena/articicial-inteligence-and-data-science

Este repositorio esta basado principalmente en la carrera de machine learning y data science de platzi pero también habrán recursos de otras plataformas e instituciones educativas.

algebra algorithms articicial-inteligence data-science instituciones-educativas jupyter-notebook platzi python university

Last synced: 27 Oct 2024

https://github.com/franzdiebold/data-science-cheat-sheets

A collection of Data Science cheat sheets.

cheat-sheet cheat-sheets data-science pandas

Last synced: 05 Nov 2024

https://github.com/apreshill/data-vis-labs-2018

Principles & Practice of Data Visualization, CS631 Spring 2018

data-science data-visualization education rstats teaching

Last synced: 15 Nov 2024

https://github.com/mlr-org/mlr3torch

Deep learning framework for the mlr3 ecosystem based on torch

data-science deep-learning machine-learning mlr3 r r-package torch

Last synced: 06 Nov 2024

https://github.com/florents-tselai/greek-wines-analysis

Scraper, Data and Analysis for "Analyzing 1000+ Greek Wines with Python"

beautifulsoup data-science pandas python seaborn web-scraping

Last synced: 31 Oct 2024

https://github.com/dfinke/psduckdb

PSDuckDB is a PowerShell module that provides seamless integration with DuckDB, enabling efficient execution of analytical SQL queries directly from the PowerShell environment.

data-analysis data-science duckdb powershell sql

Last synced: 27 Oct 2024

https://github.com/leemengtw/gist-evernote

A Python application that sync Github Gists and save them to Evernote notebook as screenshots.

data-science evernote gists github github-graphql jupyter-notebook pet-project python selenium sync

Last synced: 07 Aug 2024