Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/welding-torch/excel-anonymizer

A Python script that anonymizes an Excel file and synthesizes new data in its place.

data-science microsoft nlp pandas presidio privacy

Last synced: 07 Nov 2024

https://github.com/opengeos/geoai

A Python package for using Artificial Intelligence (AI) with geospatial data

ai data-science geoai geopython geospatial jupyter python

Last synced: 11 Nov 2024

https://github.com/credo-ai/credoai_lens

Credo AI Lens is a comprehensive assessment framework for AI systems. Lens standardizes model and data assessment, and acts as a central gateway to assessments created in the open source community.

ai artificial-intelligence assessment data-science ethical-artificial-intelligence fairness-ai fairness-ml jupyter machine-learning ml python reporting responsible-ai visualization

Last synced: 28 Sep 2024

https://github.com/okfn-brasil/whistleblower

🚨A Twitter bot for publicly reporting suspicions found by Rosie, Serenata de Amor's AI

data-science facebook-messenger-bot machine-learning twitter-bot

Last synced: 31 Oct 2024

https://github.com/kb22/GitHub-User-Insights-using-API

The project involves using the GitHub API using user authentication to fetch information such as commits and repositories for that specific user and store them as CSV files for data collection and analysis.

api data-analysis data-science data-scraping github-api python

Last synced: 08 Nov 2024

https://github.com/henestrosadev/sololearn

Compilation of all SoloLearn courses with their respective projects and practices and all 72 code challenges for all 7 supported languages.

code-challenge code-practice data-science programming-exercises programming-languages python sololearn sololearn-cert sololearn-solutions

Last synced: 27 Oct 2024

https://github.com/ropensci/rdataretriever

R interface to the Data Retriever

data data-science database datasets r r-package rstats science

Last synced: 13 Aug 2024

https://github.com/imgcook/datacook

Machine Learning and Data Analysis in JavaScript.

data-science feature-engineering javascript machine-learning

Last synced: 13 Nov 2024

https://github.com/fremantle-industries/prop

An open and opinionated trading platform using productive & familiar open source libraries and tools for strategy research, execution and operation.

algo-trading data-science defi elixir grafana trading-platform

Last synced: 07 Nov 2024

https://github.com/giswqs/postgis

Spatial Data Management with PostgreSQL and PostGIS https://gishub.org/sdm

data-science database geospatial postgis postgres postgresql

Last synced: 02 Nov 2024

https://github.com/soumyadip007/data-science-using-python-university-course-module

“Data science” is just about as broad of a term as they come. It may be easiest to describe what it is by listing its more concrete components: Data exploration & analysis. Included here: Pandas; NumPy; SciPy; a helping hand from Python's Standard Library.

data-preparation data-preprocessing data-processing data-science data-visualization jupyter-notebook knn numpy panda plotting python

Last synced: 28 Oct 2024

https://github.com/google/bayesnf

Bayesian Neural Field models for prediction in large-scale spatiotemporal datasets

bayesian-inference data-science machine-learning spatiotemporal-data-analysis statistics

Last synced: 18 Sep 2024

https://github.com/plantinformatics/pretzel

Javascript full-stack framework for Big Data visualisation and analysis

big-data bioinformatics data-science data-visualization ember emberjs express expressjs javascript open-source

Last synced: 31 Oct 2024

https://github.com/jonathandinu/spark-ray-data-science

Supporting content (slides and exercises) for the Pearson video series covering best practices for developing scalable applications with Spark and Ray in the context of a data scientist's standard workflow.

artificial-intelligence data-science distributed-computing machine-learning python ray spark

Last synced: 03 Aug 2024

https://github.com/ndleah/8-week-sql-challenge

#8WeekSQLChallenge by Danny Ma.

data-analysis data-science sql

Last synced: 13 Nov 2024

https://github.com/rfordatascience/rfordatasciencewiki

Resources for the R4DS Online Learning Community, including answer keys to the text

beginner beginner-friendly beginner-tutorial-series data-science help-wanted r4ds rstats rstudio tidyverse

Last synced: 14 Nov 2024

https://github.com/briatte/dsr

Introduction to Data Science with R (Sciences Po, Paris, 2023)

course data-analysis data-science data-visualization r statistics

Last synced: 27 Oct 2024

https://github.com/ploomber/soopervisor

☁️ Export Ploomber pipelines to Kubernetes (Argo), Airflow, AWS Batch, SLURM, and Kubeflow.

airflow argo argo-workflows aws data-science kubeflow kubeflow-pipelines kubernetes machine-learning slurm workflow

Last synced: 13 Nov 2024

https://github.com/elysian01/data-purifier

A Python library for Automated Exploratory Data Analysis, Automated Data Cleaning, and Automated Data Preprocessing For Machine Learning and Natural Language Processing Applications in Python.

data-analysis data-cleaning data-cleaning-pipeline data-preprocessing data-science data-visualization datapurifier eda exploratory-data-analysis jupyter python-lib python-library python3

Last synced: 07 Nov 2024

https://github.com/codait/max-central-repo

Central Repository of Model Asset Exchange project. This repository contains information about the available models, current project status, contribution guidelines and supporting assets.

cloud codait data-science deep-learning ibm-developer kubernetes model-asset-exchange node-red-flow openshift trainable-models watson-machine-learning watson-st

Last synced: 09 Nov 2024

https://github.com/jrfiedler/causal_inference_julia_code

Julia code for part 2 of the book Causal Inference: What If, by Miguel Hernán and James Robins

causal-inference causality data-science julia julialang

Last synced: 12 Oct 2024

https://github.com/goldencheetah/scikit-sports

Sports analysis library for Python

data-science sports

Last synced: 08 Nov 2024

https://github.com/lukasmosser/snist

A Benchmark for Seismic Velocity Inversion from Synthetics

data-science deep-learning geology geophysics machine-learning physics seismic waveform

Last synced: 08 Nov 2024

https://github.com/stefan-m-lenz/BoltzmannMachines.jl

A Julia package for training and evaluating multimodal deep Boltzmann machines

data-science deep-boltzmann-machine deep-learning julia machine-learning neural-networks restricted-boltzmann-machine

Last synced: 13 Nov 2024

https://github.com/nicolaskruchten/scipy2021

Data Visualization as the First and Last Mile of Data Science: Plotly Express and Dash

data-analysis data-science data-visualization python visualization

Last synced: 08 Nov 2024

https://github.com/joaquinamatrodrigo/cienciadedatos.net

Web de divulgación con material formativo sobre estadística, algoritmos de machine learning, ciencia de datos y programación en R y Python.

analytics ciencia-de-dados data-science estadistica forecasting machine-learning python r-programming rstats statistics

Last synced: 01 Nov 2024

https://github.com/facultyai/scala-plotly-client

Visualise your data from Scala using Plotly

data-science graph plot plotly scala visualisation

Last synced: 08 Nov 2024

https://github.com/SOCR/SOCRAT

A Dynamic Web Toolbox for Interactive Data Processing, Analysis, and Visualization

data-analysis data-science data-visualization socr statistics visual-analytics visualization

Last synced: 03 Nov 2024

https://github.com/AidanCooper/shap-analysis-guide

How to Interpret SHAP Analyses: A Non-Technical Guide

data-science machine-learning shap tutorial

Last synced: 12 Nov 2024

https://github.com/weiji14/deepbedmap

Going beyond BEDMAP2 using a super resolution deep neural network. Also a convenient flat file data repository for high resolution bed elevation datasets around Antarctica.

antarctica bedmap binder chainer data-science deep-neural-network digital-elevation-model flat-file-db generative-adversarial-network glaciology jupyter-notebook optuna pangeo remote-sensing super-resolution

Last synced: 22 Oct 2024

https://github.com/edinsonrequena/articicial-inteligence-and-data-science

Este repositorio esta basado principalmente en la carrera de machine learning y data science de platzi pero también habrán recursos de otras plataformas e instituciones educativas.

algebra algorithms articicial-inteligence data-science instituciones-educativas jupyter-notebook platzi python university

Last synced: 27 Oct 2024

https://github.com/datakitchen/dataops-testgen

DataOps Data Quality TestGen is part of DataKitchen's Open Source Data Observability. DataOps TestGen delivers simple, fast data quality test generation and execution by data profiling,  new dataset hygiene review, AI generation of data quality validation tests, ongoing testing of data refreshes, & continuous anomaly monitoring

data data-engineering data-observability data-quality data-science data-testing datachecker dataops dataprofiling dataquality datavalidation mssql postgresql python redshift self-hosted snowflake

Last synced: 06 Nov 2024

https://github.com/m-dadej/marswitching.jl

MarSwitching.jl: Julia package for Markov switching dynamic models :chart_with_upwards_trend:

data-science econometrics julia machine-learning markov-chain statistics time-series

Last synced: 12 Oct 2024

https://github.com/repetere/modelscript

REPO MOVED TO https://github.com/repetere/jsonstack-data - Data Science and Machine learning in JavaScript

data-mining data-preprocessing data-science javascript machine-learning

Last synced: 27 Sep 2024

https://github.com/franzdiebold/data-science-cheat-sheets

A collection of Data Science cheat sheets.

cheat-sheet cheat-sheets data-science pandas

Last synced: 05 Nov 2024

https://github.com/mlr-org/mlr3torch

Deep learning framework for the mlr3 ecosystem based on torch

data-science deep-learning machine-learning mlr3 r r-package torch

Last synced: 06 Nov 2024

https://github.com/pjaselin/cubist

A Python package for fitting Quinlan's Cubist regression model

data-science machine-learning python regression scikit-learn

Last synced: 14 Nov 2024

https://github.com/florents-tselai/greek-wines-analysis

Scraper, Data and Analysis for "Analyzing 1000+ Greek Wines with Python"

beautifulsoup data-science pandas python seaborn web-scraping

Last synced: 31 Oct 2024

https://github.com/njanakiev/openstreetmap-data-science

Data Science with OpenStreetMap

data-science openstreetmap python

Last synced: 06 Nov 2024

https://github.com/dfinke/psduckdb

PSDuckDB is a PowerShell module that provides seamless integration with DuckDB, enabling efficient execution of analytical SQL queries directly from the PowerShell environment.

data-analysis data-science duckdb powershell sql

Last synced: 27 Oct 2024

https://github.com/leemengtw/gist-evernote

A Python application that sync Github Gists and save them to Evernote notebook as screenshots.

data-science evernote gists github github-graphql jupyter-notebook pet-project python selenium sync

Last synced: 07 Aug 2024

https://github.com/jules32/rmarkdown-website-tutorial

Tutorial for creating websites w/ R Markdown

data-science rmarkdown rstats teaching tutorial

Last synced: 06 Nov 2024

https://github.com/apreshill/data-vis-labs-2018

Principles & Practice of Data Visualization, CS631 Spring 2018

data-science data-visualization education rstats teaching

Last synced: 11 Oct 2024

https://github.com/jhwohlgemuth/pwsh-prelude

PowerShell “standard” library for supercharging your productivity. Provides a powerful cross-platform scripting environment enabling efficient analysis and sustainable science in myriad contexts.

applied-mathematics cli cli-app data-science hacktoberfest library mathematics powershell powershell-module statistics text-processing text-to-speech user-interface

Last synced: 27 Oct 2024

https://github.com/jmari/ipharo

Pharo Smaltalk kernel for Jupyter

data-science jupyter-notebook pharo pharo-smalltalk smalltalk

Last synced: 09 Oct 2024

https://github.com/ipeirotis/introduction-to-python

Notes for the "Introduction to Programming for Data Science" class

data-science for-beginners python python3

Last synced: 27 Oct 2024

https://github.com/leriomaggio/python-data-science

Lecture notes and materials for Python Data Science course

data-science jupyter-notebooks machine-learning materials python-tutorials

Last synced: 29 Oct 2024

https://github.com/ammsa/dtcleaner

DTCleaner: data cleaning using multi-target decision trees.

data-cleaning data-mining data-preprocessing data-quality data-science data-wrangling

Last synced: 28 Oct 2024

https://github.com/darribas/gds17

Geographic Data Science'17

data-science gis pysal python

Last synced: 28 Oct 2024

https://github.com/jmari/iPharo

Pharo Smaltalk kernel for Jupyter

data-science jupyter-notebook pharo pharo-smalltalk smalltalk

Last synced: 03 Aug 2024

https://github.com/tirendazacademy/chatgpt-with-examples

This repo contains ChatGPT tutorials about data science, machine learning, deep learning, Python. We show how to use Chat GPT with examples.

chat-gpt chatgpt chatgpt-api chatgpt-python chatgpt3 data-science deep-learning machine-learning

Last synced: 08 Nov 2024

https://github.com/tejzpr/ordered-concurrently

Ordered-concurrently a library for concurrent processing with ordered output in Go. Process work concurrently and returns output in a channel in the order of input. It is useful in concurrently processing items in a queue, and get output in the order provided by the queue.

concurrent concurrent-data-structure data-pipeline data-science golang golang-library ordered parallel parallel-computing

Last synced: 26 Oct 2024

https://github.com/tdeboissiere/cookiecutter-deeplearning

Project folder structure for doing and sharing deep learning work.

data-science project-template

Last synced: 05 Nov 2024

https://github.com/lkuffo/data-viz

Más de 50 ejemplos de visualizaciones y análisis de datos en Matplotlib, Pandas, Seaborn, Plotly, Bokeh y Networkx

data-analysis data-science dataviz geoviz jupyter jupyter-notebook matplotlib networkx pandas plotly python seaborn

Last synced: 05 Nov 2024

https://github.com/dMLTquant/openbb_sdk_exporation

Explore OpenBB SDK without having to install anything on your local machine. You just need a GitHub and a GitPod account.

algorithmic-trading data-science financial-data jupyter notebook openbb python

Last synced: 01 Nov 2024

https://github.com/megagonlabs/ruler

Data Programming by Demonstration (DPBD) for Document Classification

data-labeling data-programming data-science machine-learning training-data weak-supervision

Last synced: 10 Nov 2024

https://github.com/ActuariesInstitute/cookbook

Data and analytics cookbook for actuaries

actuarial analytics data-science hacktoberfest

Last synced: 08 Aug 2024