Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/ak-coram/cl-duckdb

Common Lisp CFFI wrapper around the DuckDB C API

c-bindings common-lisp data-science duckdb lisp olap parquet sql

Last synced: 13 Nov 2024

https://github.com/benjaminmbrown/real-time-data-viz-d3-crossfilter-websocket-tutorial

Tutorial on real-time data visualization. Python websocket server & d3.js + crossfilter.js frontend

crossfilter d3 d3js data-science data-visualization dcjs tutorial websockets

Last synced: 25 Nov 2024

https://github.com/ivan-bilan/nlp-and-data-science-spotlights

Regular spotlights of underrated NLP and Data Science GitHub repositories

data-science deep-learning natural-language-processing nlp spotlight

Last synced: 31 Dec 2024

https://github.com/sayakpaul/benchmarking-and-mli-experiments-on-the-adult-dataset

Contains benchmarking and interpretability experiments on the Adult dataset using several libraries

data-science fastai h2oai interpretable-machine-learning machine-learning microsoft-interpret tensorflow

Last synced: 13 Jan 2025

https://github.com/datakitchen/dataops-observability

DataOps Observability is part of DataKitchen's Open Source Data Observability. DataOps Observability monitors every data journey from data source to customer value, from any team development environment into production, across every tool, team, environment, and customer so that problems are detected, localized, and understood immediately.

data data-engineering data-observability data-science dataops pipleine-monitoring

Last synced: 06 Nov 2024

https://github.com/scrapinghub/page_clustering

A simple algorithm for clustering web pages, suitable for crawlers

data-science

Last synced: 07 Jan 2025

https://github.com/blmoore/blogr

Scripts + data to recreate analyses published on http://benjaminlmoore.wordpress.com and http://blm.io

data-science dataviz r rstats statistics

Last synced: 24 Jan 2025

https://github.com/IMSoley/cs-study-plan

📚👨‍🎓 Resources I'm using everyday to develop my skills to become a self-taught good programmer ...

artificial-intelligence computer-science data-science data-structures-and-algorithms higher-education machine-learning web-development

Last synced: 20 Nov 2024

https://github.com/ww-tech/primrose

Primrose modeling framework for simple production models

dag data-science datascience deployment machine-learning primrose python workflows

Last synced: 18 Nov 2024

https://github.com/scicloj/scicloj-data-science-handbook

Clojure data science handbook - journal style examples of data science

clojure clojurescript data-science notebook scicloj

Last synced: 15 Nov 2024

https://github.com/lamres/capm_shiny

Demo project of creating an interactive analytical tool for stock market using CAPM.

capm data-science r shiny shinyapps stock-market stocks time-series

Last synced: 04 Dec 2024

https://github.com/virgesmith/ukcensusapi

UK Census Data queries and downloads from python or R

data-science python r

Last synced: 27 Oct 2024

https://github.com/tirthajyoti/mlr

Multiple linear regression with statistical inference, residual analysis, direct CSV loading, and other features

analytics data-analytics data-science linear-regression machine-learning modeling predictive-modeling python regression scikit-learn statiscal-learning statistical-analysis statistics

Last synced: 17 Feb 2025

https://github.com/center-for-threat-informed-defense/sightings_ecosystem

Sightings Ecosystem gives cyber defenders visibility into what adversaries actually do in the wild. With your help, we are tracking MITRE ATT&CK® techniques observed to give defenders real data on technique prevalence.

ctid cyber-threat-intelligence cybersecurity data-science data-visualization mitre-attack

Last synced: 07 Nov 2024

https://github.com/mlsanigeria/ai-hacktober-mlsa

Contributing to cutting-edge open-source projects in Machine Learning hosted by MLSA Nigeria

artificial-intelligence data-science hacktoberfest machine-learning microsoft-azure mlsa open-source python

Last synced: 06 Nov 2024

https://github.com/wri-dssg-omdena/policy-data-analyzer

Building a model to recognize incentives for landscape restoration in environmental policies from Latin America, the US and India. Bringing NLP to the world of policy analysis through an extensible framework that includes scraping, preprocessing, active learning and text analysis pipelines.

active-learning bert data-science document-classification environmental huggingface incentives landscape-restoration lda machine-learning nlp policy sbert scraping scrapy sentence-transformers spyder text-classification topic transformers

Last synced: 30 Oct 2024

https://github.com/mindinventory/bank-marketing-data-visualisation

This repository contains Python code for visualizing the Bank Marketing dataset using various data visualization techniques. The dataset is loaded from a CSV file, and both numerical and categorical features are explored using popular libraries such as Pandas, Matplotlib, Seaborn, and Plotly.

artificial-intelligence data-science data-visualization jupyter-notebook machine-learning matplotlib pandas plotly python seaborn

Last synced: 21 Jan 2025

https://github.com/braph-software/BRAPH-2

BRAPH 2.0 is a comprehensive software package for the analysis and visualization of brain connectivity data, offering flexible customization, rich visualization capabilities, and a platform for collaboration in neuroscience research.

biomedical-engineering brain-connectivity-analysis brain-research computational-neuroscience connectomics data-analysis data-science data-visualization deep-learning graph-theory machine-learning matlab network-analysis neuroimaging neuroscience open-source reproducible-research research-tools scientific-software toolbox

Last synced: 12 Nov 2024

https://github.com/mainakrepositor/parkinsons-detector

Detect the onset of possible risk of Parkinson's disease with the help of clinical data using Machine Learning Models.

data-science data-visualization decision-tree-classifier medical-application mini-project parkinsons-disease python-3 random-forest-classifier slider-component streamlit-webapp

Last synced: 12 Nov 2024

https://github.com/noaa-mdl/grib2io

Python interface to the NCEP G2C Library for reading and writing GRIB2 messages.

atmospheric-science data-science grib2 grib2-decoder grib2-encoder grib2-tables meteorology ncep ncep-grib2 ndfd-grib2 numpy python python3 weather weather-data

Last synced: 19 Feb 2025

https://github.com/rubixml/iris

The original lightweight introduction to machine learning in Rubix ML using the famous Iris dataset and the K Nearest Neighbors classifier.

classification cross-validation data-science example-project introduction-to-machine-learning iris-dataset k-nearest-neighbors knn machine-learning machine-learning-tutorial nearest-neighbors php php-machine-learning php-ml rubix-ml tutorial

Last synced: 04 Dec 2024

https://github.com/fredhutch/wiki

SciWiki: Collective KnowledgeBase for Scientific Data and Use

bioinformatics community computing data-science documentation fhdasl open-science sciwiki wiki

Last synced: 30 Jan 2025

https://github.com/martinfleis/sdsc21-workshop

Materials for SDSC 2021 Workshop

data-science python workshop

Last synced: 28 Oct 2024

https://github.com/racinmat/mal-analysis

github repo for MyAnimeList analysis. Also links to the MAL dataset.

analysis anime crawling data-science kaggle-dataset mal scraped-data

Last synced: 06 Nov 2024

https://github.com/maneprajakta/honours-in-data-science

Resources and Implementation Of Assignment For Honours In Data Science

assignment-solutions data-science honours resources sppu

Last synced: 27 Oct 2024

https://github.com/Smat26/Roman-Urdu-Dataset

Compilation of Manually Tagged Roman Urdu Dataset (Urdu written in Latin/Roman Script), along with other helpful Roman Urdu NLP resources

data-science dataset hindi hindi-language natural-language-processing nlp urdu urdu-language urdu-nlp

Last synced: 18 Nov 2024

https://github.com/prakhar-ff13/customer-analytics

Machine Learning Case study on customer segmentation and prediction of groups.

analytics case-study data-analysis data-science data-visualization dimensionality-reduction machine-learning python python3

Last synced: 30 Nov 2024

https://github.com/aphp/eds-scikit

eds-scikit is a Python library providing tools to process and analyse OMOP data

clinical-data-warehouse data-science medical omop python

Last synced: 25 Nov 2024

https://github.com/tjmahr/madr_pipelines

Slides and materials for my talk to the Madison R Users Group

data-science dplyr magrittr presentation r

Last synced: 12 Nov 2024

https://github.com/ahammadmejbah/ai-cheat-sheet

The replication of human intellectual processes by machines, most notably computer systems, is referred to as artificial intelligence (AI for short). Expert systems, natural language processing, voice recognition, and machine vision are all examples of specific uses of artificial intelligence.

cheatsheet data-science deep-learning machine-learning neural-networks

Last synced: 09 Jan 2025

https://github.com/chainbound/apollo

cross-chain ETL tool for EVM chaindata

blockchain data-science dsl ethereum etl evm golang hcl web3

Last synced: 24 Nov 2024

https://github.com/anilkumarteegala/wqu-ds-unit-2

This repo contains all the files material releated to WorldQuant University's Data Science Summer 2020 Session Unit 2: Machine Learning and Statistical Analysis

data-science machine-learning statistical-analysis wqu

Last synced: 12 Jan 2025

https://github.com/m-clark/data-processing-and-visualization

This document forms the basis of several workshops/talks that get into everyday programming with R, but also includes mirrored code in Python as Jupyter notebooks.

data-processing data-science datatable dplyr ggplot2 htmlwidgets jupyter-notebooks machine-learning model-criticism modeling numpy pandas programming programming-exercises python r tidyverse visualization workshop workshops

Last synced: 29 Nov 2024

https://github.com/epistasislab/rebate

Relief Based Algorithms of ReBATE implemented in Python with Cython optimization. This repository is no longer being updated. Please see scikit-rebate.

cython data-science feature-selection

Last synced: 16 Nov 2024

https://github.com/ericmjl/pyds-cli

Helping you manage your data science projects sanely.

data-science workflow-automation

Last synced: 31 Oct 2024

https://github.com/openghg/openghg

A cloud platform for greenhouse gas (GHG) data analysis and collaboration.

analysis cloud collaboration data-science greenhouse-gas

Last synced: 14 Nov 2024

https://github.com/JieZheng-ShanghaiTech/KG4SL

Synthetic lethality (SL) is a promising gold mine for the discovery of anti-cancer drug targets. KG4SL is the first graph neural network (GNN)-based model that uses knowledge graph for SL prediction.

ai4science bioinformatics cancer data-science drug-discovery machine-learning

Last synced: 13 Nov 2024

https://github.com/kwokhing/yandexcatboost-python-demo

Demo on the capability of Yandex CatBoost gradient boosting classifier on a fictitious IBM HR dataset obtained from Kaggle. Data exploration, cleaning, preprocessing and model tuning are performed on the dataset

catboost data-analysis data-preprocessing data-science feature-selection gradient-boosting gradient-boosting-classifier one-hot-encode pandas pearson-correlation python python27 seaborn variance-analysis visualization yandex-catboost

Last synced: 24 Jan 2025

https://github.com/google-marketing-solutions/ml_toast

Cluster multilingual search terms captured from different time windows into semantically relevant topics.

data-science machine-learning marketing-science nlp tensorflow topic-clustering

Last synced: 05 Dec 2024

https://github.com/iesahin/xvc

A robust (🐢) and fast (🐇) MLOps tool for managing data and pipelines in Rust (🦀)

command-line-tool data data-engineering data-pipelines data-science devops machine-learning machine-learning-engineering mlops rust

Last synced: 11 Nov 2024

https://github.com/microsoft/responsible-ai-toolbox-genbit

A tool for gender bias identification in text. Part of Microsoft's Responsible AI toolbox.

data-science fairness-ai fairness-ml fairnness gender gender-bias jupyter machine-learning natural-language natural-language-processing

Last synced: 02 Dec 2024

https://github.com/timkpaine/perspective-parquet

Parquet file reader and editor in Jupyterlab, built with `perspective` for pivoting, filtering, aggregating, etc

data-science data-visualization datavisualization dataviz jupyter jupyterlab jupyterlab-extension jupyterlab-extensions parquet parquet-viewer perspective pivot-tables

Last synced: 01 Jan 2025

https://github.com/petersontylerd/mlmachine

mlmachine accelerates machine learning experimentation

data-analysis data-science data-visualization machine-learning python

Last synced: 13 Nov 2024

https://github.com/hackersandslackers/pandas-sqlalchemy-tutorial

:panda_face: :computer: Load or insert data into a SQL database using Pandas DataFrames.

data-analysis data-science dataframes pandas pandas-sqlalchemy-tutorial python sql-database sqlalchemy tutorial

Last synced: 16 Nov 2024

https://github.com/hendersontrent/gam.jl

Fit, evaluate, and visualise generalised additive models (GAMs) in native Julia

data-science generalized-additive-models machine-learning regression statistical-models statistics

Last synced: 14 Feb 2025

https://github.com/bluegreen-labs/daymetr

An R Interface to the Daymet Web Services

climate-data data-science daymet gridded-data netcdf ornl-daac r-package rstats

Last synced: 24 Nov 2024

https://github.com/shixiangwang/self-study

My Self-Study Room: keep tidy and lightweight

bioinformatics data-science python r study study-room

Last synced: 04 Feb 2025

https://github.com/jezcope/pyrefine

Execute OpenRefine JSON scripts without OpenRefine (or Java)

data-science data-wrangling openrefine python

Last synced: 06 Nov 2024

https://github.com/sandialabs/MEWS

Multi-scenario Extreme Weather Simulator (MEWS)

application data-science scr-2664 snl-applications

Last synced: 27 Nov 2024

https://github.com/sandialabs/mews

Multi-scenario Extreme Weather Simulator (MEWS)

application data-science scr-2664 snl-applications

Last synced: 18 Feb 2025

https://github.com/srwi/pycharm-pixellens

Free PyCharm image viewer plugin for visualizing and debugging NumPy, OpenCV, PyTorch, TensorFlow, JAX, and PIL data.

data-science debugger-visualizer debugging machine-learning pycharm pycharm-plugin python

Last synced: 08 Feb 2025

https://github.com/afondiel/cs-books

Computer science books from algorithms, data structure, programming, to data science, AI and much more.

ai books computer-science computer-science-books computer-vision computer-vision-books data-science data-structures dl image-processing ml programming

Last synced: 16 Feb 2025

https://github.com/hassaku/audio-plot-lib

This library provides graph sonification functions and has been developed for a project named "Data science and machine learning resources for screen reader users". Please refer to the project page for more details.

audio data-science google-colab graphs machine-learning python sonification visually-impaired

Last synced: 17 Feb 2025

https://github.com/leerenjie/100-days-of-code-in-python

Udemy Angela Yu's course has 100 projects for students to make each day with classes for 2 hours each day. This repository will store all the related projects

100-days-of-code api backend-webdevelopment data-science database flask frontend-web game-development version-control

Last synced: 20 Dec 2024

https://github.com/alagoa/youtube-or-pornhub

Service identification on ciphered traffic.

capture data-science machinelearning ml pcap python3 spotify traffic tshark youtube

Last synced: 31 Jan 2025

https://github.com/lenguyenthedat/minimal-datascience

This repository contains all the code and dataset used in my blog series: Minimal Data Science

blog-series data-science kaggle machine-learning python scikit-learn xgboost

Last synced: 08 Nov 2024

https://github.com/mam-dev/debianized-jupyterhub

:package: ♃ Debian packaging of JupyterHub, a multi-user server for Jupyter notebooks

data-science debian-packages deployment devops dh-virtualenv jupyter-notebook jupyterhub omnibus-packages python-3

Last synced: 13 Feb 2025

https://github.com/datasnakes/orthoevolution

An easy to use and comprehensive python package which aids in the analysis and visualization of orthologous genes. 🐵

bash bioinformatics biology biosql blast data-science ftp genetics ncbi orthologs orthologues orthology orthology-inference pbs phylogenetics python qsub sequence-alignment sge shell

Last synced: 16 Nov 2024

https://github.com/noahgift/devml

Product of Pragmatic AI Labs: Machine Learning, Statistics and Utilities around Developer Productivity, Company Productivity and Project Productivity

ai churn-statistics data-science defects git github jupyter-notebook machine-intelligence machine-learning pandas productivity python seaborn visualization

Last synced: 07 Nov 2024

https://github.com/dylan-profiler/compressio

Lossless in-memory compression of pandas DataFrames and Series powered by the visions type system. Up to 10x less RAM needed for the same data.

compression data-science dtype hacktoberfest pandas python types

Last synced: 16 Nov 2024