Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/megagonlabs/ruler

Data Programming by Demonstration (DPBD) for Document Classification

data-labeling data-programming data-science machine-learning training-data weak-supervision

Last synced: 10 Nov 2024

https://github.com/ActuariesInstitute/cookbook

Data and analytics cookbook for actuaries

actuarial analytics data-science hacktoberfest

Last synced: 08 Aug 2024

https://github.com/ak-coram/cl-duckdb

Common Lisp CFFI wrapper around the DuckDB C API

c-bindings common-lisp data-science duckdb lisp olap parquet sql

Last synced: 13 Nov 2024

https://github.com/aachartmodel/aachartkit-swift-pro

📈📊👑👑👑AAChartKit-Swift-Pro is a professional version of AAChartKit-Swift, it is an elegant and friendly chart framework for iOS, iPadOS, macOS. AAChartKit-Swift-Pro is a more powerful data visualization framework that supports more types beautiful chart like bellcurve, bullet, columnpyramid, cylinder, dependencywheel, heatmap, histogram, networkgraph, organization, packedbubble, pareto, sankey, series, solidgauge, streamgraph, sunburst, tilemap, timeline, treemap, variablepie, variwide, vector, venn, windbarb, wordcloud, xrange charts and so on.

aacharts chart charting-library data-science data-visualization framework highcharts hybrid ios ipados macos plot swift webview

Last synced: 07 Nov 2024

https://github.com/tstreamdoth/instacart-market-basket-analysis

Use Instacart public dataset to report which products are often shopped together. 🍋🍉🥑🥦

data-analysis data-science instacart market-basket-analysis

Last synced: 28 Oct 2024

https://github.com/root-11/tablite

multiprocessing enabled out-of-memory data analysis library for tabular data.

data-analysis data-science datatype disk etl excel filereader pandas pivot-tables python table tabular-data

Last synced: 11 Oct 2024

https://github.com/rafzamb/sknifedatar

sknifedatar is a package that serves primarily as an extension to the modeltime 📦 ecosystem. In addition to some functionalities of spatial data and visualization.

data data-analysis data-science data-visualization forecasting r statistics time-series

Last synced: 05 Aug 2024

https://github.com/hunar4321/reweight-gpt

Reweight GPT - a simple neural network using transformer architecture for next character prediction

algorithms data-science gpt language-model machine-learning nerual-networks numpy pytorch

Last synced: 14 Nov 2024

https://github.com/scrapinghub/page_clustering

A simple algorithm for clustering web pages, suitable for crawlers

data-science

Last synced: 10 Nov 2024

https://github.com/datakitchen/dataops-observability

DataOps Observability is part of DataKitchen's Open Source Data Observability. DataOps Observability monitors every data journey from data source to customer value, from any team development environment into production, across every tool, team, environment, and customer so that problems are detected, localized, and understood immediately.

data data-engineering data-observability data-science dataops pipleine-monitoring

Last synced: 06 Nov 2024

https://github.com/benjaminmbrown/real-time-data-viz-d3-crossfilter-websocket-tutorial

Tutorial on real-time data visualization. Python websocket server & d3.js + crossfilter.js frontend

crossfilter d3 d3js data-science data-visualization dcjs tutorial websockets

Last synced: 06 Aug 2024

https://github.com/chandrikadeb7/coursera_ibm_data_science_professional_certificate

This repo consists of the lecture PDFs and quiz solutions of all the courses under the IBM Data Science Professional Certificate specialization course of Coursera.

coursera coursera-assignment coursera-data-science coursera-solutions coursera-specialization data-science ibm ibm-data-science jupyter-notebook lecture-pdfs professional-certificates quiz-solutions solutions specialization

Last synced: 09 Nov 2024

https://github.com/dfinke/PSDuckDB

PSDuckDB is a PowerShell module that provides seamless integration with DuckDB, enabling efficient execution of analytical SQL queries directly from the PowerShell environment.

data-analysis data-science duckdb powershell sql

Last synced: 23 Aug 2024

https://github.com/lamres/capm_shiny

Demo project of creating an interactive analytical tool for stock market using CAPM.

capm data-science r shiny shinyapps stock-market stocks time-series

Last synced: 13 Aug 2024

https://github.com/ivan-bilan/nlp-and-data-science-spotlights

Regular spotlights of underrated NLP and Data Science GitHub repositories

data-science deep-learning natural-language-processing nlp spotlight

Last synced: 08 Nov 2024

https://github.com/sayakpaul/benchmarking-and-mli-experiments-on-the-adult-dataset

Contains benchmarking and interpretability experiments on the Adult dataset using several libraries

data-science fastai h2oai interpretable-machine-learning machine-learning microsoft-interpret tensorflow

Last synced: 22 Oct 2024

https://github.com/mainakrepositor/parkinsons-detector

Detect the onset of possible risk of Parkinson's disease with the help of clinical data using Machine Learning Models.

data-science data-visualization decision-tree-classifier medical-application mini-project parkinsons-disease python-3 random-forest-classifier slider-component streamlit-webapp

Last synced: 12 Nov 2024

https://github.com/wri-dssg-omdena/policy-data-analyzer

Building a model to recognize incentives for landscape restoration in environmental policies from Latin America, the US and India. Bringing NLP to the world of policy analysis through an extensible framework that includes scraping, preprocessing, active learning and text analysis pipelines.

active-learning bert data-science document-classification environmental huggingface incentives landscape-restoration lda machine-learning nlp policy sbert scraping scrapy sentence-transformers spyder text-classification topic transformers

Last synced: 30 Oct 2024

https://github.com/mlsanigeria/ai-hacktober-mlsa

Contributing to cutting-edge open-source projects in Machine Learning hosted by MLSA Nigeria

artificial-intelligence data-science hacktoberfest machine-learning microsoft-azure mlsa open-source python

Last synced: 06 Nov 2024

https://github.com/braph-software/BRAPH-2

BRAPH 2.0 is a comprehensive software package for the analysis and visualization of brain connectivity data, offering flexible customization, rich visualization capabilities, and a platform for collaboration in neuroscience research.

biomedical-engineering brain-connectivity-analysis brain-research computational-neuroscience connectomics data-analysis data-science data-visualization deep-learning graph-theory machine-learning matlab network-analysis neuroimaging neuroscience open-source reproducible-research research-tools scientific-software toolbox

Last synced: 12 Nov 2024

https://github.com/virgesmith/ukcensusapi

UK Census Data queries and downloads from python or R

data-science python r

Last synced: 27 Oct 2024

https://github.com/IMSoley/cs-study-plan

📚👨‍🎓 Resources I'm using everyday to develop my skills to become a self-taught good programmer ...

artificial-intelligence computer-science data-science data-structures-and-algorithms higher-education machine-learning web-development

Last synced: 04 Aug 2024

https://github.com/center-for-threat-informed-defense/sightings_ecosystem

Sightings Ecosystem gives cyber defenders visibility into what adversaries actually do in the wild. With your help, we are tracking MITRE ATT&CK® techniques observed to give defenders real data on technique prevalence.

ctid cyber-threat-intelligence cybersecurity data-science data-visualization mitre-attack

Last synced: 07 Nov 2024

https://github.com/martinfleis/sdsc21-workshop

Materials for SDSC 2021 Workshop

data-science python workshop

Last synced: 28 Oct 2024

https://github.com/noaa-mdl/grib2io

Python interface to the NCEP G2C Library for reading and writing GRIB2 messages.

atmospheric-science data-science grib2 grib2-decoder grib2-encoder grib2-tables meteorology ncep ncep-grib2 ndfd-grib2 numpy python python3 weather weather-data

Last synced: 11 Nov 2024

https://github.com/ahammadmejbah/ai-cheat-sheet

The replication of human intellectual processes by machines, most notably computer systems, is referred to as artificial intelligence (AI for short). Expert systems, natural language processing, voice recognition, and machine vision are all examples of specific uses of artificial intelligence.

cheatsheet data-science deep-learning machine-learning neural-networks

Last synced: 11 Nov 2024

https://github.com/tirthajyoti/mlr

Multiple linear regression with statistical inference, residual analysis, direct CSV loading, and other features

analytics data-analytics data-science linear-regression machine-learning modeling predictive-modeling python regression scikit-learn statiscal-learning statistical-analysis statistics

Last synced: 12 Oct 2024

https://github.com/tjmahr/madr_pipelines

Slides and materials for my talk to the Madison R Users Group

data-science dplyr magrittr presentation r

Last synced: 12 Nov 2024

https://github.com/racinmat/mal-analysis

github repo for MyAnimeList analysis. Also links to the MAL dataset.

analysis anime crawling data-science kaggle-dataset mal scraped-data

Last synced: 06 Nov 2024

https://github.com/Smat26/Roman-Urdu-Dataset

Compilation of Manually Tagged Roman Urdu Dataset (Urdu written in Latin/Roman Script), along with other helpful Roman Urdu NLP resources

data-science dataset hindi hindi-language natural-language-processing nlp urdu urdu-language urdu-nlp

Last synced: 04 Aug 2024

https://github.com/maneprajakta/honours-in-data-science

Resources and Implementation Of Assignment For Honours In Data Science

assignment-solutions data-science honours resources sppu

Last synced: 27 Oct 2024

https://github.com/JieZheng-ShanghaiTech/KG4SL

Synthetic lethality (SL) is a promising gold mine for the discovery of anti-cancer drug targets. KG4SL is the first graph neural network (GNN)-based model that uses knowledge graph for SL prediction.

ai4science bioinformatics cancer data-science drug-discovery machine-learning

Last synced: 13 Nov 2024

https://github.com/iesahin/xvc

A robust (🐢) and fast (🐇) MLOps tool for managing data and pipelines in Rust (🦀)

command-line-tool data data-engineering data-pipelines data-science devops machine-learning machine-learning-engineering mlops rust

Last synced: 11 Nov 2024

https://github.com/m-clark/data-processing-and-visualization

This document forms the basis of several workshops/talks that get into everyday programming with R, but also includes mirrored code in Python as Jupyter notebooks.

data-processing data-science datatable dplyr ggplot2 htmlwidgets jupyter-notebooks machine-learning model-criticism modeling numpy pandas programming programming-exercises python r tidyverse visualization workshop workshops

Last synced: 08 Aug 2024

https://github.com/thudm/kdd-industrial-papers

A list of recent industrial papers in KDD'16–'18

data-mining data-science kdd paper-list

Last synced: 14 Nov 2024

https://github.com/timkpaine/perspective-parquet

Parquet file reader and editor in Jupyterlab, built with `perspective` for pivoting, filtering, aggregating, etc

data-science data-visualization datavisualization dataviz jupyter jupyterlab jupyterlab-extension jupyterlab-extensions parquet parquet-viewer perspective pivot-tables

Last synced: 27 Oct 2024

https://github.com/openghg/openghg

A cloud platform for greenhouse gas (GHG) data analysis and collaboration.

analysis cloud collaboration data-science greenhouse-gas

Last synced: 14 Nov 2024

https://github.com/kwokhing/yandexcatboost-python-demo

Demo on the capability of Yandex CatBoost gradient boosting classifier on a fictitious IBM HR dataset obtained from Kaggle. Data exploration, cleaning, preprocessing and model tuning are performed on the dataset

catboost data-analysis data-preprocessing data-science feature-selection gradient-boosting gradient-boosting-classifier one-hot-encode pandas pearson-correlation python python27 seaborn variance-analysis visualization yandex-catboost

Last synced: 12 Oct 2024

https://github.com/ericmjl/pyds-cli

Helping you manage your data science projects sanely.

data-science workflow-automation

Last synced: 31 Oct 2024

https://github.com/petersontylerd/mlmachine

mlmachine accelerates machine learning experimentation

data-analysis data-science data-visualization machine-learning python

Last synced: 13 Nov 2024

https://github.com/nitya/pydata-analysis-workshop

Step-by-step workshop for the "Simplifying Data Analysis" talk

data-science data-visualization python workshop

Last synced: 29 Oct 2024

https://github.com/hendersontrent/gam.jl

Fit, evaluate, and visualise generalised additive models (GAMs) in native Julia

data-science generalized-additive-models machine-learning regression statistical-models statistics

Last synced: 05 Nov 2024

https://github.com/hackersandslackers/pandas-sqlalchemy-tutorial

:panda_face: :computer: Load or insert data into a SQL database using Pandas DataFrames.

data-analysis data-science dataframes pandas pandas-sqlalchemy-tutorial python sql-database sqlalchemy tutorial

Last synced: 09 Nov 2024

https://github.com/bluegreen-labs/daymetr

An R Interface to the Daymet Web Services

climate-data data-science daymet gridded-data netcdf ornl-daac r-package rstats

Last synced: 08 Aug 2024

https://github.com/jezcope/pyrefine

Execute OpenRefine JSON scripts without OpenRefine (or Java)

data-science data-wrangling openrefine python

Last synced: 06 Nov 2024

https://github.com/lenguyenthedat/minimal-datascience

This repository contains all the code and dataset used in my blog series: Minimal Data Science

blog-series data-science kaggle machine-learning python scikit-learn xgboost

Last synced: 08 Nov 2024

https://github.com/alagoa/youtube-or-pornhub

Service identification on ciphered traffic.

capture data-science machinelearning ml pcap python3 spotify traffic tshark youtube

Last synced: 06 Nov 2024

https://github.com/mam-dev/debianized-jupyterhub

:package: ♃ Debian packaging of JupyterHub, a multi-user server for Jupyter notebooks

data-science debian-packages deployment devops dh-virtualenv jupyter-notebook jupyterhub omnibus-packages python-3

Last synced: 11 Oct 2024

https://github.com/sandialabs/mews

Multi-scenario Extreme Weather Simulator (MEWS)

application data-science scr-2664 snl-applications

Last synced: 12 Nov 2024

https://github.com/hemansnation/machine-learning-mlops-generativeai-nlp-cv-mlsystem-design

MLOps - Deploy models at scale, Generative AI - Build applications with LLMs, NLP - Understand Transformers & Text Generation Models, Computer Vision - Build GANs projects like Deepfakes, ML System Design, hands-on project building and code algorithms from scratch.

computer-vision data-science deep-learning generative-ai machine-learning natural-language-processing python

Last synced: 08 Nov 2024

https://github.com/noahgift/devml

Product of Pragmatic AI Labs: Machine Learning, Statistics and Utilities around Developer Productivity, Company Productivity and Project Productivity

ai churn-statistics data-science defects git github jupyter-notebook machine-intelligence machine-learning pandas productivity python seaborn visualization

Last synced: 07 Nov 2024

https://github.com/kairen/learning-spark

Tidy up Spark and Hadoop tutorials.

bigdata data-science hadoop spark

Last synced: 30 Oct 2024

https://github.com/akabe/docker-iocaml-datascience

Dockerfile of Jupyter (IPython notebook) and IOCaml (OCaml kernel) with libraries for data science and machine learning

data-science deep-learning docker functional-programming iocaml jupyter-notebook machine-learning ocaml

Last synced: 30 Oct 2024

https://github.com/koheiw/proxyc

R package for large-scale similarity/distance computation

data-science distance-measures r similarity-measures

Last synced: 05 Nov 2024

https://github.com/oldratlee/data-science-practice

数据科学实践 | data science practice

anaconda data-science python statistics

Last synced: 12 Oct 2024

https://github.com/raybellwaves/cfanalytics

Downloading, analyzing and visualizing CrossFit data

crossfit crossfit-games data-frames data-science python

Last synced: 08 Nov 2024

https://github.com/datapane/examples

Datapane Examples

data-science datapane jupyter python

Last synced: 09 Aug 2024

https://github.com/klaus78/data-science-flashcards

A large collection of challenges on Data Science and Machine Learning.

data-science hacktoberfest jekyll-website machine-learning python

Last synced: 11 Oct 2024

https://github.com/outerbounds/metaflowbot

Slack bot for monitoring your Metaflow flows!

data-science metaflow ml mlops slack slack-bot

Last synced: 10 Nov 2024

https://github.com/mainakrepositor/data-analysis

Different types of data analytics projects : EDA, PDA, DDA, TSA and much more.....

data-analysis data-science deeplearning machine-learning-algorithms neural-networks time-series-analysis tsa

Last synced: 12 Nov 2024