Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/dfinke/psduckdb

PSDuckDB is a PowerShell module that provides seamless integration with DuckDB, enabling efficient execution of analytical SQL queries directly from the PowerShell environment.

data-analysis data-science duckdb powershell sql

Last synced: 27 Oct 2024

https://github.com/pjaselin/cubist

A Python package for fitting Quinlan's Cubist regression model

data-science machine-learning python regression scikit-learn

Last synced: 14 Nov 2024

https://github.com/mlr-org/mlr3torch

Deep learning framework for the mlr3 ecosystem based on torch

data-science deep-learning machine-learning mlr3 r r-package torch

Last synced: 06 Nov 2024

https://github.com/florents-tselai/greek-wines-analysis

Scraper, Data and Analysis for "Analyzing 1000+ Greek Wines with Python"

beautifulsoup data-science pandas python seaborn web-scraping

Last synced: 31 Oct 2024

https://github.com/apreshill/data-vis-labs-2018

Principles & Practice of Data Visualization, CS631 Spring 2018

data-science data-visualization education rstats teaching

Last synced: 15 Nov 2024

https://github.com/njanakiev/openstreetmap-data-science

Data Science with OpenStreetMap

data-science openstreetmap python

Last synced: 06 Nov 2024

https://github.com/leemengtw/gist-evernote

A Python application that sync Github Gists and save them to Evernote notebook as screenshots.

data-science evernote gists github github-graphql jupyter-notebook pet-project python selenium sync

Last synced: 07 Aug 2024

https://github.com/jules32/rmarkdown-website-tutorial

Tutorial for creating websites w/ R Markdown

data-science rmarkdown rstats teaching tutorial

Last synced: 06 Nov 2024

https://github.com/lkuffo/data-viz

Más de 50 ejemplos de visualizaciones y análisis de datos en Matplotlib, Pandas, Seaborn, Plotly, Bokeh y Networkx

data-analysis data-science dataviz geoviz jupyter jupyter-notebook matplotlib networkx pandas plotly python seaborn

Last synced: 05 Nov 2024

https://github.com/darribas/gds17

Geographic Data Science'17

data-science gis pysal python

Last synced: 28 Oct 2024

https://github.com/tirendazacademy/chatgpt-with-examples

This repo contains ChatGPT tutorials about data science, machine learning, deep learning, Python. We show how to use Chat GPT with examples.

chat-gpt chatgpt chatgpt-api chatgpt-python chatgpt3 data-science deep-learning machine-learning

Last synced: 08 Nov 2024

https://github.com/jmari/ipharo

Pharo Smaltalk kernel for Jupyter

data-science jupyter-notebook pharo pharo-smalltalk smalltalk

Last synced: 09 Oct 2024

https://github.com/tdeboissiere/cookiecutter-deeplearning

Project folder structure for doing and sharing deep learning work.

data-science project-template

Last synced: 05 Nov 2024

https://github.com/jmari/iPharo

Pharo Smaltalk kernel for Jupyter

data-science jupyter-notebook pharo pharo-smalltalk smalltalk

Last synced: 17 Nov 2024

https://github.com/dMLTquant/openbb_sdk_exporation

Explore OpenBB SDK without having to install anything on your local machine. You just need a GitHub and a GitPod account.

algorithmic-trading data-science financial-data jupyter notebook openbb python

Last synced: 01 Nov 2024

https://github.com/ipeirotis/introduction-to-python

Notes for the "Introduction to Programming for Data Science" class

data-science for-beginners python python3

Last synced: 17 Nov 2024

https://github.com/leriomaggio/python-data-science

Lecture notes and materials for Python Data Science course

data-science jupyter-notebooks machine-learning materials python-tutorials

Last synced: 29 Oct 2024

https://github.com/ammsa/dtcleaner

DTCleaner: data cleaning using multi-target decision trees.

data-cleaning data-mining data-preprocessing data-quality data-science data-wrangling

Last synced: 28 Oct 2024

https://github.com/tejzpr/ordered-concurrently

Ordered-concurrently a library for concurrent processing with ordered output in Go. Process work concurrently and returns output in a channel in the order of input. It is useful in concurrently processing items in a queue, and get output in the order provided by the queue.

concurrent concurrent-data-structure data-pipeline data-science golang golang-library ordered parallel parallel-computing

Last synced: 26 Oct 2024

https://github.com/jhwohlgemuth/pwsh-prelude

PowerShell “standard” library for supercharging your productivity. Provides a powerful cross-platform scripting environment enabling efficient analysis and sustainable science in myriad contexts.

applied-mathematics cli cli-app data-science hacktoberfest library mathematics powershell powershell-module statistics text-processing text-to-speech user-interface

Last synced: 27 Oct 2024

https://github.com/hunar4321/reweight-gpt

Reweight GPT - a simple neural network using transformer architecture for next character prediction

algorithms data-science gpt language-model machine-learning nerual-networks numpy pytorch

Last synced: 14 Nov 2024

https://github.com/ak-coram/cl-duckdb

Common Lisp CFFI wrapper around the DuckDB C API

c-bindings common-lisp data-science duckdb lisp olap parquet sql

Last synced: 13 Nov 2024

https://github.com/ActuariesInstitute/cookbook

Data and analytics cookbook for actuaries

actuarial analytics data-science hacktoberfest

Last synced: 08 Aug 2024

https://github.com/tstreamdoth/instacart-market-basket-analysis

Use Instacart public dataset to report which products are often shopped together. 🍋🍉🥑🥦

data-analysis data-science instacart market-basket-analysis

Last synced: 28 Oct 2024

https://github.com/root-11/tablite

multiprocessing enabled out-of-memory data analysis library for tabular data.

data-analysis data-science datatype disk etl excel filereader pandas pivot-tables python table tabular-data

Last synced: 11 Oct 2024

https://github.com/rafzamb/sknifedatar

sknifedatar is a package that serves primarily as an extension to the modeltime 📦 ecosystem. In addition to some functionalities of spatial data and visualization.

data data-analysis data-science data-visualization forecasting r statistics time-series

Last synced: 05 Aug 2024

https://github.com/megagonlabs/ruler

Data Programming by Demonstration (DPBD) for Document Classification

data-labeling data-programming data-science machine-learning training-data weak-supervision

Last synced: 10 Nov 2024

https://github.com/aachartmodel/aachartkit-swift-pro

📈📊👑👑👑AAChartKit-Swift-Pro is a professional version of AAChartKit-Swift, it is an elegant and friendly chart framework for iOS, iPadOS, macOS. AAChartKit-Swift-Pro is a more powerful data visualization framework that supports more types beautiful chart like bellcurve, bullet, columnpyramid, cylinder, dependencywheel, heatmap, histogram, networkgraph, organization, packedbubble, pareto, sankey, series, solidgauge, streamgraph, sunburst, tilemap, timeline, treemap, variablepie, variwide, vector, venn, windbarb, wordcloud, xrange charts and so on.

aacharts chart charting-library data-science data-visualization framework highcharts hybrid ios ipados macos plot swift webview

Last synced: 07 Nov 2024

https://github.com/scrapinghub/page_clustering

A simple algorithm for clustering web pages, suitable for crawlers

data-science

Last synced: 10 Nov 2024

https://github.com/datakitchen/dataops-observability

DataOps Observability is part of DataKitchen's Open Source Data Observability. DataOps Observability monitors every data journey from data source to customer value, from any team development environment into production, across every tool, team, environment, and customer so that problems are detected, localized, and understood immediately.

data data-engineering data-observability data-science dataops pipleine-monitoring

Last synced: 06 Nov 2024

https://github.com/benjaminmbrown/real-time-data-viz-d3-crossfilter-websocket-tutorial

Tutorial on real-time data visualization. Python websocket server & d3.js + crossfilter.js frontend

crossfilter d3 d3js data-science data-visualization dcjs tutorial websockets

Last synced: 06 Aug 2024

https://github.com/chandrikadeb7/coursera_ibm_data_science_professional_certificate

This repo consists of the lecture PDFs and quiz solutions of all the courses under the IBM Data Science Professional Certificate specialization course of Coursera.

coursera coursera-assignment coursera-data-science coursera-solutions coursera-specialization data-science ibm ibm-data-science jupyter-notebook lecture-pdfs professional-certificates quiz-solutions solutions specialization

Last synced: 09 Nov 2024

https://github.com/lamres/capm_shiny

Demo project of creating an interactive analytical tool for stock market using CAPM.

capm data-science r shiny shinyapps stock-market stocks time-series

Last synced: 13 Aug 2024

https://github.com/sayakpaul/benchmarking-and-mli-experiments-on-the-adult-dataset

Contains benchmarking and interpretability experiments on the Adult dataset using several libraries

data-science fastai h2oai interpretable-machine-learning machine-learning microsoft-interpret tensorflow

Last synced: 22 Oct 2024

https://github.com/dfinke/PSDuckDB

PSDuckDB is a PowerShell module that provides seamless integration with DuckDB, enabling efficient execution of analytical SQL queries directly from the PowerShell environment.

data-analysis data-science duckdb powershell sql

Last synced: 23 Aug 2024

https://github.com/ww-tech/primrose

Primrose modeling framework for simple production models

dag data-science datascience deployment machine-learning primrose python workflows

Last synced: 18 Nov 2024

https://github.com/ivan-bilan/nlp-and-data-science-spotlights

Regular spotlights of underrated NLP and Data Science GitHub repositories

data-science deep-learning natural-language-processing nlp spotlight

Last synced: 08 Nov 2024

https://github.com/scicloj/scicloj-data-science-handbook

Clojure data science handbook - journal style examples of data science

clojure clojurescript data-science notebook scicloj

Last synced: 15 Nov 2024

https://github.com/wri-dssg-omdena/policy-data-analyzer

Building a model to recognize incentives for landscape restoration in environmental policies from Latin America, the US and India. Bringing NLP to the world of policy analysis through an extensible framework that includes scraping, preprocessing, active learning and text analysis pipelines.

active-learning bert data-science document-classification environmental huggingface incentives landscape-restoration lda machine-learning nlp policy sbert scraping scrapy sentence-transformers spyder text-classification topic transformers

Last synced: 30 Oct 2024

https://github.com/braph-software/BRAPH-2

BRAPH 2.0 is a comprehensive software package for the analysis and visualization of brain connectivity data, offering flexible customization, rich visualization capabilities, and a platform for collaboration in neuroscience research.

biomedical-engineering brain-connectivity-analysis brain-research computational-neuroscience connectomics data-analysis data-science data-visualization deep-learning graph-theory machine-learning matlab network-analysis neuroimaging neuroscience open-source reproducible-research research-tools scientific-software toolbox

Last synced: 12 Nov 2024

https://github.com/mainakrepositor/parkinsons-detector

Detect the onset of possible risk of Parkinson's disease with the help of clinical data using Machine Learning Models.

data-science data-visualization decision-tree-classifier medical-application mini-project parkinsons-disease python-3 random-forest-classifier slider-component streamlit-webapp

Last synced: 12 Nov 2024

https://github.com/virgesmith/ukcensusapi

UK Census Data queries and downloads from python or R

data-science python r

Last synced: 27 Oct 2024

https://github.com/mlsanigeria/ai-hacktober-mlsa

Contributing to cutting-edge open-source projects in Machine Learning hosted by MLSA Nigeria

artificial-intelligence data-science hacktoberfest machine-learning microsoft-azure mlsa open-source python

Last synced: 06 Nov 2024

https://github.com/IMSoley/cs-study-plan

📚👨‍🎓 Resources I'm using everyday to develop my skills to become a self-taught good programmer ...

artificial-intelligence computer-science data-science data-structures-and-algorithms higher-education machine-learning web-development

Last synced: 04 Aug 2024

https://github.com/center-for-threat-informed-defense/sightings_ecosystem

Sightings Ecosystem gives cyber defenders visibility into what adversaries actually do in the wild. With your help, we are tracking MITRE ATT&CK® techniques observed to give defenders real data on technique prevalence.

ctid cyber-threat-intelligence cybersecurity data-science data-visualization mitre-attack

Last synced: 07 Nov 2024

https://github.com/noaa-mdl/grib2io

Python interface to the NCEP G2C Library for reading and writing GRIB2 messages.

atmospheric-science data-science grib2 grib2-decoder grib2-encoder grib2-tables meteorology ncep ncep-grib2 ndfd-grib2 numpy python python3 weather weather-data

Last synced: 11 Nov 2024

https://github.com/martinfleis/sdsc21-workshop

Materials for SDSC 2021 Workshop

data-science python workshop

Last synced: 28 Oct 2024

https://github.com/tjmahr/madr_pipelines

Slides and materials for my talk to the Madison R Users Group

data-science dplyr magrittr presentation r

Last synced: 12 Nov 2024

https://github.com/ahammadmejbah/ai-cheat-sheet

The replication of human intellectual processes by machines, most notably computer systems, is referred to as artificial intelligence (AI for short). Expert systems, natural language processing, voice recognition, and machine vision are all examples of specific uses of artificial intelligence.

cheatsheet data-science deep-learning machine-learning neural-networks

Last synced: 11 Nov 2024

https://github.com/maneprajakta/honours-in-data-science

Resources and Implementation Of Assignment For Honours In Data Science

assignment-solutions data-science honours resources sppu

Last synced: 27 Oct 2024

https://github.com/tirthajyoti/mlr

Multiple linear regression with statistical inference, residual analysis, direct CSV loading, and other features

analytics data-analytics data-science linear-regression machine-learning modeling predictive-modeling python regression scikit-learn statiscal-learning statistical-analysis statistics

Last synced: 12 Oct 2024

https://github.com/Smat26/Roman-Urdu-Dataset

Compilation of Manually Tagged Roman Urdu Dataset (Urdu written in Latin/Roman Script), along with other helpful Roman Urdu NLP resources

data-science dataset hindi hindi-language natural-language-processing nlp urdu urdu-language urdu-nlp

Last synced: 18 Nov 2024

https://github.com/epistasislab/rebate

Relief Based Algorithms of ReBATE implemented in Python with Cython optimization. This repository is no longer being updated. Please see scikit-rebate.

cython data-science feature-selection

Last synced: 16 Nov 2024

https://github.com/racinmat/mal-analysis

github repo for MyAnimeList analysis. Also links to the MAL dataset.

analysis anime crawling data-science kaggle-dataset mal scraped-data

Last synced: 06 Nov 2024

https://github.com/thudm/kdd-industrial-papers

A list of recent industrial papers in KDD'16–'18

data-mining data-science kdd paper-list

Last synced: 14 Nov 2024

https://github.com/openghg/openghg

A cloud platform for greenhouse gas (GHG) data analysis and collaboration.

analysis cloud collaboration data-science greenhouse-gas

Last synced: 14 Nov 2024

https://github.com/m-clark/data-processing-and-visualization

This document forms the basis of several workshops/talks that get into everyday programming with R, but also includes mirrored code in Python as Jupyter notebooks.

data-processing data-science datatable dplyr ggplot2 htmlwidgets jupyter-notebooks machine-learning model-criticism modeling numpy pandas programming programming-exercises python r tidyverse visualization workshop workshops

Last synced: 08 Aug 2024

https://github.com/petersontylerd/mlmachine

mlmachine accelerates machine learning experimentation

data-analysis data-science data-visualization machine-learning python

Last synced: 13 Nov 2024

https://github.com/timkpaine/perspective-parquet

Parquet file reader and editor in Jupyterlab, built with `perspective` for pivoting, filtering, aggregating, etc

data-science data-visualization datavisualization dataviz jupyter jupyterlab jupyterlab-extension jupyterlab-extensions parquet parquet-viewer perspective pivot-tables

Last synced: 27 Oct 2024

https://github.com/kwokhing/yandexcatboost-python-demo

Demo on the capability of Yandex CatBoost gradient boosting classifier on a fictitious IBM HR dataset obtained from Kaggle. Data exploration, cleaning, preprocessing and model tuning are performed on the dataset

catboost data-analysis data-preprocessing data-science feature-selection gradient-boosting gradient-boosting-classifier one-hot-encode pandas pearson-correlation python python27 seaborn variance-analysis visualization yandex-catboost

Last synced: 12 Oct 2024

https://github.com/iesahin/xvc

A robust (🐢) and fast (🐇) MLOps tool for managing data and pipelines in Rust (🦀)

command-line-tool data data-engineering data-pipelines data-science devops machine-learning machine-learning-engineering mlops rust

Last synced: 11 Nov 2024

https://github.com/ericmjl/pyds-cli

Helping you manage your data science projects sanely.

data-science workflow-automation

Last synced: 31 Oct 2024