Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/m-clark/data-processing-and-visualization

This document forms the basis of several workshops/talks that get into everyday programming with R, but also includes mirrored code in Python as Jupyter notebooks.

data-processing data-science datatable dplyr ggplot2 htmlwidgets jupyter-notebooks machine-learning model-criticism modeling numpy pandas programming programming-exercises python r tidyverse visualization workshop workshops

Last synced: 08 Aug 2024

https://github.com/ericmjl/pyds-cli

Helping you manage your data science projects sanely.

data-science workflow-automation

Last synced: 31 Oct 2024

https://github.com/mam-dev/debianized-jupyterhub

:package: ♃ Debian packaging of JupyterHub, a multi-user server for Jupyter notebooks

data-science debian-packages deployment devops dh-virtualenv jupyter-notebook jupyterhub omnibus-packages python-3

Last synced: 11 Oct 2024

https://github.com/bluegreen-labs/daymetr

An R Interface to the Daymet Web Services

climate-data data-science daymet gridded-data netcdf ornl-daac r-package rstats

Last synced: 08 Aug 2024

https://github.com/nitya/pydata-analysis-workshop

Step-by-step workshop for the "Simplifying Data Analysis" talk

data-science data-visualization python workshop

Last synced: 29 Oct 2024

https://github.com/hendersontrent/gam.jl

Fit, evaluate, and visualise generalised additive models (GAMs) in native Julia

data-science generalized-additive-models machine-learning regression statistical-models statistics

Last synced: 05 Nov 2024

https://github.com/lenguyenthedat/minimal-datascience

This repository contains all the code and dataset used in my blog series: Minimal Data Science

blog-series data-science kaggle machine-learning python scikit-learn xgboost

Last synced: 08 Nov 2024

https://github.com/jezcope/pyrefine

Execute OpenRefine JSON scripts without OpenRefine (or Java)

data-science data-wrangling openrefine python

Last synced: 06 Nov 2024

https://github.com/alagoa/youtube-or-pornhub

Service identification on ciphered traffic.

capture data-science machinelearning ml pcap python3 spotify traffic tshark youtube

Last synced: 06 Nov 2024

https://github.com/hackersandslackers/pandas-sqlalchemy-tutorial

:panda_face: :computer: Load or insert data into a SQL database using Pandas DataFrames.

data-analysis data-science dataframes pandas pandas-sqlalchemy-tutorial python sql-database sqlalchemy tutorial

Last synced: 16 Nov 2024

https://github.com/datasnakes/orthoevolution

An easy to use and comprehensive python package which aids in the analysis and visualization of orthologous genes. 🐵

bash bioinformatics biology biosql blast data-science ftp genetics ncbi orthologs orthologues orthology orthology-inference pbs phylogenetics python qsub sequence-alignment sge shell

Last synced: 16 Nov 2024

https://github.com/sandialabs/mews

Multi-scenario Extreme Weather Simulator (MEWS)

application data-science scr-2664 snl-applications

Last synced: 12 Nov 2024

https://github.com/dylan-profiler/compressio

Lossless in-memory compression of pandas DataFrames and Series powered by the visions type system. Up to 10x less RAM needed for the same data.

compression data-science dtype hacktoberfest pandas python types

Last synced: 16 Nov 2024

https://github.com/philips-software/latrend

An R package for clustering longitudinal datasets in a standardized way, providing interfaces to various R packages for longitudinal clustering, and facilitating the rapid implementation and evaluation of new methods

cluster-analysis clustering-evaluation clustering-methods data-science longitudinal-clustering longitudinal-data mixture-models r r-package time-series-analysis

Last synced: 17 Nov 2024

https://github.com/akabe/docker-iocaml-datascience

Dockerfile of Jupyter (IPython notebook) and IOCaml (OCaml kernel) with libraries for data science and machine learning

data-science deep-learning docker functional-programming iocaml jupyter-notebook machine-learning ocaml

Last synced: 30 Oct 2024

https://github.com/kairen/learning-spark

Tidy up Spark and Hadoop tutorials.

bigdata data-science hadoop spark

Last synced: 30 Oct 2024

https://github.com/koheiw/proxyc

R package for large-scale similarity/distance computation

data-science distance-measures r similarity-measures

Last synced: 05 Nov 2024

https://github.com/noahgift/devml

Product of Pragmatic AI Labs: Machine Learning, Statistics and Utilities around Developer Productivity, Company Productivity and Project Productivity

ai churn-statistics data-science defects git github jupyter-notebook machine-intelligence machine-learning pandas productivity python seaborn visualization

Last synced: 07 Nov 2024

https://github.com/hemansnation/machine-learning-mlops-generativeai-nlp-cv-mlsystem-design

MLOps - Deploy models at scale, Generative AI - Build applications with LLMs, NLP - Understand Transformers & Text Generation Models, Computer Vision - Build GANs projects like Deepfakes, ML System Design, hands-on project building and code algorithms from scratch.

computer-vision data-science deep-learning generative-ai machine-learning natural-language-processing python

Last synced: 08 Nov 2024

https://github.com/outerbounds/metaflowbot

Slack bot for monitoring your Metaflow flows!

data-science metaflow ml mlops slack slack-bot

Last synced: 10 Nov 2024

https://github.com/raybellwaves/cfanalytics

Downloading, analyzing and visualizing CrossFit data

crossfit crossfit-games data-frames data-science python

Last synced: 08 Nov 2024

https://github.com/vincentauriau/tennis-prediction

Predicts the winner of a tennis match with machine learning

atp data data-science machine-learning tennis

Last synced: 16 Nov 2024

https://github.com/mainakrepositor/data-analysis

Different types of data analytics projects : EDA, PDA, DDA, TSA and much more.....

data-analysis data-science deeplearning machine-learning-algorithms neural-networks time-series-analysis tsa

Last synced: 12 Nov 2024

https://github.com/datapane/examples

Datapane Examples

data-science datapane jupyter python

Last synced: 09 Aug 2024

https://github.com/oldratlee/data-science-practice

数据科学实践 | data science practice

anaconda data-science python statistics

Last synced: 12 Oct 2024

https://github.com/0x0be/scrapeadvisor

A user-friendly python-based GUI which provides sentiment analysis of users' reviews toward a specific TripAdvisor facility

data-mining data-science python3 r scraping sentiment-analysis sentiment-classification text-mining tripadvisor tripadvisor-scraper web-scraping

Last synced: 04 Nov 2024

https://github.com/klaus78/data-science-flashcards

A large collection of challenges on Data Science and Machine Learning.

data-science hacktoberfest jekyll-website machine-learning python

Last synced: 11 Oct 2024

https://github.com/azure/aml-run

GitHub Action that allows you to submit a run to your Azure Machine Learning Workspace.

aml azure azure-machine-learning data-science machine-learning mlops

Last synced: 07 Oct 2024

https://github.com/theengineeringworld/python-data-science

Python Data Science has all the data sets and jupyter notebook files for the Youtube course at http://youtube.com/theengineeringworld under the name of " Python Data Science Course ".

data data-analysis data-mining data-science data-visualization jupyter-notebook jupyter-notebooks machine-learning python python27

Last synced: 12 Oct 2024

https://github.com/rasbt/hbind

Calculates hydrogen-bond interaction tables for protein-small molecule complexes, based on protein PDB and protonated ligand MOL2 structure input. Raschka et al. (2018) J. Computer-Aided Molec. Design

bioinformatics computational-biology data-science hydrogen-bonds protein-ligand-interfaces

Last synced: 22 Oct 2024

https://github.com/SOM-Research/DescribeML

DescribeML is a Visual Studio Code language plug-in to describe machine-learning datasets in a structured format. Build better data describing the composition, provenance and social concerns of your dataset.

data-science dataset-generation datasets describeml langium machine-learning model-driven modeling open-data open-datasets visual-studio-code vscode

Last synced: 10 Oct 2024

https://github.com/Azure/aml-run

GitHub Action that allows you to submit a run to your Azure Machine Learning Workspace.

aml azure azure-machine-learning data-science machine-learning mlops

Last synced: 13 Aug 2024

https://github.com/mkcor/advanced-pandas

Pandas is a powerful tool for data exploration and analysis (including timeseries).

data-analysis data-science labeled-data notebooks python3 teaching-materials

Last synced: 16 Oct 2024

https://github.com/arthurpaulino/miraiml

MiraiML: asynchronous, autonomous and continuous Machine Learning in Python

data-science hyperparameter-optimization machine-learning python

Last synced: 08 Nov 2024

https://github.com/dayyass/graph-based-clustering

Graph-Based Clustering using connected components and spanning trees.

clustering data-science graph graph-algorithms hacktoberfest machine-learning python sklearn

Last synced: 07 Nov 2024

https://github.com/computationalcore/introduction-to-python

A very useful collection of Jupyter Notebooks, which aims to introduce the Python programming language.

data-analysis data-science fundamental google-colab jupyter-notebook jupyter-notebooks numpy pandas python python-language python-programming python3

Last synced: 10 Nov 2024

https://github.com/amadeusitgroup/cpmml

cPMML is C++ library for scoring machine learning models serialized with the Predictive Model Markup Language (PMML)

ai data-science machine-learning ml model-deployment model-scoring pmml

Last synced: 10 Nov 2024

https://github.com/denadai2/google_street_view_deep_neural

Deep Neural Network model to predict security perception from Google Street View images. Model based on AlexNet CNNs

computational-social-science computer-vision data-science deep-learning urban-planning urban-science

Last synced: 27 Oct 2024

https://github.com/thomasnield/bayes_user_input_prediction

Demonstration of using Naive Bayes to predict user inputs with Kotlin 1.2 std-lib

bayes bayes-classifier data-science kotlin

Last synced: 30 Oct 2024

https://github.com/city-of-helsinki/mlops-template

Generic repository template for small scale MLOps

data-science datascience machine-learning machinelearning mlops python

Last synced: 18 Nov 2024

https://github.com/rjbergerud/open-source-for-common-good

A list I'm keeping of active open source projects that serve a social or environmental goal.

citizen-science civic-tech community data-science humanity non-profit social social-impact sustainability

Last synced: 13 Nov 2024

https://github.com/soodoku/data-science

Lecture Slides for Introduction to Data Science

data-science statistical-learning

Last synced: 25 Oct 2024

https://github.com/nneji123/credit-card-fraud-detection

Credit Card Fraud Detection App built with Streamlit, FastAPI and Docker.

credit-card data-science deployment docker docker-compose fastapi fraud-detection machine-learning streamlit

Last synced: 13 Nov 2024

https://github.com/staircase-dev/piso

Pandas Interval Set Operations: providing methods for set operations, analytics, lookups and joins on pandas' Interval, IntervalArray and IntervalIndex

data-analysis data-science data-structures interval interval-arithmetic interval-set pandas set set-operations set-theory

Last synced: 16 Nov 2024

https://github.com/hugoblox/theme-markdown-slides

🎙 在 Markdown 中创建漂亮的演示文稿。Write, share, and present your slides using the open, future-proof Markdown standard

blogdown data-science hugo hugo-learn-theme hugo-theme jupyter latex-math lms markdown markdown-slides mermaid obsidian obsidian-publish r reveal-js rstudio slides slideshow-maker static-site-generator theme

Last synced: 09 Nov 2024

https://github.com/mpds-io/mpds-api

Tutorials, notebooks, issue tracker, and website on the MPDS API: the data retrieval interface for the Materials Platform for Data Science

calphad crystal-structure crystallography data-science materials materials-informatics materials-platform materials-science mpds-api mpds-platform phase-diagram phase-diagrams

Last synced: 06 Nov 2024

https://github.com/anitagraser/eda-protocol-movement-data

Step-by-step exploratory movement data analysis protocol in a Jupyter notebook

data-quality-assessment data-science exploratory-data-analysis movement-data

Last synced: 10 Nov 2024

https://github.com/hemansnation/python-for-data-professionals

This course is designed to get a good grip on python programming, logic building, solving algorithm-based questions, data structures, understanding of data analytics, working with pandas, professional practices, and API building.

data-analytics data-professionals data-science exploratory-data-analysis logic-programming machine-learning pandas python

Last synced: 08 Nov 2024

https://github.com/lourd/react-google-sheet

Pulling data from Google Sheets with React components

api-client data-science google-sheets javascript react spreadsheets

Last synced: 14 Oct 2024

https://github.com/microsoft/autobrewml

With AutoBrewML Framework the time it takes to get production-ready ML models with great ease and efficiency highly accelerates.

anomaly-detection azure-automl cleansing-data data-science datavisualization machine-learning microsoft nlp-machine-learning responsible-ml sampling-strategies text-analysis text-classification text-summarization

Last synced: 08 Nov 2024

https://github.com/goplus/pandas

Flexible and powerful data analysis / manipulation library for Go+, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

data-analysis data-science data-tech go golang gop goplus pandas scientific-computing

Last synced: 12 Nov 2024

https://github.com/thechymera/behaviopy

Behavioral data analysis and plotting in Python.

animal-behavior biomedical data-science foss multimodality plotting

Last synced: 16 Nov 2024

https://github.com/dhaitz/data-science-links

A curated list of links to great data science articles, videos, ...

agile ai artificial-intelligence career-advice data-science data-scientists machine-learning

Last synced: 11 Nov 2024

https://github.com/florents-tselai/pandas-sets

Set-oriented Operations in Pandas

data-science pandas set-operations sets

Last synced: 31 Oct 2024

https://github.com/luiscib3r/solar-rad-forecasting

In these notebooks the entire research and implementation process carried out for the construction of various machine learning models based on neural networks that are capable of predicting levels of solar radiation is captured given a set of historical data taken by meteorological stations.

convolutional-neural-networks data-science deep-learning forecasting machine-learning rnn rnn-tensorflow

Last synced: 05 Nov 2024