Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Data Science
Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.
- GitHub: https://github.com/topics/data-science
- Wikipedia: https://en.wikipedia.org/wiki/Data_science
- Related Topics: data-analysis, data-mining, machine-learning, big-data, data-visualization,
- Aliases: datasciences, data-science-project, data-science-algorithm,
- Last updated: 2024-11-19 00:06:52 UTC
- JSON Representation
https://github.com/kongruksiamza/python-datascience
เอกสารประกอบการสอนเนื้อหา Python - Data Science และงานด้าน Machine Learning
data-analysis data-science numpy pandas python
Last synced: 09 Nov 2024
https://github.com/kennethleungty/Text-to-Audio-with-Bark
Exploring Bark, the Open-Source Text-to-Audio Generative Model
ai artificial-intelligence bark data-science deep-learning gen-ai generative-ai machine-learning prompt-engineering speech text-prompt text-to-audio text-to-music text-to-sound text-to-speech
Last synced: 27 Oct 2024
https://github.com/afondiel/cs-books
Computer science books from algorithms, data structure, programming, to data science, AI and much more.
ai books computer-science computer-science-books computer-vision computer-vision-books data-science data-structures dl image-processing ml programming
Last synced: 06 Nov 2024
https://github.com/mad-lab-fau/tpcp
Pipeline and Dataset helpers for complex algorithm evaluation.
algorithms biosignals data-management data-science machine-learning python
Last synced: 12 Nov 2024
https://github.com/chifisource/oddframes.jl
The unique data management platform for Julia
data data-science julia machine-learning
Last synced: 12 Nov 2024
https://github.com/wyfunique/DBSim
The codebase for DBSim
data-science database in-database in-database-analytics query-optimizer sql-parser sql-query
Last synced: 11 Nov 2024
https://github.com/mukeshmithrakumar/hackerranksolutions
My HackerRank Solutions for Python, Java, C, C++, Shell, SQL, JavaScript and Interview Preparation Kit
bash c cpp data-science hackerrank hackerrank-solutions interview-preparation interview-questions java javascript linux-shell machine-learning python python3 shell software sql
Last synced: 16 Nov 2024
https://github.com/getyourguide/db-rocket
Keep your local python scripts installed and in sync with a databricks notebook. Shortens the feedback loop to develop projects using a hybrid environment.
data-science databricks productivity python
Last synced: 14 Nov 2024
https://github.com/silvanmelchior/cme_parser
A tiny parser for more flexible conda environment files
cme-parser conda conda-environment data-science meta-environment parser python
Last synced: 27 Oct 2024
https://github.com/Absolventa/iruby-chartkick
Minimalistic wrapper around chartkick for using it within iruby
chartkick data-science iruby rubydatascience visualization
Last synced: 14 Nov 2024
https://github.com/rugk/crops-parser
🌱🍎🍆 A shell script to parse the data by the Food and Agriculture Organization of the United Nations on crops/fruits.
agriculture agriculture-research crop crops data-analysis data-science food fruit fruits statistics streetcomplete tree vegetables
Last synced: 23 Oct 2024
https://github.com/hoangsonww/global-covid19-analysis
🌍 This repository hosts an in-depth analysis of COVID-19's impact across five key countries from Jan 2020 to Dec 2021. Through advanced data analysis and visualization, we aim to provide insights into how the pandemic evolved differently across these nations, shedding light on the effectiveness of various health measures and vaccination campaigns.
covid covid-19 covid19-tracker data data-analysis data-analytics data-science data-visualization ggplot2 julia julia-language python r r-language r-markdown r-programming sas sas-programming stata vaccination
Last synced: 12 Oct 2024
https://github.com/m-muecke/awesome-data-science
Data science and programming resources for daily work
awesome-list bash data-science linux machine-learning python r r-programming sql
Last synced: 28 Oct 2024
https://github.com/jonnor/datascience-master
Journal/notes/log of my Masters in Data Science degree
data-science homework machine-learning
Last synced: 23 Oct 2024
https://github.com/facultyai/ipydataclean
Interactive cleaning for Pandas DataFrames
data-cleaning data-science dataframe jupyter-notebook pandas
Last synced: 28 Oct 2024
https://github.com/phantominsights/covid-19
Data ETL & Analysis on the global and Mexican datasets of the COVID-19 pandemic.
covid-19 data-science etl matplotlib numpy pandas python3 requests seaborn
Last synced: 11 Nov 2024
https://github.com/nelsonmestevao/uminho
:books: University projects, exercises & notes
c cpp data-science distributed-systems haskell java software-engineering
Last synced: 11 Oct 2024
https://github.com/reddyprasade/python-basic-for-all-3.x
We are going to Learn Python, it is a powerful multi-purpose programming language created by Guido van Rossum. It has simple easy-to-use syntax, making it the perfect language for someone trying to learn computer programming for the first time. This is a comprehensive guide on how to get started in Python, why you should learn it and how you can learn it. However, if you knowledge of other programming languages and want to quickly get started with Python.
comprehensive-guide data-science knowledge perfect-language programming-languages python python-3 python-3-6 python3
Last synced: 12 Oct 2024
https://github.com/nrennie/data-science-resources
Resources relating to data science.
Last synced: 28 Oct 2024
https://github.com/srowen/cdsw-simple-serving
Modeling Lifecycle with ACME Occupancy Detection and Cloudera
cloudera cloudera-data-science data-science openscoring pmml workbench
Last synced: 01 Oct 2024
https://github.com/ma7555/kerasgen
A Keras/Tensorflow compatible image data generator for TripletLoss
data-generation data-generator data-generators data-science keras keras-tensorflow tensorflow triplet triplet-loss triplet-neural-network
Last synced: 23 Oct 2024
https://github.com/evoluteur/kaggle-look-alike
Kaggle Data Explorer UI look-alike build with React.
data data-analysis data-engineering data-exploration data-mining data-platform data-science datascience exploratory-data-analysis explorer front-end frontend kaggle react spa
Last synced: 13 Nov 2024
https://github.com/gramian/hapod
HAPOD - Hierarchical Approximate Proper Orthogonal Decomposition
data-driven data-reduction data-science datascience dimension-reduction distributed-memory high-performance-computing hpc limited-memory mapreduce mapreduce-algorithm model-order-reduction model-reduction pca pod proper-orthogonal-decomposition svd unsupervised-learning
Last synced: 13 Nov 2024
https://github.com/brpy/ml-books
A list of freely available Machine Learning related books.
books data-science free freely machine-learning statistics
Last synced: 05 Nov 2024
https://github.com/tamasgal/thepipe
A simplistic, general purpose pipeline framework.
data-processing data-processing-pipelines data-science hacktoberfest pipelines provenance python
Last synced: 28 Oct 2024
https://github.com/mratsim/humpback-whale-identification
Kaggle Humpback whale identification: 2xGPU Data augmentation + FP16 mixed precision training
computer-vision data-science identification kaggle pytorch
Last synced: 23 Oct 2024
https://github.com/scikit-learn/blog
Hosting the scikit-learn blog.
community data-science machine-learning open-source python scikit-learn
Last synced: 07 Oct 2024
https://github.com/primaprashant/ai-customer-support
📚 Curated collection of blogs and papers on how different companies are using machine learning in production for better customer support.
ai applied-data-science applied-machine-learning applied-ml artificial-intelligence customer-service customer-support data-science deep-learning machine-learning natural-language-processing nlp paper production tech-blog
Last synced: 07 Nov 2024
https://github.com/nceas/nceas-training
Training materials and modules from R-based data science short courses at NCEAS
Last synced: 12 Nov 2024
https://github.com/codelibs/fione
Fione is Enterprise AI Platform
ai automl data-science machine-learning
Last synced: 30 Oct 2024
https://github.com/omarsar/data_mining_2017_fall_lab
Contains information and instructions for the first Data Mining lab session for 2017 Fall.
data data-analysis data-mining data-science data-visualization
Last synced: 13 Oct 2024
https://github.com/privacy-tech-lab/cross-device-tracking
Data and software for cross-device tracking data collection
cross-device-tracking data-science internet-tracking privacy privacy-tech
Last synced: 09 Nov 2024
https://github.com/amey-thakur/python-crash-course
IIT ROPAR - Diginique Techlabs --> Data Science Machine Learning and AI using Python
ai amey ameythakur data-science data-science-projects house-price-prediction machine-learning python python-crash-course
Last synced: 09 Nov 2024
https://github.com/openbridge/ob_pysh-db
pysh-db - The Data Science Toolkit (DSK)
bash data-science mysql postgres python redshift sql
Last synced: 14 Nov 2024
https://github.com/lintangwisesa/python_fundamental_datascience
Python 🐍 for Jr Data Scientist 📈📊📉
data-science machine-learning python
Last synced: 11 Nov 2024
https://github.com/afondiel/computer-science-notebook
Essential computer science notes and resources for software engineers/developers of all levels.
ai computer-science control-laws data-science deep-learning design-patterns devops iot machine-learning notebook open-catalog reinforcement-learning research-notes robotics ros software-engineering
Last synced: 06 Nov 2024
https://github.com/giswqs/streamlit-mapbox
A Streamlit Component for rendering Mapbox GL JS
data-science geospatial mapping streamlit streamlit-component streamlit-webapp
Last synced: 14 Oct 2024
https://github.com/nicbet/infozilla
The infoZilla unstructured software engineering data mining tool. It can find and extract source code regions, patches, stack traces, enumerations and itemizations from discussion threads.
bugreport bugzilla data-mining data-science tools unstructured-data
Last synced: 06 Nov 2024
https://github.com/alro10/roadmap-data-scientist
The basic roadmap to become a data scientist
analytics cognitive-courses data-science data-scientist docker ibm ibm-cloud kubernetes machine-learning python python3 roadmap roadmap-ds sql
Last synced: 19 Nov 2024
https://github.com/shivangraikar/datasciencevalue
Web application created using Streamlit to host an intelligent salary predictor. The project returns the position of the user in this particular field of Data Science.
data-science heroku-deployment logistic-regression machine-learning streamlit-webapp
Last synced: 08 Nov 2024
https://github.com/rsalmei/tsp-essay
A fun study of some heuristics for the Travelling Salesman Problem.
algorithms clustering data-science data-visualization greedy-algorithm greedy-nn-algorithm heuristic kmeans-algorithm kmeans-clustering logistics matplotlib numpy pandas traveling-salesman traveling-salesman-problem travelling-salesman travelling-salesman-problem tsp tsp-problem two-opt
Last synced: 13 Oct 2024
https://github.com/codait/flight-delay-notebooks
Analyzing flight delay and weather data using Elyra, IBM Data Asset Exchange, Kubeflow Pipelines and KFServing
codait data-science elyra jupyter jupyter-notebook jupyterlab kfserving kubeflow-pipelines machine-learning
Last synced: 09 Nov 2024
https://github.com/csinva/cookiecutter-ml-research
A logical, reasonably standardized, but flexible project structure for conducting ml research 🍪
ai artificial-intelligence classification data-science machine-learning ml ml-tooling modeling natural-language-processing nlp python regression research statistics tabular-data template
Last synced: 09 Nov 2024
https://github.com/mmbazel/predicting-kickstarter-campaign-outcomes-using-nlp-feature-engineering
Turning raw kickstarter text data => Campaign predictions using SpaCy, Scikit-learn, SQLAlchemy, SQLite3 & XGBoost Classifier (feat eng = Bag-of-Words, Tfdvectorizer)
classification data-science feature-engineering kickstarter-campaigns nlp nlp-feature-engineering nlp-machine-learning springboard springboard-career-track springboard-data-science springboard-projects sqlite-database sqlite3
Last synced: 18 Nov 2024
https://github.com/shwetajoshi601/world-bank-data-analysis
An Exploratory Data Analysis on the World Bank Dataset.
analysis data-science eda python3 world-bank-api worldbank
Last synced: 16 Nov 2024
https://github.com/greenelab/gbm_immune_validation
Validating glioblastoma immune cell immunohistochemsitry using computational deconvolution of TCGA tumors
analysis cancer data-science gene-expression glioblastoma machine-learning survival-analysis tool
Last synced: 13 Nov 2024
https://github.com/dawievlill/datascience-871
Data science module for economists written mostly in Julia and R
data-analysis data-science machine-learning
Last synced: 11 Nov 2024
https://github.com/earth-artificial-intelligence/earth_ai_book_materials
The repo contains the source code, notebooks, and technical resources that assist students to read the book Artificial Intelligence in Earth Science.
data-science earth-science machine-learning python
Last synced: 15 Nov 2024
https://github.com/jbris/time_series_anomaly_detection_examples
Several examples of anomaly detection algorithms for time series data.
anomaly-detection anomaly-detection-algorithm data-science docker grafana grafana-influxdb influxdb influxdb-grafana machine-learning machine-learning-algorithms python r rstudio statistics telegraf tensorflow tensorflow-examples tensorflow-tutorials time-series time-series-analysis
Last synced: 13 Nov 2024
https://github.com/sravb/nba-predictive-analytics
Being able to perform gameplay analysis of NBA players, NBA Predictive Analytics is a basketball coach's new best friend.
basketball data-mining data-science data-visualization decision-tree k-nearest-neighbors kaggle-dataset machine-learning matplotlib nba-analytics pandas predictive-analytics python scikit-learn scipy
Last synced: 17 Nov 2024
https://github.com/opengeos/streamlit-map-template
A streamlit template for mapping applications
data-science geospatial mapping python streamlit
Last synced: 11 Nov 2024
https://github.com/brunocampos01/predicting-retail-churn-with-azure-ml-studio
Challenge to job: Data Scientist
api-rest azure azure-machine-learning-services azure-machine-learning-studio azure-pipelines challenge cheat-sheet-machine-learning cheat-sheets data-engineering data-engineering-pipeline data-science data-scientist deploy-machine-learning machine-learning machine-learning-studio pandas powerbi python python3 softplan
Last synced: 16 Nov 2024
https://github.com/alan-turing-institute/hds-discussiongroup
Repo of the Turing's Humanities & Data Science Discussion Group
data-science digital-humanities discussion-group
Last synced: 13 Nov 2024
https://github.com/latentcat/network-vis
WIP. Visualization of social networks. 社交网络可视化。
complex-networks data data-science graph graph-visualization social-media social-network-analysis visualization wechat
Last synced: 17 Nov 2024
https://github.com/uscbiostats/pm566
USC's Introduction to Health Data Science
course course-materials data-science datascience dataviz machine-learning rstats webscraping
Last synced: 11 Nov 2024
https://github.com/tomaztk/List_of_R_packages_for_Data_scientist
List of useful R packages for data scientists
data-science r r-language r-markdown r-package r-programming statistics
Last synced: 13 Aug 2024
https://github.com/canagnos/mcp
Tools for Measuring Classification Performance for R, Python and Spark
artificial-intelligence classification data-mining data-science machine-learning machine-learning-algorithms
Last synced: 23 Oct 2024
https://github.com/takuti/anompy
A Python library for anomaly detection
anomaly-detection data-science forecasting machine-learning python
Last synced: 16 Oct 2024
https://github.com/wlandau/targets-keras
An example Keras pipeline with the targets R package
data-science keras pipeline r reproducibility reproducible-research rstats statistics targets workflow
Last synced: 27 Oct 2024
https://github.com/vinibrsl/internet-affordability
🌍 Dataset that shows the Internet affordability by country (a shocking reality!)
data-science dataset human-rights insights jupyter-notebook scraping
Last synced: 24 Oct 2024
https://github.com/khadkarajesh/internship-preparation-kit
Repository consist the technical and behavioural questions asked by french tech companies for internship
algorithm algorithms coding-interviews codinggame data-science data-structures data-structures-and-algorithms french hacktoberfest hacktoberfest-accepted hacktoberfest2022 internship interview interview-practice interview-preparation interview-questions interview-test leetcode python software-engineering
Last synced: 11 Oct 2024
https://github.com/ahammadmejbah/tensorflow-developers-roadmap
TensorFlow is an open-source machine learning framework developed by Google. It provides a versatile platform for creating and deploying machine learning models, particularly neural networks, enabling tasks like image recognition, natural language processing, and more.
computer-vision computervision data-analysis data-engineering data-science data-visualization machine-image machine-learning machine-learning-algorithms machine-vision tensorflow tensorflow-tutorials tensorflow2
Last synced: 10 Oct 2024
https://github.com/lydialucchesi/smallsets
Visual documentation for data preprocessing in R and Python
data-science data-visualization documentation-tool machine-learning preprocessing python r r-package visualization-tools
Last synced: 12 Oct 2024
https://github.com/smups/rustronomy
rustronomy - an astronomy data analysis toolkit written in rust
astronomy data-science physics rust rust-lang rust-library science
Last synced: 14 Oct 2024
https://github.com/ekote/build-your-first-end-to-end-lakehouse-solution
Build Your First End-to-End Lakehouse Solution (aka.ms/fabconlake)
apache-spark data-engineering data-factory data-pipeline data-science dataflows delta-lake lakehouse machine-learning microsoft-azure microsoft-fabric parquet powerbi tutorial warehouse workshop
Last synced: 12 Oct 2024
https://github.com/dmedri/roaster
R - Fetch, build and deploy.
build-tool data-science rstats statistical-analysis statistics virtual-environments
Last synced: 13 Aug 2024
https://github.com/open-risk/correlationmatrix
correlationMatrix is a Python powered library for the statistical analysis and visualization of correlations
correlation-analysis correlation-matrices data-analysis data-science statistics
Last synced: 13 Oct 2024
https://github.com/sjcobb/ai-duet-3d
3D music animation + machine learning (in development)
3d-animation 3d-audio 3d-game artificial-intelligence browser-game data-science data-visualization game-development generative-music javascript machine-learning music music-bot music-composition music-theory music-visualizer neural-network web-development youtube-channel
Last synced: 11 Oct 2024
https://github.com/h2oai/article-information-2019
Article for Special Edition of Information: Machine Learning with Python
data-science explainable-ai explainable-ml fairness-ai fairness-ml fairness-testing fatml iml interpretable-ai interpretable-machine-learning interpretable-ml machine-learning machine-learning-interpretability python xai
Last synced: 06 Nov 2024
https://github.com/zakroum-hicham/football-analysis-cv
This repository contains a computer vision/machine learning football project that uses YOLO for object detection, Kmeans for pixel segmentation, and perspective transformation to analyze player movements in football videos
ai computer-vision data-science football-analytics kmeans-clustering machine-learning opencv yolov8
Last synced: 29 Oct 2024
https://github.com/majorlift/volatility-modeling-python-datasci
Undergraduate Thesis published by the Seoul National University Department of Economics (2020)
arima-forecasting data-science data-vizualization financial-engineering garch-model granger-causality jupyter-notebook numpy pandas pyplot python3 regression-models research-paper risk-modelling scipy-stats seaborn statsmodels time-series-analysis value-at-risk volatility-modeling
Last synced: 10 Nov 2024
https://github.com/chendaniely/ds4biomed
Data Science for the Biomedical Sciences
biomedical-sciences data-science
Last synced: 13 Oct 2024
https://github.com/chongyasong/youml
YouML: A Machine Learning Toolkit
ai artificial-intelligence big-data data-mining data-science machine-learning matplotlib numpy pandas python scikit-learn scipy
Last synced: 13 Oct 2024
https://github.com/darenasc/data-science-for-good
Data Science for Good links.
Last synced: 27 Oct 2024
https://github.com/seandavi/machinelearningintro
Machine learning use cases for teaching
data-science machine-learning r rstats teaching-materials tutorial
Last synced: 05 Nov 2024
https://github.com/inphyt/imdb_sentiment_analysis_bert
BERT Sentiment Classification on the IMDb Large Movie Review Dataset.
bert bert-model data-mining data-mining-algorithms data-mining-python data-science machine-learning machine-learning-algorithms natural-language-processing nlp nlp-machine-learning scikit-learn sentiment-analysis sentiment-classification spacy spacy-models spacy-nlp
Last synced: 12 Nov 2024
https://github.com/techshot25/healthcare
Insurance cost predictor
bayesian-regression data-analysis data-science linear-regression machine-learning polynomial-regression random-forest-regression
Last synced: 10 Nov 2024
https://github.com/giswqs/notebook-share
A repo for sharing notebooks
data-science dataviz geospatial jupyter-notebook mapping notebook
Last synced: 02 Nov 2024
https://github.com/duo-labs/datasci-ctf
A capture-the-flag exercise based on data analysis challenges
Last synced: 12 Nov 2024
https://github.com/qpwedev/blockchain-network-visualizer
Blockchain Network Visualizer for TON.
blockchain data-science network ton toncoin
Last synced: 26 Oct 2024
https://github.com/adilzouitine/pyfeel
Python package for emotion analysis in French
data-analysis data-mining data-science emotion emotion-analysis nlp nlp-library opinion-mining python
Last synced: 13 Oct 2024
https://github.com/zen-reportz/zen_dash
Simple, Fast, Scalable , production grade dashboard application . Right solution for team
dashboard data-analytics data-science fastapi flask python3 shiny streamlit
Last synced: 07 Nov 2024
https://github.com/amey-thakur/bangalore-house-price-prediction
Machine Learning Project to Predict House Prices in Bangalore.
amey ameythakur data-science deep-learning house-price-prediction kaggle machine-learning megasatish python
Last synced: 09 Nov 2024
https://github.com/bdpedigo/networks-course
A short course on network data science at Johns Hopkins University
data-science jupyter-book network-analysis networks python python3 teaching teaching-materials
Last synced: 10 Nov 2024
https://github.com/brutusyhy/polars-explorer
A data explorer app for Polars based on Tauri
data-analysis data-science data-visualization dataframe full-stack polars react rust tauri typescript
Last synced: 10 Nov 2024
https://github.com/alvarobartt/ea-associate-ds
Electronic Arts (EA) NLP Assignment for: Associate Data Scientist
data-science electronic-arts nlp recruitment-task
Last synced: 14 Oct 2024
https://github.com/koalaverse/analyticssummit19
Material for 2019 Analytics Summit Machine Learning with R Training
data-science educational-materials machine-learning r workshop-materials
Last synced: 19 Nov 2024
https://github.com/glentner/dataphile
Data analytics library for Python and suite of open source, command line based data ops tools.
data-analysis data-ops data-science python scientific-computing
Last synced: 09 Nov 2024
https://github.com/hourout/linora
Simple and efficient tools for data science.
data-analysis data-mining data-science hyperparameter-optimization lightgbm machine-learning python xgboost
Last synced: 05 Nov 2024
https://github.com/aliktk/python_chilla
This repository contains practice materials on Python, used to deliver online training course. The course was sponsered by codenics and Scholership Network. Pakistan
course data-science eda machine-learning-algorithms pandas-python python scikit-learn training
Last synced: 14 Nov 2024
https://github.com/eyadsibai/machine-learning-docker-image
Data Science/Machine Learning Docker Image for CPU
data-science docker docker-image google-cloud machine-learning
Last synced: 22 Oct 2024
https://github.com/mtpatter/mlflow-tutorial
Fully reproducible, Dockerized, step-by-step, tutorial on training and serving a simple sklearn classifier model using mlflow. Detailed blog post published on Towards Data Science.
data-science machine-learning mlflow mlflow-docker mlops tutorial
Last synced: 13 Nov 2024
https://github.com/nas5w/imdb-data
A JSON file of 50,000 IMDB movie reviews to be used in machine learning applications.
data data-science imdb javascript machine-learning
Last synced: 16 Nov 2024
https://github.com/bradflaugher/ai-101
Notes, links and code samples and resources for teaching yourself pytorch and tensorflow.
bootcamp course data-engineering data-science learn-to-code learning-by-doing learning-python machine-learning
Last synced: 16 Nov 2024
https://github.com/doubleml/doubleml-serverless
DoubleML-Serverless - Distributed Double Machine Learning with a Serverless Architecture
aws-lambda causal-inference data-science double-machine-learning econometrics machine-learning python scikit-learn serverless statistics
Last synced: 17 Nov 2024
https://github.com/networks-learning/discussion-complexity
Code for "On the Complexity of Opinions and Online Discussions", WSDM 2019
complexity data-science discussion online-discussions opinion-mining paper wsdm
Last synced: 17 Nov 2024
https://github.com/jimbrig/lossrx
An R package, plumber API, database, and Shiny App for Actuarial Loss Development and Reserving Workflows.
actuarial-science claims-data claims-reserving data-science insurance modelling property-casualty reserving rpackage rshiny rstats workflow
Last synced: 13 Nov 2024