Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Data Science
Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.
- GitHub: https://github.com/topics/data-science
- Wikipedia: https://en.wikipedia.org/wiki/Data_science
- Related Topics: data-analysis, data-mining, machine-learning, big-data, data-visualization,
- Aliases: datasciences, data-science-project, data-science-algorithm,
- Last updated: 2024-07-29 13:36:33 UTC
- JSON Representation
https://github.com/probcomp/bayeslite
BayesDB on SQLite. A Bayesian database table for querying the probable implications of data as easily as SQL databases query the data itself.
automatic-data-modeling data-science databases machine-learning probabilistic-programming
Last synced: 01 Aug 2024
https://github.com/NorskRegnesentral/skweak
skweak: A software toolkit for weak supervision applied to NLP tasks
data-science distant-supervision natural-language-processing nlp-library nlp-machine-learning python spacy training-data weak-supervision
Last synced: 30 Jul 2024
https://github.com/dataquestio/project-walkthroughs
Data science, machine learning, and web development project code for https://www.youtube.com/c/Dataquestio .
data-science machine-learning pandas python
Last synced: 08 Aug 2024
https://github.com/tidyverse/datascience-box
Data Science Course in a Box
data-science education r rstats teaching
Last synced: 30 Jul 2024
https://github.com/dataprofessor/code
Compilation of R and Python programming codes on the Data Professor YouTube channel.
data-professor data-science data-science-python dataprofessor datascience exploratory-data-analysis machine-learning machinelearning pandas python python-data-science r scikit-learn scikit-learn-python shiny streamlit
Last synced: 08 Aug 2024
https://github.com/youssefHosni/Practical-Machine-Learning
Practical machine learning notebook & articles covers the machine learning end to end life cycle.
Last synced: 31 Jul 2024
https://github.com/MentatInnovations/datastream.io
An open-source framework for real-time anomaly detection using Python, ElasticSearch and Kibana
alerts anomaly anomaly-detection anomalydetection anomalydiscovery bokeh-dashboard dashboard data-science data-stream datascience dataset dsio elasticsearch iot jupyter kibana machinelearning python sklearn timeseries
Last synced: 30 Jul 2024
https://github.com/firmai/data-science-career
Career Resources for Data Science, Machine Learning, Big Data and Business Analytics Career Repository
analytics big-data business-analytics business-intelligence career data-science machine-learning resources
Last synced: 07 Aug 2024
https://github.com/sberbank-ai-lab/LightAutoML
LAMA - automatic model creation framework
automated-machine-learning automl blackbox classification data-science ensembling feature-engineering gradient-boosting kaggle lama linear-model model-selection multiclass nlp parameter-tuning pipeline pytorch regression stacking whitebox
Last synced: 07 Aug 2024
https://github.com/webartifex/intro-to-python
An intro to Python & programming for wanna-be data scientists
data-science introduction-to-programming jupyter python tutorial
Last synced: 03 Aug 2024
https://github.com/mlr-org/mlr3
mlr3: Machine Learning in R - next generation
classification data-science machine-learning mlr3 r r-package regression
Last synced: 30 Jul 2024
https://github.com/fraunhoferportugal/tsfel
An intuitive library to extract features from time series.
classification colab-notebook data-science feature-engineering feature-extraction time-series
Last synced: 30 Jul 2024
https://github.com/epsilla-cloud/vectordb
Epsilla is a high performance Vector Database Management System. Try out hosted Epsilla at https://cloud.epsilla.com/
ai chatgpt data data-science database embeddings embeddings-similarity infrastructure llms machine-learning neural-network neural-search rag retrieval search-engine vector-database vector-search
Last synced: 01 Aug 2024
https://github.com/stitchfix/hamilton
A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton
dag data-engineering data-platform data-science dataframe etl etl-framework etl-pipeline feature-engineering featurization hamilton hamiltonian machine-learning numpy pandas python software-engineering stitch-fix
Last synced: 02 Aug 2024
https://github.com/JoaquinAmatRodrigo/skforecast
Time series forecasting with scikit-learn models
arima autoregressive-forecasting backtesting-forecasters data-science direct-forecasting exogenous-predictors forecasting lightgbm machine-learning multi-series-forecasting multi-step-forecasting multiple-time-series-forecasting probabilistic-forecasting python quantile-forecasting sarimax scikit-learn time-series weighted-time-series-forecasting xgboost
Last synced: 01 Aug 2024
https://github.com/nivu/ai_all_resources
A curated list of Best Artificial Intelligence Resources
artificial-intelligence convolutional-neural-networks data-science decision-trees deep-learning gan kmeans knn machine-learning mathematics neural-networks python random-forest regression reinforcement-learning rnn statistics statquest support-vector-machine tensorflow
Last synced: 01 Aug 2024
https://github.com/WenjieDu/PyPOTS
A Python toolkit/library for reality-centric machine/deep learning and data mining on partially-observed time series, including SOTA neural network models for scientific analysis tasks of imputation, classification, clustering, forecasting, & anomaly detection on incomplete industrial (irregularly-sampled) multivariate TS with NaN missing values
classification clustering data-mining data-science deep-learning forecasting healthcare imputation incomplete industrial interpolation machine-learning missing-values missingness neural-network partially-observed-time-series pytorch science-research time-series time-series-analysis
Last synced: 01 Aug 2024
https://github.com/turicas/rows
A common, beautiful interface to tabular data, no matter the format
convert-data csv data data-science excel hacktoberfest python table tabular-data xls xlsx
Last synced: 31 Jul 2024
https://github.com/ropensci/targets
Function-oriented Make-like declarative workflows for R
data-science high-performance-computing make peer-reviewed pipeline r r-package r-targetopia reproducibility reproducible-research rstats targets workflow
Last synced: 05 Aug 2024
https://github.com/GoogleCloudPlatform/DataflowJavaSDK
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
big-data data-analysis data-mining data-processing data-science google-cloud-dataflow
Last synced: 02 Aug 2024
https://github.com/egbertbouman/youtube-comment-downloader
Simple script for downloading Youtube comments without using the Youtube API
data-science data-scraper python youtube youtube-comments
Last synced: 01 Aug 2024
https://github.com/shenweichen/Coursera
Quiz & Assignment of Coursera
computer-vision coursera data-science data-structures deep-learning machine-learning natural-language-processing reinforcement-learning
Last synced: 07 Aug 2024
https://github.com/google/lightweight_mmm
LightweightMMM ๐ฆ is a lightweight Bayesian Marketing Mix Modeling (MMM) library that allows users to easily train MMMs and obtain channel attribution information.
bayesian data-science econometrics marketing-science mmm
Last synced: 02 Aug 2024
https://github.com/dswah/pyGAM
[HELP REQUESTED] Generalized Additive Models in Python
data-science gams interpretable-machine-learning machine-learning python scientific-computing
Last synced: 31 Jul 2024
https://github.com/aloctavodia/Statistical-Rethinking-with-Python-and-PyMC3
Python/PyMC3 port of the examples in " Statistical Rethinking A Bayesian Course with Examples in R and Stan" by Richard McElreath
bayesian-data-analysis data-science pymc python statistics
Last synced: 07 Aug 2024
https://github.com/empathy87/The-Elements-of-Statistical-Learning-Python-Notebooks
A series of Python Jupyter notebooks that help you better understand "The Elements of Statistical Learning" book
data-analysis data-science machine-learning python sklearn statistical-learning tensorflow tutorials
Last synced: 02 Aug 2024
https://github.com/quixio/quix-streams
100% Python stream processing with Streaming DataFrames
data-engineering data-intensive-applications data-science event-driven-architecture kafka machine-learning python real-time-data-processing stream-processing stream-processor streaming-data streaming-data-pipelines streaming-data-processing time-series-data
Last synced: 04 Aug 2024
https://github.com/aeon-toolkit/aeon
A toolkit for machine learning from time series
data-mining data-science forecasting machine-learning scikit-learn time-series time-series-analysis time-series-classification time-series-clustering time-series-regression
Last synced: 31 Jul 2024
https://github.com/HazyResearch/meerkat
Creative interactive views of any dataset.
data-science foundation-models machine-learning ml pandas
Last synced: 31 Jul 2024
https://github.com/opengeos/streamlit-geospatial
A multi-page streamlit app for geospatial
data-science datascience dataviz geopython geospatial housing-data housing-market huggingface mapping open-source python real-estate streamlit streamlit-webapp
Last synced: 03 Aug 2024
https://github.com/aerdem4/lofo-importance
Leave One Feature Out Importance
data-science explainable-ai feature-importance feature-selection machine-learning
Last synced: 02 Aug 2024
https://github.com/latitude-dev/latitude
Developer-first embedded analytics
analytics business-intelligence dashboard data data-analysis data-analytics data-app data-engineering data-science data-visualization duckdb embedded-analytics exploratory-data-analysis javascript-framework open-source react self-hosted sql svelte tailwindcss
Last synced: 07 Sep 2024
https://github.com/d0r1h/ML-University
Machine Learning Open Source University
artificial-intelligence awsome awsome-list computer-science course data-science deep-learning free learning machine-learning mathematics natural-language-processing neural-network open-source reinforcement-learning university
Last synced: 03 Aug 2024
https://github.com/the-black-knight-01/Data-Science-Competitions
Goal of this repo is to provide the solutions of all Data Science Competitions(Kaggle, Data Hack, Machine Hack, Driven Data etc...).
analytics-vidhya competition-code competitive-data-science-github data-science data-science-competition data-science-competitions datahack-competition kaggle kaggle-competition kaggle-competition-for-beginners kaggle-competition-solutions kaggle-solutions-github kaggle-winning-solutions-github machine-learning machinehack-competition xgboost
Last synced: 09 Aug 2024
https://github.com/h1st-ai/h1st
Power Tools for AI Engineers With Deadlines
automl autonomous-vehicles avionics cold-start collaboration cybersecurity data-science datascience-environment energy-optimization ensemble-machine-learning explainability hacktoberfest home-automation human-in-the-loop industrial-iot predictive-maintenance time-series trustworthy-datascience
Last synced: 30 Jul 2024
https://github.com/tirthajyoti/Stats-Maths-with-Python
General statistics, mathematical programming, and numerical/scientific computing scripts and notebooks in Python
analytics anova bayesian-statistics clustering data-science hypothesis-testing inferential-statistics machine-learning mathematical-programming mathematics matplotlib normal-distribution numerical-analysis numpy pandas probability python scipy statistics statsmodels
Last synced: 02 Aug 2024
https://github.com/zama-ai/concrete-ml
Concrete ML: Privacy Preserving ML framework built on top of Concrete, with bindings to traditional ML frameworks.
data-science fhe homomorphic-encryption machine-learning ppml privacy python scikit-learn tfhe torch
Last synced: 31 Jul 2024
https://github.com/AmoDinho/datacamp-python-data-science-track
All the slides, accompanying code and exercises all stored in this repo. ๐
bokeh data-science datacamp datacamp-course datacamp-exercises datacamp-machine-learning datacamp-projects datacamp-python datacamp-solutions-python datascience machinelearning natural-language-processing neural-network neural-networks nlp pandas python scikit-learn tokenization
Last synced: 31 Jul 2024
https://github.com/elki-project/elki
ELKI Data Mining Toolkit
anomalydetection cluster-analysis clustering data-analysis data-mining data-mining-algorithms data-science distance-functions index indexing java machine-learning outlier-detection outliers time-series visualization
Last synced: 04 Aug 2024
https://github.com/kuwala-io/kuwala
Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demographics data b) Point of Interests from Open Street Map c) Google Popular Times
admin-boundaries data data-integration data-science dbt elt google-trends jupyter kuwala no-code open-data open-source population postgres pyspark python react react-flow scraping spatial-analysis
Last synced: 01 Aug 2024
https://github.com/OSGeo/grass
GRASS GIS - free and open-source geospatial processing engine
arrays data-science earth-observation geospatial geospatial-analysis gis grass-gis hacktoberfest image-processing jupyter machine-learning open-science parallel-computing python raster remote-sensing science spatial timeseries-analysis vector
Last synced: 01 Aug 2024
https://github.com/chrisvoncsefalvay/learn-julia-the-hard-way
Learn Julia the hard way!
data-science hpc julia julia-language julialang language learning learning-by-doing learning-julia scientific-computing statistics technical-computing
Last synced: 31 Jul 2024
https://github.com/jwkvam/bowtie
:bowtie: Create a dashboard with python!
ant-design antd dashboard data-science flask interactive jupyter plotly python react socket-io visualization webapp
Last synced: 31 Jul 2024
https://github.com/yoshoku/rumale
Rumale is a machine learning library in Ruby
artificial-intelligence data-analysis data-science machine-learning ml ruby rubyml
Last synced: 30 Jul 2024
https://github.com/abhayspawar/featexp
Feature exploration for supervised learning
data-exploration data-science feature-engineering machine-learning visualization
Last synced: 07 Aug 2024
https://github.com/miguelgfierro/ai_projects
AI projects
analytics artificial-intelligence big-data code-examples data-science deep-learning examples machine-learning neural-networks programming-exercise
Last synced: 31 Jul 2024
https://github.com/TheAlgorithms/Jupyter
The repository contains script and notebook related to Statistics, Machine learning, Neural network, Deep learning, NLP, Numerical methods, and Automation.
algorithms data-science data-structures deep-learning hacktoberfest machine-learning neural-network
Last synced: 02 Aug 2024
https://github.com/jasmcaus/caer
High-performance Vision library in Python. Scale your research, not boilerplate.
ai artificial-intelligence augmentation caer computer-vision cuda data-science deep-learning gpu image-classification image-processing image-segmentation machine-learning neural-network opencv python segmentation type-checking video-processing vision
Last synced: 01 Aug 2024
https://github.com/JosephLai241/URS
Universal Reddit Scraper - A comprehensive Reddit scraping/archival command-line tool.
archiving command-line comments csv data-analysis data-science json livestream osint-tool praw pyo3 python reddit reddit-scraper redditor rust scraper subreddit trees wordcloud
Last synced: 31 Jul 2024
https://github.com/nipy/nipype
Workflows and interfaces for neuroimaging packages
big-data brain-imaging brainweb data-science dataflow dataflow-programming neuroimaging python workflow-engine
Last synced: 06 Aug 2024
https://github.com/williamFalcon/test-tube
Python library to easily log experiments and parallelize hyperparameter search for neural networks
caffe caffe2 chainer data-science deep-learning grid-search hyperparameter-optimization keras machine-learning neural-networks pytorch random-search tensorflow
Last synced: 31 Jul 2024
https://github.com/AgnostiqHQ/covalent
Pythonic tool for orchestrating machine-learning/high performance/quantum-computing workflows in heterogeneous compute environments.
covalent data-pipeline data-science deep-learning hacktoberfest hpc hpc-applications machine-learning machinelearning machinelearning-python orchestration parallelization pipelines python quantum quantum-computing quantum-machine-learning workflow workflow-automation workflow-management
Last synced: 01 Aug 2024
https://github.com/target/matrixprofile-ts
A Python library for detecting patterns and anomalies in massive datasets using the Matrix Profile
data-science matrix-profile motif motif-discovery pip pip3 pypi pypi-packages python python3 time-series timeseries-analysis timeseries-segmentation
Last synced: 31 Jul 2024
https://github.com/compdemocracy/polis
:milky_way: Open Source AI for large scale open ended feedback
civic-tech data-science deliberative-democracy participatory-democracy
Last synced: 01 Aug 2024
https://github.com/glue-viz/glue
Linked Data Visualizations Across Multiple Files
data-science linked-data python visualization
Last synced: 05 Aug 2024
https://github.com/iterative/mlem
๐ถ A tool to package, serve, and deploy any ML model on any platform. Archived to be resurrected one day๐ค
cli data-science deployment developer-tools git machine-learning mlem model-registry python
Last synced: 31 Jul 2024
https://github.com/erikaduan/r_tips
A repository of R usage tips for data cleaning, data mining, data visualisation, statistical inference and machine learning
data-science data-visualization machine-learning r rstats statistics
Last synced: 13 Aug 2024
https://github.com/pdpipe/pdpipe
Easy pipelines for pandas DataFrames.
data data-science dataframe dataframes pandas pandas-dataframe pipeline
Last synced: 01 Aug 2024
https://github.com/ipython-books/cookbook-2nd-code
Code of the IPython Cookbook, Second Edition, by Cyrille Rossant, Packt Publishing 2018 [read-only repository]
computing data-analysis data-mining data-science data-visualization ipython jupyter jupyter-notebook machine-learning numerical-computation python visualization
Last synced: 02 Aug 2024
https://github.com/explosion/spacy-stanza
๐ฅ Use the latest Stanza (StanfordNLP) research models directly in spaCy
corenlp data-science machine-learning natural-language-processing nlp spacy spacy-pipeline stanford-corenlp stanford-machine-learning stanford-nlp stanza
Last synced: 05 Aug 2024
https://github.com/run-house/runhouse
The fastest way to iterate and deploy AI workloads on your own infra. Unobtrusive, debuggable, PyTorch-like APIs.
api artificial-intelligence aws azure collaboration data-science deployment distributed fastapi gcp infrastructure machine-learning middleware observability python pytorch ray sagemaker serverless
Last synced: 31 Jul 2024
https://github.com/alteryx/evalml
EvalML is an AutoML library written in python.
automl data-science feature-engineering feature-selection hyperparameter-tuning machine-learning model-selection optimization
Last synced: 01 Aug 2024
https://github.com/kennethleungty/Failed-ML
Compilation of high-profile real-world examples of failed machine learning projects
ai artificial-intelligence classification computer-vision data-engineering data-quality data-science deep-learning failed-data-science failed-machine-learning failed-ml fml forecasting machine-learning ml natural-language-processing production recsys regression
Last synced: 01 Aug 2024
https://github.com/scikit-mobility/scikit-mobility
scikit-mobility: mobility analysis in Python
complex-systems data-analysis data-science human-mobility mobility-analysis mobility-flows network-science risk-assessment scikit-mobility statistics synthetic-flows
Last synced: 05 Aug 2024
https://github.com/HunterMcGushion/hyperparameter_hunter
Easy hyperparameter optimization and automatic result saving across machine learning algorithms and libraries
ai artificial-intelligence catboost data-science deep-learning experimentation feature-engineering hyperparameter-optimization hyperparameter-tuning keras lightgbm machine-learning ml neural-network optimization python rgf scikit-learn sklearn xgboost
Last synced: 02 Aug 2024
https://github.com/Kotlin/dataframe
Structured data processing in Kotlin
data-analysis data-science dataframe kotlin
Last synced: 01 Aug 2024
https://github.com/litaotao/IPython-Dashboard
A stand alone, light-weight web server for building, sharing graphs created in ipython. Build for data science, data analysis guys. Aiming at building an interactive visualization, collaborated dashboard, and real-time streaming graph.
dashboard data-science ipython ipython-dashboard notebook visualization
Last synced: 15 Aug 2024
https://github.com/dataproofer/Dataproofer
A proofreader for your data
cli command-line csv data-analysis data-mining data-science excel nodejs spreadsheet
Last synced: 01 Aug 2024
https://github.com/BiomedSciAI/causallib
A Python package for modular causal inference analysis and model evaluations
causal causal-inference causal-models causality data-science machine-learning ml
Last synced: 31 Jul 2024
https://github.com/nicolaskruchten/jupyter_pivottablejs
Dragโnโdrop Pivot Tables and Charts for Jupyter/IPython Notebook, care of PivotTable.js
data-analysis data-science interactive jupyter-notebook pivot-chart pivot-tables
Last synced: 01 Aug 2024
https://github.com/google/edward2
A simple probabilistic programming language.
bayesian-methods data-science deep-learning machine-learning neural-networks probabilistic-programming statistics tensorflow
Last synced: 30 Jul 2024
https://github.com/rweekly/rweekly.org
R Weekly
blog community data-science data-visualization r rweekly statistics visualization weekly
Last synced: 07 Aug 2024
https://github.com/pm4py/pm4py-core
Public repository for the PM4Py (Process Mining for Python) project.
data-mining data-science machine-learning process-mining python
Last synced: 02 Aug 2024
https://github.com/pm4py/pm4py-source
Public repository for the PM4Py (Process Mining for Python) project.
data-mining data-science machine-learning process-mining python
Last synced: 07 Aug 2024
https://github.com/arvkevi/kneed
Knee point detection in Python :chart_with_upwards_trend:
data-analysis data-science elbow-method knee-point python scientific-computing systems
Last synced: 31 Jul 2024
https://github.com/jphall663/interpretable_machine_learning_with_python
Examples of techniques for training interpretable ML models, explaining ML models, and debugging ML models for accuracy, discrimination, and security.
accountability data-mining data-science decision-tree fairness fatml gradient-boosting-machine h2o iml interpretability interpretable interpretable-ai interpretable-machine-learning interpretable-ml lime machine-learning machine-learning-interpretability python transparency xai
Last synced: 02 Aug 2024
https://github.com/fastai/fastai2
Temporary home for fastai v2 while it's being developed
data-science deep-learning fastai jupyter machine-learning nbdev python pytorch
Last synced: 07 Aug 2024
https://github.com/bacalhau-project/bacalhau
Compute over Data framework for public, transparent, and optionally verifiable computation
ai-art ai-data-collection ai-pipeline batch-processing bioinformatics-pipeline data-analysis data-engineering data-science decentralized decentralized-computing distributed gene-sequencing insulators iot logging-framework orchestration-framework p2p video-processing
Last synced: 02 Aug 2024
https://github.com/aeturrell/coding-for-economists
This repository hosts the code behind the online book, Coding for Economists.
book data-science econometrics economics economics-models jupyter-notebook learning python research vscode
Last synced: 01 Aug 2024
https://github.com/yzhao062/combo
(AAAI' 20) A Python Toolbox for Machine Learning Model Combination
aggregation data-mining data-science ensemble-learning machine-learning machine-learning-pipelines model-combination pipeline-framework python
Last synced: 02 Aug 2024
https://github.com/rstojnic/lazydata
Lazydata: Scalable data dependencies for Python projects
data-science datamanagement machine-learning python
Last synced: 31 Jul 2024
https://github.com/TrainingByPackt/Data-Science-Projects-with-Python
A Case Study Approach to Successful Data Science Projects Using Python, Pandas, and Scikit-Learn
data-science machine-learning numpy pandas pandas-dataframe python scikit-learn
Last synced: 01 Aug 2024
https://github.com/cerndb/dist-keras
Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
apache-spark data-parallelism data-science deep-learning distributed-optimizers hadoop keras machine-learning optimization-algorithms tensorflow
Last synced: 03 Aug 2024
https://github.com/mrsaeeddev/free-ai-resources
๐ FREE AI Resources - ๐ Courses, ๐ท Jobs, ๐ Blogs, ๐ฌ AI Research, and many more - for everyone!
ai artificial-intelligence artificial-neural-networks data data-science data-science-learning data-science-projects deep-learning deep-neural-networks hacktoberfest hacktoberfest2020 machine-learning machine-learning-algorithms machinelearning reinforcement-learning research supervised-learning unsupervised-learning
Last synced: 01 Aug 2024
https://github.com/Squarespace/datasheets
Read data from, write data to, and modify the formatting of Google Sheets
data data-analytics data-science dataframe google pandas python
Last synced: 30 Jul 2024
https://github.com/chris-greening/instascrape
Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically
beginner-friendly data-mining data-science instagram instagram-data instagram-scraper lightweight python python-scraper python3 webscraping
Last synced: 01 Aug 2024
https://github.com/lgienapp/aquarel
Styling matplotlib made easy
data-science data-visualization matplotlib plotting theme theme-development theming visualization
Last synced: 31 Jul 2024
https://github.com/blue-yonder/turbodbc
Turbodbc is a Python module to access relational databases via the Open Database Connectivity (ODBC) interface. The module complies with the Python Database API Specification 2.0.
data-science database exasol numpy odbc pep249 pyodbc python python-database-api speedup
Last synced: 01 Aug 2024
https://github.com/SebKrantz/collapse
Advanced and Fast Data Transformation in R
cran data-aggregation data-analysis data-manipulation data-processing data-science data-transformation econometrics high-performance panel-data r rstats scientific-computing statistics time-series weighted weights
Last synced: 02 Aug 2024
https://github.com/erezsh/Preql
An interpreted relational query language that compiles to SQL.
data-science database python query sql
Last synced: 31 Jul 2024
https://github.com/alegonz/baikal
A graph-based functional API for building complex scikit-learn pipelines.
data-science graph-based machine-learning python scikit-learn
Last synced: 03 Aug 2024
https://github.com/DiskFrame/disk.frame
Fast Disk-Based Parallelized Data Manipulation Framework for Larger-than-RAM Data
data data-science large-dataset manipulation-data medium-data r
Last synced: 30 Jul 2024
https://github.com/jadianes/data-science-your-way
Ways of doing Data Science Engineering and Machine Learning in R and Python
data-frame data-science data-science-engineering exploratory-data-analysis jupyter machine-learning notebook python r tutorial
Last synced: 31 Jul 2024
https://github.com/graspologic-org/graspologic
Python package for graph statistics
data-science graph graph-statistics machine-learning networks python
Last synced: 02 Aug 2024
https://github.com/ploomber/jupysql
Better SQL in Jupyter. ๐
bigquery clickhouse data-engineering data-science duckdb hive jupyter mysql polars postgres presto python redshift snowflake spark-sql sql sqlite trino tsql
Last synced: 01 Aug 2024
https://github.com/JacksonWuxs/DaPy
Easy-to-use data analysis / manipulation framework for humans
analysis data-analysis data-science efficiency pypi python statistical-reports
Last synced: 31 Jul 2024
https://github.com/benedekrozemberczki/datasets
A repository of pretty cool datasets that I collected for network science and machine learning research.
benchmark community-detection data-science dataset deepwalk dimensionality-reduction gcn gnn graph-convolution graph-embedding graph-neural-network graph2vec link-prediction machine-learning network-analysis network-embedding network-science node-classification node-embedding node2vec
Last synced: 01 Aug 2024
https://github.com/JuliaStats/GLM.jl
Generalized linear models in Julia
data-science glm julia regression statistical-models statistics
Last synced: 02 Aug 2024