Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/synthesized-io/fairlens

Identify bias and measure fairness of your data

bias data data-analysis data-science fairness pandas python statistics

Last synced: 15 Nov 2024

https://github.com/tiledb-inc/tiledb-vcf

Efficient variant-call data storage and retrieval library using the TileDB storage library.

bioinformatics data-science genomics gwas python spark tiledb variant-calling vcf

Last synced: 14 Nov 2024

https://github.com/delsner/flask-angular-data-science

Repository for a data science starter app using Flask, Angular and Docker. https://medium.com/@dvelsner/deploying-a-simple-machine-learning-model-in-a-modern-web-application-flask-angular-docker-a657db075280

angular data-science docker flask machine-learning python sklearn typescript

Last synced: 10 Oct 2024

https://github.com/scrapinghub/aile

Automatic Item List Extraction

data-science

Last synced: 10 Nov 2024

https://github.com/akgold/do4ds

A book on DevOps for Data Scientists with CRC Press.

data-science devops it python r

Last synced: 10 Nov 2024

https://github.com/DataKitchen/data-observability-installer

Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility across and down your data estate. Save time with simple, fast data quality test generation and execution. Trust your data, tools, and systems end to end.

data data-engineering data-observability data-profiling data-quality data-reliability data-science datachecker datacleaner datacleaning dataops dataquality datatesting datavalidation mssql pipeline-tests postgresql redshift self-hosted snowflake

Last synced: 13 Nov 2024

https://github.com/aershov24/machine-learning-ds-interview-questions

🔴 1704 Machine Learning, Data Science & Python Interview Questions (ANSWERED) To Kill Your Next ML & DS Interview. Get All Answers + PDFs on MLStack.Cafe. Post your ML Jobs 👉

algorithms-and-data-structures data-analysis data-science interview-practice interview-preparation interview-questions machine-learning machine-learning-algorithms machinelearning

Last synced: 18 Nov 2024

https://github.com/sangaline/reverse-engineering-the-hacker-news-ranking-algorithm

An analysis of historical Hacker News data to determine the ranking algorithm

analysis data-science hacker-news

Last synced: 07 Nov 2024

https://github.com/uc-r/uc-r.github.io

Main repository for R programming courses @ University of Cincinnati, courses and tutorials that focus on data wrangling, exploration, visualization, and analysis with R.

classroom data-science data-wrangling machine-learning r tutorial tutorial-code visualization

Last synced: 30 Oct 2024

https://github.com/nuclio/nuclio-jupyter

Nuclio Function Automation for Python and Jupyter

data-science jupyter kubernetes nuclio python

Last synced: 14 Nov 2024

https://github.com/svilupp/promptingtools.jl

Streamline your life using PromptingTools.jl, the Julia package that simplifies interacting with large language models.

data-science generative-ai julia

Last synced: 30 Oct 2024

https://github.com/rogerfitz/tutorials

Git Repo for Articles on Ergo Sum blog and the youtube channel https://www.youtube.com/channel/UCiie9CN--dazA7iT2sry5FA

algorithmia data-science draft-kings fan-duel fivethirtyeight google-maps-api ocr python sports tech text-to-speech visualizations

Last synced: 08 Nov 2024

https://github.com/palashio/nylon

An intelligent, flexible grammar of machine learning.

auto-ml data-science grammar machine-learning

Last synced: 07 Nov 2024

https://github.com/svilupp/PromptingTools.jl

Streamline your life using PromptingTools.jl, the Julia package that simplifies interacting with large language models.

data-science generative-ai julia

Last synced: 28 Oct 2024

https://github.com/Dumbris/trunklucator

Python module for data scientists for quick creating annotation projects.

active-learning annotation annotation-tool data-science machine-learning nlp

Last synced: 04 Nov 2024

https://github.com/seandavi/geoquery

The bridge between the NCBI Gene Expression Omnibus and Bioconductor

bioconductor bioinformatics data-science genomics ncbi-geo r rstats

Last synced: 05 Nov 2024

https://github.com/bcgov/bcdata

An R package for searching & retrieving data from the B.C. Data Catalogue

bcdc citz data-science env r r-package rstats

Last synced: 13 Aug 2024

https://github.com/dspinellis/alexandria3k

Local relational access to openly-available publication data sets

bibliometric-analysis crossref data-science orcid scientometrics

Last synced: 15 Nov 2024

https://github.com/XpressAI/xircuits

Simple visual programming environment for jupyterlab

data-science jupyterlab python

Last synced: 10 Oct 2024

https://github.com/ropensci/gittargets

Data version control for reproducible analysis pipelines in R with {targets}.

data-science data-version-control data-versioning r r-package reproducibility reproducible-research rstats targets workflow

Last synced: 31 Oct 2024

https://github.com/hemansnation/data-analyst-roadmap

Data-Analyst-Roadmap for Professionals. This roadmap contains 8 Chapters that can be completed in 8 weeks, whether you are a fresher in the field or an experienced professional who wants to transition into Data Analysis.

analytics data-analysis data-analysis-python data-analytics data-science numpy predictive-analytics project-based-learning python statistics tableau

Last synced: 08 Nov 2024

https://github.com/andrea-ballatore/open-geo-data-education

Open Geospatial Datasets for GIS Education: This is a repository of open geospatial datasets to be used in an educational context. I created these files over years of teaching Geographic Data Science and GIS. All original datasets are freely available online with open data licenses (see the dataset attribution for details). All the datasets in this repository have been selected, cleaned, harmonised, and repackaged for GIS exercises in a higher-education context. This is a pretty time-intensive process that other educators can hopefully avoid by using these versions.

data-science geojson geospatial-data geospatial-datasets gis gis-data gis-education tsv

Last synced: 27 Oct 2024

https://github.com/great-expectations/great_expectations_action

A GitHub Action that makes it easy to use Great Expectations to validate your data pipelines in your CI workflows.

actions continuous-integration data-integrity data-quality data-science mlops

Last synced: 06 Nov 2024

https://github.com/imdeepmind/neuralpy

NeuralPy: A Keras like deep learning library works on top of PyTorch

data-science deep-learning keras library machine-learning neural-network neuralpy neuralpy-torch python pytorch

Last synced: 13 Nov 2024

https://github.com/FlyRanch/figurefirst

A layout-first approach to figure making

data-science inkscape inkscape-extensions matplotlib plotting python svg

Last synced: 15 Nov 2024

https://github.com/datakitchen/data-observability-installer

Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility across and down your data estate. Save time with simple, fast data quality test generation and execution. Trust your data, tools, and systems end to end.

data data-engineering data-observability data-profiling data-quality data-reliability data-science datachecker datacleaner datacleaning dataops dataquality datatesting datavalidation mssql pipeline-tests postgresql redshift self-hosted snowflake

Last synced: 12 Oct 2024

https://github.com/woz-u/DS-Student-Resources

Data Science Student Companion Notebooks and Data Lake

data-analysis data-science data-visualization machine-learning nosql python r sql statistics

Last synced: 08 Aug 2024

https://github.com/5agado/conversation-analyzer

Analyzer and statistics generator for text-based conversations. Includes Facebook scraper and parser

data-science facebook quantified-self scraper

Last synced: 08 Nov 2024

https://github.com/zjuearthdata/geochemistrypi

an open-sourced highly automated machine learning Python framework for data-driven geochemistry discovery

dash data-science fastapi flaml geochemistry mlflow nodejs ray reactjs scikit-learn typer

Last synced: 10 Oct 2024

https://github.com/stanfordnlp/edu-convokit

Edu-ConvoKit: An Open-Source Framework for Education Conversation Data

data data-analysis data-science education language natural-language-processing

Last synced: 15 Nov 2024

https://github.com/Invictify/Jupter-Notebook-REST-API

Run your jupyter notebooks as a REST API endpoint. This isn't a jupyter server but rather just a way to run your notebooks as a REST API Endpoint.

data-science data-science-pipelines docker dockerfile fastapi jupyter python rest-api

Last synced: 26 Oct 2024

https://github.com/jonrau1/SyntheticSun

SyntheticSun is a defense-in-depth security automation and monitoring framework which utilizes threat intelligence, machine learning, managed AWS security services and, serverless technologies to continuously prevent, detect and respond to threats.

anomaly-detection automation aws aws-security aws-serverless data-science data-visualization elasticsearch geolocation guardduty incident-response kibana machine-learning misp sagemaker security-automation security-tools serverless threat-detection threat-intelligence

Last synced: 04 Aug 2024

https://github.com/dominodatalab/domino-research

Projects developed by Domino's R&D team

data-science mlflow mlops python sagemaker

Last synced: 13 Aug 2024

https://github.com/tirthajyoti/synthetic-data-gen

Various methods for generating synthetic data for data science and ML

classification data data-science machine-learning python regression symbolic-computation time-series

Last synced: 22 Oct 2024

https://github.com/microsoft/coml

Interactive coding assistant for data scientists and machine learning developers, empowered by large language models.

automated-machine automl copilot data-science hyperparameter-optimization jupyter jupyter-lab large-language-models llm machine-learning

Last synced: 07 Oct 2024

https://github.com/kianweelee/Edator

A python package that performs exploratory data analysis for users. Additionally, it generates 3 types of output files (cleaned CSV, plots and a text report).

data-analysis data-science exploratory-data-analysis

Last synced: 15 Nov 2024

https://github.com/nbarrowman/vtree

An R package for calculating and drawing variable trees

data-science data-visualization exploratory-data-analysis r statistics

Last synced: 31 Oct 2024

https://github.com/capitalone/dataCompareR

dataCompareR is an R package that allows users to compare two datasets and view a report on the similarities and differences.

compare-data data data-analysis data-science r

Last synced: 13 Aug 2024

https://github.com/TomasBeuzen/python-programming-for-data-science

Content from the University of British Columbia's Master of Data Science course DSCI 511.

data-manipulation data-science numpy pandas programming python teaching

Last synced: 07 Aug 2024

https://github.com/glemaitre/pyparis-2018-sklearn

PyParis tutorial on machine learning using scikit-learn

data-science machine-learn pandas scikit-learn

Last synced: 01 Nov 2024

https://github.com/felipenoris/math-server-docker

The ideal multi-user Data Science server with Jupyterhub and RStudio, ready for Python, R and Julia languages.

data-science docker julia julia-language jupyter jupyter-kernels jupyterhub jupyterlab latex python rstudio-servers shiny-server

Last synced: 28 Oct 2024

https://github.com/MLMI2-CSSI/foundry

Simplifying the discovery and usage of machine-learning ready datasets in materials science and chemistry

chemistry data-science datasets machine-learning materials-science

Last synced: 05 Aug 2024

https://github.com/manumerous/vpselector

Visual Pandas Selector: Visualize and interactively select time-series data

data-science data-visualization pandas python selector

Last synced: 29 Oct 2024

https://github.com/trainingbypackt/applied-deep-learning-with-python

Applied Deep Learning with Python, published by Packt

data-science deep-learning machine-learning python

Last synced: 14 Nov 2024

https://github.com/grailbio/bio

Bioinformatic infrastructure libraries

bioinformatics data-science golang

Last synced: 09 Nov 2024

https://github.com/bcgov/bcmaps

An R package of map layers for British Columbia

data-science env r r-package rstats

Last synced: 05 Aug 2024

https://github.com/bramvanroy/spacy_conll

Pipeline component for spaCy (and other spaCy-wrapped parsers such as spacy-stanza and spacy-udpipe) that adds CoNLL-U properties to a Doc and its sentences and tokens. Can also be used as a command-line tool.

conll conll-u data-science machine-learning natural-language-processing nlp pandas parser python spacy spacy-extension spacy-pipeline stanford-machine-learning stanford-nlp stanza udpipe

Last synced: 13 Nov 2024

https://github.com/siddhujetty/Product-analytics-insights-collection

My Solutions to "A Collection of Data Science Take-Home Challenges" by Giulio Palombo.

data-science machine-learning r-programming solutions take-home-test

Last synced: 13 Aug 2024

https://github.com/ploomber/soorgeon

Convert monolithic Jupyter notebooks 📙 into maintainable Ploomber pipelines. 📊

data-engineering data-science jupyter jupyter-notebooks machine-learning mlops workflow

Last synced: 13 Nov 2024

https://github.com/exasol/pyexasol

Exasol python driver with low overhead, fast HTTP transport and compression

data-science database driver exasol exasol-integration python websocket-client

Last synced: 14 Nov 2024

https://github.com/xiaodaigh/jlboost.jl

A 100%-Julia implementation of Gradient-Boosting Regression Tree algorithms

catboost data-science gbdt gbrt lightgbm machine-learning tree tree-boosting-algorithms xgboost

Last synced: 08 Nov 2024

https://github.com/uc-r/Advanced-R

Advanced Analytics with R training material delivered in a 2 day format

data-science educational-materials r training-materials workshop-materials

Last synced: 13 Nov 2024

https://github.com/piquette/qtrn

A cli tool to streamline financial markets data analysis :wrench:

cli data data-science finance go golang options quotes scraper stock stock-analysis stock-market

Last synced: 04 Nov 2024

https://github.com/verynifty/RolodETH

A Rolodex for popular Ethereum chain address.

data-science ethereum ethereum-blockchain

Last synced: 18 Nov 2024

https://github.com/data-centric-ai/dcbench

A benchmark of data-centric tasks from across the machine learning lifecycle.

data-science machine-learning

Last synced: 30 Oct 2024

https://github.com/visgl/deck.gl-data

Data for the data visualization library deck.gl examples (https://uber.github.io/deck.gl/#/)

data data-science data-visualization uber

Last synced: 07 Aug 2024

https://github.com/robertmartin8/udemyml

Templates, code and notes for Kirill Eremenko's Machine Learning course

data-science machine-learning python r tutorial udemy udemy-machine-learning

Last synced: 22 Oct 2024

https://github.com/shenxiangzhuang/PythonDataAnalysis

The data and code that used in my book.

data-science python3 webcrawler

Last synced: 30 Oct 2024

https://github.com/shenxiangzhuang/pythondataanalysis

The data and code that used in my book.

data-science python3 webcrawler

Last synced: 14 Nov 2024