Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/stocknear/backend

Backend of stocknear - Stock Analysis for Data Freaks ❤️

data data-science fastapi fastify finance javascript machine-learning nodejs pocketbase python redis

Last synced: 02 Sep 2024

https://github.com/akgold/do4ds

A book on DevOps for Data Scientists with CRC Press.

data-science devops it python r

Last synced: 13 Aug 2024

https://github.com/ideos/gloe

A general-purpose library designed to guide developers in expressing their code as a flow.

clean-code data-science flow functional-programming machine-learning python typing

Last synced: 31 Jul 2024

https://github.com/eclipse-zenoh-flow/zenoh-flow

zenoh-flow aims at providing a zenoh-based data-flow programming framework for computations that span from the cloud to the device.

autonomous-vehicles data-science dataflow-programming machine-learning robotics ros2 rust-lang

Last synced: 06 Sep 2024

https://github.com/uc-r/uc-r.github.io

Main repository for R programming courses @ University of Cincinnati, courses and tutorials that focus on data wrangling, exploration, visualization, and analysis with R.

classroom data-science data-wrangling machine-learning r tutorial tutorial-code visualization

Last synced: 31 Jul 2024

https://github.com/Dumbris/trunklucator

Python module for data scientists for quick creating annotation projects.

active-learning annotation annotation-tool data-science machine-learning nlp

Last synced: 01 Aug 2024

https://github.com/nuclio/nuclio-jupyter

Nuclio Function Automation for Python and Jupyter

data-science jupyter kubernetes nuclio python

Last synced: 01 Aug 2024

https://github.com/bcgov/bcdata

An R package for searching & retrieving data from the B.C. Data Catalogue

bcdc citz data-science env r r-package rstats

Last synced: 13 Aug 2024

https://github.com/andrea-ballatore/open-geo-data-education

Open Geospatial Datasets for GIS Education: This is a repository of open geospatial datasets to be used in an educational context. I created these files over years of teaching Geographic Data Science and GIS. All original datasets are freely available online with open data licenses (see the dataset attribution for details). All the datasets in this repository have been selected, cleaned, harmonised, and repackaged for GIS exercises in a higher-education context. This is a pretty time-intensive process that other educators can hopefully avoid by using these versions.

data-science geojson geospatial-data geospatial-datasets gis gis-data gis-education tsv

Last synced: 31 Jul 2024

https://github.com/ropensci/gittargets

Data version control for reproducible analysis pipelines in R with {targets}.

data-science data-version-control data-versioning r r-package reproducibility reproducible-research rstats targets workflow

Last synced: 05 Aug 2024

https://github.com/OpenSTEF/openstef

Automated Machine Learning pipelines. Builds the Open Short Term Energy Forecasting package.

data-science energy energy-forecasting forecasting machine-learning python time-series

Last synced: 03 Aug 2024

https://github.com/FlyRanch/figurefirst

A layout-first approach to figure making

data-science inkscape inkscape-extensions matplotlib plotting python svg

Last synced: 03 Aug 2024

https://github.com/woz-u/DS-Student-Resources

Data Science Student Companion Notebooks and Data Lake

data-analysis data-science data-visualization machine-learning nosql python r sql statistics

Last synced: 08 Aug 2024

https://github.com/5agado/conversation-analyzer

Analyzer and statistics generator for text-based conversations. Includes Facebook scraper and parser

data-science facebook quantified-self scraper

Last synced: 01 Aug 2024

https://github.com/great-expectations/great_expectations_action

A GitHub Action that makes it easy to use Great Expectations to validate your data pipelines in your CI workflows.

actions continuous-integration data-integrity data-quality data-science mlops

Last synced: 01 Aug 2024

https://github.com/dominodatalab/domino-research

Projects developed by Domino's R&D team

data-science mlflow mlops python sagemaker

Last synced: 13 Aug 2024

https://github.com/jonrau1/SyntheticSun

SyntheticSun is a defense-in-depth security automation and monitoring framework which utilizes threat intelligence, machine learning, managed AWS security services and, serverless technologies to continuously prevent, detect and respond to threats.

anomaly-detection automation aws aws-security aws-serverless data-science data-visualization elasticsearch geolocation guardduty incident-response kibana machine-learning misp sagemaker security-automation security-tools serverless threat-detection threat-intelligence

Last synced: 04 Aug 2024

https://github.com/Invictify/Jupter-Notebook-REST-API

Run your jupyter notebooks as a REST API endpoint. This isn't a jupyter server but rather just a way to run your notebooks as a REST API Endpoint.

data-science data-science-pipelines docker dockerfile fastapi jupyter python rest-api

Last synced: 31 Jul 2024

https://github.com/manumerous/vpselector

Visual Pandas Selector: Visualize and interactively select time-series data

data-science data-visualization pandas python selector

Last synced: 31 Jul 2024

https://github.com/capitalone/dataCompareR

dataCompareR is an R package that allows users to compare two datasets and view a report on the similarities and differences.

compare-data data data-analysis data-science r

Last synced: 13 Aug 2024

https://github.com/TomasBeuzen/python-programming-for-data-science

Content from the University of British Columbia's Master of Data Science course DSCI 511.

data-manipulation data-science numpy pandas programming python teaching

Last synced: 07 Aug 2024

https://github.com/kianweelee/Edator

A python package that performs exploratory data analysis for users. Additionally, it generates 3 types of output files (cleaned CSV, plots and a text report).

data-analysis data-science exploratory-data-analysis

Last synced: 03 Aug 2024

https://github.com/MLMI2-CSSI/foundry

Simplifying the discovery and usage of machine-learning ready datasets in materials science and chemistry

chemistry data-science datasets machine-learning materials-science

Last synced: 05 Aug 2024

https://github.com/nbarrowman/vtree

An R package for calculating and drawing variable trees

data-science data-visualization exploratory-data-analysis r statistics

Last synced: 30 Jul 2024

https://github.com/bcgov/bcmaps

An R package of map layers for British Columbia

data-science env r r-package rstats

Last synced: 05 Aug 2024

https://github.com/siddhujetty/Product-analytics-insights-collection

My Solutions to "A Collection of Data Science Take-Home Challenges" by Giulio Palombo.

data-science machine-learning r-programming solutions take-home-test

Last synced: 13 Aug 2024

https://github.com/grailbio/bio

Bioinformatic infrastructure libraries

bioinformatics data-science golang

Last synced: 02 Aug 2024

https://github.com/uc-r/Advanced-R

Advanced Analytics with R training material delivered in a 2 day format

data-science educational-materials r training-materials workshop-materials

Last synced: 02 Aug 2024

https://github.com/piquette/qtrn

A cli tool to streamline financial markets data analysis :wrench:

cli data data-science finance go golang options quotes scraper stock stock-analysis stock-market

Last synced: 01 Aug 2024

https://github.com/verynifty/RolodETH

A Rolodex for popular Ethereum chain address.

data-science ethereum ethereum-blockchain

Last synced: 03 Aug 2024

https://github.com/data-centric-ai/dcbench

A benchmark of data-centric tasks from across the machine learning lifecycle.

data-science machine-learning

Last synced: 31 Jul 2024

https://github.com/visgl/deck.gl-data

Data for the data visualization library deck.gl examples (https://uber.github.io/deck.gl/#/)

data data-science data-visualization uber

Last synced: 07 Aug 2024

https://github.com/devsgnr/breadroll

breadroll 🥟 is a simple lightweight library for data processing operations written in Typescript and powered by Bun.

bun csv csv-parser data-engineering data-science data-transformation eda exploratory-data-analysis tsv tsv-parser

Last synced: 17 Aug 2024

https://github.com/shenxiangzhuang/PythonDataAnalysis

The data and code that used in my book.

data-science python3 webcrawler

Last synced: 31 Jul 2024

https://github.com/gitonthescene/csv-reconcile

A reconciliation service for OpenRefine serving data from a given CSV file.

data-science openrefine

Last synced: 01 Aug 2024

https://github.com/aiwithqasim/Free-Artificial-Intelligence-Resources

Welcome, to this Open Source Repository regarding FREE ARTIFICIAL INTELLIGENCE RESOURCE. Get Benefit from the free resources mention & kindly five STAR & FORK this so that it can get maximum Fame so that Everyone can take advantage.

ai article artificial-intelligence artificial-neural-networks blog data-science datascientist deep-learning freeresources hacktoberfest hecktoberfest2021 jobs machine-learning machine-learning-algorithms natural-language-processing nlp project python3 youtube

Last synced: 01 Aug 2024

https://github.com/LaihoE/did-it-spill

Check if you have training samples in your test set

computer-vision data-science deep-learning pytorch semantic-similarity time-series

Last synced: 02 Aug 2024

https://github.com/Desbordante/desbordante-core

Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.

anomaly-detection correlations data-analytics data-cleaning data-cleansing data-engineering data-exploration data-mining data-mining-algorithms data-preprocessing data-profiling data-science data-wrangling exploratory-data-analysis feature-engineering feature-extraction feature-selection knowledge-discovery spreadsheets tabular-data

Last synced: 01 Aug 2024

https://github.com/bnosac/crfsuite

Labelling Sequential Data in Natural Language Processing with R - using CRFsuite

chunking conditional-random-fields crf crfsuite data-science intent-classification natural-language-processing ner nlp r r-package

Last synced: 31 Jul 2024

https://github.com/DataKitchen/data-observability-installer

Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility across and down your data estate. Save time with simple, fast data quality test generation and execution. Trust your data, tools, and systems end to end.

data data-engineering data-observability data-profiling data-quality data-reliability data-science datachecker datacleaner datacleaning dataops dataquality datatesting datavalidation mssql pipeline-tests postgresql redshift self-hosted snowflake

Last synced: 02 Aug 2024

https://github.com/jmwoloso/pychattr

Python Channel Attribution (pychattr) - A Python implementation of the excellent R ChannelAttribution library

channel-attribution data-analysis data-science machine-learning python python-channel-attribution rpy2 wrapper

Last synced: 02 Aug 2024

https://github.com/meteostat/weather-stations

A list of public weather stations everyone can edit and share.

climate data-science json meteostat weather weather-stations

Last synced: 08 Aug 2024

https://github.com/ahmed-mohamed-sn/olliePy

OlliePy is a python package which can help data scientists in exploring their data and evaluating and analysing their machine learning experiments by utilising the power and structure of modern web applications. The data scientist only needs to provide the data and any required information and OlliePy will generate the rest.

ai analytics charts dashboard data data-analytics data-science data-scientist eda error-analysis exploratory-data-analysis machine-learning python visualization

Last synced: 03 Aug 2024

https://github.com/dlab-berkeley/Python-Data-Wrangling-Legacy

D-Lab's 3 hour introduction to data wrangling in Python. Learn how to import and manipulate dataframes using pandas in Python.

data-science pandas python

Last synced: 02 Aug 2024

https://github.com/elshor/dstools

Javascript tools and utilities for the data scientist

data-science javascript

Last synced: 31 Jul 2024

https://github.com/shlizee/NeuroAI

NeuroAI-UW seminar, a regular weekly seminar for the UW community, organized by NeuroAI Shlizerman Lab.

ai cvpr data-science deep-learning eccv icml neural-networks neurips neuroscience-methods recurrent-neural-networks sfn

Last synced: 02 Aug 2024

https://github.com/rcdilorenzo/ecce

ML Prediction of Bible Topics and Passages (Python / React)

data-science fastapi fully-connected-network interactive-visualizations keras-tensorflow reactjs

Last synced: 02 Aug 2024

https://github.com/PKU-DAIR/mindware

An efficient open-source AutoML system for automating machine learning lifecycle, including feature engineering, neural architecture search, and hyper-parameter tuning.

automl-algorithms automl-pipeline bayesian-optimization blackbox-optimization data-science deep-learning distributed-systems ensemble-learning hyper-parameter-optimization knobs-tuning machine-learning meta-learning neural-architecture-search python

Last synced: 03 Aug 2024

https://github.com/daun-io/Study-Data-Science

Practical data science notebooks that I used to study at 2016

data-science jupyter-notebook machine-learning tensorflow

Last synced: 07 Aug 2024

https://ddotta.github.io/cookbook-rpolars/

Cookbook to provide solutions to common tasks and problems in using Polars with R

benchmark cookbook data-engineering data-science datatable dplyr polars r tidyr

Last synced: 04 Aug 2024

https://github.com/lter/lterdatasampler

LTER data samples to teach environmental data science

data-science ecology

Last synced: 03 Aug 2024

https://github.com/gesiscss/css_methods_python

A full course of self-explanatory and freely available materials on CSS methods

data-science jupyter-notebook python

Last synced: 31 Jul 2024

https://github.com/electronick1/stairs

Framework which helps you to make parallel/distributed calculations using data pipelines

data-engineering data-pipeline data-science distributed-computing python

Last synced: 02 Aug 2024

https://github.com/okfn-brasil/whistleblower

🚨A Twitter bot for publicly reporting suspicions found by Rosie, Serenata de Amor's AI

data-science facebook-messenger-bot machine-learning twitter-bot

Last synced: 31 Jul 2024

https://github.com/ropensci/rdataretriever

R interface to the Data Retriever

data data-science database datasets r r-package rstats science

Last synced: 13 Aug 2024

https://github.com/zincware/ZnTrack

Create, visualize, run & benchmark DVC pipelines in Python & Jupyter notebooks.

data-science data-version-control developer-tools dvc git machine-learning python reproducibility

Last synced: 03 Aug 2024

https://github.com/jonathandinu/spark-ray-data-science

Supporting content (slides and exercises) for the Pearson video series covering best practices for developing scalable applications with Spark and Ray in the context of a data scientist's standard workflow.

artificial-intelligence data-science distributed-computing machine-learning python ray spark

Last synced: 03 Aug 2024

https://github.com/google/bayesnf

Bayesian Neural Field models for prediction in large-scale spatiotemporal datasets

bayesian-inference data-science machine-learning spatiotemporal-data-analysis statistics

Last synced: 18 Sep 2024