Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/numeract/rflow

Flexible R Pipelines with Caching

cache data-science pipeline r rflow

Last synced: 10 Jun 2024

https://github.com/WeR-stats/workshop-setup_cloud_machine_data_science

Step-by-step instructions on how to set up a virtual machine for Data Science usiing Cloud Infrastructures

cloud data-science dataops digitalocean jupyterlab python r r-shiny r-stats rstudio rstudio-server shiny-server

Last synced: 10 Jun 2024

https://github.com/jimbrig/lossrunAnalyzer

R Package and Shiny App to Analyze Insurance Lossruns

actuarial data-analysis data-mining data-science insurance r record-linkage risk-management shiny

Last synced: 10 Jun 2024

https://github.com/bcgov/safepaths

An R 📦 to safely set & use a path to a private network

citz data-science r r-package rstats

Last synced: 10 Jun 2024

https://github.com/garynth41/John-Hopkins-University-Mastering-Software-Development-in-R

This repository covers R software development for building data science tools. As the field of data science evolves, it has become clear that software development skills are essential for producing useful data science results and products. You will obtain rigorous training in the R language, including the skills for handling complex data, building R packages and developing custom data visualizations. You will learn modern software development practices to build tools that are highly reusable, modular, and suitable for use in a team-based environment or a community of developers.

data-science debugging development-practices development-skills modular namespace package-manager software-development software-testing yml-reference

Last synced: 10 Jun 2024

https://github.com/ChawlaAvi/Daily-Dose-of-Data-Science

A collection of code snippets from the publication Daily Dose of Data Science on Substack: http://www.dailydoseofds.com/

data-analysis data-science data-science-tips data-visualization jupyter jupyter-notebook jupyter-tips matplotlib matplotlib-tips numpy pandas pandas-tips python python-tips sklearn

Last synced: 10 Jun 2024

https://github.com/romanmichaelpaolucci/AI_Stock_Trading

Design pattern for critical stages in the development process of an AI Stock Trading Bot

artificial-intelligence data-science machine-learning neural-network python trading trading-algorithms trading-bot trading-strategies

Last synced: 09 Jun 2024

https://github.com/tusharnankani/analysis-2.0

An Exhaustive WhatsApp Chat Data Analysis 2.0

analysis data data-science plots trends visualization

Last synced: 09 Jun 2024

https://github.com/aaronwangy/Data-Science-Cheatsheet

A helpful 5-page machine learning cheatsheet to assist with exam reviews, interview prep, and anything in-between.

cheatsheet data-science machine-learning

Last synced: 09 Jun 2024

https://github.com/scverse/scanpy

Single-cell analysis in Python. Scales to >1M cells.

anndata bioinformatics data-science machine-learning python scanpy scverse transcriptomics visualize-data

Last synced: 09 Jun 2024

https://github.com/jupyter-naas/naas

Low-code Python library to safely use notebooks in production: schedule workflows, generate assets, trigger webhooks, send notifications, build pipelines, manage secrets (Cloud-only)

ai binder data data-science data-transformation engine etl integration jupyter jupyterlab notebooks open-source pipeline

Last synced: 08 Jun 2024

https://github.com/JosephLai241/URS

Universal Reddit Scraper - A comprehensive Reddit scraping/archival command-line tool.

archiving command-line comments csv data-analysis data-science json livestream osint-tool praw pyo3 python reddit reddit-scraper redditor rust scraper subreddit trees wordcloud

Last synced: 08 Jun 2024

https://github.com/epsilla-cloud/vectordb

Epsilla is a high performance Vector Database Management System. Try out hosted Epsilla at https://cloud.epsilla.com/

ai chatgpt data data-science database embeddings embeddings-similarity infrastructure llms machine-learning neural-network neural-search rag retrieval search-engine vector-database vector-search

Last synced: 08 Jun 2024

https://github.com/target/data-validator

A tool to validate data, built around Apache Spark.

data-science data-validation hacktoberfest

Last synced: 07 Jun 2024

https://github.com/alibaba/feathub

FeatHub - A stream-batch unified feature store for real-time machine learning

apache-flink data data-engineering data-quality data-science feature-engineering feature-store machine-learning mlops streaming

Last synced: 07 Jun 2024

https://github.com/microsoft/responsible-ai-toolbox

Responsible AI Toolbox is a suite of tools providing model and data exploration and assessment user interfaces and libraries that enable a better understanding of AI systems. These interfaces and libraries empower developers and stakeholders of AI systems to develop and monitor AI more responsibly, and take better data-driven actions.

data-analysis data-science data-visualization error-analysis explainability explainable-ai explainable-ml fairness fairness-ai fairness-ml interpretability jupyter machine-learning machinelearning ml responsible-ai ui visualization widget widgets

Last synced: 07 Jun 2024

https://github.com/platonai/PulsarRPA

Automate webpages at scale, scrape web data completely and accurately with high performance, distributed RPA.

crawler data-mining data-science rpa scraper scraping web-automation web-crawler web-mining web-scraping web-sql

Last synced: 07 Jun 2024

https://github.com/erezsh/Preql

An interpreted relational query language that compiles to SQL.

data-science database python query sql

Last synced: 07 Jun 2024

https://github.com/blue-yonder/turbodbc

Turbodbc is a Python module to access relational databases via the Open Database Connectivity (ODBC) interface. The module complies with the Python Database API Specification 2.0.

data-science database exasol numpy odbc pep249 pyodbc python python-database-api speedup

Last synced: 07 Jun 2024

https://github.com/BiomedSciAI/causallib

A Python package for modular causal inference analysis and model evaluations

causal causal-inference causal-models causality data-science machine-learning ml

Last synced: 07 Jun 2024

https://github.com/jacobgil/confidenceinterval

The long missing library for python confidence intervals

data-science machine-learning metrics statistics

Last synced: 07 Jun 2024

https://github.com/chdb-io/chdb

chDB is an embedded OLAP SQL Engine 🚀 powered by ClickHouse

chdb clickhouse clickhouse-database clickhouse-server data-science database embedded-database olap python sql

Last synced: 07 Jun 2024

https://github.com/UniversalDataTool/courseware

Create instructions for labeling datasets using the Universal Data Tool

annotators courseware data-science dataset hacktoberfest label

Last synced: 07 Jun 2024

https://github.com/IlyaGusev/tgcontest

Telegram Data Clustering contest solution by Mindful Squirrel

classification clustering cpp data-science document-similarity fasttext machine-learning nlp

Last synced: 07 Jun 2024

https://github.com/EliotAndres/kaggle-past-solutions

A searchable compilation of Kaggle past solutions

awesome compilation data-science kaggle

Last synced: 06 Jun 2024

https://github.com/iphysresearch/DataSciComp

A collection of popular Data Science Challenges/Competitions || Countdown timers to keep track of the entry deadlines.

challenge competition data-challenge data-science data-science-competitions project

Last synced: 06 Jun 2024

https://github.com/shashankvemuri/Finance

150+ quantitative finance Python programs to help you gather, manipulate, and analyze stock market data

algorithmic-trading data-science finance machine-learning pandas python quantitative-finance stock stock-market stocks technical-indicators trading-strategies

Last synced: 06 Jun 2024

https://github.com/center-for-threat-informed-defense/sightings_ecosystem

Sightings Ecosystem gives cyber defenders visibility into what adversaries actually do in the wild. With your help, we are tracking MITRE ATT&CK® techniques observed to give defenders real data on technique prevalence.

ctid cyber-threat-intelligence cybersecurity data-science data-visualization mitre-attack

Last synced: 05 Jun 2024

https://github.com/jonrau1/SyntheticSun

SyntheticSun is a defense-in-depth security automation and monitoring framework which utilizes threat intelligence, machine learning, managed AWS security services and, serverless technologies to continuously prevent, detect and respond to threats.

anomaly-detection automation aws aws-security aws-serverless data-science data-visualization elasticsearch geolocation guardduty incident-response kibana machine-learning misp sagemaker security-automation security-tools serverless threat-detection threat-intelligence

Last synced: 05 Jun 2024

https://github.com/napjon/krisk

Statistical Interactive Visualization with pandas+Jupyter integration on top of Echarts.

dashboard data-science data-visualization echarts interactive-charts jupyter-notebook python

Last synced: 05 Jun 2024

https://github.com/mathewroy/ynabr

Analyze and visualize your You Need A Budget (YNAB) data. YNAB meets R programming language.

api data-analysis data-science data-visualization r ynab ynab-api

Last synced: 04 Jun 2024

https://github.com/beeva-jorgezaldivar/plumberModel

Create APIs for the deployment of R models with minimal code

api caret data-science deployment machine-learning plumber r

Last synced: 04 Jun 2024

https://github.com/jimbrig/property_allocation_demo

Dynamic Risk Allocation Model - bringing simplicity to the complex realm of Property Insurance.

actuarial-science allocation dashboard data-science insurance property-management r r-shiny workflow

Last synced: 04 Jun 2024

https://github.com/rpodcast/shinycal

The Data Science StreamRs Calendar!

data-science r shiny streaming

Last synced: 04 Jun 2024

https://github.com/jimbrig/lossdevt

An R package and Shiny App for Actuarial Loss Development and Reserving

actuarial-science claims-reserving data-science insurance property-casualty rpackage rshiny rstats workflow

Last synced: 04 Jun 2024

https://github.com/jimbrig/lossrx

An R package, plumber API, database, and Shiny App for Actuarial Loss Development and Reserving Workflows.

actuarial-science claims-data claims-reserving data-science insurance modelling property-casualty reserving rpackage rshiny rstats workflow

Last synced: 04 Jun 2024

https://github.com/casualcomputer/sql.mechanic

Functions that generate SQL queries that summarize high-dimensional tables stored in various databases (e.g. Microsoft SQL Servers, Netezza, DB2, Postgres, Oracle, MySQL, etc.).

data-analysis data-quality-checks data-science database mysql netezza oracle postgres quality-control r sql sql-server

Last synced: 04 Jun 2024

https://github.com/microsoft/finnts

Microsoft Finance Time Series Forecasting Framework (FinnTS) is a forecasting package that utilizes cutting-edge time series forecasting and parallelization on the cloud to produce accurate forecasts for financial data.

business data-science feature-selection finance finnts forecasting machine-learning microsoft r r-package rstats time-series

Last synced: 04 Jun 2024

https://github.com/lockedata/opentrainingcontent

An MIT & CCBY4.0 licensed repository of training materials from Locke Data

data-science open-course r-stats

Last synced: 04 Jun 2024

https://github.com/tuanle618/AEDA

AEDA - Automated Data Exploratory Analysis in R

data-science eda eda-report exploratory-data-analysis r

Last synced: 04 Jun 2024

https://github.com/nischalshrestha/Unravel

A fluent code explorer for R. 🔍

data-science datawrangling dplyr r rstats shiny tidyr tidyverse

Last synced: 03 Jun 2024

https://github.com/bcgov/bcdata

An R package for searching & retrieving data from the B.C. Data Catalogue

bcdc citz data-science env r r-package rstats

Last synced: 03 Jun 2024

https://github.com/erikaduan/r_tips

A repository of R usage tips for data cleaning, data mining, data visualisation, statistical inference and machine learning

data-science data-visualization machine-learning r rstats statistics

Last synced: 03 Jun 2024

https://github.com/MCodrescu/octopus

R Package for Interacting with Databases

data-science database r rshiny

Last synced: 03 Jun 2024

https://github.com/Correia-jpv/fucking-awesome-R

A curated list of awesome R packages, frameworks and software. With repository stars⭐ and forks🍴

awesome awesome-list data-analysis data-science list r r-framework r-language r-package r-programming rlanguage rprogramming rstats

Last synced: 03 Jun 2024

https://github.com/rnorm/book_sample

another book on data science

book data-science python r

Last synced: 03 Jun 2024

https://github.com/Rilfanayasmin/Raven-coding

Practice programming with R and Python on all data science algorithms

data-science machine-learning-algorithms predictive-modeling r-programming statistical-models

Last synced: 03 Jun 2024

https://github.com/Azure/aml-run

GitHub Action that allows you to submit a run to your Azure Machine Learning Workspace.

aml azure azure-machine-learning data-science machine-learning mlops

Last synced: 03 Jun 2024

https://github.com/aws/amazon-sagemaker-examples

Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.

aws data-science deep-learning examples inference jupyter-notebook machine-learning mlops reinforcement-learning sagemaker training

Last synced: 03 Jun 2024

https://github.com/dair-ai/ML-Course-Notes

🎓 Sharing machine learning course / lecture notes.

ai data-science deep-learning machine-learning natural-language-processing

Last synced: 03 Jun 2024

https://github.com/flyteorg/flytekit

Extensible Python SDK for developing Flyte tasks and workflows. Simple to get started and learn and highly extensible.

automation data data-science extensible flyte flyte-tasks hacktoberfest mlops pypi python sdk spark workflows

Last synced: 02 Jun 2024

https://github.com/ameygawade/streamlit-robots_txt_generator

This Streamlit app allows users to generate and customize a robots.txt file by selecting user-agents, specifying disallowed paths, enabling crawler delay, and providing a sitemap URL.

config data-science front generative generator google robots-txt search-algorithm search-engine seo seo-optimization stream streamlit txt-files web webapp webapplication

Last synced: 02 Jun 2024

https://github.com/curiousily/Machine-Learning-from-Scratch

Succinct Machine Learning algorithm implementations from scratch in Python, solving real-world problems (Notebooks and Book). Examples of Logistic Regression, Linear Regression, Decision Trees, K-means clustering, Sentiment Analysis, Recommender Systems, Neural Networks and Reinforcement Learning.

artificial-intelligence book classification data-science machine-learning machine-learning-algorithms neural-networks notebook recommender-systems regression reinforcement-learning sentiment-analysis

Last synced: 02 Jun 2024

https://github.com/AnonCatalyst/Coeus-Framework

Coeus 🌐 is an OSINT framework empowering users with tools for effective intelligence gathering from open sources. From social media monitoring 📱 to data analysis 📊, it offers a centralized platform for seamless OSINT investigations.

data-science data-visualization database forensic-analysis forensics forensics-tools framework information-retrieval infosec osint osint-framework osint-python osint-resources osint-tool osint-toolkit people-search reconnaissance

Last synced: 02 Jun 2024

https://github.com/Nachimak28/awesome-list-of-awesomes

A curated list of all the Awesome --Topic Name-- lists I've found till date relevant to Data lifecycle, ML and DL.

ai computer-vision cv data-science deep-learning distributed-systems dl machine-learning microservices ml natural-language-processing nlp papers research tools transfer-learning

Last synced: 02 Jun 2024

https://github.com/lancedb/lance

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..

apache-arrow computer-vision data-analysis data-analytics data-centric data-format data-science dataops deep-learning duckdb embeddings llms machine-learning mlops python rust

Last synced: 02 Jun 2024

https://github.com/dMLTquant/openbb_sdk_exporation

Explore OpenBB SDK without having to install anything on your local machine. You just need a GitHub and a GitPod account.

algorithmic-trading data-science financial-data jupyter notebook openbb python

Last synced: 02 Jun 2024