Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/renumics/sliceguard

A library for detecting problematic data segments in structured and unstructured data with few lines of code.

data-analysis data-cleaning data-curation data-exploration data-science data-visualization deep-learning eda exploratory-data-analysis machine-learning python visualization

Last synced: 27 Oct 2024

https://github.com/bnosac/crfsuite

Labelling Sequential Data in Natural Language Processing with R - using CRFsuite

chunking conditional-random-fields crf crfsuite data-science intent-classification natural-language-processing ner nlp r r-package

Last synced: 27 Dec 2024

https://github.com/Desbordante/desbordante-core

Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.

anomaly-detection correlations data-analytics data-cleaning data-cleansing data-engineering data-exploration data-mining data-mining-algorithms data-preprocessing data-profiling data-science data-wrangling exploratory-data-analysis feature-engineering feature-extraction feature-selection knowledge-discovery spreadsheets tabular-data

Last synced: 04 Nov 2024

https://github.com/dayyass/qaner

Unofficial implementation of QaNER: Prompting Question Answering Models for Few-shot Named Entity Recognition.

data-science machine-learning named-entity-recognition natural-language-processing ner nlp python python3 question-answering

Last synced: 07 Nov 2024

https://github.com/ahammadmejbah/artificial-intelligence-important-documents-collections

AI technology is significant because it allows software to do human functions—understanding, reasoning, planning, communication, and perception—increasingly effectively, efficiently, and affordably.

ai algorithms big-data computer-science computer-vision data-analyst data-engineering data-mining data-science deep-learning machine-learning mathematics python

Last synced: 11 Nov 2024

https://github.com/terryyz/pyarmadillo

PyArmadillo: an alternative approach to linear algebra in Python

armadillo-library calculations data-science linear-algebra machine-learning

Last synced: 01 Dec 2024

https://github.com/gesiscss/css_methods_python

A full course of self-explanatory and freely available materials on CSS methods

data-science jupyter-notebook python

Last synced: 09 Jan 2025

https://github.com/dask-contrib/dask-awkward

Native Dask collection for awkward arrays, and the library to use it.

columnar-format dask data-analysis data-science data-structure jagged-array python ragged-array

Last synced: 13 Jan 2025

https://github.com/randyzwitch/streamlit-embedcode

Streamlit component for embedding code snippets such as GitHub gists, CodePen snippets, Gitlab snippets, etc.

data-analysis data-science data-visualization python streamlit streamlit-component

Last synced: 19 Dec 2024

https://github.com/gurupatil0003/python_tutorial

Python is a high-level, interpreted programming language known for its simplicity and readability.Python emphasizes code readability and allows programmers to express concepts in fewer lines of code compared to languages like C++ or Java.

data-science database library modules oops-in-python operator python

Last synced: 10 Jan 2025

https://github.com/hannansatopay/roughviz

A Python visualization library for creating sketchy/hand-drawn styled charts.

charts data-science hacktoberfest jupyter-notebook python-visualization roughviz vizualisation

Last synced: 08 Nov 2024

https://github.com/mitre/menelaus

Online and batch-based concept and data drift detection algorithms to monitor and maintain ML performance.

concept-drift data-drift data-science drift-detection machine-learning statistics

Last synced: 09 Nov 2024

https://github.com/tf-encrypted/moose

Secure distributed dataflow framework for encrypted machine learning and data processing

cryptography data-science distributed-computing machine-learning privacy secure-computation

Last synced: 09 Jan 2025

https://github.com/france-travail/gabarit

Gabarit : kickstart your data science project from scratch

data-science deep-learning machine-learning python

Last synced: 09 Jan 2025

https://github.com/seandavi/sars2pack

An R package with over 50 highly cited, read-to-use, up-to-date COVID-19 pandemic data resources

biomedical-data coronavirus coronavirus-tracking covid-19 data-science data-visualization datascience datasets epidemics epidemiology geospatial public-health rstats rstats-package

Last synced: 05 Nov 2024

https://github.com/cihat/datastructure

📌🔎📝 Veri Yapıları (BMU221) ve bütün derslerin dokümantasyonu. Notes and examples in the data structure and all lessons course. Data Structures with Java.

bilgisayar-muhendisligi computer-science data-science data-structure data-structure-blogs data-structures data-structures-and-algorithms documentation turkce-dokumantasyon veri-bilimi veri-yapilari

Last synced: 26 Dec 2024

https://github.com/meteostat/weather-stations

A list of public weather stations everyone can edit and share.

climate data-science json meteostat weather weather-stations

Last synced: 27 Nov 2024

https://github.com/alexioannides/ml-workflow-automation

Python Machine Learning (ML) project that demonstrates the archetypal ML workflow within a Jupyter notebook, with automated model deployment as a RESTful service on Kubernetes.

classification data-science flask helm jupyter-notebook kaggle kubernetes machine-learning mlops numpy pandas python rest-api sklearn

Last synced: 28 Oct 2024

https://github.com/ashishpatel26/datascienv

datascienv is package that helps you to setup your environment in single line of code with all dependency and it is also include pyforest that provide single line of import all required ml libraries

catboost data-science data-science-env datascienv imbalanced-data lightgbm matplotlib numpy pandas pycaret scikit-learn seaborn tensorflow2 xgboost

Last synced: 10 Oct 2024

https://github.com/rickiepark/hg-da

<혼자 공부하는 데이터 분석 with 파이썬>의 코드 저장소

data-analysis data-science data-visualization machine-learning matplotlib numpy pandas scikit-learn scipy

Last synced: 14 Jan 2025

https://github.com/opengeos/streamlit-map-template

A streamlit template for mapping applications

data-science geospatial mapping python streamlit

Last synced: 15 Jan 2025

https://github.com/mlverse/mall

Run multiple LLM predictions against a data frame with R and Python

data-science dplyr llm polars python r

Last synced: 09 Jan 2025

https://github.com/mdeff/ntds_2018

Material for the EPFL master course "A Network Tour of Data Science", edition 2018.

data-science education epfl graphs network-science

Last synced: 21 Nov 2024

https://github.com/astrazeneca/judgyprophet

Forecasting for knowable future events using Bayesian informative priors (forecasting with judgmental-adjustment).

ai bayesian data-science forecasting machine-learning python statistics

Last synced: 18 Nov 2024

https://github.com/jmwoloso/pychattr

Python Channel Attribution (pychattr) - A Python implementation of the excellent R ChannelAttribution library

channel-attribution data-analysis data-science machine-learning python python-channel-attribution rpy2 wrapper

Last synced: 13 Nov 2024

https://github.com/scicloj/wolframite

An interface between Clojure and Wolfram Language (the language of Mathematica)

clojure data-science mathematica wolfram-language

Last synced: 19 Dec 2024

https://github.com/mratsim/mckinsey-smartcities-traffic-prediction

Adventure into using multi attention recurrent neural networks for time-series (city traffic) for the 2017-11-18 McKinsey IronMan (24h non-stop) prediction challenge

data-science deep-learning keras machine-learning neural-networks tensorflow time-series

Last synced: 22 Oct 2024

https://github.com/AstraZeneca/judgyprophet

Forecasting for knowable future events using Bayesian informative priors (forecasting with judgmental-adjustment).

ai bayesian data-science forecasting machine-learning python statistics

Last synced: 26 Sep 2024

https://github.com/mine-cetinkaya-rundel/teach-r-online

Materials for the Teaching statistics and data science online workshops in July 2020

data-science education rstats statistics

Last synced: 22 Dec 2024

https://github.com/yusufcinarci/data-science-projects

In this repo, there are (beginner-upper) level projects in the field of data science. I will host these projects that I have done in this field every day in this repo. With the hope that it will be useful to those who are interested in the field of data science like me and will just start...

data-analysis data-science data-science-projects jupyter jupyter-notebook python

Last synced: 07 Nov 2024

https://github.com/tgsmith61591/skoot

A package for data science practitioners. This library implements a number of helpful, common data transformations with a scikit-learn friendly interface in an effort to expedite the modeling process.

data-science imbalanced-data machine-learning pandas python scikit-learn skutil

Last synced: 07 Nov 2024

https://github.com/antononcube/mathematicavsr

Example projects, code, and documents for comparing Mathematica with R.

comparison data-analysis data-science machine-learning mathematica r time-series

Last synced: 10 Jan 2025

https://github.com/stitchfix/mab

Library for multi-armed bandit selection strategies, including efficient deterministic implementations of Thompson sampling and epsilon-greedy.

data-science experimentation go golang multi-armed-bandit multi-armed-bandits multiarmed-bandits reinforcement-learning thompson thompson-sampling

Last synced: 03 Jan 2025

https://github.com/scrapinghub/aduana

Frontera backend to guide a crawl using PageRank, HITS or other ranking algorithms based on the link structure of the web graph, even when making big crawls (one billion pages).

data-science

Last synced: 10 Nov 2024

https://github.com/mikeizbicki/cmc-csci046

CMC's Data Structures and Algorithms Course Materials

cmc computer-science course data-science python3

Last synced: 08 Jan 2025

https://github.com/alinski29/stonks.jl

Julia library for standardizing financial data retrieval and storage from multiple APIs.

data data-mining data-science dataframe finance julia trading trading-algorithms

Last synced: 02 Nov 2024

https://github.com/junpenglao/planet_sakaar_data_science

A colourful collection of codes and notebooks, like Planet Sakaar

bayesian-inference data-science pymc3

Last synced: 02 Nov 2024

https://ddotta.github.io/cookbook-rpolars/

Cookbook to provide solutions to common tasks and problems in using Polars with R

benchmark cookbook data-engineering data-science datatable dplyr polars r tidyr

Last synced: 18 Nov 2024

https://github.com/ulikoehler/uliengineering

A python library for calculations perfomed in electronics engineering

data-analysis data-science electronics engineering python

Last synced: 13 Jan 2025

https://github.com/ahmed-mohamed-sn/olliePy

OlliePy is a python package which can help data scientists in exploring their data and evaluating and analysing their machine learning experiments by utilising the power and structure of modern web applications. The data scientist only needs to provide the data and any required information and OlliePy will generate the rest.

ai analytics charts dashboard data data-analytics data-science data-scientist eda error-analysis exploratory-data-analysis machine-learning python visualization

Last synced: 15 Nov 2024

https://github.com/svilupp/awesome-generative-ai-meets-julia-language

Comprehensive guide to generative AI projects and resources in Julia.

awesome awesome-list data-science generative-ai julia

Last synced: 28 Oct 2024

https://github.com/fcakyon/instafake-dataset

Dataset for Intagram Fake and Automated Account Detection

bot classification data-science dataset fake instafake instagram machine-learning research

Last synced: 06 Jan 2025

https://github.com/tommyod/paretoset

Compute the Pareto (non-dominated) set, i.e., skyline operator/query.

data-mining data-science datascience multi-objective-optimization optimization pandas skyline-query

Last synced: 12 Jan 2025

https://github.com/dlab-berkeley/Python-Data-Wrangling-Legacy

D-Lab's 3 hour introduction to data wrangling in Python. Learn how to import and manipulate dataframes using pandas in Python.

data-science pandas python

Last synced: 11 Nov 2024

https://github.com/elshor/dstools

Javascript tools and utilities for the data scientist

data-science javascript

Last synced: 27 Oct 2024

https://github.com/loukesio/ggvolc

𝐠𝐠𝐯𝐨𝐥𝐜 effortlessly translates differential expression datasets and RNAseq data into informative volcano plots. Highlight genes of interest with unprecedented ease. With just a single line of code, visualize complex datasets, gaining deeper insights and simplifying data representation

bioinformatics data-science data-visualization gro-seq rna-seq

Last synced: 21 Dec 2024

https://github.com/asad70/insider-trading

This program extracts insider trading data from the sec website and stores it in excel file for the specified time frame.

algotrading data-science extract-data insider-trading insiders tickers trading trading-strategies

Last synced: 11 Nov 2024

https://github.com/shlizee/NeuroAI

NeuroAI-UW seminar, a regular weekly seminar for the UW community, organized by NeuroAI Shlizerman Lab.

ai cvpr data-science deep-learning eccv icml neural-networks neurips neuroscience-methods recurrent-neural-networks sfn

Last synced: 12 Nov 2024

https://github.com/zeeshanahmad4/stock-prices-prediction-ml-flask-dashboard

This program predicts the price of GOOG stock for a specific day using the Machine Learning algorithm called Support Vector Regression (SVR) Linear Regression. Importing flask module in the project is mandatory An object of Flask class is our WSGI application.

classification data-mining data-science data-visualization dataset flask flask-dashboard linear-regression ml prediction prediction-algorithm prediction-model predictive-analytics python stock-analysis stock-market stock-prices stock-prices-prediction stock-trading visualization

Last synced: 10 Jan 2025

https://github.com/datakitchen/dataops-testgen

DataOps Data Quality TestGen is part of DataKitchen's Open Source Data Observability. DataOps TestGen delivers simple, fast data quality test generation and execution by data profiling,  new dataset hygiene review, AI generation of data quality validation tests, ongoing testing of data refreshes, & continuous anomaly monitoring

data data-engineering data-observability data-quality data-science data-testing datachecker dataops dataprofiling dataquality datavalidation mssql postgresql python redshift self-hosted snowflake

Last synced: 15 Jan 2025

https://github.com/sparkfish/shabby-pages

ShabbyPages is a state-of-the-art corpus of born-digital document images with both ground truth and distorted versions appropriate for use in training models to reverse distortions and recover to original denoised documents.

binarization born-digital computer-vision corpus data-science dataset denoising layout-detection

Last synced: 17 Dec 2024

https://github.com/jacksonburns/astartes

Better Data Splits for Machine Learning

ai data-science machine-learning ml python sampling

Last synced: 19 Dec 2024