Data Science
Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.
- GitHub: https://github.com/topics/data-science
- Wikipedia: https://en.wikipedia.org/wiki/Data_science
- Related Topics: data-analysis, data-mining, machine-learning, big-data, data-visualization,
- Aliases: datasciences, data-science-project, data-science-algorithm,
- Last updated: 2026-07-03 00:07:42 UTC
- JSON Representation
https://github.com/ucla-biostat-203b/2024winter
biostatistics data-science machine-learning
Last synced: 26 Feb 2025
https://github.com/alexcj10/diwali-sales-analysis
This repository contains an analysis of Diwali sales data to uncover trends and patterns in customer behavior. The project aims to provide insights into customer demographics, purchasing habits, and product preferences during the Diwali season.
analysis data-science diwali jupyter-notebook matplotlib numpy pandas python sales seaborn
Last synced: 15 Apr 2025
https://github.com/portfoliome/pgawedge
Postgresql Sqlalchemy adapter
data-science etl finance postgres postgresql python-3 sqlalchemy
Last synced: 15 Mar 2026
https://github.com/mohidex/data-pipeline-on-gcp
The Real-time Ecommerce Data Collection and Processing project empowers businesses with real-time insights by efficiently extracting, processing, and storing ecommerce data from multiple sources. Combining Golang and Python, this cutting-edge solution streamlines data handling from diverse ecommerce websites.
beautifulsoup data-engineer data-pipeline data-science database datastore dependency-injection firebase firestore gcp go golang google google-cloud pubsub python solid-principles storage web-scraping
Last synced: 14 Apr 2025
https://github.com/itzmeanjan/corporatez
Data analysis done on Ministry of Corporate Affairs, Govt. of India's open data to get deeper insight, with :heart:
company-data corporate data-science data-visualization govt-company india matplotlib opendata python3 visualization
Last synced: 14 Oct 2025
https://github.com/kennethleungty/tensorflow-transfer-learning-image-classification
Practical Guide to Transfer Learning in TensorFlow for Multiclass Image Classification
artificial-intelligence data-science deep-learning image-classification machine-learning tensorflow transfer-learning
Last synced: 05 Oct 2025
https://github.com/sharatsawhney/character_segmentation
A detailed Research project on Character-Segmentation using Neural Networks!
data-science deep-learning deep-neural-networks keras keras-layer keras-models keras-neural-networks matplotlib neural-network numpy opencv-python
Last synced: 02 Apr 2025
https://github.com/polis-community/red-dwarf
A DIMensional REDuction library for stellarpunk democracy into the long haul. (Inspired by Pol.is)
civic-tech collective-intelligence data-science deliberative-democracy democracy dimensionality-reduction participatory-democracy polis
Last synced: 06 Oct 2025
https://github.com/judftteam/aiida-jutools
Tools for simplifying daily work with the AiiDA workflow engine
aiida computational-materials-science computational-science data-science density-functional-theory dft forschungszentrum-juelich high-throughput judft materials-informatics materials-science pandas provenance toolkit utility workflow
Last synced: 26 Jan 2026
https://github.com/ruivieira/nim-mentat
A Nim library for data science and machine learning
data-science library machine-learning nim scientific-computing
Last synced: 10 Aug 2025
https://github.com/bhattbhavesh91/auto-sklearn-tutorial
Small tutorial on auto-sklearn which is an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator.
auto-ml auto-sklearn automl data-science machine-learning python tutorial
Last synced: 27 Oct 2025
https://github.com/datasets/genome-sequencing-costs
Costs associated with DNA sequencing since 2001
Last synced: 19 Oct 2025
https://github.com/kqc-real/streamlit
MC-Tests in deutscher Sprache
agiles-projektmanagement data-science deep-learning mathematische-grundlagen
Last synced: 23 Jan 2026
https://github.com/abtinz/machine-learning-with-python
Machine Learning with Python in Jupiter
data-mining data-science fuzzy-logic machine-learning matplotlib numpy pandas preprocessing regression
Last synced: 29 Jul 2025
https://github.com/iamyajat/whatsapp-chat-analyzer-api
An API to analyse WhatsApp chats and generate insights
data-analysis data-science fastapi python whatsapp
Last synced: 17 Oct 2025
https://github.com/alipsa/matrix
Groovy library for working with tabular data.
analytics data-science groovy tables
Last synced: 02 Apr 2026
https://github.com/stink-po/boxoffice_api
Unofficial Python API for Box Office Mojo
data-science dataset movies-and-cinemas scraper
Last synced: 07 Sep 2025
https://github.com/surajv311/udemy_course_resources
List of course resources from my Udemy Course : "Numpy for Data Science" 2020
arrays data-science numpy numpy-tutorial python3 udemy udemy-course
Last synced: 16 May 2025
https://github.com/he7d3r/maratona-behind-the-code-2021
data-science ibm-cloud machine-learning maratona
Last synced: 26 Oct 2025
https://github.com/devinterview-io/model-evaluation-interview-questions
๐ฃ Model Evaluation interview questions and answers to help you prepare for your next machine learning and data science interview in 2024.
ai-interview-questions coding-interview-questions coding-interviews data-science data-science-interview data-science-interview-questions data-scientist-interview interview-practice interview-preparation machine-learning machine-learning-and-data-science machine-learning-interview machine-learning-interview-questions model-evaluation model-evaluation-interview-questions model-evaluation-questions model-evaluation-tech-interview software-engineer-interview technical-interview-questions
Last synced: 08 Jan 2026
https://github.com/codelibs/docker-fione
Docker for Fione
ai automl data-science machine-learning
Last synced: 23 Mar 2025
https://github.com/navdeep-g/sdss-2019
Interpretable Machine Learning with rsparkling
data-science h2o-3 machine-learning r rsparkling spark sparklyr xai
Last synced: 07 Apr 2025
https://github.com/robinthibaut/project_template
Template for Python scientific projects
data-science python science-research template template-project vcs
Last synced: 14 Aug 2025
https://github.com/stefanpeidli/gonet
A students Project on GO
data-science go machine-learning neural-network students
Last synced: 26 Mar 2025
https://github.com/peleiden/pelutils
Utility module for Python
data-science logging machine-learning parsing profiling
Last synced: 14 Jan 2026
https://github.com/devinterview-io/cost-function-interview-questions
๐ฃ Cost Function interview questions and answers to help you prepare for your next machine learning and data science interview in 2024.
ai-interview-questions coding-interview-questions coding-interviews cost-function cost-function-interview-questions cost-function-questions cost-function-tech-interview data-science data-science-interview data-science-interview-questions data-scientist-interview interview-practice interview-preparation machine-learning machine-learning-and-data-science machine-learning-interview machine-learning-interview-questions software-engineer-interview technical-interview-questions
Last synced: 08 Jan 2026
https://github.com/negativenagesh/arogyamitra
An accessible, reliable, and efficient platform for medical information and support using LLMs
data-science embeddings flask genai knowledgebase langchain llama2 llm meta-llama-2-chat pineconedb python semantic-indexing vector-database
Last synced: 19 Jun 2025
https://github.com/thecoderpinar/gen-expression
Gene expression analysis is a fundamental component of genomics research, providing valuable insights into how genes are regulated and their impact on various biological processes. This project delves into the realm of gene expression data, aiming to uncover hidden patterns and relationships within complex datasets. ๐
bioinformatics biotechnology data-analysis data-science data-visualization genomics kaggle machine-learning pca python
Last synced: 30 Apr 2025
https://github.com/dayyass/extended-naive-bayes
[WIP] Extension of sklearn Naive Bayes models that allows sampling and more feature distributions.
data-science distributions generative-model machine-learning naive-bayes python sampling scikit-learn
Last synced: 13 Apr 2025
https://github.com/barrettotte/ibmi-jupyter
Utility notebook for using Jupyter notebooks with IBMi for basic reports and visualizations.
data-science db2 db2i ibmi jupyter-notebook
Last synced: 11 Apr 2025
https://github.com/coelhosilva/flight-ad
flight-ad is a Python package for anomaly detection in the aviation domain built on top of scikit-learn.
anomaly-detection data-science fdm flight-data flight-data-analysis flight-data-monitoring machine-learning python scikit-learn
Last synced: 10 Apr 2025
https://github.com/omarsar/data_mining_hw_1
Contains information for the first assignment of Data Mining 2017 Fall, NTHU.
data data-mining data-science datavisualization pandas
Last synced: 10 Apr 2025
https://github.com/a-r-j/graphtype
Type hinting for networkx Graphs
data-science graph graph-algorithms graph-theory network-analysis networkx pydata python scientific-computing typehinting typehints
Last synced: 18 Mar 2025
https://github.com/flexmonster/pivot-jupyter-notebook
Jupyter Notebook pivot table example with Flexmonster
data-analysis data-science interactive jupyter-notebook pivot-tables python
Last synced: 16 Jun 2025
https://github.com/devinterview-io/autoencoders-interview-questions
๐ฃ Autoencoders interview questions and answers to help you prepare for your next machine learning and data science interview in 2025.
ai-interview-questions autoencoders autoencoders-interview-questions autoencoders-questions autoencoders-tech-interview coding-interview-questions coding-interviews data-science data-science-interview data-science-interview-questions data-scientist-interview interview-practice interview-preparation machine-learning machine-learning-and-data-science machine-learning-interview machine-learning-interview-questions software-engineer-interview technical-interview-questions
Last synced: 16 Feb 2026
https://github.com/mathworks-teaching-resources/probability-theory
A courseware module that covers the fundamental concepts in probability theory and their implications in data science. Topics include probability, random variables, and Bayes' Theorem.
bayesian-statistics courseware cwm data-science mathematics matlab matlab-live-script probability-theory random-variables
Last synced: 15 Jul 2025
https://github.com/overhash/supermarket-tracker
A supermarket aggregator for price information at New Zealand supermarkets
data-science new-zealand nz prices rust-lang supermarket
Last synced: 11 Apr 2025
https://github.com/vbyan/deeva
๐Deeva - your smart analytics companion for Object Detection datasets
data data-science data-visualization datasets deeva machine-learning object-detection plotly python statistics streamlit visualization
Last synced: 26 Jun 2025
https://github.com/kennethleungty/statsassume
Automating Assumption Checks for Regression Models (Work in Progress, Currently Paused)
assumption-check assumption-checks assumptions binary-logistic-regression data-analytics data-science linear-regression logistic-regression machine-learning ml multilinear-regression multinomial-logistic-regression multinomial-regression python regression regression-models statistics stats statsassume statsmodels
Last synced: 12 Jul 2025
https://github.com/tushar2704/machinealgobox
Explore common ML algorithms, from scratch implementations to real-world use cases, Each algorithm is accompanied by clear explanations, code implementations, and real-world use cases, enabling you to grasp their underlying principles and apply them to different problem domains.
algorithms alogorithms-implemented artificial-intelligence data data-analytics data-engineering data-science deployment machine-learning-algorithms mlops python r streamlit streamlit-tushar2704 tushar2704
Last synced: 07 May 2025
https://github.com/iamantimpal/iamantimpal
๐ Hi, I'm Antim Pal, the Founder of Optimism Educator. An online platform dedicated to empowering students with skills in Computer Science, Web Design, Graphic
data-analysis data-science data-visualization database database-design database-management datascience graphical-user-interface graphics grapic-design reading-list readme readme-badges readme-generator readme-md readme-profile readme-stats readme-template
Last synced: 10 Apr 2025
https://github.com/markziemann/5pillars
Five pillars of computational reproducibility
bioinformatics computational-biology data-science journal-article reproducible-research
Last synced: 18 Feb 2026
https://github.com/john-hawkins/projit
Application for managing the structure, properties, data, experiments and build of data science projects.
data-science experiments machine-learning project-management
Last synced: 23 Jun 2025
https://github.com/carpentries-incubator/data-science-for-docs
Data Science For Practicing Clinicians
carpentries carpentries-incubator data-science education english lesson medicine pre-alpha
Last synced: 02 Feb 2026
https://github.com/ndleah/transactions
๐ช Linear regression model, predict monthly transaction amount
data-science financial-modeling linear-regression mlr transactions
Last synced: 05 May 2025
https://github.com/juliusmarkwei/crypto-jacking-classificatioin
classifying network activity from various websites as either cryptojacking or not based on features related to both network-based and host-based data.
cryptojacking data-science machine-learning python
Last synced: 13 Apr 2025
https://github.com/giswqs/timelapse
An interactive streamlit web app for creating satellite timelapse
data-science dataviz earthengine geopython python satellite streamlit
Last synced: 12 May 2025
https://github.com/ryanrudes/wikimedia
A dataset comprised of over 40 million images sourced from Wikimedia Commons
computer-vision data-science data-scraping dataset datasets deep-learning gans image images machine-learning wikimedia wikimedia-commons
Last synced: 13 Sep 2025
https://github.com/coalio/Assistant
A data science library providing flexible dataframes for Lua 5.1+
data-analysis data-science data-structures dataframe lua
Last synced: 11 Apr 2025
https://github.com/aflah02/nlp-albumentations-data-augmentation
This repository contains helper functions which can help you generate additional data points depending on your NLP task.
Last synced: 09 Jul 2025
https://github.com/leriomaggio/develer-data-science
Deep dive into Data Science with Python @ Develer
data-science deep-learning keras keras-tensorflow lecture-notes machine-learning numpy python python3 scikit-learn tutorial
Last synced: 21 Jul 2025
https://github.com/timetoai/timediffusion_forecasting
Research Project on time-series forecasting
data-science deep-learning machine-learning pytorch time-series time-series-forecasting
Last synced: 07 Mar 2026
https://github.com/carlomazzaferro/numerai_easy_ml
General purpose workflow for machine learning projects applied to the https://numer.ai data challenges.
data-science mahchine-leaning numerai
Last synced: 26 Mar 2025
https://github.com/joshwlambert/daisieprep
Extracts phylogenetic island community data from phylogenetic trees
data-science island-biogeography phylogenetics r
Last synced: 18 Mar 2025
https://github.com/njlyon0/supportr
Support Functions for Wrangling and Visualization
Last synced: 20 Mar 2025
https://github.com/arose13/pliablelasso
Python implementation of the pliable lasso
Last synced: 09 May 2025
https://github.com/paoloripamonti/word2vec-keras
Word2Vec Keras Text Classifier
data-science keras machine-learning text-classification word2vec
Last synced: 09 May 2025
https://github.com/erp12/rica
DataFrame abstraction for Clojure data scientists.
clojure clojurescript data-science dataframe
Last synced: 11 Apr 2025
https://github.com/gjtorikian/destroy-all-monuments
This is data taken from the SPLC report titled "Whose Heritage? Public Symbols of the Confederacy" from April 21, 2016
data-science government-data social-justice
Last synced: 10 Apr 2025
https://github.com/devinterview-io/chatgpt-interview-questions
๐ฃ ChatGPT interview questions and answers to help you prepare for your next machine learning and data science interview in 2024.
ai-interview-questions chatgpt chatgpt-interview-questions chatgpt-questions chatgpt-tech-interview coding-interview-questions coding-interviews data-science data-science-interview data-science-interview-questions data-scientist-interview interview-practice interview-preparation machine-learning machine-learning-and-data-science machine-learning-interview machine-learning-interview-questions software-engineer-interview technical-interview-questions
Last synced: 05 May 2025
https://github.com/wilhelmagren/finq
๐ฌ Quantitative analysis and management toolbox for financial applications.
analysis data-analysis data-science finance financial-data investment monte-carlo nasdaq optimization portfolio-management portfolio-optimization quantitative-finance quantum-computing time-series yahoo-finance
Last synced: 22 Mar 2025
https://github.com/hunterdii/iriswise
IrisWise is a machine learning application for predicting Iris flower species. Built with Streamlit, this app provides a user-friendly interface to input flower measurements and receive predictions using various models, including K-Nearest Neighbors, (Random Forest, SVM, and Logistic Regression) **(Working On It...)**.
classifier-model data-science flowers-recognition iris-dataset iris-recognition knn-classification machine-learning pickle python python3 streamlit streamlit-webapp
Last synced: 21 Feb 2026
https://github.com/blmoore/summerdatachallenge
My entry for: http://summerdatachallenge.com (I came 3rd)
analytics data-science london r real-estate rstats
Last synced: 30 Apr 2025
https://github.com/hevalhazalkurt/exploring_the_data_of_lego_history
A data exploration project on LEGO history in Python with pandas, matplotlib etc. (WIP)
data data-analysis data-science data-visualization datascience datasets lego lego-history matplotlib pandas python python3
Last synced: 13 Apr 2025
https://github.com/sayakpaul/applied-data-science-w-python-specialization
Contains my assignments, guiding notebooks (provided as the course materials) and the datasets.
data-science matplotlib numpy pandas python3 scipy scipy-stack
Last synced: 12 May 2025
https://github.com/semasuka/income-classification
Predicting if an individual make more than 50K using different features
aws-s3 binary-classification data-analysis data-science data-visualization eda finance-analytics machine-learning precision python random-forest-classifier scikit-learn streamlit
Last synced: 14 Jul 2025
https://github.com/leomaurodesenv/data-science-api-framework
A simple framework to test and deploy your Data Science API
api api-rest data-science dataops docker flask-api python
Last synced: 09 Sep 2025
https://github.com/sdhutchins/cookiecutter-computational-biology
A boilerplate for reproducible computational biology projects.
biology computational-biology cookiecutter cookiecutter-template data-science science
Last synced: 20 Mar 2025
https://github.com/neo4j-graph-examples/contact-tracing
Contact Tracing graph for pandemic spread e.g. COVID-19 based on http://blog.bruggen.com/search/label/contact%20tracing
contact-tracing covid-data covid19 data-science dataset example-data graphdb healthcare neo4j neo4j-approved
Last synced: 18 Jul 2025
https://github.com/yusufcinarci/web-scraping-projects
In these project files, I will host the web scraping examples that I will make day by day.
data-analysis data-science jupyter-notebook python web-scraping
Last synced: 01 May 2025
https://github.com/zaman-hamza/citadel-datathon
My submission to the 2022 East Coast Datathon. The event started on the 21st of March and ended on the 28th, lasting about a whole week. I was in a team of two where we analyzed the non-conventional indicators and instigators of traffic.
citadel data-science data-visualization datathon
Last synced: 10 Apr 2025
https://github.com/davidssmith/rawarray.jl
Raw array (RA) file format for simple, robust, and user-friendly N-dimensional array storage
bytes complex-numbers data-science file-format julia large-dataset large-files ra-format rawarray scientific-computing storage
Last synced: 10 Sep 2025
https://github.com/iguptashubham/online-retail-sales
This Power BI dashboard, designed for marketing strategists, analyzes sales trends and customer behavior. It provides key insights empowering them to identify sales opportunities and optimize marketing campaigns, ultimately boosting business sales.
dashboard data data-analysis data-analysis-project data-analysis-project-powerbi data-analysis-python data-project data-science powerbi project
Last synced: 19 Mar 2026
https://github.com/wlandau/targets-intro
Introduction to the {targets} R package
data-science high-performance-computing make pipeline r r-package r-targetopia reproducibility reproducible-research rstats targets workflow
Last synced: 20 Mar 2025
https://github.com/getindata/quickstart-ml-starter
Kedro starterts to quickly set up new projects according to QuickStart ML Blueprints practice.
Last synced: 30 Oct 2025
https://github.com/tjpalanca/tjcloud
TJ Palanca's Personal Cloud
chromebooks cloud data-science docker kubernetes kubernetes-cluster rstudio terraform
Last synced: 13 Apr 2025
https://github.com/hugo-strang/silhouette-upper-bound
An upper bound of the Average Silhouette Width.
cluster-analysis clustering clustering-evaluation data-mining data-science machine-learning python python3 silhouette-coefficient silhouette-score upper-bound
Last synced: 14 Dec 2025
https://github.com/whizsid/kddbscan-rs
A rust library inspired by kDDBSCAN clustering algorithm
clustering data-science density-based-clustering deviation machine-learning-algorithms pinned
Last synced: 10 Apr 2025
https://github.com/raulpy271/languagesdataset
๐ I created a dataset with over 600 programming languages information
bot data-analysis data-mining data-science database ipython-notebook jupyter-notebook numpy pandas python selenium selenium-python selenium-webdriver web-scraping
Last synced: 24 Jun 2025
https://github.com/krishnaura45/stresssense
Estimation of Stress Levels Using PPG Signals
data-science datadriven feature-engineering machine-learning mental-health research-project signal-processing stress
Last synced: 13 Apr 2025
https://github.com/adamvvu/snapshot_ensemble
Train TensorFlow Keras models with cosine annealing and save an ensemble of models with no additional computational expense.
data-science deep-learning keras machine-learning python tensorflow
Last synced: 28 Oct 2025
https://github.com/rahul-jha98/restauranttrends.stats
Visualise the trends in food and restaurant choices of customers in a city by scraping data from Zomato.
data-analysis data-science visualization vuejs zomato zomato-api zomato-scraper
Last synced: 08 Jul 2025
https://github.com/leonard-seydoux/scientific-computing-for-geophysical-problems
Lecture notes and Jupyter notebooks for geophysical problems
ambient-noise data-science geomagnetism inverse-problems ipgp jupyter labs lecture-notes master notebooks python seismic seismology
Last synced: 10 Apr 2025
https://github.com/devinterview-io/light-gbm-interview-questions
๐ฃ LightGBM interview questions and answers to help you prepare for your next machine learning and data science interview in 2024.
ai-interview-questions coding-interview-questions coding-interviews data-science data-science-interview data-science-interview-questions data-scientist-interview interview-practice interview-preparation light-gbm light-gbm-interview-questions light-gbm-questions light-gbm-tech-interview machine-learning machine-learning-and-data-science machine-learning-interview machine-learning-interview-questions software-engineer-interview technical-interview-questions
Last synced: 11 Jan 2026
https://github.com/chiraag-kakar/pubg
What's the best strategy to win in PUBG? Should you sit in one spot and hide your way into victory, or do you need to be the top shot? Let's let the data do the talking!
data-science feature-engineering machine-learning-algorithms project pubg-api random-forest
Last synced: 07 May 2025
https://github.com/nikbarb810/pattern-recognition
Basic pattern recognition algorithms implemented in Python
data-science ipynb-jupyter-notebook matplotlib numpy pattern-recognition python
Last synced: 06 Mar 2026
https://github.com/nationalparkservice/qckit
QCkit provides useful functions for data quality control and manipulation including updating data to DarwinCore standards, unit conversions, and data flagging.
darwin-core data-quality data-science npsdataverse quality-control r r-package rstats
Last synced: 22 Jun 2025
https://github.com/vedadiyan/genql
GenQL is a generic querying language fully written in Go
data-analysis data-mapping data-processing data-science data-translation json json-data sql
Last synced: 22 Jun 2025
https://github.com/visokio/omniscope-custom-blocks
Public repository for custom blocks for Omniscope
business-intelligence data-science dataanalytics datapreparation python rstats
Last synced: 06 Apr 2026
https://github.com/jbris/stan-cmdstanr-gpu-docker
A Docker image to run Stan, cmdstanr, and brms for Bayesian statistical modelling. GPU support using OpenCL is available.
bayes bayesian-inference brms cmdstan cmdstanr data-science docker posterior probabilistic-programming projpred rstan rstanarm shinystan stan stan-gpu stan-lang stan-math-library tidybayes tidyverse
Last synced: 04 May 2025
https://github.com/lucasrodes/pyphoon
Tools for Digital Typhoon DL/ML Project
data-science dataset environment machine-learning tropical-cyclone
Last synced: 18 Mar 2025
https://github.com/stefanrmmr/kaggle_twitter_airline_sentiment
Kaggle Twitter US Airline Sentiment, Implementation of a Tweet Text Sentiment Analysis Model, using custom trained Word Embeddings and LSTM-Deep learning [TUM-Data Analysis&ML summer 2021] @adrianbruenger @stefanrmmr
data-science deep-learning kaggle-airline-dataset kaggle-sentiment-analysis kaggle-us-airlines lstm-neural-networks python sentiment-analysis skipgram text-sentiment-classification tweepy tweet-classification tweet-sentiment-analysis twitter twitter-sentiment-analysis us-airline-dataset word2vec
Last synced: 19 Mar 2025
https://github.com/orkunaktas/wine-quality-prediction
๐ท๐ฌ Wine Quality and Forecast ๐พ๐
alcohol data-science logistic-regression wine-quality
Last synced: 08 Sep 2025
https://github.com/octoenergy/s3migrate
Bulk delete/copy/move files or modify Hive/Drill/Athena partitions using pythonic pattern matching
Last synced: 24 Jun 2025
https://github.com/pharo-ai/data-partitioners
Pharo library for partitioning a collection. Given a set of proportions (e.g. 50%, 30%, and 20%), it shuffles the collection and divides it into non-empty subsets in such a way that every element is included in exactly one subset. Can be used in machine learning and statistical analysis for splitting data into training, validation, and test sets.
data-science machine-learning pharo statistical-analysis
Last synced: 11 Apr 2025
https://github.com/eikevons/pandas-paddles
Access the parent Pandas data frame in loc[], iloc[], assign(), and others Pandas helpers
data-analysis data-exploration data-science pandas pandas-dataframe pandas-library pandas-loc
Last synced: 16 Jun 2025
https://github.com/praveen1664/chatbot
This is a chatbot written in python & getting inputs directly from sql database
chatbot data-science database json nlp python3 sqlite sqlite3
Last synced: 11 Jul 2025
https://github.com/wazzabeee/twitter-sentiment-analysis-pyspark
Comparative study of classification algorithms implemented in PySpark on the Sentiment 140 dataset.
apache-spark data data-science gcp google-cloud logistic-regression naive-bayes-classifier natural-language-processing nlp nlp-machine-learning pyspark python python3 sentiment-analysis sentiment-classification sentiment140-dataset sentimental-analysis spark tweet twitter
Last synced: 06 May 2025