Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with data-cleaning
A curated list of projects in awesome lists tagged with data-cleaning .
https://github.com/cleanlab/cleanlab
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
active-learning annotation data-centric-ai data-cleaning data-curation data-labeling data-profiling data-quality data-science data-validation dataops dataquality datasets exploratory-data-analysis labeling llms noisy-labels out-of-distribution-detection outlier-detection weak-supervision
Last synced: 17 Dec 2024
https://github.com/johnkerl/miller
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
command-line command-line-tools csv csv-format data-cleaning data-processing data-reduction data-regression devops devops-tools json json-data miller statistical-analysis statistics streaming-algorithms streaming-data tabular-data tsv unix-toolkit
Last synced: 16 Dec 2024
https://github.com/voxel51/fiftyone
Refine high-quality datasets and visual AI models
active-learning artificial-intelligence computer-vision data-centric-ai data-cleaning data-curation data-quality data-science deep-learning developer-tools image-classification machine-learning object-detection python unstructured-data vector-search visualization
Last synced: 30 Oct 2024
https://github.com/unionai-oss/pandera
A light-weight, flexible, and expressive statistical data testing library
assertions data-assertions data-check data-cleaning data-processing data-validation data-verification dataframe-schema dataframes hypothesis-testing pandas pandas-dataframe pandas-validation pandas-validator schema testing testing-tools validation
Last synced: 29 Oct 2024
https://github.com/justmarkham/pandas-videos
Jupyter notebook and datasets from the pandas video series
data-analysis data-cleaning data-science jupyter-notebook pandas python tutorial
Last synced: 20 Dec 2024
https://github.com/justmarkham/dat8
General Assembly's 2015 Data Science course in Washington, DC
clustering course data-analysis data-cleaning data-science data-visualization decision-trees ensemble-learning jupyter-notebook linear-regression logistic-regression machine-learning model-evaluation naive-bayes natural-language-processing pandas python regular-expressions scikit-learn web-scraping
Last synced: 20 Dec 2024
https://github.com/hi-primus/optimus
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
big-data-cleaning bigdata cudf dask dask-cudf data-analysis data-cleaner data-cleaning data-cleansing data-exploration data-extraction data-preparation data-profiling data-science data-transformation data-wrangling machine-learning pyspark spark
Last synced: 17 Dec 2024
https://github.com/sfirke/janitor
simple tools for data cleaning in R
data-analysis data-cleaning data-science dirty-data excel pivot-tables r spss tabulations tidyverse
Last synced: 17 Dec 2024
https://github.com/data-forge/data-forge-ts
The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
csv data data-analysis data-cleaning data-cleansing data-forge data-management data-manipulation data-munging data-visualization data-wrangling javascript json linq nodejs pandas visualization
Last synced: 17 Dec 2024
https://github.com/skrub-data/skrub
Prepping tables for machine learning
data data-analysis data-cleaning data-preparation data-preprocessing data-science data-wrangling dirty-data machine-learning
Last synced: 19 Dec 2024
https://github.com/ECNU-ICALK/EduChat
An open-source educational chat model from ICALK, East China Normal University. 开源中英教育对话大模型。(通用基座模型,GPU部署,数据清理) 致敬: LLaMA, MOSS, BELLE, Ziya, vLLM
belle chinese-nlp data-cleaning education llama llm moss open-models
Last synced: 02 Nov 2024
https://github.com/schema-inspector/schema-inspector
Schema-Inspector is a simple JavaScript object sanitization and validation module.
data-cleaning javascript sanitization validation
Last synced: 20 Dec 2024
https://github.com/akanz1/klib
Easy to use Python library of customized functions for cleaning and analyzing data.
data-analysis data-cleaning data-preprocessing data-science data-visualization feature-selection klib python
Last synced: 15 Nov 2024
https://github.com/data-cleaning/validate
Professional data validation for the R environment
Last synced: 25 Oct 2024
https://github.com/jim-schwoebel/voicebook
🗣️ A book and repo to get you started programming voice computing applications in Python (10 chapters and 200+ scripts).
data data-cleaning encryption-decryption featurization generation machine-learning python3 security server transcription visualization voice voice-activity-detection voice-assistant voice-computing voice-control voice-recognition voice-recording wake-word-detection
Last synced: 15 Dec 2024
https://github.com/msamogh/nonechucks
Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
data-cleaning data-pipeline data-preprocessing data-processing machine-learning preprocessing pytorch torch
Last synced: 14 Nov 2024
https://github.com/probcomp/PClean
A domain-specific probabilistic programming language for scalable Bayesian data cleaning
bayesian-inference data-cleaning data-cleansing probabilistic-graphical-models probabilistic-programming
Last synced: 13 Nov 2024
https://github.com/genomoncology/FuzzTypes
Pydantic extension for annotating autocorrecting fields.
data-cleaning fuzzy-string-matching named-entity-linking pydantic
Last synced: 17 Nov 2024
https://github.com/ekstroem/datamaid
An R package for data screening
data-cleaning data-screening reproducible-research
Last synced: 03 Dec 2024
https://github.com/ekstroem/dataMaid
An R package for data screening
data-cleaning data-screening reproducible-research
Last synced: 13 Nov 2024
https://github.com/jim-schwoebel/allie
🤖 An automated machine learning framework for audio, text, image, video, or .CSV files (50+ featurizers and 15+ model trainers). Python 3.6 required.
autokeras automl autopytorch data-augmentation data-cleaning data-cleaning-pipeline data-transformation data-visualization datasets deep-learning ludwig machine-learning machine-learning-api machine-learning-library machine-learning-models model-compression model-deployment tpot voice-computing
Last synced: 19 Dec 2024
https://github.com/hi-primus/bumblebee
🚕 A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)
bumblebee cudf dask dask-cudf data-cleaning data-preparation data-profiling datasets gpu gui optimus prepare-data python
Last synced: 12 Nov 2024
https://github.com/charlesdedampierre/BunkaTopics
🗺️ Data Cleaning and Textual Data Visualization 🗺️
cartography data-cleaning explainability fine-tuning llms machine-learning natural-language-processing nlp summarization topic-modeling
Last synced: 03 Sep 2024
https://github.com/iam-mhaseeb/skytrax-data-warehouse
A full data warehouse infrastructure with ETL pipelines running inside docker on Apache Airflow for data orchestration, AWS Redshift for cloud data warehouse and Metabase to serve the needs of data visualizations such as analytical dashboards.
airflow data-analysis data-analytics data-cleaning data-engineering data-orchestration data-processing data-visualization data-warehouse data-warehousing database docker metabase python python3 redshift s3 s3-bucket sql
Last synced: 14 Dec 2024
https://github.com/chrismuir/refinr
Cluster and merge similar string values: an R implementation of Open Refine clustering algorithms
approximate-string-matching clustering cran data-cleaning data-clustering fuzzy-matching ngram openrefine r rstats
Last synced: 18 Dec 2024
https://github.com/ChrisMuir/refinr
Cluster and merge similar string values: an R implementation of Open Refine clustering algorithms
approximate-string-matching clustering cran data-cleaning data-clustering fuzzy-matching ngram openrefine r rstats
Last synced: 26 Oct 2024
https://github.com/aai-institute/pyDVL
pyDVL is a library of stable implementations of algorithms for data valuation and influence function computation
banzhaf-index data-centric-ai data-cleaning data-pruning data-quality data-valuation game-theory influence-functions least-core machine-learning robust-machine-learning shapley-value transferlab
Last synced: 17 Nov 2024
https://github.com/lolei/redditcleaner
Cleans Reddit Text Data :scroll: :broom:
data-cleaning hacktoberfest nlp praw psaw pushshift python reddit text-data
Last synced: 15 Dec 2024
https://github.com/sail-sg/sailcraft
Data Toolkit for Sailor Language Models
data-cleaning data-deduplication
Last synced: 07 Nov 2024
https://github.com/renumics/sliceguard
A library for detecting problematic data segments in structured and unstructured data with few lines of code.
data-analysis data-cleaning data-curation data-exploration data-science data-visualization deep-learning eda exploratory-data-analysis machine-learning python visualization
Last synced: 27 Oct 2024
https://github.com/Desbordante/desbordante-core
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.
anomaly-detection correlations data-analytics data-cleaning data-cleansing data-engineering data-exploration data-mining data-mining-algorithms data-preprocessing data-profiling data-science data-wrangling exploratory-data-analysis feature-engineering feature-extraction feature-selection knowledge-discovery spreadsheets tabular-data
Last synced: 04 Nov 2024
https://github.com/rvanasa/pandas-gpt
Power up your data science workflow with ChatGPT.
chatgpt claude-ai data-cleaning data-engineering data-science data-visualization gemini generative-ai gpt4 jupyter-notebook litellm low-code matplotlib numpy o1 openai pandas productivity scipy seaborn
Last synced: 19 Dec 2024
https://github.com/ropensci/taxa
taxonomic classes for R
data-cleaning r r-package rstats taxon taxonomy
Last synced: 04 Dec 2024
https://github.com/elysian01/data-purifier
A Python library for Automated Exploratory Data Analysis, Automated Data Cleaning, and Automated Data Preprocessing For Machine Learning and Natural Language Processing Applications in Python.
data-analysis data-cleaning data-cleaning-pipeline data-preprocessing data-science data-visualization datapurifier eda exploratory-data-analysis jupyter python-lib python-library python3
Last synced: 07 Nov 2024
https://github.com/mramshaw/data-cleaning
Data Cleaning with Python
data-cleaning data-munging data-wrangling numpy pandas python python3
Last synced: 19 Dec 2024
https://github.com/ammsa/dtcleaner
DTCleaner: data cleaning using multi-target decision trees.
data-cleaning data-mining data-preprocessing data-quality data-science data-wrangling
Last synced: 28 Oct 2024
https://github.com/theronione/cleaner.jl
A toolbox of simple solutions for common data cleaning problems.
Last synced: 12 Oct 2024
https://github.com/jmcastagnetto/covid-19-data-cleanup
Scripts to cleanup data from https://github.com/CSSEGISandData/COVID-19
covid-19 covid-19-data data-cleaning data-visualization datasets r
Last synced: 08 Nov 2024
https://github.com/irsol/udacity-bertelsmann-data-science-challenge-scholarship-2018
This is a repo for my Bertelsmann Data Science Scholarship Challenge: notes, exercises, quizzes.
aggregation bertelsmann challenge control-flow data-cleaning data-science data-visualization python scholarship sql statistics udacity udacity-course udacity-scholarship-course udacity2018 variability
Last synced: 28 Oct 2024
https://github.com/facultyai/boltzmannclean
Fill missing values in Pandas DataFrames using Restricted Boltzmann Machines
data-cleaning data-science dataframe pandas restricted-boltzmann-machine
Last synced: 08 Nov 2024
https://github.com/data-cleaning/errorlocate
Find and replace erroneous fields in data using validation rules
data-cleaning errors invalidation r
Last synced: 04 Dec 2024
https://github.com/the-Hull/datacleanr
Interactive and Reproducible Data Cleaning
annotation-tool data-cleaning outlier-detection outlier-removal reproducibility
Last synced: 04 Dec 2024
https://github.com/amine-smahi/r-learning-journey
Some of the projects i made when starting to learn R for Data Science at the university
afc cpa data-cleaning data-integration data-science datascience r r-language
Last synced: 27 Oct 2024
https://github.com/catalyst/moodle-local_datacleaner
Reduce, filter, and anonymize moodle data for non-prod environments
anonymize data-cleaning datacleaner moodle php plugin
Last synced: 11 Nov 2024
https://github.com/aifred-health/vulcanai
A high level deep learning framework for quickly prototyping networks with added tools in data visualisation, model interpretability and performance metrics
data-analysis data-cleaning data-science data-visualization deep-learning deep-neural-networks feature-engineering mental-health python3 pytorch scikit-learn
Last synced: 05 Dec 2024
https://github.com/santoshlite/quantclean
🧹 Quantclean is a program that reformats financial dataset to US Equity TradeBar (Quantconnect format)
algo-trading algorithmic-trading data-cleaning finance financial-data futures lean-engine ohlcv options quandl quant quantconnect quantitative-finance quantitative-trading stock-data stock-market stocks trading-algorithms trading-bot trading-strategies
Last synced: 13 Nov 2024
https://github.com/facultyai/ipydataclean
Interactive cleaning for Pandas DataFrames
data-cleaning data-science dataframe jupyter-notebook pandas
Last synced: 28 Oct 2024
https://github.com/data-cleaning/validatetools
data-cleaning r rules validation
Last synced: 04 Dec 2024
https://github.com/jkminder/data2neo
Data2Neo is a library that simplifies the conversion of data in relational format to a graph knowledge database.
data-cleaning data-conversion data-engineering data2neo database-migrations graphs neo4j relational-databases remodeling
Last synced: 14 Oct 2024
https://github.com/chinmayrane16/titanic-survival-in-depth-analysis
Used Pandas , Matplotlib , Seaborn libraries to Analyze , Visualize and Explore the data of people travelling on Titanic, and Used Scikit-learn Modelling Algorithms to predict their probability of Survival.
classification-model data-cleaning data-visualization feature-engineering matplotlib numpy pandas seaborn
Last synced: 27 Oct 2024
https://github.com/firaskahlaoui/heart-disease-prediction
The Heart Disease Prediction project aims to predict the likelihood of heart disease using machine learning techniques.
data-cleaning data-visualization flask jupyter-notebook kaggle-dataset model-building python3
Last synced: 15 Nov 2024
https://github.com/kemingy/plane
A text processing tool including tag(HTML, URL, Email) extraction and removing, punctuation normalization, simple segmentation, and so on.
chinese-nlp data-cleaning nlp preprocess regex tokenization tokenizer
Last synced: 27 Oct 2024
https://github.com/lukashedegaard/datasetops
Fluent dataset operations, compatible with your favorite libraries
data-cleaning data-munging data-processing data-science data-wrangling dataset dataset-combinations deep-learning multiple-datasets pytorch tensorflow
Last synced: 10 Nov 2024
https://github.com/LukasHedegaard/datasetops
Fluent dataset operations, compatible with your favorite libraries
data-cleaning data-munging data-processing data-science data-wrangling dataset dataset-combinations deep-learning multiple-datasets pytorch tensorflow
Last synced: 15 Nov 2024
https://github.com/jchehe/xcel
【项目已迁移到团队github】因此该 repository 只会同步最新的 README.md,若需要 watch、Star、Fork,则去团队的 github。谢谢。
Last synced: 11 Nov 2024
https://github.com/jay0lee/cmdc
Chrome Managed Data Cleanup - https://chrome.google.com/webstore/detail/chrome-managed-data-clean/anfhmiaflneaeffhlmbcedfjakdlpleg
cache cookies data-cleaning g-suite google-chrome google-chrome-extension javascript
Last synced: 24 Oct 2024
https://github.com/epiverse-trace/cleanepi
R package to clean and standardize epidemiological data
data-cleaning epidemiology epiverse r r-package
Last synced: 02 Dec 2024
https://github.com/benedekrozemberczki/av_ultimate_student_hunt
Solution for the Ultimate Student Hunt Challenge (1st place).
analytics-vidhya-competition competition data-cleaning data-engineering data-engineering-pipeline distributed-machine-learning driven-data extreme-gradient-boosting forecasting gradient-boosting kaggle machine-learning r student-hunt supervised-learning weather-forecast winning-entry xgboost
Last synced: 14 Nov 2024
https://github.com/sayakpaul/analytics-vidhya-game-of-deep-learning-hackathon
Contains my experiments for the Game of Deep Learning Hackathon conducted by Analytics Vidhya
active-learning analytics-vidhya computer-vision data-cleaning deep-learning fastai label-noise
Last synced: 23 Oct 2024
https://github.com/waynejz/comp9321-19t1
COMP9321 Data Services Engineering 2019T1
backend data-cleaning data-services data-visualization
Last synced: 18 Dec 2024
https://github.com/yaph/james-bond-actors
Script to grab Freebase data about James Bond actors and generate gexf data file.
data-cleaning data-processing data-retrieval freebase james-bond-actors network-graph
Last synced: 03 Dec 2024
https://github.com/brunocampos01/allstate-claims-severity
Udacity Machine Learning Engineer Nanodegree capstone proposal.
allstate capstone-proposal challenge data-analyst-nanodegree data-cleaning data-engineering data-science data-visualization dataset deep-learning kaggle machine-learning pca-analysis pt-br python udacity-machine-learning-nanodegree
Last synced: 16 Nov 2024
https://github.com/hypertextassassin0273/excel_data_organizer_and_cleaner-ds_project
Data Structures project in C++11 language, uses custom Vector & String structures with Move Semantics (Rule of Five)
cpp11 data-cleaning data-cleansing data-structure-projects data-structures data-structures-project data-wrangling ds-projects easy-project excel-operations move-semantics object-oriented-programming oop open-source open-source-code open-source-project rule-of-five string university-project vector
Last synced: 14 Nov 2024
https://github.com/marksweiss/sofine
Lightweight framework for creating data-collecting plugins and chaining calls to them from CLI, REST or Python to return unified data sets.
cross-language data-cleaning data-processing data-retrieval json python
Last synced: 04 Dec 2024
https://github.com/amey-thakur/kaggle
Kaggle Courses - All Exercises of the respective courses.
amey ameythakur courses data-cleaning data-manipulation data-science data-visualization deep-learning feature-engineering intro-to-ml kaggle machine-learning machine-learning-explainability python
Last synced: 09 Nov 2024
https://github.com/incubated-geek-cc/text-manipulation
A browser-based text-manipulation toolkit. No server required. Re-designed version of https://textmechanic.com/
css data-cleaning html javascript productivity text-editor
Last synced: 15 Nov 2024
https://github.com/vida-nyu/openclean-core
Data Cleaning and Data Profiling Library for Python
data-cleaning data-curation hacktoberfest
Last synced: 24 Nov 2024
https://github.com/siddeshsambasivam/ntuoss-datascraping-and-datacleaning-workshop
This repository contains the reference scripts and the content presented in the NTU OSS Data scraping and Data cleaning workshop.
data-cleaning data-crawling data-scraping
Last synced: 24 Oct 2024
https://github.com/hrolive/from-data-to-insights-with-google-cloud-platform
Four-course accelerated online specialization teaches course participants how to derive insights through data analysis and visualization using the Google Cloud Platform
data-analysis data-cleaning data-preparation data-visualization sql
Last synced: 09 Nov 2024
https://github.com/data-cleaning/dcmodifydb
Deterministic, documented correction rules on a database
correction data-cleaning database r
Last synced: 04 Dec 2024
https://github.com/mrankitgupta/sales-insights-data-analysis-using-tableau-and-sql
India based Hardware company Sales Insights - A Data Analysis Project performed on Tableau & SQL
66daysofdata analysis analytics ankitgupta data-analysis data-cleaning data-science data-visualization excel mrankitgupta mysql powerbi rdbms sql sql-server statistics tableau tableau-dashboards tableau-desktop tableau-public
Last synced: 17 Nov 2024
https://github.com/memgonzales/pisa-2018-analysis
Jupyter notebook presenting the process of data preparation, research question formulation, data analysis, and data modelling with the goal of extracting insights from the 2018 PISA Dataset
data-cleaning data-modeling data-science data-visualization exploratory-data-analysis jupyter-notebook matplotlib numpy oecd-data pandas pisa scipy statistical-inference
Last synced: 19 Nov 2024
https://github.com/data-cleaning/validatesuggest
Generate validation rules from data
Last synced: 04 Dec 2024
https://github.com/bharathgs/dframeutils
simple utility tools for dataframes in Python || WIP ||
csv data-cleaning data-preprocessing data-science dataframe pandas pandas-dataframe preprocessing python tidy-data tidytext utility utility-function utility-library
Last synced: 28 Oct 2024
https://github.com/cbozan/graduation-project
Graduation project categorizes popular search phrases using Python and Spark and presents them on a website to inspire creators.
crisp-dm data-cleaning data-science machine-learning nlp nlp-machine-learning spark spark-mllib
Last synced: 23 Nov 2024
https://github.com/depressioncenter/mden
Mobile technologies code from the University of Michigan's Mobile Data Experts Network (MDEN), featuring data cleaning automations, REDCap project templates, and links to useful external modules. [DOI: 10.6084/m9.figshare.25438714]
automation data-analysis data-cleaning fitness-tracker heart-rate-data mobile-data mobile-development mquery powerautomate powerbi powerquery python r sleep-data smartwatch-data tableau
Last synced: 25 Nov 2024
https://github.com/fbraza/python-tocase
A library to help recasing your strings
case-converter data-cleaning pandas python python3 strings-manipulation
Last synced: 07 Dec 2024
https://github.com/Nelson-Gon/mde
mde: Missing Data Explorer
data-analysis data-cleaning data-exploration data-science datacleaner datacleaning exploratory-data-analysis missing missing-data missing-value-treatment missing-values missingness omit r r-package r-stats recode replace rstats statistics
Last synced: 04 Dec 2024
https://github.com/spider-rs/llm-readability
The readability library for LLM's
clean-data data-cleaning llm-training readability safari-reader
Last synced: 05 Nov 2024
https://github.com/imsanjoykb/data-analytics-tool-development
Data Analytics Tool Development
data-analysis-toolbox data-cleaning data-science deployment machine-learning rest-api streamlit
Last synced: 17 Nov 2024
https://github.com/nelson-gon/mde
mde: Missing Data Explorer
data-analysis data-cleaning data-exploration data-science datacleaner datacleaning exploratory-data-analysis missing missing-data missing-value-treatment missing-values missingness omit r r-package r-stats recode replace rstats statistics
Last synced: 29 Oct 2024
https://github.com/srinivasrm/mutual-funds-analysis-and-prediction
In this project I have performed analysis and prediction on 1,3,and 5 year returns on 1064 mutual funds in India. I have scraped data from a website which is the most visited website for mutual fund investments.I have tested regression models linear model,SGD Regressor , Random Forest Regressor,Decision Tree Regressor,Ridge,MLP Regressor and linear model (Lasso).After which I have selected the best perorming model and performed Hyper parameter tuning and then deployed an interactive application which can generate the visualization and send an email with the visualization to the users email address.
beautifulsoup data-analysis data-base data-cleaning data-science deployment etl finanace frontend funds machine-learning mutual mutual-funds pgsql python scikit-learn sql streamlit web webapplication
Last synced: 11 Oct 2024
https://github.com/yaph/world-aid-transparency
World aid transparency data scripts for creating a visualization with D3
competition-project data-cleaning data-processing data-retrieval data-visualization worldbank
Last synced: 03 Dec 2024
https://github.com/yaph/gh-commit-locations
Scripts used for analyzing GitHub commit locations to create a map visualization
big-query data-challenge data-cleaning data-mining data-processing data-visualization github information-retrieval user-location world-map
Last synced: 03 Dec 2024
https://github.com/kwokhing/exploratory-data-analysis-on-smrt-tweets
Demo on performing exploratory data analysis (EDA) on train service disruptions based on scrapped (user generated contents) tweets from the train operator's (SMRT) twitter account
data-analysis data-cleaning data-collection data-preparation exploratory-data-analysis exploratory-data-visualizations folium geospatial-data leaflet-map python python3 regex scraping selenium selenium-python social-media text-processing user-generated-content web-scraping webscraping
Last synced: 02 Dec 2024
https://github.com/kwokhing/network-analysis-on-mrt-station
Demo on applying the concept of network analysis on a network of connected railway stations, attempting to identify the important stations (nodes) in this network. Web scraping techniques using rvest package is also briefly discussed upon.
betweenness-centrality closeness-centrality data-cleaning degree-centrality eigenvector-centrality gephi graph-analysis igraph r rvest social-network-analysis social-networks web-scraping xpath
Last synced: 02 Dec 2024
https://github.com/jananiarunachalam/Data-Science-Portfolio
Data Science Projects Repository
analytics api data-cleaning data-science data-visualization databases deep-learning excel machine-learning numpy pandas plotly predictive-modeling python3 r r-programming sql
Last synced: 27 Nov 2024
https://github.com/nragland37/event-optimization-tool
R-based Shiny application that maps availability and identifies optimal engagement times to enhance participation within an organization
data-analysis data-cleaning data-preparation heatmap r shiny shiny-app tidyverse
Last synced: 16 Nov 2024
https://github.com/baimamboukar/python_data_cleaning
Data cleaning automation for emails in csv and excel files
automation csv data-cleaning excel oop-principles python3
Last synced: 12 Nov 2024
https://github.com/sondosaabed/sp.top-data-science-and-analytics
This repository is created as part of the Data Science Coursework Birzeit university by Dr. Hussein Soboh
assignment birzeit-university course coursework data-analytics data-cleaning data-processing data-science data-visualization data-wrangling datasets eda exploratory-data-analysis linear-regression machine-learning numpy pandas python statistics storytelling
Last synced: 06 Nov 2024
https://github.com/sharyash81/sga
Soccer Game Analysis project
beautifulsoup data-cleaning data-summarization data-visualization django machine-learning mongodb pandas random-forest supervised-learning webscraping
Last synced: 10 Nov 2024
https://github.com/bgreenwell/bpa
Basic pattern analysis in R
basic-pattern-analysis data-cleaning r standardization
Last synced: 16 Oct 2024
https://github.com/nirmalnishant645/python-programming
Basic Python Programs
algorithms algorithms-and-data-structures algorithms-datastructures big-data data-analysis data-cleaning data-mining data-mining-algorithms data-science data-structure data-structures datastructures-algorithms geeksforgeeks geeksforgeeks-python geeksforgeeks-solutions hackerearth hackerearth-python hackerearth-solutions python python3
Last synced: 06 Dec 2024
https://github.com/easonlai/pii-data-scrubber
This is demo repo to demonstrate how to leverage Azure Text Analytics to perform Personally identifiable information (PII) data scrubbing by Python (Jupyter Notebook). This is important part of data wrangling/data cleaning.
azure azure-cognitive-services azure-text-analysis azure-text-analytics data-cleaning data-wrangling jupyter-notebook jypyter microsoft-azure microsoft-cognitive-services pandas pandas-dataframe pii-data pii-data-scrub pii-data-scrubber pii-data-scrubbing piidata python python3
Last synced: 10 Nov 2024
https://github.com/koolreport/cleandata
Make your data clean before making report
data-clean data-cleaning mysql-reporting-tools php-reporting-tools reporting-engine
Last synced: 14 Nov 2024
https://github.com/byteplant/jquery-address-validator-net
jQuery plugin for the address-validator.net API
address address-autocomplete address-validation data-cleaning data-quality data-validation form-validation form-validation-jquery javascript javascript-library jquery validation
Last synced: 16 Nov 2024
https://github.com/kaustubhgupta/google-fit-data-analysis
This is the notebook code and the dataset for the Google Fit Analysis I did for Analytics Vidhya Blog.
data-cleaning data-visualization demo google plotly voila
Last synced: 29 Nov 2024
https://github.com/cgnorthcutt/reliablity_framework_for_rag
Demo showing how the Trustworthy Language Model add reliability to LLM outputs and improves RAG, agents, and data enrichment worfklows. can be used to improve fine-tuning of LLMs, accuracy of LLM outputs, and smart routing for RAG and agents.
chatgpt data-cleaning data-curation data-observability data-quality llms observability rag
Last synced: 03 Dec 2024
https://github.com/akashjain04/apriorialgorithm
Usage of Apriori Algorithm to find frequent item sets.
apriori apriori-algorithm association-rule-mining association-rules data-cleaning data-mining frequent-itemsets frequent-pattern-mining python
Last synced: 24 Nov 2024