{"id":21055358,"url":"https://github.com/tusharpandey003/data-science","last_synced_at":"2025-03-13T23:44:53.731Z","repository":{"id":237883207,"uuid":"795419622","full_name":"tusharpandey003/Data-Science","owner":"tusharpandey003","description":"Data science include Data Analysis, Machine learning  , EDA,PCA and Data Structure and Algorithms","archived":false,"fork":false,"pushed_at":"2024-05-22T10:54:20.000Z","size":19051,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-20T19:14:47.821Z","etag":null,"topics":["algorithms","algorithms-and-data-structures","data-analysis","data-analytics","data-cleaning","data-science","data-structures","data-visualization","dsa","kmeans-clustering","machine-learning","outlier-detection","pca","pca-analysis"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tusharpandey003.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-03T08:38:59.000Z","updated_at":"2024-05-22T10:54:23.000Z","dependencies_parsed_at":"2024-05-03T13:27:54.002Z","dependency_job_id":"f1f58f31-7187-4af2-999d-08af6b128014","html_url":"https://github.com/tusharpandey003/Data-Science","commit_stats":null,"previous_names":["tusharpandey003/data-science"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tusharpandey003%2FData-Science","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tusharpandey003%2FData-Science/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tusharpandey003%2FData-Science/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tusharpandey003%2FData-Science/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tusharpandey003","download_url":"https://codeload.github.com/tusharpandey003/Data-Science/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243500781,"owners_count":20300771,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["algorithms","algorithms-and-data-structures","data-analysis","data-analytics","data-cleaning","data-science","data-structures","data-visualization","dsa","kmeans-clustering","machine-learning","outlier-detection","pca","pca-analysis"],"created_at":"2024-11-19T16:44:07.956Z","updated_at":"2025-03-13T23:44:53.696Z","avatar_url":"https://github.com/tusharpandey003.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Basics of Data-Science\n\n\nDS-1 --- Data Analysis/visualisation with help of NumPy,Pandas,Matplotlib.\n\nThe repository you’re referring to is a comprehensive resource for data science, with a specific focus on data analysis using Python libraries such as Pandas and NumPy. \nThe repository includes a folder named “Data Analysis with Pandas and NumPy”, which contains a Jupyter notebook dedicated to basic data visualization.\n\nThis notebook demonstrates how to use NumPy and Pandas for data manipulation, and Matplotlib’s Pyplot for data visualization.\nNumPy is used for numerical computing and supports multi-dimensional arrays and matrices, along with a large collection of mathematical functions to operate on these arrays. \nPandas is used for data manipulation and analysis, offering data structures and operations for manipulating numerical tables and time series.\n\nThe notebook uses Matplotlib’s Pyplot, a collection of functions that provide a MATLAB-like interface for making plots and charts.\nIt demonstrates how to create various types of plots, including line plots, scatter plots, bar plots, and histograms. It also shows how to customize these plots by adding labels, titles, legends, and adjusting the color and size of the plot elements.\n\n\n\n\\\\\\\\\\\\\\\\\\\\\\\\\\\n\n\nDS-2--- Exploratory Data Analysis for Time Series Machine Learning Data\n\nThis notebook is dedicated to the meticulous process of preparing time series data for machine learning applications. Our journey begins with the loading of time series data and corresponding labels, followed by their concatenation into a unified database.\n\nThe data preparation phase involves several crucial steps:\n\nRemoving Redundant Columns: We streamline the dataset by eliminating unnecessary features that do not contribute to the model’s performance.\nDuplicate Removal: Ensuring the uniqueness of each data point, we remove any duplicate entries.\nHandling Missing Values: We employ strategies to address gaps in the dataset, either by imputation or exclusion, to maintain data integrity.\nData Type Conversion: Each feature is cast into its appropriate data type to facilitate subsequent analysis.\nWith the data primed, we delve into exploratory data analysis (EDA):\n\nAnomaly and Outlier Detection: Utilizing graphical tools, we identify and rectify anomalies and outliers that could skew our model’s learning.\nAutocorrelation and Stationarity Checks: Essential for time series forecasting, we assess the data’s autocorrelation and stationarity, applying transformations if necessary.\nFinally, we focus on preprocessing and feature dimensionality reduction:\n\nPreprocessing: Standardizing and normalizing the features to ensure uniformity across the dataset.\nDimensionality Reduction: Techniques like PCA are applied to distill the essence of the data, enhancing the model’s ability to learn from the most significant features.\nThis repository serves as a comprehensive guide for transforming raw time series data into a refined form, ready for the development of robust machine learning models.\n\n\n\\\\\\\\\\\\\\\\\\\\\\\\\\\n\n\nDS-3--- Wheat Variety Clustering and Dimensionality Reduction with KMeans clustering and Principal Component Analysis\n\nThis notebook showcases a comprehensive analysis of a dataset containing three distinct varieties of wheat: Kama, Rosa, and Canadian. Our objective is to classify these varieties and reduce the dimensionality of the dataset to enhance the performance of machine learning models.\n\nRepository Contents:\n\nGraphical Plots: Initial exploratory data analysis with visualizations that graphically represent the distribution and characteristics of the Kama, Rosa, and Canadian wheat varieties.\nKMeans Clustering: Implementation of the KMeans clustering algorithm to group the wheat data into clusters, aiming to identify inherent patterns and similarities among the varieties.\nPCA for Dimension Reduction: Application of Principal Component Analysis to reduce the number of variables in the dataset while preserving the essential information, thus simplifying the dataset’s complexity.\n\nProcess Overview:\n\nData Visualization: We begin by plotting the wheat varieties to understand their distribution and to identify any apparent groupings.\nClustering Analysis: Using KMeans, we segment the dataset into clusters, each representing a potential wheat variety.\nDimensionality Reduction: PCA is performed to transform the data into a lower-dimensional space, making it more manageable for analysis and visualization.\n\nOutcome: The end result is a structured approach to classifying wheat varieties and a reduced feature set that maintains the core characteristics necessary for accurate machine learning predictions.\n\n\\\\\\\\\\\\\\\\\\\\\\\\\\\n\nDS-4 \n\nBuisness Analytics is a jupyternotebook contains the buisness data from year 2009-2010. \nData info:\n\nColumn: \n\n\nInvoiceNo - Invoice number. Nominal. A 6-digit integral number uniquely assigned to each transaction. \n\nStockCode - Product (item) code. Nominal. A 5-digit integral number uniquely assigned to each distinct product. \n\nDescription - Product (item) name. Nominal. \n\nQuantity - The quantities of each product (item) per transaction. Numeric. \n\nInvoiceDate - Invoice date and time. Numeric. The day and time when a transaction was generated. \n\nUnitPrice - Unit price. Numeric. Product price per unit in sterling ( £). \n\nCustomerID- Customer number. Nominal. A 5-digit integral number uniquely assigned to each customer.\n\nCountry - Country name. Nominal. The name of the country where a customer resides.\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftusharpandey003%2Fdata-science","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftusharpandey003%2Fdata-science","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftusharpandey003%2Fdata-science/lists"}