{"id":27934107,"url":"https://github.com/soham7998/data-analysis-projects","last_synced_at":"2026-05-04T07:34:55.361Z","repository":{"id":184876469,"uuid":"672613896","full_name":"soham7998/Data-Analysis-Projects","owner":"soham7998","description":"My Data Analysis Projects which are completed by me and gain a hands on Experience from each project. the project showcase different Concepts , Visualization and many things. ","archived":false,"fork":false,"pushed_at":"2024-05-03T18:47:40.000Z","size":8975,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-05-04T14:54:56.199Z","etag":null,"topics":["data","data-analysis","data-science","machine-learning","nlp","python","soham","visualization"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/soham7998.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-07-30T17:15:49.000Z","updated_at":"2024-05-03T18:47:44.000Z","dependencies_parsed_at":"2024-02-19T16:01:29.815Z","dependency_job_id":"12c403a0-b1ee-418a-97de-a3f9dc733e47","html_url":"https://github.com/soham7998/Data-Analysis-Projects","commit_stats":null,"previous_names":["soham7998/data-analysis-projects"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/soham7998%2FData-Analysis-Projects","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/soham7998%2FData-Analysis-Projects/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/soham7998%2FData-Analysis-Projects/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/soham7998%2FData-Analysis-Projects/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/soham7998","download_url":"https://codeload.github.com/soham7998/Data-Analysis-Projects/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252820462,"owners_count":21809227,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data","data-analysis","data-science","machine-learning","nlp","python","soham","visualization"],"created_at":"2025-05-07T05:27:03.412Z","updated_at":"2026-05-04T07:34:50.337Z","avatar_url":"https://github.com/soham7998.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Data-Analysis-Projects\n# 1) Netflix EDA \n\n![NFLX](https://github.com/soham7998/Data-Analysis-Projects/assets/112894790/3b070b75-8717-4133-bed6-fdf62fd6a790)\n\nThis dataset \u0026 Repository consists of all Netflix original films released as of June 1st, 2021. Additionally, it also includes all Netflix documentaries and specials. The data was webscraped off of this Wikipedia page, which was then integrated with a dataset consisting of all of their corresponding IMDB scores. IMDB scores are voted on by community members, and the majority of the films have 1,000+ reviews.\nDataset consist of:\nTitle\nGenre\nPremiere date,IMDB scores\nRuntime,Languages\n\n# 2) Football EDA\nThis repository will be looking at Football doing a range of different activities with football data this will include Exploratory Data Analysis, Data visualization,many other topics. This repository will consist of mainly Jupyter Notebooks and Python programming language.\n\n# 3) Twitter Senitment Analysis\nIt is a Natural Language Processing Problem where Sentiment Analysis is done by Classifying the Positive tweets from negative tweets by classification, text mining, text analysis, data analysis and data visualization\n\n![1693288099245](https://github.com/soham7998/Data-Analysis-Projects/assets/112894790/594eca24-9c82-4a95-8642-c283d2155d71)\n\n# 4) Power BI Dashboard \n**Power BI Sales Dashboard for Global Super Store**\n• The project involves creating an interactive Power BI Sales Dashboard using Global_super_store sales data.\n\n• The ETL process was performed to clean and transform the data using Power query.\n\n• DAX was used for creating calculated measures and calculated columns.\n\n• Visualizations and reports were created using cards, charts and slicers to provide insights and easy understanding for end users.\n\n• The tools used were Microsoft Power BI and MS Excel.\n\n![Super Sales Dashboard](https://github.com/soham7998/Data-Analysis-Projects/assets/112894790/ebb4f8e1-e1d2-4ee8-a70f-7c40c9aa1e49)\n\n# 5) Data Science EDA \n\n**Data Science Job Salaries Dataset contains 11 columns, each are:**\n\n• work_year: The year the salary was paid.\n\n• experience_level: The experience level in the job during the year\n\n• employment_type: The type of employment for the role\n\n• job_title: The role worked in during the year.\n\n• salary: The total gross salary amount paid.\n\n• employee_residence: Employee's primary country of residence in during the work year as an ISO 3166 country code.\n\n• remote_ratio: The overall amount of work done remotely\n\n• company_location: The country of the employer's main office or contracting branch\n\n• company_size: The median number of people that worked for the company during the year\n\n![ds](https://github.com/soham7998/Data-Analysis-Projects/assets/112894790/fd6cf30d-5895-441b-b555-a5ab5a4fab3f)\n\n\n# 6) IPL Data Analysis_Using Apache Spark\nHere are the things I have done.\n\n•Basics of Apache Spark (architecture, transformation, action, lazy evaluation)\n\n•Creating a Databricks account and the basics of it\n\n•Structured API and how to write transformation functions\n\n•Using SQL to analyze IPL Data\n\n•Building visualization to gain more insights \n\nThe goal of this project is to give you an overall understanding of Apache Spark and its different functions to write transformation blocks on top of that you will learn SQL to analyze data and build visualization.\n\n![Screenshot 2024-05-03 162548](https://github.com/soham7998/Data-Analysis-Projects/assets/112894790/74102fef-8da2-48f0-b962-c65a33b4a4af)\n\n# 7) Loan Eligibility Prediction \n**Data Loading and Exploration:**\nImported necessary libraries and loaded the dataset from a CSV file.\nExplored the dataset with head(), info(), shape, and describe() methods to understand its structure and summary statistics.\n\n**Identified missing values using isnull().sum().**\nFilled missing values in categorical columns (e.g., Gender, Married) with the mode, and in numerical columns (e.g., LoanAmount, Loan_Amount_Term) with mean or mode as appropriate.\nFeature Engineering:\n\n**Created new features such as TotalIncome by summing ApplicantIncome and CoapplicantIncome.**\nTransformed skewed data using logarithmic scaling (LoanAmount_log and TotalIncome_log).\n\n**Data Visualization:**\nUsed histograms and boxplots to visualize the distribution of ApplicantIncome, CoapplicantIncome, LoanAmount, and their logarithmic transformations.\nExamined the relationship between Credit_History and Loan_Status using cross-tabulation.\n\n**Data Preparation:**\nSelected relevant features for model training and separated the target variable (Loan_Status).\nSplit the data into training and testing sets using train_test_split.\nEncoded categorical variables into numerical values using LabelEncoder.\n\n**Model Training and Evaluation:**\nApplied the Naive Bayes Classifier to train the model on the training set.\nEvaluated the model's performance on the test set, likely calculating metrics such as accuracy, precision, recall, and F1-score (though the evaluation part isn't explicitly mentioned in the provided code).\n\n![image](https://github.com/soham7998/Data-Analysis-Projects/assets/112894790/caab9cf9-55ac-4fb1-9283-429d16a06000)\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsoham7998%2Fdata-analysis-projects","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsoham7998%2Fdata-analysis-projects","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsoham7998%2Fdata-analysis-projects/lists"}