https://github.com/usk2003/income-testing-hypothesis
This repository analyzes data scientists' income using t-tests, providing insights into salary distributions and company ratings. Key features include data cleaning, statistical analysis, visualizations, and company suggestions for freshers. Practical advice helps guide career decisions effectively!
https://github.com/usk2003/income-testing-hypothesis
hypothesis-testing matplotlib pandas python statistical-analysis t-test
Last synced: 3 months ago
JSON representation
This repository analyzes data scientists' income using t-tests, providing insights into salary distributions and company ratings. Key features include data cleaning, statistical analysis, visualizations, and company suggestions for freshers. Practical advice helps guide career decisions effectively!
- Host: GitHub
- URL: https://github.com/usk2003/income-testing-hypothesis
- Owner: usk2003
- Created: 2024-12-23T20:10:06.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2025-01-10T19:59:57.000Z (4 months ago)
- Last Synced: 2025-01-10T20:35:50.655Z (4 months ago)
- Topics: hypothesis-testing, matplotlib, pandas, python, statistical-analysis, t-test
- Language: Python
- Homepage:
- Size: 0 Bytes
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ๐ Data Scientist Income T-Test Hypothesis Analysis
This project involves analyzing and testing hypotheses regarding data scientists' income using statistical methods, specifically t-tests. The dataset includes salary information from various companies, and we aim to provide actionable insights for freshers looking for job opportunities. ๐ผ
## ๐ Project Overview
1. **Data Cleaning** ๐งน
- Cleaning salary columns (`Average`, `Lowest`, `Highest`).
- Filtering data based on frequency `/yr`.
- Removing outliers using the IQR method.2. **Statistical Analysis** ๐งฎ
- Calculation of population and sample statistics (mean, standard deviation).
- Hypothesis testing:
- Two-tailed t-test.
- One-tailed t-tests (greater/less).3. **Visualizations** ๐
- Normal distribution plots for population and sample salaries.
- Scatter plots: Rating vs. Average Salary for population and sample.4. **Company Suggestions** ๐ข
- Suggesting companies based on user-specified expected salary.5. **Conclusions and Practical Advice** ๐
- Insights derived from hypothesis testing.
- Tips for freshers choosing companies based on salary and ratings.## ๐ป Prerequisites
- Python 3.12
- Libraries: pandas, numpy, matplotlib, seaborn, scipy## ๐ Key Features
- Data cleaning and preprocessing.
- Statistical analysis using t-tests.
- Visualizations for clear data interpretation.
- Company recommendations based on salary expectations.## ๐ Visualizations
### 1. Normal Distribution Plot
A comparison of the population and sample average salary distributions.### 2. Scatter Plot: Rating vs. Average Salary
Insights into how company ratings relate to salaries.## ๐ Results and Conclusions
- Statistical tests reveal whether sample salaries significantly differ from the population mean.
- Practical advice provided for freshers based on salary expectations and company ratings.## ๐ก Suggested Companies for Freshers
Enter your expected salary during the script execution to get a list of recommended companies that meet your salary criteria. ๐ฐ
## ๐ฌ Contact
For any questions or issues, feel free to reach out via email: [[email protected]] โ๏ธ