https://github.com/nobleknightt/dss-tec
Files for Data Science and Statistics TEC
https://github.com/nobleknightt/dss-tec
data-science statistics
Last synced: 4 months ago
JSON representation
Files for Data Science and Statistics TEC
- Host: GitHub
- URL: https://github.com/nobleknightt/dss-tec
- Owner: nobleknightt
- Created: 2021-10-23T13:39:11.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2021-11-09T18:03:34.000Z (over 4 years ago)
- Last Synced: 2025-03-10T16:47:34.600Z (over 1 year ago)
- Topics: data-science, statistics
- Language: HTML
- Homepage:
- Size: 6.46 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Data Science and Statistics TEC
Files for Data Science and Statistics TEC
## Instructions
- [x] Import data into Jupyter Notebook
- [x] Apply `head()`, `tail()`, `shape`, `columns`, `dtypes`, `describe()` on data
- [x] Apply `describe()` on following columns
- `Population growth (annual %)`
- `GDP growth (annual %)`
- `Inflation, GDP deflator (annual %)`
- `Inflation, consumer prices (annual %)`
- [x] Plot Histogram, Boxplot and Density Curve of above 4 variables and Infer the plots
- [x] Test Null Hypothesis: Average of `Inflation, consumer prices (annual %)` and `GDP growth (annual %)` is equal, Find mean of each variable, Identify test, Conduct and Infer test
- [x] Test Null Hypothesis: Average `GDP growth (annual %)`, `Inflation, GDP deflator (annual %)`, `Inflation, consumer prices (annual %)` and `Population growth (annual %)` are equal, Find mean of each variable, Identify test, Conduct and Infer test
- [x] Plot Scatter Plot of `GDP growth (annual %)`, `Inflation, GDP deflator (annual %)` and Infer
- [x] Calculate Correlation & Covariance of above 4 variables and Infer
- [x] Consider variables `Exports of goods and services (% of GDP)` and `Imports of goods and services (% of GDP)`. Analyse the 2 variables over time and Perform Trend Analysis
- [x] Analyze `CO2 emissions (metric tons per capita)` and Perform Descriptive Statistics, Line Chart and Trend Analysis
- [x] Create a DataFrame of following variables
- `Population growth (annual %)`
- `CO2 emissions (metric tons per capita)`
- `GDP growth (annual %)`
- `Industry (including construction), value added (% of GDP)`
- `Exports of goods and services (% of GDP)`
- `Imports of goods and services (% of GDP)`
- `Gross capital formation (% of GDP)`
- `Inflation, consumer prices (annual %)`
- [x] Build following models with `GDP growth (annual %)` as dependent variable, Interpret R Square, Predict and Calculate residual and Compare models on RMSE
- [x] Multiple Linear Regression
- [x] Decision Tree Regression
- [x] Random Forest Regression
- [x] Identify all Healthcare Indicators, Create DataFrame and Analyze using
- [x] Descriptive Statistics
- [x] Data Visualization
- [x] Conduct both ttest and Anova, Choose variables accordingly and Interpret test outcome
- [x] Identify all Education Indicators, Create DataFrame and Analyze using
- [x] Descriptive Statistics
- [x] Data Visualization