Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/zafir100100/cancer-stage-prediction
This code predicts cancer data using various regression models, calculates their average R-squared scores, and prints the best model.
https://github.com/zafir100100/cancer-stage-prediction
cross-validation data-analysis data-preprocessing decision-trees gradient-boosting linear-regression machine-learning-algorithms numpy pandas random-forest regression scikit-learn
Last synced: 7 days ago
JSON representation
This code predicts cancer data using various regression models, calculates their average R-squared scores, and prints the best model.
- Host: GitHub
- URL: https://github.com/zafir100100/cancer-stage-prediction
- Owner: zafir100100
- Created: 2019-09-27T15:23:25.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2023-04-09T07:02:18.000Z (almost 2 years ago)
- Last Synced: 2024-11-11T02:22:02.602Z (2 months ago)
- Topics: cross-validation, data-analysis, data-preprocessing, decision-trees, gradient-boosting, linear-regression, machine-learning-algorithms, numpy, pandas, random-forest, regression, scikit-learn
- Language: Python
- Homepage:
- Size: 1.15 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Cancer-Stage-Prediction
The provided code performs a regression analysis on cancer data using different machine learning models.
Firstly, the necessary libraries are imported, and the data is read from a CSV file. Then, the features and the target variable are separated from the data, and converted to numpy arrays for further analysis.
Next, five regression models are initialized: ExtraTreesRegressor, RandomForestRegressor, DecisionTreeRegressor, LinearRegression, and GradientBoostingRegressor.
The KFold function is used to split the data into five folds for cross-validation. Then, for each fold, the data is split into training and testing sets, and the chosen regression model is trained on the training set. The trained model is then used to predict the target variable for the test set, and the accuracy is calculated using the r2_score function from the sklearn.metrics library. The r2_score is a statistical measure of how close the data are to the fitted regression line.
The average r2_score for each model is calculated over all folds, and printed to the console. The ExtraTreesRegressor, RandomForestRegressor, DecisionTreeRegressor, LinearRegression, and GradientBoostingRegressor models are trained and evaluated on the same dataset using the same evaluation metric.
Finally, the results are displayed to the user.