{"id":43318291,"url":"https://github.com/j143/heart-attack-analysis","last_synced_at":"2026-02-01T22:10:20.397Z","repository":{"id":293832534,"uuid":"976386715","full_name":"j143/heart-attack-analysis","owner":"j143","description":"heart attack analysis with Apache SystemDS","archived":false,"fork":false,"pushed_at":"2025-07-19T18:57:08.000Z","size":1401,"stargazers_count":0,"open_issues_count":5,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-19T21:48:48.064Z","etag":null,"topics":["apache-systemds","systemds"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/j143.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-02T02:48:24.000Z","updated_at":"2025-05-17T11:44:00.000Z","dependencies_parsed_at":"2025-05-17T12:43:15.638Z","dependency_job_id":null,"html_url":"https://github.com/j143/heart-attack-analysis","commit_stats":null,"previous_names":["j143/heart-attack-analysis"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/j143/heart-attack-analysis","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/j143%2Fheart-attack-analysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/j143%2Fheart-attack-analysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/j143%2Fheart-attack-analysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/j143%2Fheart-attack-analysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/j143","download_url":"https://codeload.github.com/j143/heart-attack-analysis/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/j143%2Fheart-attack-analysis/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28992737,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-01T22:01:47.507Z","status":"ssl_error","status_checked_at":"2026-02-01T21:58:37.335Z","response_time":56,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-systemds","systemds"],"created_at":"2026-02-01T22:10:20.342Z","updated_at":"2026-02-01T22:10:20.392Z","avatar_url":"https://github.com/j143.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Heart Attack Analysis and Prediction\n\n![Heart Attack Analysis](correlation_heatmap.png)\n\n## Heart Attack Analysis\n1.\tIntroduction\n A heart attack occurs when an artery supplying your heart with blood and oxygen becomes blocked. A blood clot can form and block your arteries, causing a heart attack. This Heart Attack Analysis helps to understand the chance of attack occurrence in persons based on varied health conditions.\n2.\tDataset\nThe dataset is Heart_Attack_Analysis_Data.csv. It has been added to this. \nThis dataset contains data about some hundreds of patients mentioning Age, Sex, Exercise Include Angia(1=YES, 0=NO), Chest Pain Type(Value 1: typical angina, Value2: atypical angina, Value 3: non-anginal pain, Value 4: asymptomatic), ECG Results, Blood Pressure, Cholesterol, Blood Sugar, Family History (Number of persons affected in the family), Maximum Heart Rate, Target -0=LESS CHANCE , 1= MORE CHANCE\n\n## Aim of the assignment is to \n\n•\tBuilding a Predictive Model    (Which features decide heart attack?)\n•\tEvaluate the model.\n•\tRefine the model, as appropriate\n\n## What I need to do?\n\na)\tSelect a method for performing the analytic task\nb)\tPreprocess the data to enhance quality\nc)\tCarry out descriptive summarization of data and make observations\nd)\tIdentify relevant, irrelevant attributes for building model. \ne)\tPerform appropriate data transformations with justifications\nf)\tGenerate new features if needed\ng)\tCarry out the chosen analytic task. Show results including intermediate results, as needed\nh)\tEvaluate the solutions\ni)\tLook for refinement opportunities\n\n## Setup and Running Instructions\n\n### Prerequisites\n- Python 3.8+ \n- Required packages listed in `requirements.txt`\n\n### Installation\n1. Clone the repository:\n   ```\n   git clone https://github.com/j143/heart-attack-analysis\n   cd heart-attack-analysis\n   ```\n\n2. Install dependencies:\n   ```\n   pip install -r requirements.txt\n   ```\n\n### Running the Analysis\n\n#### Complete Analysis Workflow\nTo run the entire analysis pipeline including model training, refinement, and evaluation:\n\n```\npython summary.py\n```\n\nThis will:\n- Check for required dependencies\n- Run model refinement if needed\n- Compare original models with refined models\n- Display a complete project summary\n\n#### Step-by-Step Analysis\n\n1. Run the original analysis with SystemDS:\n   ```\n   python heart_attack_systemds.py\n   ```\n   This performs the initial data analysis, trains logistic regression and L2SVM models, and saves the model weights.\n\n2. Verify saved models:\n   ```\n   python verify_models.py\n   ```\n   This script verifies that the saved models can be loaded and used for predictions.\n\n3. Run model refinement:\n   ```\n   python model_refinement.py\n   ```\n   This script performs hyperparameter tuning with cross-validation and creates an ensemble model.\n\n4. Compare model performance:\n   ```\n   python model_comparison.py\n   ```\n   This script compares the performance of the original models with the refined models.\n\n### Key Results\n\n- The original L2SVM achieved ~82% accuracy\n- After refinement, the Random Forest model achieved ~95% accuracy\n- Key features for predicting heart attacks based on different models:\n  - SystemDS Logistic Regression: Age, ECG Results, Sex, MaxHeartRate\n  - Refined Random Forest: CP_Type, MaxHeartRate, Age, Cholestrol, BloodPressure\n\nFor detailed information about the analysis process and results, please refer to the `solution.md` file.\n\n### Project Structure\n\n- `heart_attack_systemds.py`: Main analysis script using SystemDS\n- `verify_models.py`: Script to verify saved models\n- `model_refinement.py`: Implements hyperparameter tuning, cross-validation, and ensemble methods\n- `model_comparison.py`: Compares original and refined models\n- `summary.py`: Complete workflow script with project\n- `solution.md`: Detailed documentation of the approach and results\n- `Heart_Attack_Analysis_Data.csv`: Dataset\n- `requirements.txt`: List of required Python packages\n\n### Visualizations\n\nThe analysis generates several visualizations:\n\n- Correlation heatmap\n- Feature distributions\n- Feature importance plots\n- Model performance comparisons\n- ROC curves\n\n### Saved Models\n\nThe following models are saved during the analysis:\n\n- `logistic_regression_weights.pkl`: Original logistic regression model\n- `l2svm_weights.pkl`: Original L2SVM model\n- `scaler.pkl`: Data standardization parameters\n- `refined_random_forest_model.pkl`: Tuned Random Forest model\n- `refined_ensemble_model.pkl`: Ensemble of tuned models\n\n## Conclusion\n\nI have applied ML technique in heart attack analysis. I have utilized systemds and scikit-learn\n\nKey findings:\n1. Different models identified different important predictors:\n   - Initial models (Logistic Regression): Age, ECG Results, Sex, Maximum Heart Rate\n   - Refined models (Random Forest): CP_Type, Maximum Heart Rate, Age, Cholestrol, BloodPressure\n2. Hyperparameter tuning and cross-validation improved performance\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fj143%2Fheart-attack-analysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fj143%2Fheart-attack-analysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fj143%2Fheart-attack-analysis/lists"}