{"id":18402877,"url":"https://github.com/jancervenka/turbofan_failure","last_synced_at":"2025-07-14T02:36:08.767Z","repository":{"id":30926389,"uuid":"126408308","full_name":"jancervenka/turbofan_failure","owner":"jancervenka","description":"Aircraft engine failure prediction model","archived":false,"fork":false,"pushed_at":"2023-03-25T01:07:59.000Z","size":23986,"stargazers_count":29,"open_issues_count":3,"forks_count":13,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-04-07T07:36:42.328Z","etag":null,"topics":["lstm","prediction-model","python","scikit-learn","svm","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jancervenka.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-03-22T23:52:02.000Z","updated_at":"2025-03-31T05:21:29.000Z","dependencies_parsed_at":"2025-04-07T07:42:39.904Z","dependency_job_id":null,"html_url":"https://github.com/jancervenka/turbofan_failure","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/jancervenka/turbofan_failure","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jancervenka%2Fturbofan_failure","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jancervenka%2Fturbofan_failure/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jancervenka%2Fturbofan_failure/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jancervenka%2Fturbofan_failure/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jancervenka","download_url":"https://codeload.github.com/jancervenka/turbofan_failure/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jancervenka%2Fturbofan_failure/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265233753,"owners_count":23731825,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["lstm","prediction-model","python","scikit-learn","svm","tensorflow"],"created_at":"2024-11-06T02:43:49.127Z","updated_at":"2025-07-14T02:36:08.746Z","avatar_url":"https://github.com/jancervenka.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Aircraft Engine Failure prediction Model\r\n\r\n[0]: https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/#turbofan\r\n[1]: https://ieeexplore.ieee.org/document/4711423/\r\n[2]: https://ieeexplore.ieee.org/document/6678166/\r\n[3]: https://ieeexplore.ieee.org/document/4711414/\r\n\r\nThis problem requires to accurectly predict remaining useful life (RUL) of aircraft\r\nturbofan engines based on various sensor measurements (multivariate time series).\r\nThe RUL is defined as the number of engine cycles before failure.\r\n\r\nI tried to predict the RUL values for engine units in the FD004 dataset from\r\n[Turbofan Engine Degradation Simulation Data Set][0] using two different models\r\n(LSTM network and support vector machine). My code can be found in the\r\n`turbofan.ipynb` file.\r\n\r\n## Data Preparation and Feature Engineering\r\n\r\nFirst, I computed the RUL value for each row in the dataset to get a dataframe in the\r\nfollowing form:\r\n\r\n| unit | cycle | sensor_1 | sensor_2 | sensor_n | RUL |\r\n|------|-------|----------|----------|----------|-----|\r\n| 1    | 1     | 0.2      | 30       | 0.9      | 192 |\r\n| 1    | 2     | 0.3      | 29       | 0.2      | 191 |\r\n\r\nEach row can be used as a model training sample where the `sensor_k` columns are\r\nthe features and the `RUL` is the model target. The rows are treated as independend\r\nobservations and the measurement trends from the previous cycles are ignored.\r\n\r\nAs recommended in [(1)][1], the features are normalized to `μ = 0, σ = 1`\r\nand PCA is applied.\r\n\r\nThis simplified approach is used to train the support vector machine model.\r\n\r\n### Samples as Time Series\r\n\r\nFor the LSTM model, I opted for more advanced feature engineering and chose to\r\nincorporate the trends from the previous cycles. In this case, each training sample\r\nconsists of masurements at cycle `i` as well as `i-5, i-10, i-20, i-30, i-40`.\r\n\r\nThe model input is a 3D tensor with shape `(n, 6, 24)` where `n` is the number of\r\ntraining samples, `6` is the number of cycles (timesteps), and `24` is the number\r\nof principal components (features).  \r\n\r\n## LSTM Regressor\r\n\r\nAfter running random search to optimize the hyperparameters and some experimentation,\r\nI settled on the following architecture:\r\n\r\n```\r\nModel: \"rlu_estimator\"\r\n_________________________________________________________________\r\nLayer (type)                 Output Shape              Param #   \r\n=================================================================\r\ncomponents (InputLayer)      [(None, 6, 24)]           0         \r\n_________________________________________________________________\r\nlstm (LSTM)                  (None, 64)                22784     \r\n_________________________________________________________________\r\ndropout_lstm (Dropout)       (None, 64)                0         \r\n_________________________________________________________________\r\nhidden_0 (Dense)             (None, 64)                4160      \r\n_________________________________________________________________\r\ndropout_0 (Dropout)          (None, 64)                0         \r\n_________________________________________________________________\r\nhidden_1 (Dense)             (None, 64)                4160      \r\n_________________________________________________________________\r\ndropout_1 (Dropout)          (None, 64)                0         \r\n_________________________________________________________________\r\nhidden_2 (Dense)             (None, 64)                4160      \r\n_________________________________________________________________\r\ndropout_2 (Dropout)          (None, 64)                0         \r\n_________________________________________________________________\r\nrul_prediction (Dense)       (None, 1)                 65        \r\n=================================================================\r\nTotal params: 35,329\r\nTrainable params: 35,329\r\nNon-trainable params: 0\r\n```\r\n\r\nThe model is using L1L2 regularization and dropout layers to mitigate overfitting.\r\n\r\nI trained the model 25 epochs and used annealing scheduler to decrease the\r\nlearning rate over time.\r\n\r\n![LSTM History](img/lstm_history.svg \"LSTM Training\")\r\n\r\n## Support Vector Machine\r\n\r\nUse of a Support vector machine (SVM) model is suggested in [(2)][2]. The authors\r\nrecommend to use non-linear radial basis (RBF) function.\r\n\r\n## Results\r\n\r\nI chose three different metrics to assess the performance of the models. Mean square\r\nerror (MSE), median absolute error (MAE) and the Score as defined in [(3)][3].\r\nI modified the Score formula (11) in [(3)][3] by dividing the overall value by the\r\nnumber of testing samples (the definition in the paper contains an error, `a_1`\r\nand `a_2` should be switched in order to penalize RUL overshooting more heavily).\r\n\r\n| metric | LSTM    | SVM     |\r\n|--------|---------|---------|\r\n| MSE    | 4627    | 5894    |\r\n| MAE    | 35      | 45      |\r\n| Score  | 8.58e10 | 1.65e14 |\r\n\r\n![Comparison](img/comparison.svg \"Model Comparison\")\r\n\r\n## Conclusions\r\n\r\nIt is clear that the advanced features together with the LSTM outperform the SVM model. \r\nThe high Score values are caused by few outliers with significant errors and the exponential\r\nnature of the formula [(3)][3].\r\n\r\n## References\r\n* [(1) Data driven prognostics using a Kalman filter ensemble of neural network models][1]\r\n* [(2) PHM-Oriented Integrated Fusion Prognostics for Aircraft Engines Based on Sensor Data][2]\r\n* [(3) Damage Propagation Modeling for Aircraft Engine Run-to-Failure Simulation][3]\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjancervenka%2Fturbofan_failure","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjancervenka%2Fturbofan_failure","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjancervenka%2Fturbofan_failure/lists"}