{"id":29193354,"url":"https://github.com/jktujq/lumenn","last_synced_at":"2025-07-02T02:09:10.227Z","repository":{"id":281067610,"uuid":"938346651","full_name":"JktuJQ/LumeNN","owner":"JktuJQ","description":"LumeNN is an application that solves problem of binary and multiclass classification of stars with variable luminosity with the usage of different machine learning models.","archived":false,"fork":false,"pushed_at":"2025-06-24T22:48:59.000Z","size":18564,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-24T23:33:09.921Z","etag":null,"topics":["astronomy","classifiers","machine-learning","maths","neural-networks","python","scikit-learn","torch"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JktuJQ.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-02-24T20:09:11.000Z","updated_at":"2025-06-24T22:49:03.000Z","dependencies_parsed_at":null,"dependency_job_id":"13d995b6-cc11-4807-b844-a4fb5dee71f7","html_url":"https://github.com/JktuJQ/LumeNN","commit_stats":null,"previous_names":["jktujq/lumenn"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/JktuJQ/LumeNN","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JktuJQ%2FLumeNN","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JktuJQ%2FLumeNN/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JktuJQ%2FLumeNN/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JktuJQ%2FLumeNN/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JktuJQ","download_url":"https://codeload.github.com/JktuJQ/LumeNN/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JktuJQ%2FLumeNN/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263061402,"owners_count":23407606,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["astronomy","classifiers","machine-learning","maths","neural-networks","python","scikit-learn","torch"],"created_at":"2025-07-02T02:09:09.515Z","updated_at":"2025-07-02T02:09:10.209Z","avatar_url":"https://github.com/JktuJQ.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🌟**LumeNN**🌟\n\n**LumeNN** is an application that addresses the problem of **binary** and **multiclass** classification\nof variable stars using various machine learning models.\n\nDue to development of the app, README documentation might be slightly outdated -\nit might not show all the classifiers available or\ntheir metrics could not be the best out of all possible configurations possible for them.\nNevertheless, the research is valid under any circumstances,\nand thus is a valuable part of the whole repository.\n\n## Relevance of the Study\n\nGiven the significant accumulation of astronomical observation data,\nwhich holds great value for astronomy and astrophysics,\nthere is a need to develop a method for efficiently and accurately identifying variable stars\namong these celestial objects.\nThis tool can help scientists avoid manually verifying all observations and\ninstead focus only on the selected candidates.\n\n## Dataset\n\nThe dataset was obtained by merging catalogs from [APASS](https://www.aavso.org/apass)\nand [GALEX](https://galex.stsci.edu/GR6/)\nusing [X-Match](http://cdsxmatch.u-strasbg.fr/)\nand filtering the results with [VSX](https://www.aavso.org/vsx/).\n\n## Binary Classification\n\n### Data Processing\n\nWhen processing the data, the following features of celestial objects must be considered:\n\n- `RAJ2000` — Right Ascension in the J2000 epoch (in degrees)\n- `DEJ2000` — Declination in the J2000 epoch (in degrees)\n- `nobs` — Number of observations\n- `Vmag` — Apparent magnitude in the V-band (optical range)\n- `e_Vmag` — Measurement error of `Vmag`\n- `Bmag` — Apparent magnitude in the B-band (blue range)\n- `e_Bmag` — Measurement error of `Bmag`\n- `gpmag` — Apparent magnitude in the Gaia G-band (data from the Gaia space telescope)\n- `e_gpmag` — Measurement error of `gpmag`\n- `rpmag` — Apparent magnitude in the Gaia RP-band (red range, Gaia data)\n- `e_rpmag` — Measurement error of `rpmag`\n- `ipmag` — Apparent magnitude in the I-band (near-infrared range)\n- `e_ipmag` — Measurement error of `ipmag`\n- `fuv_mag` — Apparent magnitude in the far-ultraviolet range\n- `nuv_mag` — Apparent magnitude in the near-ultraviolet range\n- `err` — Generalized measurement error\n- `present` — Variability flag (0 = non-variable, 1 = variable)\n- `type` — Type of variability\n- `min_mag` — Minimum apparent magnitude during the variability cycle\n- `max_mag` — Maximum apparent magnitude during the variability cycle\n\nThe columns `present` and `type` are used for binary and multiclass classification, respectively.\nAt this stage, the `type` column is deleted.\n\n### Correlation Analysis\n\nThe correlation between features and stellar variability is as follows:\n\n|           | variable  |  \n|:----------|:---------:|  \n| `RAJ2000` | -0.007308 |  \n| `DEJ2000` | -0.012249 |  \n| `nobs`    | -0.002505 |  \n| `Vmag`    | 0.029900  |  \n| `e_Vmag`  | 0.133351  |  \n| `Bmag`    | 0.028588  |  \n| `e_Bmag`  | 0.111372  |  \n| `gpmag`   | 0.029341  |  \n| `e_gpmag` | 0.077565  |  \n| `rpmag`   | 0.029131  |  \n| `e_rpmag` | 0.102294  |  \n| `ipmag`   | 0.026696  |  \n| `e_ipmag` | 0.031454  |  \n| `fuv_mag` | -0.069927 |  \n| `nuv_mag` | 0.041075  |  \n| `min_mag` | 0.012656  |  \n| `max_mag` | -0.014640 |  \n| `err`     | 0.091020  |  \n\n![Binary classification correlation matrix](binary_classification/docs/images/correlation_matrix.png)\n\nThe table shows that error-related columns have the strongest influence.\nOur interpretation is that variability is directly linked to changes in apparent magnitude,\nwhich are recorded as measurement errors.\nHere, the error represents the standard deviation from the expected value,\nand larger errors likely indicate stronger variability (with less influence from measurement inaccuracies).\n\nThe influence of `RAJ2000` and `DEJ2000` (stellar coordinates) is debatable —\nwhile they theoretically should not affect variability,\nthe correlation suggests otherwise (our research indicates models trained on full data perform slightly better).\n\n### Class Imbalance\n\nThe dataset suffers from class imbalance:\n\n![Class distribution in the dataset](binary_classification/docs/images/variable_ratio.png)\n\nThis issue was addressed during the study using **class weighting**,\n**oversampling** with SMOTE and **undersampling**.\n\n### Research\n\nThe binary classification problem was tackled using both built-in [`scikit-learn`](https://scikit-learn.org/stable/)\nmodels\nand neural networks based on [`keras`](https://keras.io/) and [`tensorflow`](https://www.tensorflow.org/).\n\nThe evaluation metrics included **accuracy**, **precision**, **recall**, and **F1-score**.\nThe goal was to identify variable stars, which are rare in the dataset.\nWhile maximizing **recall** (avoiding missed detections) was important,\nmaintaining a reasonable **F1-score** (balancing precision and recall) was also prioritized.\n\n#### Built-in `scikit-learn` Models\n\n##### Logistic Regression (Class Weighting)\n\n![Confusion matrix for logistic regression](binary_classification/docs/images/cm_logistic_regression.png)\n\n| Accuracy | Precision | Recall | F1-score |  \n|:--------:|:---------:|:------:|:--------:|  \n|  0.607   |   0.136   | 0.547  |  0.218   |\n\n##### SVC (Oversampling)\n\n![Confusion matrix for SVC](binary_classification/docs/images/cm_svc.png)\n\n| Accuracy | Precision | Recall | F1-score |  \n|:--------:|:---------:|:------:|:--------:|  \n|  0.828   |   0.364   | 0.920  |  0.522   |\n\n##### K-Nearest Neighbors (No balancing)\n\n![Confusion matrix for KNN](binary_classification/docs/images/cm_knn_nobalancing.png)\n\n| Accuracy | Precision | Recall | F1-score |  \n|:--------:|:---------:|:------:|:--------:|  \n|  0.930   |   0.722   | 0.483  |  0.581   |\n\n##### K-Nearest Neighbors (Oversampling)\n\n![Confusion matrix for KNN](binary_classification/docs/images/cm_knn.png)\n\n| Accuracy | Precision | Recall | F1-score |  \n|:--------:|:---------:|:------:|:--------:|  \n|  0.830   |   0.342   | 0.830  |  0.484   |\n\n##### Random Forest (`max_depth=11`; Class Weighting)\n\n![Confusion matrix for random forest](binary_classification/docs/images/cm_random_forest.png)\n\n| Accuracy | Precision | Recall | F1-score |  \n|:--------:|:---------:|:------:|:--------:|  \n|  0.886   |   0.465   | 0.868  |  0.606   |  \n\n##### SGD (`modified_huber` Loss; Class Weighting)\n\n![Confusion matrix for sgd](binary_classification/docs/images/cm_sgd.png)\n\n| Accuracy | Precision | Recall | F1-score |  \n|:--------:|:---------:|:------:|:--------:|  \n|  0.887   |   0.241   | 0.048  |  0.081   |\n\n##### Gradient Boosting (`max_depth=13`; Undersampling)\n\n![Confusion matrix for gradient boosting](binary_classification/docs/images/cm_gradient_boosting_undersampling.png)\n\n| Accuracy | Precision | Recall | F1-score |  \n|:--------:|:---------:|:------:|:--------:|  \n|  0.879   |   0.451   | 0.991  |  0.620   |  \n\nGradient Boosting shows the best **recall** with undersampling.\n\n##### Gradient Boosting (`max_depth=13`; Oversampling)\n\n![Confusion matrix for gradient boosting](binary_classification/docs/images/cm_gradient_boosting_oversampling.png)\n\n| Accuracy | Precision | Recall | F1-score |  \n|:--------:|:---------:|:------:|:--------:|  \n|  0.943   |   0.710   | 0.725  |  0.717   |\n\nIf SMOTE oversampling is applied, Gradient Boosting shows great **precision** and **F1-score**\n\n##### Stacking (`Gradient Boosting` + `Random Forest` + `Logistic Regression`, Class Weighting)\n\n![Confusion matrix for stacking](binary_classification/docs/images/cm_stacking.png)\n\n| Accuracy | Precision | Recall | F1-score |  \n|:--------:|:---------:|:------:|:--------:|  \n|  0.949   |   0.831   | 0.620  |  0.710   |  \n\nIt works well but fitting takes extremely long time.\n\n##### Multi-layer Perceptron (`hidden_layer_sizes=(100, 50, 20, 10), activation=\"tanh\"`; No balancing)\n\n![Confusion matrix for MLP](binary_classification/docs/images/cm_mlp.png)\n\n| Accuracy | Precision | Recall | F1-score |  \n|:--------:|:---------:|:------:|:--------:|  \n|  0.955   |   0.819   | 0.737  |  0.776   |\n\n#### Neural Networks\n\n##### Neural Network Emulating Logistic Regression (Class Weighting)\n\n**Architecture:**\n\n|  Layer 1  |  \n|:---------:|  \n| 1 neuron  |  \n| `sigmoid` |  \n\n**Hyperparameters:**\n\n| Epochs | Optimizer |  Learning Rate Schedule (`ExponentialDecay`)   | Loss (`BinaryFocalCrossentropy`) |  \n|:------:|:---------:|:----------------------------------------------:|:--------------------------------:|  \n|   50   |   Adam    | `1e-2`, `decay_steps=15000`, `decay_rate=0.01` |     `alpha=0.9`, `gamma=1.0`     |  \n\n![Confusion matrix for neural network that emulates logistic regression](binary_classification/docs/images/cm_nn_emulating_logistic_regression.png)\n\n| Accuracy | Precision | Recall | F1-score |  \n|:--------:|:---------:|:------:|:--------:|  \n|  0.661   |   0.149   | 0.493  |  0.229   |\n\n##### Best Neural Network Classifier (Class Weighting)\n\n**Architecture:**\n\n|   Layer 1    |    Layer 2    |  Layer 3  |  \n|:------------:|:-------------:|:---------:|  \n| 1024 neurons |  128 neurons  | 1 neuron  |  \n|    `mish`    | `hard_shrink` | `sigmoid` |  \n\n**Hyperparameters:**\n\n| Epochs | Optimizer |  Learning Rate Schedule (`ExponentialDecay`)   | Loss (`BinaryFocalCrossentropy`) |  \n|:------:|:---------:|:----------------------------------------------:|:--------------------------------:|  \n|   24   |   Adam    | `1e-2`, `decay_steps=15000`, `decay_rate=0.01` |     `alpha=0.9`, `gamma=1.0`     |  \n\n![Confusion matrix for best neural network classifier](binary_classification/docs/images/cm_nnclassifier.png)\n\n| Accuracy | Precision | Recall | F1-score |  \n|:--------:|:---------:|:------:|:--------:|  \n|  0.895   |   0.496   | 0.916  |  0.643   |  \n\n###### Note\n\nDuring training, we observed convergence issues due to initial weight sensitivity.\nThe model’s performance varied significantly based on initialization,\nranging from trivial predictions (all 0 or all 1) to the best performance.\nThe weights from a successful training run (first 8 epochs) were [saved](datasets/best_weights.keras)\nand are loaded when running the application.\n\n### Results\n\nThe full comparison of models is summarized below:\n\n| Model                                             | Accuracy | Precision |  Recall   | F1-score  |  \n|:--------------------------------------------------|:--------:|:---------:|:---------:|:---------:|  \n| Logistic Regression                               |  0.607   |   0.136   |   0.547   |   0.218   |\n| SVC                                               |  0.828   |   0.364   |   0.920   |   0.522   |\n| KNN (No balancing)                                |  0.930   |   0.722   |   0.483   |   0.581   |\n| KNN (Oversampling)                                |  0.830   |   0.342   |   0.830   |   0.484   |\n| Random Forest (`max_depth=11`)                    |  0.886   |   0.465   |   0.868   |   0.606   |  \n| SGD (`modified_huber`)                            |  0.887   |   0.241   |   0.048   |   0.081   |  \n| Gradient Boosting (`max_depth=13`, Undersampling) |  0.879   |   0.451   | **0.991** |   0.620   |\n| Gradient Boosting (`max_depth=13`, Oversampling)  |  0.943   |   0.710   |   0.725   |   0.717   |\n| Stacking                                          |  0.949   | **0.831** |   0.620   |   0.710   |  \n| MLP                                               |  0.955   |   0.819   |   0.737   | **0.776** |\n|                                                   |          |           |           |           |\n| Neural Network (Logistic Regression)              |  0.661   |   0.149   |   0.493   |   0.229   |  \n| Neural Network                                    |  0.895   |   0.496   |   0.916   |   0.643   |  \n\nAmong the `scikit-learn` models, the **Stacking classifier**, **Gradient Boosting classifier** and **MLP classifier**\nwere the best in **precision**, **recall** and **F1-score** respectively.\n\nThe neural network results are competitive with other classifiers, which should be noted -\nour configuration is fairly simple and so it is possible that different\nhighly tuned neural network could outperform all `scikit-learn` classifiers by a fair amount.\nThat hypothesis is even more supported by the fact that **MLP classifier** is arguably the best out of all classifiers\noverall.\n\n## Multiclass Classification\n\n### Data\n\nThe data is just the same as it was, but now we are using `type` column.\n\nAfter some grouping here are types of variable stars:\n\n- `UNKNOWN` - stars with variability that cannot be confidently classified into known categories or with no variability\n  at all\n- `ECLIPSING` - binary star systems where one star periodically passes in front of the other, causing detectable dips in\n  brightness\n- `CEPHEIDS` - pulsating variable stars with a precise period-luminosity relationship, used as \"standard candles\" in\n  astronomy\n- `RR_LYRAE` - short-period pulsating stars found in globular clusters, with periods \u003c 1 day and lower luminosity\n  than `Cepheids`\n- `DELTA_SCUTI_ETC` - delta Scuti stars and similar pulsating variables with short periods (hours) and small amplitude\n  changes\n- `LONG_PERIOD` - stars with variability cycles spanning months to years\n- `ROTATIONAL` - variability caused by starspots or non-uniform surface brightness due to rapid rotation\n- `ERUPTIVE` - irregular brightness changes due to flares or mass ejections\n- `CATACLYSMIC` - cataclysmic variables with sudden outbursts, often in binary systems\n- `EMISSION_WR` - Wolf-Rayet stars with strong emission lines from stellar winds\n\n`UNKNOWN` category was removed because now we are trying to classify only types of actual variable stars,\n`ERUPTIVE` and `EMISSION_WR` were removed because there are no data available for them in the dataset.\n\nHere is correlation matrix for multiclass dataset:\n\n![Correlation matrix](multiclass_classification/docs/images/correlation_matrix.png)\n\n#### Class Imbalance\n\nOnce again there is class imbalance:\n\n![Class imbalance](multiclass_classification/docs/images/variable_ratio.png)\n\nAlthough `CEPHEIDS` and `CATACLYSMIC` stars are almost non-existent in dataset,\nit would be a nice challenge to try to classify those correctly.\n\nThis issue was addressed during the study using **class weighting**,\n**oversampling** with SMOTE and **undersampling**,\nalthough in the research we only utilised **class weighting**\n\n### Research\n\nThe multiclass classification was solved by built-in [`scikit-learn`](https://scikit-learn.org/stable/) models\nwhich use 1v1 strategy automatically for multiclass.\n\nThe models were evaluated using **precision**, **recall**, and **F1-score** for each class,\nalong with weighted average.\n\nOn confusion matrices classes are written with their id,\nwhich does not necessarily match their id from correlation matrix image or their ordering number -\nthis id corresponds to their position in the metrics table.\n\n#### Logistic Regression\n\n![Confusion matrix for logistic regression](multiclass_classification/docs/images/cm_logistic_regression.png)\n\n| Class             | Precision | Recall | F1-Score | Support |  \n|:------------------|:---------:|:------:|:--------:|:-------:|  \n| `ECLIPSING`       |   0.53    |  0.25  |   0.34   |   280   |  \n| `CEPHEIDS`        |   0.01    |  0.33  |   0.01   |    3    |  \n| `RR_LYRAE`        |   0.46    |  0.54  |   0.50   |   125   |  \n| `DELTA_SCUTI_ETC` |   0.76    |  0.83  |   0.80   |   761   |  \n| `LONG_PERIOD`     |   0.12    |  0.47  |   0.20   |   17    |  \n| `ROTATIONAL`      |   0.85    |  0.40  |   0.54   |   528   |  \n| `CATACLYSMIC`     |   0.02    |  1.00  |   0.03   |    2    |  \n|                   |   0.72    |  0.58  |   0.61   |  1716   |  \n\n#### SVC\n\n![Confusion matrix for svc](multiclass_classification/docs/images/cm_svc.png)\n\n| Class             | Precision | Recall | F1-Score | Support |  \n|:------------------|:---------:|:------:|:--------:|:-------:|  \n| `ECLIPSING`       |   0.52    |  0.19  |   0.27   |   262   |  \n| `CEPHEIDS`        |   0.02    |  0.80  |   0.04   |    5    |  \n| `RR_LYRAE`        |   0.35    |  0.27  |   0.31   |   124   |  \n| `DELTA_SCUTI_ETC` |   0.65    |  0.50  |   0.57   |   786   |  \n| `LONG_PERIOD`     |   0.06    |  0.58  |   0.10   |   12    |  \n| `ROTATIONAL`      |   0.41    |  0.22  |   0.29   |   524   |  \n| `CATACLYSMIC`     |   0.00    |  0.33  |   0.01   |    3    |  \n|                   |   0.53    |  0.35  |   0.41   |  1716   |\n\n#### K-Nearest Neighbors\n\n![Confusion matrix for knn](multiclass_classification/docs/images/cm_knn.png)\n\n| Class             | Precision | Recall | F1-Score | Support |  \n|:------------------|:---------:|:------:|:--------:|:-------:|  \n| `ECLIPSING`       |   0.40    |  0.43  |   0.41   |   268   |  \n| `CEPHEIDS`        |   0.00    |  0.00  |   0.00   |    4    |  \n| `RR_LYRAE`        |   0.52    |  0.31  |   0.39   |   162   |  \n| `DELTA_SCUTI_ETC` |   0.66    |  0.89  |   0.76   |   742   |  \n| `LONG_PERIOD`     |   0.00    |  0.00  |   0.00   |   13    |  \n| `ROTATIONAL`      |   0.69    |  0.44  |   0.54   |   526   |  \n| `CATACLYSMIC`     |   0.00    |  0.00  |   0.00   |    1    |  \n|                   |   0.61    |  0.62  |   0.59   |  1716   |\n\n#### Random Forest\n\n![Confusion matrix for random forest](multiclass_classification/docs/images/cm_random_forest.png)\n\n| Class             | Precision | Recall | F1-Score | Support |  \n|:------------------|:---------:|:------:|:--------:|:-------:|  \n| `ECLIPSING`       |   0.74    |  0.61  |   0.67   |   268   |  \n| `CEPHEIDS`        |   0.50    |  0.25  |   0.33   |    4    |  \n| `RR_LYRAE`        |   0.77    |  0.81  |   0.79   |   140   |  \n| `DELTA_SCUTI_ETC` |   0.88    |  0.94  |   0.91   |   760   |  \n| `LONG_PERIOD`     |   0.83    |  0.50  |   0.62   |   10    |  \n| `ROTATIONAL`      |   0.89    |  0.88  |   0.89   |   530   |  \n| `CATACLYSMIC`     |   1.00    |  0.25  |   0.40   |    4    |  \n|                   |   0.85    |  0.85  |   0.85   |  1716   |\n\n#### SGD\n\n![Confusion matrix for sgd](multiclass_classification/docs/images/cm_sgd.png)\n\n| Class             | Precision | Recall | F1-Score | Support |  \n|:------------------|:---------:|:------:|:--------:|:-------:|  \n| `ECLIPSING`       |   0.37    |  0.49  |   0.42   |   271   |  \n| `CEPHEIDS`        |   0.00    |  0.00  |   0.00   |    9    |  \n| `RR_LYRAE`        |   0.64    |  0.42  |   0.51   |   175   |  \n| `DELTA_SCUTI_ETC` |   0.72    |  0.94  |   0.82   |   741   |  \n| `LONG_PERIOD`     |   0.31    |  0.82  |   0.45   |   11    |  \n| `ROTATIONAL`      |   0.90    |  0.38  |   0.54   |   506   |  \n| `CATACLYSMIC`     |   0.00    |  0.00  |   0.00   |    3    |  \n|                   |   0.70    |  0.65  |   0.63   |  1716   |\n\n#### Gradient Boosting\n\n![Confusion matrix for gradient boosting](multiclass_classification/docs/images/cm_gradient_boosting.png)\n\n| Class             | Precision | Recall | F1-Score | Support |  \n|:------------------|:---------:|:------:|:--------:|:-------:|  \n| `ECLIPSING`       |   0.76    |  0.66  |   0.70   |   261   |  \n| `CEPHEIDS`        |   0.11    |  0.12  |   0.12   |    8    |  \n| `RR_LYRAE`        |   0.77    |  0.74  |   0.76   |   133   |  \n| `DELTA_SCUTI_ETC` |   0.89    |  0.94  |   0.91   |   766   |  \n| `LONG_PERIOD`     |   0.75    |  0.64  |   0.69   |   14    |  \n| `ROTATIONAL`      |   0.89    |  0.88  |   0.89   |   532   |  \n| `CATACLYSMIC`     |   0.33    |  0.50  |   0.40   |    2    |  \n|                   |   0.85    |  0.86  |   0.86   |  1716   |\n\n### Results\n\nThe full comparison of models is summarized below:\n\n| Model               | Weighted Precision | Weighted F1-Score |\n|:--------------------|:------------------:|:-----------------:|\n| Logistic Regression |        0.72        |       0.61        |\n| SVC                 |        0.53        |       0.41        |\n| K-Nearest Neighbors |        0.61        |       0.59        |\n| Random Forest       |      **0.85**      |     **0.85**      |\n| SGD                 |        0.70        |       0.63        |\n| Gradient Boosting   |      **0.85**      |     **0.86**      |\n\nResearch showed that **Gradient Boosting** and **Random Forest** perform identically great.\nTaking in mind that `scikit-learn` Gradient Boosting does not apply class balancing,\nthat makes it the best classifier for multiclass classification.\n\nWe can also pinpoint how well Gradient Boosting and Random Forest classify `CEPHEIDS` and `CATACLYSMIC` stars:\nthey are able to recognise those stars even when their amount is very small.\n\n## Conclusion\n\nThis research successfully addressed the challenge of classifying variable stars\nthrough both **binary** (variable/non-variable) and\n**multiclass** (variable type) approaches.\n\nThe key findings demonstrate that:\n\n1. **Binary Classification**:\n    - **Gradient Boosting** with undersampling achieved the highest **recall** (0.991)\n    - **Stacking** with oversampling achieved the best **precision** (0.831)\n    - **MLP** classifier with no balancing achieved the best **F1-score** (0.776)\n    - **MLP** classifier performed the best overall\n\n2. **Multiclass Classification**:\n    - **Gradient Boosting** emerged as the best model, achieving weighted **F1-score** = 0.86 and **precision** = 0.85,\n      excelling in identifying common classes (e.g., Delta Scuti) while handling rarer types\n    - Class imbalance significantly impacted rare categories (e.g., Cepheids, Cataclysmic), highlighting the need for\n      targeted data collection or augmentation\n\n3. **Classification Insights**:\n    - Error metrics were critical predictors of variability, correlating with physical changes in stellar brightness\n    - The neural network’s sensitivity to initial weights suggests astrophysical variability patterns may require\n      careful model initialization\n    - The neural networks could be the best tools for classification - simple neural network and MLP classifiers\n      showed promising results even though they weren't tightly configured using optimization frameworks\n\nThis work provides a robust framework for automating variable star identification,\nenabling astronomers to focus on high-value targets and accelerate discoveries in stellar astrophysics.\nThat, along with interactive component, makes it a great tool for any stellar research.\n\n## Contributions\n\nFeel free to star this repository if you liked our research or if you are interested in it;\nin case of latter you are also welcome to contact our with your suggestions or questions.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjktujq%2Flumenn","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjktujq%2Flumenn","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjktujq%2Flumenn/lists"}