https://github.com/bubbarooski/kepleridentification
Simple ML project that classifies stars based on their light curves using common ML algorithms; this project was the basis for a paper
https://github.com/bubbarooski/kepleridentification
kepler machine-learning neural-network space
Last synced: 4 days ago
JSON representation
Simple ML project that classifies stars based on their light curves using common ML algorithms; this project was the basis for a paper
- Host: GitHub
- URL: https://github.com/bubbarooski/kepleridentification
- Owner: bubbarooski
- Created: 2023-02-28T04:37:45.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2025-02-18T19:48:59.000Z (over 1 year ago)
- Last Synced: 2025-03-02T08:33:54.536Z (over 1 year ago)
- Topics: kepler, machine-learning, neural-network, space
- Language: Python
- Homepage:
- Size: 4.33 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# About the Project
This project is a machine learning project based on the binary classifcation of light curves from the Kepler telescope. The Kepler telescope ran from 2009 to 2018 and its main goal was to find Earth-sized exoplanets. It collected the brightness of ~150,000 stars over its time in service and from the change in brightness of a star, NASA was able to determine if a star had an exoplanet in its system. I used the NASA Exoplanet Archive to download light curves generated by the Kepler telescope to train 6 different algorithms to try and determine if a light curve was indicative of an exoplanet: KNN, Naive Bayes, LSTM, MLP, GRU, and CNN.
# Purpose
One of the topics for my semester project in my Intro to AI class was signal classification. Rather than do the project with a well curated dataset on a topic I didn't care much for, I wanted to work with a messy data set in a field that I love and have had interest in since I was a kid! After some digging, I found that this dataset has been used in other academic papers so I had a jumping off point and there was a library already built for it in Python, which made things so much easier.
# Overview
The project itself consists of three parts: retrieval of the data set, training of algorithms, and prediction.
- Data Set: From the Exoplanet Archive, I downloaded a master CSV (called keplerDataset) that contains a large list of stars, information about the star, and whether or not an exoplanet was CONFIRMED, a FALSE POSITIVE, or a CANDIDATE. For ease of binary classification, I removed all stars labled as CANDIDATE. I then wrote a program to loop through this list and download simple light curve files for ~1000 stars and also calculate pertinent info about the light curve including max flux, min flux, average flux, max to min flux, and flux variance. It often bogged down or disconnected from the Archive so this is why only 1000 were used.
- Training: Once the data was retrieved, I then built and trained the 6 different algorithms from above. These specific ones were used as they were the stipulations for the project but I hope to add more in the future. For the tradiational classifiers, the calculated info was used in the training of those and for the neural networks, the raw light curve photo was passed in. The data was split into 80/20 training/validation.
- Prediction: The Driver file allows you to run the prediction part of the program. The first option in the menu allows the user to see the accuracy of each of the models from a random sample of 200 curve. The second option allows the user to type in an index of the keplerDataset CSV, generate a curve, and show the predictions of all 6 algorithms.
# Analysis
The accuracy of each algorithm is listed below:
- Naive Bayes: 70.56%
- KNN: 62.22%
- SVM: 70.56%
- CNN: 93.15%
- GRU: 75.57%
- MLP: 92.61%
For the traditional classifiers, I believe the accuracy is low because there are not enough features and the features that were used are not helpful. In the future, I wish to extract more useful features to make these models more accurate. For the neural networks, both CNN and MLP performed well, with over 90% accuracy. GRU did not perform well but I had a hard time training it on my laptop. Epochs were taking up to 5 mintues to cycle through. With more computing power, I think all 3 algorithms could be improved apon. The models generated are not included in this repo, as they are too large, but the structure of each algorthim can be seen in its respective folder in the Model directory.
# Research
This project was the basis of the paper I published called "Kepler Light Curve Classification Using Deep Learning and Markov Transition Field (Student Abstract)". We wanted to see if converting the light curve into a different form, such as a Markov Transition Field, and designing a more efficient convolutional nerual network would lead to a higher classification accuracy. Turns out that it did.
- The paper can be found here: https://ojs.aaai.org/index.php/AAAI/article/view/30435