https://github.com/jofaval/sonar
Binary Classification of Sonar Signals of Rocks and Metal cylinders in 1987
https://github.com/jofaval/sonar
data-analysis data-science data-visualization machine-learning python scikit-learn sonar uci
Last synced: 6 months ago
JSON representation
Binary Classification of Sonar Signals of Rocks and Metal cylinders in 1987
- Host: GitHub
- URL: https://github.com/jofaval/sonar
- Owner: jofaval
- License: gpl-3.0
- Created: 2022-07-11T17:34:12.000Z (about 3 years ago)
- Default Branch: master
- Last Pushed: 2022-07-16T12:04:14.000Z (about 3 years ago)
- Last Synced: 2025-02-04T12:56:33.077Z (8 months ago)
- Topics: data-analysis, data-science, data-visualization, machine-learning, python, scikit-learn, sonar, uci
- Language: Jupyter Notebook
- Homepage: https://colab.research.google.com/github/jofaval/sonar/blob/master/notebook.ipynb
- Size: 1.65 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Sonar - Mines and Rock classification #
[](https://colab.research.google.com/github/jofaval/sonar/blob/master/notebook.ipynb)
## Table of contents
1. [đ Data](#-data)
1. [đ Description](#-description)
1. [âī¸ Objective](#-objective)
1. [đ§ą Tech stack](#-tech-stack)
1. [đš Algorithms](#-algorithms)
1. [đ Visualization](#-visualization)
1. [đ¤ Conclusions](#-conclusions)
1. [ÂŠī¸ Credits](#-credits)## đ Data
[â Back to the table](#table-of-contents)The data is available at the official archive link:\
[https://archive.ics.uci.edu/ml/datasets/connectionist+bench+(sonar,+mines+vs.+rocks)](https://archive.ics.uci.edu/ml/datasets/connectionist+bench+(sonar,+mines+vs.+rocks))## đ Description
[â Back to the table](#table-of-contents)**Abstract**: The task is to train a network to discriminate between sonar signals bounced off a metal cylinder and those bounced off a roughly cylindrical rock.
It's a set of signals from 0 to 1 that represent the information about an object. This dataset is a famous dataset for (binary) classification.
## âī¸ Objective
[â Back to the table](#table-of-contents)To succcessfully predict wether it's a rock or a mine, it is said that, a classification model with a score above 80% is considered a good model. A simpe (not to say easy) objective makes it easier to focus on the project.
## đ§ą Tech stack
[â Back to the table](#table-of-contents)Python, that's it! R is a programming language that, as for the moment being, I have no experience with, even though it's powerful and broadly used, but I'd dare to say that no more than Python.
And one of the strongest points, if not the most, about Python, are it's libraries, so... the libraries I've used are:
- Pandas, data manipulation with an ease of use and exploration data analysis.
- Numpy, a really strong linear algebra library, used in the project for it's statistics utilities, SciPy may be an alternative, but I have no experience at all with it.
- Matplotlib and Seaborn, both fantastic libraries for data visualization, and they complement each other.
- Scikit-Learn, the library used for Machine Learning and statistics models: Linear Regression, SVR, Lasso, Ridge, etc.## đš Algorithms
[â Back to the table](#table-of-contents)I've only used XGBoost it was the idea from the start, to only use XGBoost.
- XGBoost a powerful algorithm based on gradient boosted regularization.
## đ Visualization
[â Back to the table](#table-of-contents)There only, important, visualization that there's in this project, is the sonar visualization, as to better help identify what it is that we're working with, we're given 60 attributes, but more than attributes, they're the different y-axis values that each x-axis position take. In other words, we're evaluating a signal, so, plotting it, can help us understand if our model is actually working correctly and, maybe, failing on similar signals, or it's completely messing the labeling.
## đ¤ Conclusions
[â Back to the table](#table-of-contents)Whenever possible, even on a perfect, ready-to-work dataset, read the abstract, the paper, whatever information it is that you may have at hand, it truly helps understand the evaluation of the results and it's tuning.
And, once again, distribution can do wonders, having a distributed dataset is truly important, and if you don't have one, you can create it, undersampling is the best option around, unless you can actually create/collect reliable synthethic data, which is not the case.
## ÂŠī¸ Credits
[â Back to the table](#table-of-contents)The data set was contributed to the benchmark collection by Terry Sejnowski, now at the Salk Institute and the University of California at San Deigo. The data set was developed in collaboration with R. Paul Gorman of Allied-Signal Aerospace Technology Center.
But more information can be found at the [official datateset's paper](http://archive.ics.uci.edu/ml/datasets/connectionist+bench+(sonar,+mines+vs.+rocks)).