https://github.com/tarcisio-marinho/python-data-science
Talk sobre data science com Python durante a semana da computação UNICAP 2019
https://github.com/tarcisio-marinho/python-data-science
hacktoberfest hacktoberfest2020
Last synced: 2 months ago
JSON representation
Talk sobre data science com Python durante a semana da computação UNICAP 2019
- Host: GitHub
- URL: https://github.com/tarcisio-marinho/python-data-science
- Owner: tarcisio-marinho
- Created: 2019-04-30T03:57:24.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2019-10-08T13:30:42.000Z (about 6 years ago)
- Last Synced: 2025-01-19T06:47:06.288Z (9 months ago)
- Topics: hacktoberfest, hacktoberfest2020
- Language: Jupyter Notebook
- Homepage:
- Size: 224 KB
- Stars: 1
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Loading the data
```python
import numpy as np
import pandas as pd
from sklearn import preprocessing, model_selection, neighbors, svm
import warnings
warnings.filterwarnings('ignore')
```### 7. Attribute Information: (class attribute has been moved to last column)
### Attribute Domain
-- -----------------------------------------
1. Sample code number id number
2. Clump Thickness 1 - 10
3. Uniformity of Cell Size 1 - 10
4. Uniformity of Cell Shape 1 - 10
5. Marginal Adhesion 1 - 10
6. Single Epithelial Cell Size 1 - 10
7. Bare Nuclei 1 - 10
8. Bland Chromatin 1 - 10
9. Normal Nucleoli 1 - 10
10. Mitoses 1 - 10
11. Class: (2 for benign, 4 for malignant)### 8. Missing attribute values: 16
There are 16 instances in Groups 1 to 6 that contain a single missing
(i.e., unavailable) attribute value, now denoted by "?".### 9. Class distribution:
Benign: 458 (65.5%)
Malignant: 241 (34.5%)```python
df = pd.read_csv('data/breast-cancer-wisconsin.data')
df.head()
```.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}.dataframe tbody tr th {
vertical-align: top;
}.dataframe thead th {
text-align: right;
}
id
clump_thickness
unif_cell_size
unif_cell_shape
marg_adhesion
single_ephith_cell_size
bare_nuclei
bland_chrom
norm_nucleoli
mitoses
class
0
1000025
5
1
1
1
2
1
3
1
1
2
1
1002945
5
4
4
5
7
10
3
2
1
2
2
1015425
3
1
1
1
2
2
3
1
1
2
3
1016277
6
8
8
1
3
4
3
7
1
2
4
1017023
4
1
1
3
2
1
3
1
1
2
# Cleaning the data
```python
df.replace('?', -99999, inplace=True)
df.drop(['id'], 1, inplace=True)
``````python
x = np.array(df.drop(['class'], 1))
y = np.array(df['class'])
```# Spliting the Train data and Test data
```python
x_train, x_test, y_train, y_test = model_selection.train_test_split(x, y, test_size=0.2)
``````python
x_train
```array([[1, 1, 1, ..., 2, 1, 1],
[8, 10, 10, ..., 7, 8, 1],
[5, 1, 2, ..., 3, 1, 1],
...,
[4, 1, 1, ..., 2, 1, 1],
[10, 10, 10, ..., 7, 10, 1],
[4, 10, 4, ..., 9, 10, 1]], dtype=object)```python
y_train
```array([2, 4, 2, 2, 2, 4, 2, 2, 2, 4, 4, 2, 4, 2, 2, 4, 2, 2, 4, 2, 2, 2,
2, 2, 2, 2, 2, 2, 4, 4, 2, 2, 2, 4, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2,
4, 2, 4, 2, 2, 4, 2, 4, 4, 2, 2, 4, 2, 2, 4, 2, 2, 2, 2, 2, 2, 2,
4, 2, 2, 2, 4, 2, 4, 2, 4, 2, 4, 2, 2, 4, 4, 2, 4, 4, 2, 2, 2, 2,
2, 2, 4, 2, 4, 2, 2, 2, 2, 2, 2, 4, 2, 2, 2, 2, 4, 4, 2, 2, 2, 2,
2, 2, 2, 2, 2, 4, 2, 4, 2, 2, 2, 2, 2, 2, 2, 4, 2, 4, 2, 2, 2, 2,
4, 2, 2, 2, 2, 4, 2, 4, 4, 4, 4, 4, 4, 4, 2, 2, 4, 2, 2, 2, 2, 2,
4, 2, 2, 2, 4, 2, 4, 2, 4, 4, 2, 2, 4, 4, 2, 2, 2, 2, 2, 2, 2, 4,
2, 2, 2, 2, 2, 2, 4, 2, 4, 4, 2, 2, 2, 2, 2, 4, 2, 4, 2, 2, 2, 2,
2, 4, 2, 2, 2, 2, 4, 2, 2, 2, 2, 4, 2, 2, 2, 4, 2, 4, 4, 4, 2, 2,
4, 4, 4, 2, 2, 2, 2, 4, 2, 2, 4, 2, 4, 2, 4, 2, 2, 2, 4, 4, 2, 2,
2, 4, 2, 2, 2, 2, 2, 2, 2, 2, 4, 4, 2, 2, 4, 4, 4, 4, 4, 2, 2, 4,
2, 4, 2, 4, 2, 4, 2, 2, 4, 2, 4, 2, 4, 4, 2, 4, 2, 2, 2, 2, 2, 2,
4, 4, 4, 4, 4, 2, 2, 4, 4, 2, 2, 2, 2, 4, 2, 2, 2, 2, 4, 2, 2, 2,
4, 2, 2, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 2, 2, 2, 2, 2, 4, 2,
2, 2, 2, 2, 2, 2, 2, 4, 2, 2, 4, 2, 2, 2, 4, 2, 4, 4, 4, 2, 4, 2,
2, 2, 4, 2, 2, 2, 4, 2, 4, 2, 4, 2, 2, 2, 2, 4, 2, 2, 2, 4, 4, 4,
2, 4, 2, 4, 2, 2, 4, 2, 2, 2, 2, 4, 2, 4, 4, 2, 2, 4, 4, 2, 4, 2,
2, 2, 4, 2, 2, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 4, 2, 2, 2, 4, 4, 2, 4, 4, 2, 2, 4, 4, 2, 2, 4, 2, 4, 2, 2,
2, 4, 4, 4, 4, 4, 2, 2, 4, 2, 2, 4, 4, 4, 2, 2, 2, 4, 4, 2, 2, 2,
4, 4, 2, 4, 2, 2, 4, 2, 2, 4, 2, 2, 4, 2, 2, 2, 2, 4, 4, 2, 2, 4,
2, 4, 2, 2, 2, 4, 4, 2, 2, 2, 2, 2, 4, 2, 2, 4, 2, 2, 2, 4, 4, 4,
2, 4, 2, 2, 4, 2, 4, 2, 2, 4, 4, 2, 2, 2, 4, 2, 2, 4, 2, 2, 2, 2,
2, 2, 4, 2, 2, 2, 4, 2, 2, 2, 2, 2, 2, 4, 2, 4, 2, 2, 4, 2, 2, 2,
2, 4, 2, 4, 2, 4, 2, 4, 4])# Creating and training the classifier
```python
clf = svm.SVC()
clf.fit(x_train, y_train)
```SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
kernel='rbf', max_iter=-1, probability=False, random_state=None,
shrinking=True, tol=0.001, verbose=False)# Checking model accuracy
```python
accuracy = clf.score(x_test, y_test)
accuracy
```0.9714285714285714
# Tests
```python
# Generating test data
``````python
example_measures = np.array([[4, 2, 1, 1, 1, 2, 3, 2, 1], [4, 2, 1, 2, 2, 2, 3, 2, 1]])
example_measures = example_measures.reshape(len(example_measures), -1)
``````python
tipos_de_cancer = {4 : "maligo", 2 : "benigno"}
``````python
prediction = clf.predict(example_measures)
print('Tipo de cancer é: {}'.format(tipos_de_cancer[prediction.item(0)]))
```Tipo de cancer é: benigno