https://github.com/tarcisio-marinho/python-data-science

Talk sobre data science com Python durante a semana da computação UNICAP 2019
https://github.com/tarcisio-marinho/python-data-science

hacktoberfest hacktoberfest2020

Last synced: 2 months ago
JSON representation

Talk sobre data science com Python durante a semana da computação UNICAP 2019

Host: GitHub
URL: https://github.com/tarcisio-marinho/python-data-science
Owner: tarcisio-marinho
Created: 2019-04-30T03:57:24.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2019-10-08T13:30:42.000Z (about 6 years ago)
Last Synced: 2025-01-19T06:47:06.288Z (9 months ago)
Topics: hacktoberfest, hacktoberfest2020
Language: Jupyter Notebook
Homepage:
Size: 224 KB
Stars: 1
Watchers: 2
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Loading the data

```python
import numpy as np
import pandas as pd
from sklearn import preprocessing, model_selection, neighbors, svm
import warnings
warnings.filterwarnings('ignore')
```

### 7. Attribute Information: (class attribute has been moved to last column)

### Attribute Domain
-- -----------------------------------------
1. Sample code number id number
2. Clump Thickness 1 - 10
3. Uniformity of Cell Size 1 - 10
4. Uniformity of Cell Shape 1 - 10
5. Marginal Adhesion 1 - 10
6. Single Epithelial Cell Size 1 - 10
7. Bare Nuclei 1 - 10
8. Bland Chromatin 1 - 10
9. Normal Nucleoli 1 - 10
10. Mitoses 1 - 10
11. Class: (2 for benign, 4 for malignant)

### 8. Missing attribute values: 16

There are 16 instances in Groups 1 to 6 that contain a single missing
(i.e., unavailable) attribute value, now denoted by "?".

### 9. Class distribution:

Benign: 458 (65.5%)
Malignant: 241 (34.5%)

```python
df = pd.read_csv('data/breast-cancer-wisconsin.data')
df.head()
```

.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
}

.dataframe thead th {
text-align: right;
}

id
clump_thickness
unif_cell_size
unif_cell_shape
marg_adhesion
single_ephith_cell_size
bare_nuclei
bland_chrom
norm_nucleoli
mitoses
class

0
1000025
5
1
1
1
2
1
3
1
1
2

1
1002945
5
4
4
5
7
10
3
2
1
2

2
1015425
3
1
1
1
2
2
3
1
1
2

3
1016277
6
8
8
1
3
4
3
7
1
2

4
1017023
4
1
1
3
2
1
3
1
1
2

# Cleaning the data

```python
df.replace('?', -99999, inplace=True)
df.drop(['id'], 1, inplace=True)
```

```python
x = np.array(df.drop(['class'], 1))
y = np.array(df['class'])
```

# Spliting the Train data and Test data

```python
x_train, x_test, y_train, y_test = model_selection.train_test_split(x, y, test_size=0.2)
```

```python
x_train
```

array([[1, 1, 1, ..., 2, 1, 1],
[8, 10, 10, ..., 7, 8, 1],
[5, 1, 2, ..., 3, 1, 1],
...,
[4, 1, 1, ..., 2, 1, 1],
[10, 10, 10, ..., 7, 10, 1],
[4, 10, 4, ..., 9, 10, 1]], dtype=object)

```python
y_train
```

array([2, 4, 2, 2, 2, 4, 2, 2, 2, 4, 4, 2, 4, 2, 2, 4, 2, 2, 4, 2, 2, 2,
2, 2, 2, 2, 2, 2, 4, 4, 2, 2, 2, 4, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2,
4, 2, 4, 2, 2, 4, 2, 4, 4, 2, 2, 4, 2, 2, 4, 2, 2, 2, 2, 2, 2, 2,
4, 2, 2, 2, 4, 2, 4, 2, 4, 2, 4, 2, 2, 4, 4, 2, 4, 4, 2, 2, 2, 2,
2, 2, 4, 2, 4, 2, 2, 2, 2, 2, 2, 4, 2, 2, 2, 2, 4, 4, 2, 2, 2, 2,
2, 2, 2, 2, 2, 4, 2, 4, 2, 2, 2, 2, 2, 2, 2, 4, 2, 4, 2, 2, 2, 2,
4, 2, 2, 2, 2, 4, 2, 4, 4, 4, 4, 4, 4, 4, 2, 2, 4, 2, 2, 2, 2, 2,
4, 2, 2, 2, 4, 2, 4, 2, 4, 4, 2, 2, 4, 4, 2, 2, 2, 2, 2, 2, 2, 4,
2, 2, 2, 2, 2, 2, 4, 2, 4, 4, 2, 2, 2, 2, 2, 4, 2, 4, 2, 2, 2, 2,
2, 4, 2, 2, 2, 2, 4, 2, 2, 2, 2, 4, 2, 2, 2, 4, 2, 4, 4, 4, 2, 2,
4, 4, 4, 2, 2, 2, 2, 4, 2, 2, 4, 2, 4, 2, 4, 2, 2, 2, 4, 4, 2, 2,
2, 4, 2, 2, 2, 2, 2, 2, 2, 2, 4, 4, 2, 2, 4, 4, 4, 4, 4, 2, 2, 4,
2, 4, 2, 4, 2, 4, 2, 2, 4, 2, 4, 2, 4, 4, 2, 4, 2, 2, 2, 2, 2, 2,
4, 4, 4, 4, 4, 2, 2, 4, 4, 2, 2, 2, 2, 4, 2, 2, 2, 2, 4, 2, 2, 2,
4, 2, 2, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 2, 2, 2, 2, 2, 4, 2,
2, 2, 2, 2, 2, 2, 2, 4, 2, 2, 4, 2, 2, 2, 4, 2, 4, 4, 4, 2, 4, 2,
2, 2, 4, 2, 2, 2, 4, 2, 4, 2, 4, 2, 2, 2, 2, 4, 2, 2, 2, 4, 4, 4,
2, 4, 2, 4, 2, 2, 4, 2, 2, 2, 2, 4, 2, 4, 4, 2, 2, 4, 4, 2, 4, 2,
2, 2, 4, 2, 2, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 4, 2, 2, 2, 4, 4, 2, 4, 4, 2, 2, 4, 4, 2, 2, 4, 2, 4, 2, 2,
2, 4, 4, 4, 4, 4, 2, 2, 4, 2, 2, 4, 4, 4, 2, 2, 2, 4, 4, 2, 2, 2,
4, 4, 2, 4, 2, 2, 4, 2, 2, 4, 2, 2, 4, 2, 2, 2, 2, 4, 4, 2, 2, 4,
2, 4, 2, 2, 2, 4, 4, 2, 2, 2, 2, 2, 4, 2, 2, 4, 2, 2, 2, 4, 4, 4,
2, 4, 2, 2, 4, 2, 4, 2, 2, 4, 4, 2, 2, 2, 4, 2, 2, 4, 2, 2, 2, 2,
2, 2, 4, 2, 2, 2, 4, 2, 2, 2, 2, 2, 2, 4, 2, 4, 2, 2, 4, 2, 2, 2,
2, 4, 2, 4, 2, 4, 2, 4, 4])

# Creating and training the classifier

```python
clf = svm.SVC()
clf.fit(x_train, y_train)
```

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
kernel='rbf', max_iter=-1, probability=False, random_state=None,
shrinking=True, tol=0.001, verbose=False)

# Checking model accuracy

```python
accuracy = clf.score(x_test, y_test)
accuracy
```

0.9714285714285714

# Tests

```python
# Generating test data
```

```python
example_measures = np.array([[4, 2, 1, 1, 1, 2, 3, 2, 1], [4, 2, 1, 2, 2, 2, 3, 2, 1]])
example_measures = example_measures.reshape(len(example_measures), -1)
```

```python
tipos_de_cancer = {4 : "maligo", 2 : "benigno"}
```

```python
prediction = clf.predict(example_measures)
print('Tipo de cancer é: {}'.format(tipos_de_cancer[prediction.item(0)]))
```

Tipo de cancer é: benigno

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tarcisio-marinho/python-data-science

Awesome Lists containing this project

README