https://github.com/linsanity03/titanic_prediction

A visualization on probability of people surviving titanic
https://github.com/linsanity03/titanic_prediction

classifier-model machine-learning prediction-model tensorflow

Last synced: 2 months ago
JSON representation

A visualization on probability of people surviving titanic

Host: GitHub
URL: https://github.com/linsanity03/titanic_prediction
Owner: LINSANITY03
Created: 2023-09-16T11:09:11.000Z (almost 3 years ago)
Default Branch: master
Last Pushed: 2023-09-18T10:23:07.000Z (almost 3 years ago)
Last Synced: 2025-04-05T06:41:42.062Z (over 1 year ago)
Topics: classifier-model, machine-learning, prediction-model, tensorflow
Language: Python
Homepage:
Size: 28.3 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Prediction model of surviving possibilities

In this project we use pandas to collect data from a source and in-built tensorflow model to train the data.

To run this project,

- Create a virutal environment and activate the environment.

  `  virtualenv venv

\venv\Scripts\activate`

- Install the required dependencies.

  `pip install -r requirements.txt

`

- Run the **prediction_main.py** file.

  `python prediction_main.py`

**1. Data Collection:**

We get titanic training and evaluation data from google drive links.

```

import pandas as pd

...

dftrain = pd.read_csv(

'https://storage.googleapis.com/tf-datasets/titanic/train.csv') # training data

dfeval = pd.read_csv(

'https://storage.googleapis.com/tf-datasets/titanic/eval.csv') # testing data

```

**2. Feature Extraction:**

Using the in-built feature column function of tensorflow, we get all the unique value from each column of the pandas file.

```

CATEGORICAL_COLUMNS = ['sex', 'n_siblings_spouses', 'parch', 'class', 'deck',

'embark_town', 'alone']

NUMERIC_COLUMNS = ['age', 'fare']

feature_columns = []

# we use the inbuilt function in tensorflow to get all the unique value represented in the data of certain features

for feature_name in CATEGORICAL_COLUMNS: # gets a list of all unique values from given feature column

vocabulary = dftrain[feature_name].unique()

feature_columns.append(tf.feature_column.categorical_column_with_vocabulary_list(

feature_name, vocabulary))

# similar for the numeric ones we get the features in float format

for feature_name in NUMERIC_COLUMNS:

feature_columns.append(tf.feature_column.numeric_column(

feature_name, dtype=tf.float32))

```

**3. Data Preparation:**

We need to make sure the data are in appropritate format for the tensorflow model. So, we convert the datas into data.Dataset object using tf.data.Dataset function

```

# create tf.data.Dataset object with data and its label

ds = tf.data.Dataset.from_tensor_slices((dict(data_df), label_df))

```

**4. Choosing a Model:**

Our goal is to predict the chance of survivility. So, a simple linear model would do the trick.

```

linear_est = tf.estimator.LinearClassifier(feature_columns=feature_columns)

```

**5. Training the model:**

We use the data we convert to data.Dataset object to the model.

```

linear_est.train(train_input_fn)  # train

```

**6. Evaluate the model:**

Test the unseen dataset to measure the performance of the trained model.

```

result = linear_est.evaluate(eval_input_fn)

```

**7. Make prediction:**

Using the evaluated model predict the survivor possibilty and plot the stats into graph using matplot for better readability.

```

pred_dicts = list(linear_est.predict(eval_input_fn))

probs = pd.Series([pred['probabilities'][1] for pred in pred_dicts])

probs.plot(kind='hist', bins=20, title='predicted probabilities')

plt.show()

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/linsanity03/titanic_prediction

Awesome Lists containing this project

README