https://github.com/erfaniaa/fake-job-posting-detection

Detect fake job posting with deep learning
https://github.com/erfaniaa/fake-job-posting-detection

classification deep-learning job-posting machine-learning tf-idf

Last synced: 7 months ago
JSON representation

Detect fake job posting with deep learning

Host: GitHub
URL: https://github.com/erfaniaa/fake-job-posting-detection
Owner: Erfaniaa
License: gpl-3.0
Created: 2020-05-18T21:19:36.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2024-09-01T20:57:12.000Z (about 1 year ago)
Last Synced: 2025-03-18T11:04:02.217Z (7 months ago)
Topics: classification, deep-learning, job-posting, machine-learning, tf-idf
Language: Python
Size: 15.5 MB
Stars: 14
Watchers: 2
Forks: 3
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Fake Job Posting Detection

Detecting fake job postings with deep learning

## Introduction

I have used deep learning to solve a binary classification problem: "Is this job description real? Isn't it a fake one?"

The used dataset can be found [here](https://www.kaggle.com/shivamb/real-or-fake-fake-jobposting-prediction).

## Method

After reading data from the CSV file they should be vectorized, so I used *tf-idf* algorithm for the strings. Then, I implemented a fully-connected neural network in *PyTorch* framework for processing those vectors:

```python

class Network(nn.Module):

	def __init__(self, input_size=NETWORK_INPUT_SIZE, output_size=NETWORK_OUTPUT_SIZE):

		super(Network, self).__init__()

		self.fc1 = nn.Linear(input_size, 256)

		self.fc2 = nn.Linear(256, 128)

		self.fc3 = nn.Linear(128, 64)

		self.fc4 = nn.Linear(64, 32)

		self.fc5 = nn.Linear(32, 16)

		self.fc6 = nn.Linear(16, 8)

		self.fc7 = nn.Linear(8, 4)

		self.fc8 = nn.Linear(4, output_size)

	def forward(self, x):

		x = self.fc1(x)

		x = F.relu(x)

		x = self.fc2(x)

		x = F.relu(x)

		x = self.fc3(x)

		x = F.relu(x)

		x = self.fc4(x)

		x = F.relu(x)

		x = self.fc5(x)

		x = F.relu(x)

		x = self.fc6(x)

		x = F.relu(x)

		x = self.fc7(x)

		x = F.relu(x)

		x = self.fc8(x)

		return x

```

We have an imbalanced dataset for this binary classification problem. Because of that, I have used ```torch.nn.BCEWithLogitsLoss``` as my loss function. And for the cross-validation part, *skorch* library has been used in my code.

## Result

After running the code, a confusion matrix and some related statistics will be shown to you:

```

Predict     real        fake           

Actual

real        16864       150         

fake        384         482         

Overall Statistics: 

95% CI                                                            (0.96764,0.97263)

Kappa                                                             0.62834

NIR                                                               0.95157

Overall ACC                                                       0.97013

Class Statistics:

Classes                                                           real          fake             

ACC(Accuracy)                                                     0.97013       0.97013 

ERR(Error rate)                                                   0.02987       0.02987 

F0.5(F0.5 score)                                                  0.9804        0.71008 

F1(F1 score - harmonic mean of precision and sensitivity)         0.98441       0.64352 

F2(F2 score)                                                      0.98846       0.58838 

FN(False negative/miss/type 2 error)                              150           384     

FNR(Miss rate or false negative rate)                             0.00882       0.44342 

FP(False positive/type 1 error/false alarm)                       384           150     

FPR(Fall-out or false positive rate)                              0.44342       0.00882 

PPV(Precision or positive predictive value)                       0.97774       0.76266 

TN(True negative/correct rejection)                               482           16864   

TNR(Specificity or true negative rate)                            0.55658       0.99118 

TP(True positive/hit)                                             16864         482     

TPR(Sensitivity, recall, hit rate, or true positive rate)         0.99118       0.55658 

```

## Run

First of all, install the dependencies:

```bash

pip3 install -r requirements.txt

```

Then, run the project using Python version 3:

```bash

python3 main.py

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/erfaniaa/fake-job-posting-detection

Awesome Lists containing this project

README