https://github.com/erfaniaa/fake-job-posting-detection
Detect fake job posting with deep learning
https://github.com/erfaniaa/fake-job-posting-detection
classification deep-learning job-posting machine-learning tf-idf
Last synced: 7 months ago
JSON representation
Detect fake job posting with deep learning
- Host: GitHub
- URL: https://github.com/erfaniaa/fake-job-posting-detection
- Owner: Erfaniaa
- License: gpl-3.0
- Created: 2020-05-18T21:19:36.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2024-09-01T20:57:12.000Z (about 1 year ago)
- Last Synced: 2025-03-18T11:04:02.217Z (7 months ago)
- Topics: classification, deep-learning, job-posting, machine-learning, tf-idf
- Language: Python
- Size: 15.5 MB
- Stars: 14
- Watchers: 2
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Fake Job Posting Detection
Detecting fake job postings with deep learning
## Introduction
I have used deep learning to solve a binary classification problem: "Is this job description real? Isn't it a fake one?"
The used dataset can be found [here](https://www.kaggle.com/shivamb/real-or-fake-fake-jobposting-prediction).
## Method
After reading data from the CSV file they should be vectorized, so I used *tf-idf* algorithm for the strings. Then, I implemented a fully-connected neural network in *PyTorch* framework for processing those vectors:
```python
class Network(nn.Module):
def __init__(self, input_size=NETWORK_INPUT_SIZE, output_size=NETWORK_OUTPUT_SIZE):
super(Network, self).__init__()
self.fc1 = nn.Linear(input_size, 256)
self.fc2 = nn.Linear(256, 128)
self.fc3 = nn.Linear(128, 64)
self.fc4 = nn.Linear(64, 32)
self.fc5 = nn.Linear(32, 16)
self.fc6 = nn.Linear(16, 8)
self.fc7 = nn.Linear(8, 4)
self.fc8 = nn.Linear(4, output_size)def forward(self, x):
x = self.fc1(x)
x = F.relu(x)
x = self.fc2(x)
x = F.relu(x)
x = self.fc3(x)
x = F.relu(x)
x = self.fc4(x)
x = F.relu(x)
x = self.fc5(x)
x = F.relu(x)
x = self.fc6(x)
x = F.relu(x)
x = self.fc7(x)
x = F.relu(x)
x = self.fc8(x)
return x
```We have an imbalanced dataset for this binary classification problem. Because of that, I have used ```torch.nn.BCEWithLogitsLoss``` as my loss function. And for the cross-validation part, *skorch* library has been used in my code.
## Result
After running the code, a confusion matrix and some related statistics will be shown to you:
```
Predict real fake
Actual
real 16864 150
fake 384 482Overall Statistics:
95% CI (0.96764,0.97263)
Kappa 0.62834
NIR 0.95157
Overall ACC 0.97013Class Statistics:
Classes real fake
ACC(Accuracy) 0.97013 0.97013
ERR(Error rate) 0.02987 0.02987
F0.5(F0.5 score) 0.9804 0.71008
F1(F1 score - harmonic mean of precision and sensitivity) 0.98441 0.64352
F2(F2 score) 0.98846 0.58838
FN(False negative/miss/type 2 error) 150 384
FNR(Miss rate or false negative rate) 0.00882 0.44342
FP(False positive/type 1 error/false alarm) 384 150
FPR(Fall-out or false positive rate) 0.44342 0.00882
PPV(Precision or positive predictive value) 0.97774 0.76266
TN(True negative/correct rejection) 482 16864
TNR(Specificity or true negative rate) 0.55658 0.99118
TP(True positive/hit) 16864 482
TPR(Sensitivity, recall, hit rate, or true positive rate) 0.99118 0.55658
```## Run
First of all, install the dependencies:
```bash
pip3 install -r requirements.txt
```Then, run the project using Python version 3:
```bash
python3 main.py
```