Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/xeneta/LeadQualifier

:dart: Qualify sales leads with machine learning
https://github.com/xeneta/LeadQualifier

Last synced: 18 days ago
JSON representation

:dart: Qualify sales leads with machine learning

Host: GitHub
URL: https://github.com/xeneta/LeadQualifier
Owner: xeneta
License: mit
Created: 2016-06-03T12:15:54.000Z (over 8 years ago)
Default Branch: master
Last Pushed: 2017-10-16T18:08:22.000Z (about 7 years ago)
Last Synced: 2024-07-31T21:56:06.946Z (3 months ago)
Language: Python
Homepage:
Size: 2.63 MB
Stars: 640
Watchers: 71
Forks: 105
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# LeadQualifier

This repo is a collection of scripts we use at Xeneta to **qualify sales leads** with machine learning. Read more about this project in the Medium article [Boosting Sales With Machine Learning](https://medium.com/xeneta/boosting-sales-with-machine-learning-fbcf2e618be3).

You can use this repo for **two things:**

1. Try to beat our predictions using **our data** and your own algorithm
2. Create a lead qualifier for your company, using **your own data**

## Setup

Start off by running the following command:

pip install -r requirements.txt

You'll also need to download the stopword from the [nltk](http://www.nltk.org/index.html) package. Run the Python interpreter and type the following:

import nltk
nltk.download('stopwords')

# 1. Experiment with your own algorithms

We'd love to see more algorithms on the leaderboard, so send us a pull request once you've implemented one.

## [Xeneta Qualifier](https://github.com/xeneta/LeadQualifier/tree/master/xeneta_qualifier)

We've provided you with our vectorized and transformed data [here](https://github.com/xeneta/LeadQualifier/tree/master/xeneta_qualifier/data). We can unfortunately not share the raw text data, as it contains sensitive company information (who our customers are).

To test our your own algorithm, simply add it the [run.py](https://github.com/xeneta/LeadQualifier/blob/master/xeneta_qualifier/run.py) file and run the script:

python run.py

Thanks to [lampts](https://github.com/lampts) for implementing the best performing algorithm so far, the [SGDClassifier](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html).

### Leaderboard:

| Algorithm | Precision | Recall | F1 Score|
| ------------- |:--------------| :------|:--------|
| SGD Classifier| 0.872 | 0.940 | 0.905 |
| Random Forest | 0.845 | 0.915 | 0.878 |

**PS:** We're also experimenting with a neural net (in TensorFlow) in the nn.py file.

# 2. Create your own lead qualifier

To create your own lead qualifier, you'll need to get hold of company descriptions (to create your dataset). We currently use [FullContact](https://www.fullcontact.com/developer) for this.

**Note:** We've added dummy data, so that you can run both scripts without getting errors, and to give you examples on how the sheets should look like.

## [Train Algorithm](https://github.com/xeneta/LeadQualifier/tree/master/train_algorithm)

This script trains an algorithm on **your own** input data. It expects two excel sheets named **qualified** and **disqualified** in the [input](https://github.com/xeneta/LeadQualifier/tree/master/train_algorithm/input) folder. These sheets need to contain two columns:

- URL
- Description

![](https://raw.githubusercontent.com/xeneta/LeadQualifier/master/img/sheet.png)

Run the script:

python run.py

It'll dump three files into the [qualify_leads](https://github.com/xeneta/LeadQualifier/tree/master/qualify_leads) project:

- algorithm
- vectorizer
- tfidf_vectorizer

You're now ready to start classifying your sales leads!

## [Qualify Leads](https://github.com/xeneta/LeadQualifier/tree/master/qualify_leads)

This is the script that actually predicts the quality of your leads. Add an excel sheet named **data** in the [input](https://github.com/xeneta/LeadQualifier/tree/master/qualify_leads/input) folder. Use the same format as the example file that's already there.

Run the script:

python run.py

It'll output an excel sheet with a column named **Prediction**, where 1 equals *qualified* and 0 equals *disqualified*:

![](https://raw.githubusercontent.com/xeneta/LeadQualifier/master/img/predictions_sheet.png)

Got questions? Email me at [email protected].