https://github.com/alextanhongpin/spam-api
Microservices for spam filtering system
https://github.com/alextanhongpin/spam-api
python scikit-learn
Last synced: about 2 months ago
JSON representation
Microservices for spam filtering system
- Host: GitHub
- URL: https://github.com/alextanhongpin/spam-api
- Owner: alextanhongpin
- Created: 2018-02-05T08:40:31.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2018-03-23T09:59:42.000Z (about 8 years ago)
- Last Synced: 2025-10-07T09:54:29.768Z (8 months ago)
- Topics: python, scikit-learn
- Language: Python
- Size: 28 MB
- Stars: 2
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Spam API
Microservices API for spam filtering system.
## Abstract
One of the goals of this repository is design an approach to design machine learning systems.
## To run
```bash
$ python -m main
# Note that this will not work since the import will be messed up
$ python main.py
```
## Flows
- Prepare text data
- removal of stop words
- lemmatization
- Feature extraction process
- Training the classifiers
- Checking performance
## Pickled
To view the size of the pickled file:
```bash
$ du -h *.pkl
```
## Tips
At first it may be tempting to construct your pipeline to include the feature extractor:
```python
pipeline = Pipeline([('vect', CountVectorizer(stop_words = 'english')),
('tfidf', TfidfTransformer()),
('gaussian_nb', GaussianNB())])
```
But note that this will only be useful when training your model. For prediction, you need to reuse the feature extractor function. Also,
when training multiple classifiers, you will end up running the feature extraction process which is not optimal.