An open API service indexing awesome lists of open source software.

https://github.com/aye-nyeinsan/nlpworkshop_3

Data Preprocessing for Spam Collection DataSet
https://github.com/aye-nyeinsan/nlpworkshop_3

Last synced: about 2 months ago
JSON representation

Data Preprocessing for Spam Collection DataSet

Awesome Lists containing this project

README

        

This is the NLP class workshop-3 working with SMS-spam-collection-dataset.

1. Use the previous dataset “spam dictation”
2. Preprocess text including:

* Remove white space
* Remove anything that is not English
* Calculate word length and added with column name “length”

3. Create new column name “text2”
4. Use labelEncoder method to convert class target
5. Use CountVectorize to perform BOW
6. List Top 5 and bottom 5 of transform sample to show the resultsand submit your works to MS team