https://github.com/aye-nyeinsan/nlpworkshop_3

Data Preprocessing for Spam Collection DataSet
https://github.com/aye-nyeinsan/nlpworkshop_3

Last synced: about 2 months ago
JSON representation

Data Preprocessing for Spam Collection DataSet

Host: GitHub
URL: https://github.com/aye-nyeinsan/nlpworkshop_3
Owner: aye-nyeinSan
Created: 2024-12-21T06:25:42.000Z (5 months ago)
Default Branch: main
Last Pushed: 2024-12-31T07:05:05.000Z (5 months ago)
Last Synced: 2025-04-12T17:19:48.781Z (about 2 months ago)
Language: Jupyter Notebook
Size: 234 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        This is the NLP class workshop-3 working with SMS-spam-collection-dataset. 

1. Use the previous dataset “spam dictation”

2.   Preprocess text including:

    

*  Remove white space

*  Remove anything that is not English

* Calculate word length and added with column name “length”

3. Create new column name “text2”

4. Use labelEncoder method to convert class target

5.  Use CountVectorize to perform BOW

6.  List Top 5 and bottom 5 of transform sample to show the resultsand submit your works to MS team

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/aye-nyeinsan/nlpworkshop_3

Awesome Lists containing this project

README