https://github.com/aye-nyeinsan/nlpworkshop_3
Data Preprocessing for Spam Collection DataSet
https://github.com/aye-nyeinsan/nlpworkshop_3
Last synced: about 2 months ago
JSON representation
Data Preprocessing for Spam Collection DataSet
- Host: GitHub
- URL: https://github.com/aye-nyeinsan/nlpworkshop_3
- Owner: aye-nyeinSan
- Created: 2024-12-21T06:25:42.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-12-31T07:05:05.000Z (5 months ago)
- Last Synced: 2025-04-12T17:19:48.781Z (about 2 months ago)
- Language: Jupyter Notebook
- Size: 234 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
This is the NLP class workshop-3 working with SMS-spam-collection-dataset.
1. Use the previous dataset “spam dictation”
2. Preprocess text including:
* Remove white space
* Remove anything that is not English
* Calculate word length and added with column name “length”3. Create new column name “text2”
4. Use labelEncoder method to convert class target
5. Use CountVectorize to perform BOW
6. List Top 5 and bottom 5 of transform sample to show the resultsand submit your works to MS team