https://github.com/khinshankhan/nlp-tf-idf-hadoop

NLP analysis of Term Frequency - Inverse Document Frequency using Hadoop
https://github.com/khinshankhan/nlp-tf-idf-hadoop

hadoop mapreduce nlp tf-idf

Last synced: 7 months ago
JSON representation

NLP analysis of Term Frequency - Inverse Document Frequency using Hadoop

Host: GitHub
URL: https://github.com/khinshankhan/nlp-tf-idf-hadoop
Owner: khinshankhan
License: mit
Created: 2019-11-26T21:41:18.000Z (almost 6 years ago)
Default Branch: master
Last Pushed: 2019-12-19T03:46:45.000Z (almost 6 years ago)
Last Synced: 2025-01-19T21:46:52.245Z (9 months ago)
Topics: hadoop, mapreduce, nlp, tf-idf
Language: Python
Size: 7.22 MB
Stars: 4
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# nlp-tf-idf-hadoop

NLP analysis of Term Frequency - Inverse Document Frequency using Hadoop

Khan_Rafi: Khinshan Khan and Shakil Rafi

## Requirements

- Apache Spark
- have `pyspark` on path
- Python 3
- Note 3.8 and above do not work well with spark
- Python Packages properly in environment:
- math
- re
- sys
## Run

One can run the project two ways:

- Traditional Way

```bash
spark-submit app.py
cat output
```

- Abstracted Way

```bash
make FILE= QUERY=
```

## Notes
- Running the program will write relevant output to `output` rather than stdout

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/khinshankhan/nlp-tf-idf-hadoop

Awesome Lists containing this project

README