An open API service indexing awesome lists of open source software.

https://github.com/khinshankhan/nlp-tf-idf-hadoop

NLP analysis of Term Frequency - Inverse Document Frequency using Hadoop
https://github.com/khinshankhan/nlp-tf-idf-hadoop

hadoop mapreduce nlp tf-idf

Last synced: 7 months ago
JSON representation

NLP analysis of Term Frequency - Inverse Document Frequency using Hadoop

Awesome Lists containing this project

README

          

# nlp-tf-idf-hadoop

NLP analysis of Term Frequency - Inverse Document Frequency using Hadoop

Khan_Rafi: Khinshan Khan and Shakil Rafi

## Requirements

- Apache Spark
- have `pyspark` on path
- Python 3
- Note 3.8 and above do not work well with spark
- Python Packages properly in environment:
- math
- re
- sys
## Run

One can run the project two ways:

- Traditional Way

```bash
spark-submit app.py
cat output
```

- Abstracted Way

```bash
make FILE= QUERY=
```

## Notes
- Running the program will write relevant output to `output` rather than stdout