https://github.com/zphang/big_data_proj
https://github.com/zphang/big_data_proj
Last synced: 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/zphang/big_data_proj
- Owner: zphang
- Created: 2018-04-15T20:22:18.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2018-05-13T18:38:57.000Z (about 7 years ago)
- Last Synced: 2025-01-29T13:43:36.467Z (4 months ago)
- Language: Python
- Size: 17.6 MB
- Stars: 0
- Watchers: 5
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Textual Outlier Detection
### To Run:
```bash
export PYTHONHASHSEED=0
export SPARK_YARN_USER_ENV=0spark-submit --py-files=lib.zip main.py \
--input_hfs_path='reddit_sarcasm_small.txt' \
--outliers_output_hfs_path='reddit_sarcasm_small_outliers.txt' \
--clean_output_hfs_path='reddit_sarcasm_small_clean.txt' \
--config_json_path='configs/default.json'
```### To build:
```bash
source build.sh
```