Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/glebpro/nlptermproject2017
Automatic Classification of Persuasive Arguments, RIT, 2017
https://github.com/glebpro/nlptermproject2017
arguments nlp persuasive reddit
Last synced: 20 days ago
JSON representation
Automatic Classification of Persuasive Arguments, RIT, 2017
- Host: GitHub
- URL: https://github.com/glebpro/nlptermproject2017
- Owner: glebpro
- License: mit
- Created: 2017-10-22T23:06:50.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2018-01-30T15:37:24.000Z (almost 7 years ago)
- Last Synced: 2024-11-09T12:40:25.746Z (3 months ago)
- Topics: arguments, nlp, persuasive, reddit
- Language: Python
- Homepage:
- Size: 46.9 MB
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Automatic Classification of Persuasive Arguments
Term project for ENGL 681 _Introduction to Natural Language Processing_ at RIT, 2017.[@jdberlinski](https://github.com/jdberlinski)
[@glebpro](https://github.com/glebpro)## Abstract:
Public web forums allow for massive online debate, especially on the community platform Reddit. The language of persuasive arguments can be found on the sub-community [/r/ChangeMyView](https://reddit.com/ChangeMyView), which encourages sharing views and discourse in a moderated public forum. To determine if there are any similarities between persuasive comments that were successful in changing a user's view we organized and labeled sets of argument examples, and found valuable features for classifying novel arguments through language modeling. Our model results saw 6% improved accuracy over the baseline, concluding that there are identifiable stylistic and topic features in effective arguments.[[Slides](slides.pdf)][[Paper](paper.pdf)]
## Technicals
#### Downloads
To download code: `$ git clone https://github.com/glebpro/nlptermproject2017`To download corpus: [download link](https://drive.google.com/drive/folders/1Ki65wjOoVgLENWK1xgRMPaxdBrx5v8n9?usp=sharing). Data format explained [here](/corpus_utils/data_format.txt).
#### Scripts
To generate your own corpus, gathering posts backwards in time from now:
1. Populate [`corpus_utils/reddit.auth.json`](corpus_utils/reddit.auth.json) with your reddit credentials
2. Run `$ python corpus_utils/download.py num_posts_to_collect`To generate comment pairs, use:
1. `$ python corpus_utils/comment_pairs.py CMV_##.jsonlist`For the classifier:
1. To extract features: `$ python model_files/get_features.py comment_pairs.jsonlist`
2. To train new model: `$ python model_files/model.py`
- steps 1 and 2 might take hours
3. To explore results: `$ python model_files/explore.py`
- to print results from included modelAdditional utility scrips included for parsing [posts](corpus_utils/parse_jsonlist.py) and [comments](corpus_utils/parse_comments.py)
#### Requirements
[python](https://www.python.org/) >= 3.4, [praw](https://praw.readthedocs.io/en/latest/index.html) >= 5.2, [spacy](https://spacy.io/) >= 2.0.3, [sklearn](http://scikit-learn.org/stable/) >= 0.18.1, [numpy](http://www.numpy.org/) >= 1.13.3, [nltk](http://www.nltk.org/) >= 3.2.5#### License
MIT licensed. See the bundled [LICENSE](/LICENSE) file for more details.