https://github.com/glebpro/nlptermproject2017

Automatic Classification of Persuasive Arguments, RIT, 2017
https://github.com/glebpro/nlptermproject2017

arguments nlp persuasive reddit

Last synced: 5 months ago
JSON representation

Automatic Classification of Persuasive Arguments, RIT, 2017

Host: GitHub
URL: https://github.com/glebpro/nlptermproject2017
Owner: glebpro
License: mit
Created: 2017-10-22T23:06:50.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2018-01-30T15:37:24.000Z (over 7 years ago)
Last Synced: 2025-01-04T13:06:51.812Z (6 months ago)
Topics: arguments, nlp, persuasive, reddit
Language: Python
Homepage:
Size: 46.9 MB
Stars: 1
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Automatic Classification of Persuasive Arguments

Term project for ENGL 681 _Introduction to Natural Language Processing_ at RIT, 2017.

[@jdberlinski](https://github.com/jdberlinski)

[@glebpro](https://github.com/glebpro)

## Abstract:

Public web forums allow for massive online debate, especially on the community platform Reddit. The language of persuasive arguments can be found on the sub-community [/r/ChangeMyView](https://reddit.com/ChangeMyView), which encourages sharing views and discourse in a moderated public forum. To determine if there are any similarities between persuasive comments that were successful in changing a user's view we organized and labeled sets of argument examples, and found valuable features for classifying novel arguments through language modeling. Our model results saw 6% improved accuracy over the baseline, concluding that there are identifiable stylistic and topic features in effective arguments.

[[Slides](slides.pdf)][[Paper](paper.pdf)]

## Technicals

#### Downloads

To download code: `$ git clone https://github.com/glebpro/nlptermproject2017`

To download corpus: [download link](https://drive.google.com/drive/folders/1Ki65wjOoVgLENWK1xgRMPaxdBrx5v8n9?usp=sharing). Data format explained [here](/corpus_utils/data_format.txt).

#### Scripts

To generate your own corpus, gathering posts backwards in time from now:

1. Populate [`corpus_utils/reddit.auth.json`](corpus_utils/reddit.auth.json) with your reddit credentials

2. Run `$ python corpus_utils/download.py num_posts_to_collect`

To generate comment pairs, use:

1. `$ python corpus_utils/comment_pairs.py CMV_##.jsonlist`

For the classifier:

1. To extract features: `$ python model_files/get_features.py comment_pairs.jsonlist`

2. To train new model: `$ python model_files/model.py`

  - steps 1 and 2 might take hours

3. To explore results: `$ python model_files/explore.py`

  - to print results from included model

Additional utility scrips included for parsing [posts](corpus_utils/parse_jsonlist.py) and [comments](corpus_utils/parse_comments.py)

#### Requirements

[python](https://www.python.org/) >= 3.4, [praw](https://praw.readthedocs.io/en/latest/index.html) >= 5.2, [spacy](https://spacy.io/) >= 2.0.3, [sklearn](http://scikit-learn.org/stable/) >= 0.18.1, [numpy](http://www.numpy.org/) >= 1.13.3, [nltk](http://www.nltk.org/) >= 3.2.5

#### License

MIT licensed. See the bundled [LICENSE](/LICENSE) file for more details.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/glebpro/nlptermproject2017

Awesome Lists containing this project

README