Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ashenoy95/writeprints-static

Writeprints-Static Feature Set exctraction for Adversarial Stylometry
https://github.com/ashenoy95/writeprints-static

feature-extraction nlp python python-3 python3 stylometry

Last synced: about 2 months ago
JSON representation

Writeprints-Static Feature Set exctraction for Adversarial Stylometry

Awesome Lists containing this project

README

        

# Writeprints-Static Feature Set

Code to extract important features from the Writeprint-Static Feature Set for (Adverserial) __Stylometry__
[[Brennan, Afroz, and Greenstadt (2012)]](https://www.cs.drexel.edu/~sa499/papers/adversarial_stylometry.pdf).

(Adapted from the Whiteprints approach [[Abbasi and Chen, 2008]](https://www.scss.tcd.ie/Khurshid.Ahmad/Research/Sentiments/K_Teams_Buchraest/a7-abbasi.pdf))

| Group | Category | No. of Features | Description |
| :-------: |:-------------------:| :--------------:| :---------------------------------------------------------------- |
| Lexical | Word level | 3 | Total words, average word length, number of short words |
| | Character level | 3 | Total char, percentage of digits, percentage of uppercase letters |
| | Special characters | 22 | Frequency of each of 22 special characters |
| | Letters | 26 | Letter frequency |
| | Digits | 10 | Digit frequency |
| | Vocabulary richness | 1 | Ratio of hapax and dis legomena |
| Syntactic | Function Words | 153 | Frequency of function words |
| | POS tags | 12 | Frequency of parts of speech tags (universal) |
| | Punctuation | 9 | Frequency and percentage of colon, semicolon, qmark, period, exclamation, comma, single inverted comma, double inverted comma |