Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ashenoy95/writeprints-static
Writeprints-Static Feature Set exctraction for Adversarial Stylometry
https://github.com/ashenoy95/writeprints-static
feature-extraction nlp python python-3 python3 stylometry
Last synced: about 2 months ago
JSON representation
Writeprints-Static Feature Set exctraction for Adversarial Stylometry
- Host: GitHub
- URL: https://github.com/ashenoy95/writeprints-static
- Owner: ashenoy95
- Created: 2017-05-13T19:28:04.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2018-03-12T21:55:22.000Z (almost 7 years ago)
- Last Synced: 2023-07-26T08:48:27.759Z (over 1 year ago)
- Topics: feature-extraction, nlp, python, python-3, python3, stylometry
- Language: Python
- Homepage:
- Size: 8.79 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Writeprints-Static Feature Set
Code to extract important features from the Writeprint-Static Feature Set for (Adverserial) __Stylometry__
[[Brennan, Afroz, and Greenstadt (2012)]](https://www.cs.drexel.edu/~sa499/papers/adversarial_stylometry.pdf).(Adapted from the Whiteprints approach [[Abbasi and Chen, 2008]](https://www.scss.tcd.ie/Khurshid.Ahmad/Research/Sentiments/K_Teams_Buchraest/a7-abbasi.pdf))
| Group | Category | No. of Features | Description |
| :-------: |:-------------------:| :--------------:| :---------------------------------------------------------------- |
| Lexical | Word level | 3 | Total words, average word length, number of short words |
| | Character level | 3 | Total char, percentage of digits, percentage of uppercase letters |
| | Special characters | 22 | Frequency of each of 22 special characters |
| | Letters | 26 | Letter frequency |
| | Digits | 10 | Digit frequency |
| | Vocabulary richness | 1 | Ratio of hapax and dis legomena |
| Syntactic | Function Words | 153 | Frequency of function words |
| | POS tags | 12 | Frequency of parts of speech tags (universal) |
| | Punctuation | 9 | Frequency and percentage of colon, semicolon, qmark, period, exclamation, comma, single inverted comma, double inverted comma |