Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/gesiscss/sexism_custom_classifier
Custom classifiers to detect sexist language.
https://github.com/gesiscss/sexism_custom_classifier
bert natural-language-processing nlp sckit-learn sexism-detection
Last synced: about 2 months ago
JSON representation
Custom classifiers to detect sexist language.
- Host: GitHub
- URL: https://github.com/gesiscss/sexism_custom_classifier
- Owner: gesiscss
- License: mit
- Created: 2020-08-13T10:29:34.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2021-02-11T11:36:22.000Z (almost 4 years ago)
- Last Synced: 2023-03-04T03:38:43.412Z (almost 2 years ago)
- Topics: bert, natural-language-processing, nlp, sckit-learn, sexism-detection
- Language: Jupyter Notebook
- Homepage:
- Size: 32.2 MB
- Stars: 5
- Watchers: 13
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
[![Binder](https://notebooks.gesis.org/binder/badge.svg)](https://notebooks.gesis.org/binder/v2/gh/gesiscss/sexism_custom_classifier/master?filepath=notebooks)
# Sexism Custom Classifier
## Introduction
Sexism custom classifier is implemented to detect sexism automatically in the field of natural language processing. In this work, the dataset from [Samory et al. (2020)](https://arxiv.org/abs/2004.12764) was used to train three models on top of four different feature sets (both individual and combinations). The aim of the experiments is to answer two research questions:
* What is the informativeness of different feature sets on the sexism detection task?
* What are the improvements on the sexism classifiers’ performance when different feature sets are introduced?## Support Features
* Sentiment : The sentiment intensity provided by [VADER](http://eegilbert.org/papers/icwsm14.vader.hutto.pdf) sentiment analysis method.
* Word N-grams : Term Frequency/Inverse Frequency (TF-IDF) weights of word n-grams proviided by [scikit-learn](https://jmlr.csail.mit.edu/papers/v12/pedregosa11a.html).
* Type Dependency : The type dependency relationships provided by [Stanford Parser](https://nlp.stanford.edu/~wcmac/papers/td-lrec06.pdf). The parser can be dowloaded from [here](https://nlp.stanford.edu/software/lex-parser.shtml#Download).
* Document Embeddings : The document embeddings provided by [BERT](https://arxiv.org/abs/1810.04805) language representation model.## Support Models
* Logistic Regression
* Convolutional Neural Network
* Support Vector Machine## Experiments
This example code trains Logistic Regression on top of sentiment and uni-gram features. To replicate the results, run the scripts in the 'experiments/scripts.txt'
```shell
export PARAMS_FILE='experiments/params.json'
export HYPERPARAMS_FILE='experiments/hyperparams.json'python run.py \
--params_file=$PARAMS_FILE \
--hyperparams_file=$HYPERPARAMS_FILE \
```This example code obtains the pre-trained BERT embeddings.
```shell
export DATA_FILE=/path/data.csvpython run_bert_feature_extraction.py \
--data_file=$DATA_FILE \
```