Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/stefan-it/deep-wittgenstein
Classification of Wittgenstein's remarks
https://github.com/stefan-it/deep-wittgenstein
deep-learning multi-label-classification wittgenstein
Last synced: 19 days ago
JSON representation
Classification of Wittgenstein's remarks
- Host: GitHub
- URL: https://github.com/stefan-it/deep-wittgenstein
- Owner: stefan-it
- License: agpl-3.0
- Created: 2018-03-01T21:36:34.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2018-03-05T14:35:50.000Z (almost 7 years ago)
- Last Synced: 2024-10-28T07:56:05.510Z (2 months ago)
- Topics: deep-learning, multi-label-classification, wittgenstein
- Language: Python
- Size: 6.37 MB
- Stars: 5
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: COPYING
Awesome Lists containing this project
README
# Deep-Wittgenstein
In this repository we present a pretrained model for classifiying
Wittgenstein's remarks. The pretrained model can detect and classify 70
different categories for a remark: Jetzt, Regel, Sprache, Gedanke, Behauptung,
Mengenlehre, Gleich, Unendliche Möglichkeit, Begriff, Idealismus, Gegenstand,
Kardinalzahlen, Phänomenologie, Hypothese, Ursache, Ungefähr, Unendlichkeit,
Entdeckung, Problem, Mathematik Metamathematik, Schmerzen, Sprache Sprachspiel,
Satz, Klasse, Erwartung und Erfüllung, Gesichtsraum, XXX, Bedeutung, Grund,
Sinn, Philosophie, Versuchen Suchen, Vorstellung, Abbild, Fähigkeit, Zeit,
Logik, Farben und Farbenmischung, Minima Visibilia, Grund des Denkens,
W-F-Notation, Undeutlichkeit, Glaube, Wissen, Logische Form, Tabelle,
Anwendung, Unmittelbares, Allgemeinheit, Grammatik, Zeichen, Schach, Folgen,
Beweis, Mathematik, Induktion Induktionsbeweis, Wahrscheinlichkeit, Gebrauch,
Meinen, Physikalischer Raum, Absicht, Im selben Sinn, Zahlen, Regel
Erfahrungssatz, Nicht, Verifikation, Verstehen, Tonfolge, Physikalische Sprache
and Denken.This work was done during summer semester 2017 with support by [Dr. Maximilian
Hadersbeck](http://cis.lmu.de/personen/mitarbeiter/hadersbeck/index.html) ([LMU
Munich](https://www.en.uni-muenchen.de/index.html)). Hand-labeled data is
provided by [Dr. Josef G. F.
Rothhaupt](http://www.philosophie.uni-muenchen.de/lehreinheiten/philosophie_1/personen/josef_rothhaupt/index.html)
([LMU Munich](https://www.en.uni-muenchen.de/index.html)).This project was funded by
[Lehre@LMU](https://www.uni-muenchen.de/studium/lehre_at_lmu/index.html) with a
NVIDIA Jetson TX-1.## Example
Input remark:
```text
Der Unterschied der Wortarten ist immer wie der Unterschied der Spielfiguren,
oder, wie der noch größere, einer Spielfigur und des Schachbrettes.
```Hand-labeled gold label: "Grammatik"
# Requirements
The multi-label classification approach is implemented with *Keras*, *TensorFlow*
and the *magpie* library. The following libraries must be installed:| Library | Version (tested)
| ------------ | ----------------
| *magpie* | 2.0
| *Keras* | 2.1.3
| *TensorFlow* | 1.5.0
| *h5py* | 2.7.1Notice: *magpie* should be installed via:
```bash
pip3 install --user git+https://github.com/inspirehep/[email protected]
```# Dataset
Hand-labeled data is available for the complete Ts-212. Thus, hand-labeled
categories for 7099 remarks are used. Then this corpus is split into training,
development and test set.| Dataset | # Remarks
| ----------- | ---------
| Training | 5620
| Development | 719
| Test | 760# Pretrained model
The pretrained model consists of four files:
| Description | Download
| ----------- | --------
| Word Embeddings | [embedding.pkl](https://github.com/stefan-it/deep-wittgenstein/raw/master/current_model/embedding.pkl)
| Model | [model.h5](https://github.com/stefan-it/deep-wittgenstein/raw/master/current_model/model.h5)
| Scaler | [scaler.pkl](https://github.com/stefan-it/deep-wittgenstein/raw/master/current_model/scaler.pkl)
| Category labels | [categories.labels](https://github.com/stefan-it/deep-wittgenstein/raw/master/categories.labels)Word embeddings, model and scaler are located in the `current_model` of this
repository. `categories.labels` is located in the root folder of this repository.# Classification - Example
To classify new remarks of Ludwig Wittgenstein, the following script can be used:
```python
from magpie import Magpiewith open('categories.labels') as f:
labels = [line.rstrip() for line in f.readlines()]magpie = Magpie(
keras_model='current_model/model.h5',
word2vec_model='current_model/embedding.pkl',
scaler='current_model/scaler.pkl',
labels=labels
)
```This loaded the pretrained model with all its dependencies like word embeddings
or labels.Then the following command can be used to classifiy a remark:
```python
predicted = magpie.predict_from_text('“Ich denke, Du wirst die Scheibe irgendwo innerhalb dieses Kreises treffen”.')
print(predicted)
```This will output of 5 best predicted categories for the input remark:
```python
[('Allgemeinheit', 0.66499853), ('Folgen', 0.53158545),
('Regel', 0.004923807), ('Satz', 0.0018804041), ('Meinen', 0.0017680882)]
```The gold categories are "Allgemeinheit" and "Folgen".
This classification script is located under `classification.py`.
# Acknowledgements
We would like to thank Dr. Maximilian Hadersbeck for his great support during
the development phase. We also want to thank Dr. Josef G. F. Rothhaupt for
providing us high-quality hand-labeled data for over 7000 remarks of Ludwig
Wittgenstein.We are deeply grateful that Lehre@LMU funded our research project with a
NVIDIA Jetson TX1 developer board and we would like thank LMU Munich for this
awesome program. This really helps students and boosts research.# Contact (Bugs, Feedback, Contribution and more)
For questions about *deep-wittgenstein*, contact the current maintainer:
Stefan Schweter .# License
To respect the Free Software Movement and the enormous work of Dr. Richard Stallman
this repository is released under the *GNU Affero General Public License*
in version 3. More information can be found [here](https://www.gnu.org/licenses/licenses.html)
and in `COPYING`.