Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/arxiv/arxiv-classifier

Facebook contributed classifier for abstracts
https://github.com/arxiv/arxiv-classifier

flask in-production machine-learning python

Last synced: 3 months ago
JSON representation

Facebook contributed classifier for abstracts

Host: GitHub
URL: https://github.com/arxiv/arxiv-classifier
Owner: arXiv
Created: 2017-07-13T17:25:41.000Z (over 7 years ago)
Default Branch: develop
Last Pushed: 2022-12-08T10:22:24.000Z (about 2 years ago)
Last Synced: 2024-04-16T01:57:47.143Z (10 months ago)
Topics: flask, in-production, machine-learning, python
Language: Jupyter Notebook
Homepage:
Size: 3.24 MB
Stars: 13
Watchers: 14
Forks: 3
Open Issues: 18
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # arxiv-classifier

# How to Train

```

nb = ArticleClassifier(dbpath='/path/to/db')

fn = nb.create_input_file(metadata)

nb.train(fn)

nb.save()

   

```

# How to Classify

```

nb = ArticleClassifier(dbpath='/path/to/db')

nb.load()

fn = nb.create_input_file(metadata)

classes = nb.classify(fn)

```

In these examples, metadata should be a List of Dict where the Dict are in the format given below. The file name paths are relative to the current machine (on the local file system):

```

{

  "id": "1704.00222",

  "categories": ["cs.NM", "hep-th"],

  "filename": "/path/to/file.txt"

}

```

# Trained Models

Trained model for use by arXiv staff can be found at s3://arxiv-classifier-models 

# ULMFiT classifier

## Training

See [experiments directory](experiments/) for training and evaluation notebooks.

## Models

The ULMFiT and SentencePiece model files can be downloaded [here](https://github.com/arXiv/arxiv-classifier/releases/download/ulmfit-models-v1.0/models.tar.xz). Make sure `CLASSIFIER_PATH` configuration parameter

points to `models/abstracts-classifier.pkl` and that `CLASSIFIER_TYPE` equals `ulmfit`.  

## Testing

To test the service locally you can run it with

```shell

FLASK_APP=classifier.test_app flask run --port 9999

```

and make a request:

```shell

curl -s -H "Content-Type: application/json" -X POST http://localhost:9999/classify \

    --data '{"title":"P = NP", "abstract": "We prove that P = NP for N = 1 or P = 0.", "primary": "cs.SE"}'

[{"category":"cs.CC","probability":0.8264293074607849},{"category":"cs.DS","probability":0.1285623162984848},...]

```

The primary is optional.

Both the input and output format are not yet compatible with the Naive Bayes classifier.