Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/pfdamasceno/shakespeare

Identify relevant scientific papers with simple machine learning techniques
https://github.com/pfdamasceno/shakespeare

Last synced: 3 months ago
JSON representation

Identify relevant scientific papers with simple machine learning techniques

Host: GitHub
URL: https://github.com/pfdamasceno/shakespeare
Owner: pfdamasceno
License: mit
Fork: true (benjaminaschultz/shakespeare)
Created: 2014-04-25T15:26:17.000Z (over 10 years ago)
Default Branch: master
Last Pushed: 2015-06-05T20:10:00.000Z (over 9 years ago)
Last Synced: 2024-07-04T04:33:16.576Z (4 months ago)
Language: TeX
Size: 458 KB
Stars: 26
Watchers: 2
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        shakespeare

===========

Identify relevant scientific papers with simple machine learning techniques

Installation

===========

Copy shakespeare.py, data and content\_sources to your pythonpath.

To intsall an example knowledge set, copy examples' contents to $HOME/.shakespeare

Depends on `bibtexparser`, `feedparser` `scikit-learn` packages, which can be installed via pip

    pip install bibtexparser scikit-learn feedparser

Features

========

* fetch functions for the following journals

    

    * Phys Rev A-X

    * PRL

    * PNAS

    * Nature + Nature:Stuff

    * Science

    * Small

    * ACS Nano, Nano Letters

    * Soft Matter

    * Langmuir

    * Angewandte Chemie

    * JCP, JCP B

* Fetch functions for arXiv

* support for BibTex Files

* Naive bayes training and classification

Usage

======

The very first thing to do is to let the code know where 'bad stuff' is

	./shakespeare.py -g good.bib -k examples/ --overwrite-knowledge --train

Train naive\_bayes algorithm

    ./shakespeare -g thegoodstuff.bib -b thebadstuff.bib -k examples --train

Find papers from nature nano and PNAS

    ./shakespeare.py -j natnano pnas -o cool_papers.md

Find papers from the arxiv cond-mat.soft and math, then review the algorithms selection

    ./shakespeare.py -a cond-mat.soft math --feedback

Help printout

    usage: shakespeare.py [-h] [-o OUTPUT] [-b [BIBFILES [BIBFILES ...]]]

                          [-j [JOURNALS [JOURNALS ...]]] [-a [ARXIV [ARXIV ...]]]

                          [--all_sources] [--all_good_sources] [--train]

                          [-g GOOD_SOURCE] [-m METHOD] [-k KNOWLEDGE]

                          [--overwrite-knowledge] [--feedback] [--review_all]

    optional arguments:

      -h, --help            show this help message and exit

      -o OUTPUT, --output OUTPUT

                            output file name. only supports markdown right now.

      -b [BIBFILES [BIBFILES ...]], --bibtex [BIBFILES [BIBFILES ...]]

                            bibtex files to fetch

      -j [JOURNALS [JOURNALS ...]], --journals [JOURNALS [JOURNALS ...]]

                            journals to fetch. Currently supports physreve

                            physrevd jchemphysb physreva physrevc pnas nature

                            jchemphys science natmat physrevb acsnano jphyschem

                            nanoletters natphys prl small angewantechemie langmuir

                            physrevx natnano.

      -a [ARXIV [ARXIV ...]], --arXiv [ARXIV [ARXIV ...]]

                            arXiv categories to fetch

      --all_sources         flag to search from all sources.

      --all_good_sources    flag to search from good sources. Specfied in your

                            config file.

      --train               flag to train. All sources beside "--train-input-good"

                            are treated as bad/irrelevant papers

      -g GOOD_SOURCE, --train_input_good GOOD_SOURCE

                            bibtex file containing relevant articles.

      -m METHOD, --method METHOD

                            Methods to try to find relevent papers. Right now,

                            only all, title, author, and abstract are valid fields

      -k KNOWLEDGE, --knowledge KNOWLEDGE

                            path to database containing information about good and

                            bad keywords. If you are training, you must specifiy

                            this, as it will be where your output is written

      --overwrite-knowledge

                            flag to overwrite knowledge,if training

      --feedback            flag to give feedback after sorting content

      --review_all          review all the new selections. Otherwise, you will

                            only review the good selections

TODO

======

* Train a bunch and see if this is worth any more time

* Make an nice installer

* Add support for a config file for setting defaults (which journals to search, etc)