Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/arxiv/zzzarchived_arxiv-authors

Linking arXiv e-prints to author identities
https://github.com/arxiv/zzzarchived_arxiv-authors

arxiv-ng nothing-to-see-here python

Last synced: about 4 hours ago
JSON representation

Linking arXiv e-prints to author identities

Host: GitHub
URL: https://github.com/arxiv/zzzarchived_arxiv-authors
Owner: arXiv
License: mit
Archived: true
Created: 2019-06-21T17:12:33.000Z (over 5 years ago)
Default Branch: develop
Last Pushed: 2022-12-08T05:52:10.000Z (almost 2 years ago)
Last Synced: 2024-11-06T17:12:54.824Z (about 4 hours ago)
Topics: arxiv-ng, nothing-to-see-here, python
Language: Python
Homepage:
Size: 95.7 KB
Stars: 2
Watchers: 4
Forks: 2
Open Issues: 11
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Status
This repo contains just a bare skeleton of a flask app. None of the requirements listed
here are implemented. Other than this README, there are only paths defined in
routes/api.py but no logic.

# arXiv Authors

The author identity service is responsible for mapping author name substrings
in announced e-prints to stable author identifiers, as well as maintaining
an alignment of identifiers in multiple relevant authority systems (including
the classic arXiv author authority record).

There are several ways that information may arrive at the author identity
service.

- Author-owners may add author identifiers for individual author names on
e-prints that are already announced, using a form-driven UI provided by
this service.
- Submitters may add author identifiers for individual author names via
form-driven interfaces in the :ref:`submission-core-service`, which in
turn submits that information to the author identity service.
- Authorized trusted partners may programmatically add disambiguating
information to this service via its API.

The author identity service exposes this information in two ways:

- New information about author identities for announced e-prints (mapping
between an author name substring and author identifiers) generated by new
assertions or by updated authority alignments are propagated via the
Enhancement Event Stream.
- The author identity service provides a RESTful API that exposes both the
most current state of author name mappings as well as the accession history
of mapping and alignment information.

## Problem & Goals

At submission time, submitters are asked to provide a single string with the
names of all of the authors of a paper. This string observes a canonical
format, described here. That format is not strictly enforced, but is generally
sufficient to individuate author names and affiliations as entered. That author
name string is stored in the core metadata record for the paper version. In
addition, relation in the submission database establish the identity of the
original submitter, and additional “author-owners” who have claimed the paper.

The classic arXiv system does provide an “author authority record,” which is
essentially just an URI for an arXiv user. This is tied to arXiv user
accounts, and not independently related to papers. The same is true for ORCID
identifiers, which can be added to arXiv user profiles.

In many cases, the number of author-owners will be smaller than the number of
author names in the metadata record. In addition, there is no reliable way to
relate known author-owners to names parsed from the metadata record.

As a result, we can't do some things that users want. We want to provide
features like:

- Click on an individual name on the paper abstract page to see other papers
by that author (and not by other authors with the same or similar name).
- Limit searches by author identities, rather than relying on imprecise text
matching.
- Authors can see and share a list of their papers, even if they have not
explicitly claimed those papers in their arXiv accounts.

This project provides the core data structure and API that can power these
features in the submission UI, search, and other parts of the system.

## Requirements

1. Store relationships between individual author names in the canonical author
name string and stable name authority records. The "individual author name"
can just be a substring (e.g. start and end character offsets) of the
canonical author string. Note that it is possible for substrings to overlap.

2. Wherever possible, we should avoid creating our own authority records. We
should leverage ORCID, INSPIRE, and other platforms that have done the hard
work of curating author identities.

3. Must provide a RESTful JSON API that supports the following operations:

- Add relation between a fragement of the e-print canonical author string
and an author authority record.
- Retrieve all such relations for a specific e-print.
- Retrieve all such relations for a specific author authority record.

4. Must provide a UI that allows author-owners to view and edit relations
between author names and authority records, and approve/reject proposed
relations from non-author-owners (below).

5. Non-author-owners should be able to submit a proposed relation, that can be
approved by an author-owner.

- E.g. if a co-author from a large collaboration (so unlikely to be an
author-owner in the system) wants to be sure that an e-print is associated
with their identity.

6. Partner platforms should be able to submit relations based on their
curatorial work and other data. E.g. ADS, INSPIRE, others. For trusted
platforms, these relations should be treated as true unless an author-owner
specifies otherwise.

Once the core functionality in place, we will want to do things like:

7. Emit system events (via Kinesis) about new authorship information.

8. Propose possible relations to author-owners based on automated process
(e.g. graph-based learning).

## Constraints

1. Flask app that follows the design approach outlined in
https://arxiv.github.io/arxiv-arxitecture/crosscutting/services.html . Can
be deployed as a Docker image, e.g with uWSGI application server
2. Separate blueprints for API, user interface
3. Use [arXiv base](https://github.com/arXiv/arxiv-base) for base templates,
error handling, etc
4. Use [arXiv auth](https://github.com/arXiv/arxiv-auth) for authn/z.
5. API documented with OpenAPI 3 and JSON schema.

## Quick-start

We use [Pipenv](https://github.com/pypa/pipenv) for dependency management.

```bash
pipenv install --dev
```

You can run either the API or the UI using the Flask development server.

```bash
FLASK_APP=ui.py FLASK_DEBUG=1 pipenv run flask run
```

Dockerfiles are also provided in the root of this repository. These use uWSGI
and the corresponding ``wsgi_[xxx].py`` entrypoints.

## Contributing

Please see the [arXiv contributor
guidelines](https://github.com/arXiv/.github/blob/master/CONTRIBUTING.md) for
tips on getting started.

## Code of Conduct

All contributors are expected to adhere to the [arXiv Code of
Conduct](https://arxiv.org/help/policies/code_of_conduct).