Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/inspirehep/curation-scripts
Scripts for automated large-scale curation
https://github.com/inspirehep/curation-scripts
Last synced: 5 days ago
JSON representation
Scripts for automated large-scale curation
- Host: GitHub
- URL: https://github.com/inspirehep/curation-scripts
- Owner: inspirehep
- Created: 2021-10-29T11:10:08.000Z (about 3 years ago)
- Default Branch: master
- Last Pushed: 2024-09-16T11:30:59.000Z (about 2 months ago)
- Last Synced: 2024-09-17T11:31:57.539Z (about 2 months ago)
- Language: Python
- Size: 127 KB
- Stars: 2
- Watchers: 4
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Curation scripts
This repo contains scripts that are meant to implement one-off large-scale changes to INSPIRE or gather complex information.
To add a new script, create a new directory under `scripts` containing a single `script.py` file implementing the script logic, then create a PR. Auxiliary files will be generated automatically when merging to master.
## Common patterns
### Search, check & do
Often, a task can be fit into the following pattern:
1. _Search_ for a set of records that need to be handled
2. _Check_ for each record whether it needs to be modified, or print some metadata from the record
3. _Do_ some modifications on the record passing the checkIn those cases, one can subclass the `SearchCheckDo` class which provides logic to perform these operations. The script will have the following structure:
```python
from inspirehep.curation.search_check_do import SearchCheckDoclass MyCustomAction(SearchCheckDo):
"""Explain what this does."""# Literature is default, ``search_class`` needs to be set for other
# collections
# search_class = LiteratureSearchquery = ... # a custom query, like "t electron"
@staticmethod
def check(record, logger, state):
# Use ``record`` to check if it needs to be modified, return True if
# so. Optionally use ``logger`` to log additional info and ``state`` to
# transmit some data to the ``do`` step.
...@staticmethod
def do(record, logger, state):
# Mutate ``record`` to make modifications.
# Optionally use ``logger`` to log additional info and ``state`` to
# retrieve some data to the ``do`` step.
...MyCustomAction()
```Concrete examples can be found [here](https://github.com/inspirehep/inspirehep/blob/master/backend/inspirehep/curation/search_check_do/examples.py).
#### Logging
By default, when running the script, the output will contain which record was checked and whether the check was positive and hence the record modified. If you want to output additional information from the record in the `check` or `do` phase, you can use the `logger` that's passed to the method. It is an instance of a [structlog](https://www.structlog.org/en/stable/getting-started.html) logger. You can use it similarly to the standard library logger (with methods like `warning` or `info` to be used depending on the importance of the message): you can pass it a string for the message and additional arbitrary arguments with extra data you want to output. For example, you could do
```python
logger.info(
"More details about the record.",
title=record["title"],
first_author=record.set_value("authors[0].full_name"),
)
```The recid is included automatically, you don't need to add it yourself.