Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ruarxive/apibackuper
Python library and cmd tool to backup API calls
https://github.com/ruarxive/apibackuper
api archival backup preservation
Last synced: 3 months ago
JSON representation
Python library and cmd tool to backup API calls
- Host: GitHub
- URL: https://github.com/ruarxive/apibackuper
- Owner: ruarxive
- License: mit
- Created: 2020-08-14T07:53:07.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2024-07-16T07:22:56.000Z (7 months ago)
- Last Synced: 2024-10-11T19:48:38.820Z (4 months ago)
- Topics: api, archival, backup, preservation
- Language: Python
- Homepage:
- Size: 89.8 KB
- Stars: 14
- Watchers: 5
- Forks: 2
- Open Issues: 14
-
Metadata Files:
- Readme: README.md
- Changelog: HISTORY.rst
- License: LICENSE
- Authors: AUTHORS.rst
Awesome Lists containing this project
- awesome-digital-preservation - apibackuper - Python library and cmd tool to backup API calls (Other digital objects / Public Data API)
README
---
title: apibackuper \-- a command-line tool to archive/backup API calls
---apibackuper is a command line tool to archive/backup API calls. It\'s
goal to download all data behind REST API and to archive it to local
storage. This tool designed to backup API data, so simple as possible.::: contents
:::::: section-numbering
:::# History
This tool was developed optimize backup/archival procedures for Russian
government information from E-Budget portal budget.gov.ru and some other
government IT systems too. Examples of tool usage could be found in
\"examples\" directory# Main features
- Any GET/POST iterative API supported
- Allows to estimate time required to backup API
- Stores data inside ZIP container
- Supports export of backup data as JSON lines file
- Documentation
- Test coverage# Installation
## Linux
Most Linux distributions provide a package that can be installed using
the system package manager, for example:``` bash
# Debian, Ubuntu, etc.
$ apt install apibackuper
`````` bash
# Fedora
$ dnf install apibackuper
`````` bash
# CentOS, RHEL, ...
$ yum install apibackuper
`````` bash
# Arch Linux
$ pacman -S apibackuper
```## Windows, etc.
A universal installation method (that works on Windows, Mac OS X, Linux,
…, and always provides the latest version) is to use pip:``` bash
# Make sure we have an up-to-date version of pip and setuptools:
$ pip install --upgrade pip setuptools$ pip install --upgrade apibackuper
```(If `pip` installation fails for some reason, you can try
`easy_install apibackuper` as a fallback.)## Python version
Python version 3.6 or greater is required.
# Quickstart
This example is about backup of Russian certificate authorities. List of
them published at e-trust.gosuslugi.ru and available via undocumented
API.``` bash
$ apibackuper create etrust
$ cd etrust
```Edit apibackuper.cfg as:
``` bash
[settings]
initialized = True
name = etrust[project]
description = E-Trust UC list
url = https://e-trust.gosuslugi.ru/app/scc/portal/api/v1/portal/ca/list
http_mode = POST
work_modes = full,incremental,update
iterate_by = page[params]
page_size_param = recordsOnPage
page_size_limit = 100
page_number_param = page[data]
total_number_key = total
data_key = data
item_key = РеестровыйНомер
change_key = СтатусАккредитации.ДействуетС[storage]
storage_type = zip
```Add file params.json with parameters used with POST requests
``` json
{"page":1,"orderBy":"id","ascending":false,"recordsOnPage":100,"searchString":null,"cities":null,"software":null,"cryptToolClasses":null,"statuses":null}
```Execute command \"estimate\" to see how long data will be collected and
how much space needed``` bash
$ apibackuper estimate full
```Output:
``` bash
Total records: 502
Records per request: 100
Total requests: 6
Average record size 32277.96 bytes
Estimated size (json lines) 16.20 MB
Avg request time, seconds 66.9260
Estimated all requests time, seconds 402.8947
```Execute command \"run\" to collect the data. Result stored in
\"storage.zip\"``` bash
$ apibackuper run full
```Exports data from storage and saves as jsonl file called
\"etrust.jsonl\"``` bash
$ apibackuper export jsonl etrust.jsonl
```# Config options
Example config file
``` bash
[settings]
initialized = True
name =
splitter = .[project]
description =
url =
http_mode =
work_modes =
iterate_by =[params]
page_size_param =
page_size_limit =
page_number_param =
count_skip_param =[data]
total_number_key =
data_key =
item_key =
change_key =[follow]
follow_mode =
follow_pattern =
follow_data_key =
follow_param =
follow_item_key =[files]
fetch_mode =
root_url =
keys =
storage_mode =[storage]
storage_type = zip
compression = True
```## settings
- name - short name of the project
- splitter - value of field splitter. Needed for rare cases when \'.\'
is part of field name. For example for OData requests and
\'@odata.count\' field## project
- description - text that explains what for is this project
- url - API endpoint url
- http_mode - one of HTTP modes: GET or POST
- work_modes - type of operations: full - archive everything,
incremental - add new records only, update - collect changed data
only
- iterate_by - type of iteration of records. By \'page\' - default,
page by page or by \'skip\' if skip value provided## params
- page_size_param - parameter with page size
- page_size_limit - limit of records provided by API
- page_number_param = parameter with page number
- count_skip_param - parameter for \'skip\' type of iteration## data
- total_number_key - key in data with total number of records
- data_key - key in data with list of records
- item_key - key in data with unique identifier of the record. Could
be group of keys separated with comma
- change_key - key in data that indicates that record changed. Could
be group of keys separated with comma## follow
- follow_mode - mode to follow objects. Could be \'url\' or \'item\'.
If mode is \'url\' than follow_pattern not used
- follow_pattern - url pattern / url prefix for followed objects. Only
for mode \'item\'\'
- follow_data_key - if object/objects are inside array, key of this
array
- follow_param - parameter used in \'item\' mode
- follow_item_key - item key## files
- fetch_mode - file fetch mode. Could be \'prefix\' or \'id\'. Prefix
- root_url - root url / prefix for files
- keys - list of keys with urls/file id\'s to search for files to save
- storage_mode - a way how files stored in storage/files.zip. By
default \'filepath\' and files storaged same way as they presented
in url## storage
- storage_type - type of local storage. \'zip\' is local zip file is
default one
- compression - if True than compressed ZIP file used, less space
used, more CPU time processing data# Usage
Synopsis:
``` bash
$ apibackuper [flags] [command] inputfile
```See also `apibackuper --help`.
## Examples
Create project \"budgettofk\":
``` bash
$ apibackuper create budgettofk
```Estimate execution time for \'budgettofk\' project. Should be called in
project dir or project dir provided via -p parameter:``` bash
$ apibackuper estimate full -p budgettofk
```Output
``` bash
Total records: 12282
Records per request: 500
Total requests: 25
Average record size 1293.60 bytes
Estimated size (json lines) 15.89 MB
Avg request time, seconds 1.8015
Estimated all requests time, seconds 46.0536
```Run project. Should be called in project dir or project dir provided via
-p parameter``` bash
$ apibackuper run full
```Export data from project. Should be called in project dir or project dir
provided via -p parameter``` bash
$ apibackuper export jsonl hhemployers.jsonl -p hhemployers
```Follows each object of downloaded data and does requests for each
objects .. code-block:: bash> \$ apibackuper follow continue
Downloads all files associated with API objects .. code-block:: bash
> \$ apibackuper getfiles
# Advanced
TBD