https://github.com/ruarxive/apibackuper

Python library and cmd tool to backup API calls
https://github.com/ruarxive/apibackuper

api archival backup preservation

Last synced: 2 months ago
JSON representation

Python library and cmd tool to backup API calls

Host: GitHub
URL: https://github.com/ruarxive/apibackuper
Owner: ruarxive
License: mit
Created: 2020-08-14T07:53:07.000Z (almost 5 years ago)
Default Branch: master
Last Pushed: 2024-07-16T07:22:56.000Z (11 months ago)
Last Synced: 2024-10-11T19:48:38.820Z (8 months ago)
Topics: api, archival, backup, preservation
Language: Python
Homepage:
Size: 89.8 KB
Stars: 14
Watchers: 5
Forks: 2
Open Issues: 14
Metadata Files:
- Readme: README.md
- Changelog: HISTORY.rst
- License: LICENSE
- Authors: AUTHORS.rst

Awesome Lists containing this project

awesome-digital-preservation - apibackuper - Python library and cmd tool to backup API calls (Other digital objects / Public Data API)

README

---
title: apibackuper \-- a command-line tool to archive/backup API calls
---

apibackuper is a command line tool to archive/backup API calls. It\'s
goal to download all data behind REST API and to archive it to local
storage. This tool designed to backup API data, so simple as possible.

::: contents
:::

::: section-numbering
:::

# History

This tool was developed optimize backup/archival procedures for Russian
government information from E-Budget portal budget.gov.ru and some other
government IT systems too. Examples of tool usage could be found in
\"examples\" directory

# Main features

- Any GET/POST iterative API supported
- Allows to estimate time required to backup API
- Stores data inside ZIP container
- Supports export of backup data as JSON lines file
- Documentation
- Test coverage

# Installation

## Linux

Most Linux distributions provide a package that can be installed using
the system package manager, for example:

``` bash
# Debian, Ubuntu, etc.
$ apt install apibackuper
```

``` bash
# Fedora
$ dnf install apibackuper
```

``` bash
# CentOS, RHEL, ...
$ yum install apibackuper
```

``` bash
# Arch Linux
$ pacman -S apibackuper
```

## Windows, etc.

A universal installation method (that works on Windows, Mac OS X, Linux,
вЂ¦, and always provides the latest version) is to use pip:

``` bash
# Make sure we have an up-to-date version of pip and setuptools:
$ pip install --upgrade pip setuptools

$ pip install --upgrade apibackuper
```

(If `pip` installation fails for some reason, you can try
`easy_install apibackuper` as a fallback.)

## Python version

Python version 3.6 or greater is required.

# Quickstart

This example is about backup of Russian certificate authorities. List of
them published at e-trust.gosuslugi.ru and available via undocumented
API.

``` bash
$ apibackuper create etrust
$ cd etrust
```

Edit apibackuper.cfg as:

``` bash
[settings]
initialized = True
name = etrust

[project]
description = E-Trust UC list
url = https://e-trust.gosuslugi.ru/app/scc/portal/api/v1/portal/ca/list
http_mode = POST
work_modes = full,incremental,update
iterate_by = page

[params]
page_size_param = recordsOnPage
page_size_limit = 100
page_number_param = page

[data]
total_number_key = total
data_key = data
item_key = РеестровыйНомер
change_key = СтатусАккредитации.ДействуетС

[storage]
storage_type = zip
```

Add file params.json with parameters used with POST requests

``` json
{"page":1,"orderBy":"id","ascending":false,"recordsOnPage":100,"searchString":null,"cities":null,"software":null,"cryptToolClasses":null,"statuses":null}
```

Execute command \"estimate\" to see how long data will be collected and
how much space needed

``` bash
$ apibackuper estimate full
```

Output:

``` bash
Total records: 502
Records per request: 100
Total requests: 6
Average record size 32277.96 bytes
Estimated size (json lines) 16.20 MB
Avg request time, seconds 66.9260
Estimated all requests time, seconds 402.8947
```

Execute command \"run\" to collect the data. Result stored in
\"storage.zip\"

``` bash
$ apibackuper run full
```

Exports data from storage and saves as jsonl file called
\"etrust.jsonl\"

``` bash
$ apibackuper export jsonl etrust.jsonl
```

# Config options

Example config file

``` bash
[settings]
initialized = True
name =
splitter = .

[project]
description =
url =
http_mode =
work_modes =
iterate_by =

[params]
page_size_param =
page_size_limit =
page_number_param =
count_skip_param =

[data]
total_number_key =
data_key =
item_key =
change_key =

[follow]
follow_mode =
follow_pattern =
follow_data_key =
follow_param =
follow_item_key =

[files]
fetch_mode =
root_url =
keys =
storage_mode =

[storage]
storage_type = zip
compression = True
```

## settings

- name - short name of the project
- splitter - value of field splitter. Needed for rare cases when \'.\'
is part of field name. For example for OData requests and
\'@odata.count\' field

## project

- description - text that explains what for is this project
- url - API endpoint url
- http_mode - one of HTTP modes: GET or POST
- work_modes - type of operations: full - archive everything,
incremental - add new records only, update - collect changed data
only
- iterate_by - type of iteration of records. By \'page\' - default,
page by page or by \'skip\' if skip value provided

## params

- page_size_param - parameter with page size
- page_size_limit - limit of records provided by API
- page_number_param = parameter with page number
- count_skip_param - parameter for \'skip\' type of iteration

## data

- total_number_key - key in data with total number of records
- data_key - key in data with list of records
- item_key - key in data with unique identifier of the record. Could
be group of keys separated with comma
- change_key - key in data that indicates that record changed. Could
be group of keys separated with comma

## follow

- follow_mode - mode to follow objects. Could be \'url\' or \'item\'.
If mode is \'url\' than follow_pattern not used
- follow_pattern - url pattern / url prefix for followed objects. Only
for mode \'item\'\'
- follow_data_key - if object/objects are inside array, key of this
array
- follow_param - parameter used in \'item\' mode
- follow_item_key - item key

## files

- fetch_mode - file fetch mode. Could be \'prefix\' or \'id\'. Prefix
- root_url - root url / prefix for files
- keys - list of keys with urls/file id\'s to search for files to save
- storage_mode - a way how files stored in storage/files.zip. By
default \'filepath\' and files storaged same way as they presented
in url

## storage

- storage_type - type of local storage. \'zip\' is local zip file is
default one
- compression - if True than compressed ZIP file used, less space
used, more CPU time processing data

# Usage

Synopsis:

``` bash
$ apibackuper [flags] [command] inputfile
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ruarxive/apibackuper

Awesome Lists containing this project

README