https://github.com/lynn/sourced

Helper library for Python scripts to fetch and cache data
https://github.com/lynn/sourced

Last synced: about 1 month ago
JSON representation

Helper library for Python scripts to fetch and cache data

Host: GitHub
URL: https://github.com/lynn/sourced
Owner: lynn
License: mit
Created: 2020-09-12T21:31:26.000Z (over 4 years ago)
Default Branch: master
Last Pushed: 2020-09-14T13:26:45.000Z (over 4 years ago)
Last Synced: 2025-02-14T08:22:51.625Z (3 months ago)
Language: Python
Homepage:
Size: 9.77 KB
Stars: 3
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # sourced

I don't like writing this in my Python scripts every time:

```py

import json, os, requests

photos_url = 'https://jsonplaceholder.typicode.com/photos'

photos_path = 'photos.json'

# Fetch the file if it doesn't exist yet.

if not os.path.isfile(photos_path):

    res = requests.get(photos_url)

    with open(photos_path, 'wb') as f:

        f.write(res.content)

# Then, use the local copy.

with open('photos.json') as f:

    photos = json.load(f)

print(photos[0]['title'])

```

So this little library lets you write:

```py

import sourced

photos_url = 'https://jsonplaceholder.typicode.com/photos'

photos = sourced.json('photos.json', url=photos_url)

print(photos[0]['title'])

```

And offers some other handy behavior.

# Documentation

## Basic usage

Pass `url`, and optionally `headers`, to fetch and decode files from the web.

Use `encoding` to set the encoding for text and JSON. It defaults to `utf-8`.

```py

t = sourced.text('merci.txt', url=text_url, headers={'Accept-Language': 'fr'})

s = sourced.text('sozai.txt', url=sjis_url, encoding='shift_jis')

j = sourced.json('okay.json', url=json_url, headers={'Authorization': token})

b = sourced.binary('data.bin', url=binary_url)

assert isinstance(s, str)

assert isinstance(j, dict)

assert isinstance(b, bytes)

```

Alternatively, use `create=file_creating_function`. This function should

return a deserialized result (so, a `dict` rather than a string, for JSON).

```py

def default_json(): return {'meow': 123}

j = sourced.json('my.json', create=default_json)

```

## Cache Invalidation

Pass `max_age='2 weeks'` to invalidate cached files after 2 weeks.

[pytimeparse](https://github.com/wroberts/pytimeparse) is used to parse

the provided time delta.

```py

stuff = sourced.json('stuff.json', url=my_url, max_age='5 days')

```

## JSON paths

When you only need part of the data at `url`, you can provide a JSON path

to select the parts you want. The JSON path library used is

[jsonpath-rw](https://github.com/kennknowles/python-jsonpath-rw).

Use `find=path` to get an array of matches:

```py

sourced.json('titles.json', url=url, find='[:10].title')

```

Use `pick=path` to keep just the first match:

```py

sourced.json('titles.json', url=url, pick='[?(@.id == 99)]')

```

## Pagination

### Predefined pages

If `url` is a list, the results are concatenated with `+`. This is

useful for text or JSON endpoints that return arrays.

```py

sourced.json('combi.json', url=[url1, url2])

sourced.text('combi.txt', url=[url3, url4])

```

### URL-numbered pages

If `url` is `https://blah.com/foo/%p1`, then requests are made for

`https://blah.com/foo/1`, `https://blah.com/foo/2`, … until the result is

empty text / an empty JSON array.

Use `%p0` to start numbering pages from 0 instead.

```py

sourced.json('tags.json', url='https://testbooru.donmai.us/tags.json?limit=500&page=%p1')

```

### JSON-pathed pages

If using JSON and `next_path` is a JSON path, it is used to retrieve a URL

for the next page until it is `null`.

```py

kanji = sourced.json('kanji.json',

    url='https://api.wanikani.com/v2/subjects?types=kanji',

    headers={'Wanikani-Revision': '20170710',

             'Authorization': 'Bearer %s' % wk_token},

    find='data[*].data.characters', next_page='pages.next_url')

```

### Custom next-path function

If `next_path` is a function, it is called with the decoded (e.g. JSON object)

result to get the next page URL.

```py

f = lambda json: base_url + '&start_from=' + json['next_page_start']

sourced.json('a.json', url=base_url, next_page=f)

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/lynn/sourced

Awesome Lists containing this project

README