https://github.com/lynn/sourced
Helper library for Python scripts to fetch and cache data
https://github.com/lynn/sourced
Last synced: about 1 month ago
JSON representation
Helper library for Python scripts to fetch and cache data
- Host: GitHub
- URL: https://github.com/lynn/sourced
- Owner: lynn
- License: mit
- Created: 2020-09-12T21:31:26.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2020-09-14T13:26:45.000Z (over 4 years ago)
- Last Synced: 2025-02-14T08:22:51.625Z (3 months ago)
- Language: Python
- Homepage:
- Size: 9.77 KB
- Stars: 3
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# sourced
I don't like writing this in my Python scripts every time:
```py
import json, os, requests
photos_url = 'https://jsonplaceholder.typicode.com/photos'
photos_path = 'photos.json'# Fetch the file if it doesn't exist yet.
if not os.path.isfile(photos_path):
res = requests.get(photos_url)
with open(photos_path, 'wb') as f:
f.write(res.content)# Then, use the local copy.
with open('photos.json') as f:
photos = json.load(f)print(photos[0]['title'])
```So this little library lets you write:
```py
import sourced
photos_url = 'https://jsonplaceholder.typicode.com/photos'photos = sourced.json('photos.json', url=photos_url)
print(photos[0]['title'])
```And offers some other handy behavior.
# Documentation
## Basic usage
Pass `url`, and optionally `headers`, to fetch and decode files from the web.
Use `encoding` to set the encoding for text and JSON. It defaults to `utf-8`.```py
t = sourced.text('merci.txt', url=text_url, headers={'Accept-Language': 'fr'})
s = sourced.text('sozai.txt', url=sjis_url, encoding='shift_jis')
j = sourced.json('okay.json', url=json_url, headers={'Authorization': token})
b = sourced.binary('data.bin', url=binary_url)
assert isinstance(s, str)
assert isinstance(j, dict)
assert isinstance(b, bytes)
```Alternatively, use `create=file_creating_function`. This function should
return a deserialized result (so, a `dict` rather than a string, for JSON).```py
def default_json(): return {'meow': 123}
j = sourced.json('my.json', create=default_json)
```## Cache Invalidation
Pass `max_age='2 weeks'` to invalidate cached files after 2 weeks.
[pytimeparse](https://github.com/wroberts/pytimeparse) is used to parse
the provided time delta.```py
stuff = sourced.json('stuff.json', url=my_url, max_age='5 days')
```## JSON paths
When you only need part of the data at `url`, you can provide a JSON path
to select the parts you want. The JSON path library used is
[jsonpath-rw](https://github.com/kennknowles/python-jsonpath-rw).Use `find=path` to get an array of matches:
```py
sourced.json('titles.json', url=url, find='[:10].title')
```Use `pick=path` to keep just the first match:
```py
sourced.json('titles.json', url=url, pick='[?(@.id == 99)]')
```## Pagination
### Predefined pages
If `url` is a list, the results are concatenated with `+`. This is
useful for text or JSON endpoints that return arrays.```py
sourced.json('combi.json', url=[url1, url2])
sourced.text('combi.txt', url=[url3, url4])
```### URL-numbered pages
If `url` is `https://blah.com/foo/%p1`, then requests are made for
`https://blah.com/foo/1`, `https://blah.com/foo/2`, … until the result is
empty text / an empty JSON array.Use `%p0` to start numbering pages from 0 instead.
```py
sourced.json('tags.json', url='https://testbooru.donmai.us/tags.json?limit=500&page=%p1')
```### JSON-pathed pages
If using JSON and `next_path` is a JSON path, it is used to retrieve a URL
for the next page until it is `null`.```py
kanji = sourced.json('kanji.json',
url='https://api.wanikani.com/v2/subjects?types=kanji',
headers={'Wanikani-Revision': '20170710',
'Authorization': 'Bearer %s' % wk_token},
find='data[*].data.characters', next_page='pages.next_url')
```### Custom next-path function
If `next_path` is a function, it is called with the decoded (e.g. JSON object)
result to get the next page URL.```py
f = lambda json: base_url + '&start_from=' + json['next_page_start']
sourced.json('a.json', url=base_url, next_page=f)
```