https://github.com/mpdn/github-json-schemas
A dataset of all JSON shcemas on Github
https://github.com/mpdn/github-json-schemas
dataset json-schema
Last synced: about 1 year ago
JSON representation
A dataset of all JSON shcemas on Github
- Host: GitHub
- URL: https://github.com/mpdn/github-json-schemas
- Owner: mpdn
- License: mit
- Created: 2023-06-18T18:58:48.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2023-06-18T19:57:28.000Z (about 3 years ago)
- Last Synced: 2025-02-07T03:41:41.749Z (over 1 year ago)
- Topics: dataset, json-schema
- Homepage:
- Size: 1.95 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Github JSON Schema Dataset
This is a dataset of all JSON schemas on found in the
[GitHub BigQuery Dataset](https://console.cloud.google.com/marketplace/product/github/github-repos).
The schema files are stored as gzipped newline-delimited JSON. Each line is a unique schema file,
with an array of repositories referencing the file. See the [schema](schema.json) for more details.
See [Releases](https://github.com/mpdn/github-json-schemas/releases/latest) for downloading the
dataset.
## Query
```sql
select
id,
content,
usages
from `bigquery-public-data.github_repos.contents`
join (
select
id,
array_agg(struct(repo_name, ref, path, license)) as usages
from `bigquery-public-data.github_repos.files`
join `bigquery-public-data.github_repos.licenses` using (repo_name)
where ends_with(path, '.json')
group by id
) using (id)
where
not binary
and contains_substr(content, '$schema')
and contains_substr(content, 'json-schema.org')
and contains_substr(json_query(content, '$."$schema"'), 'json-schema.org')
```