Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/cfpb/api
Documentation to support upcoming data platform API and data sets
https://github.com/cfpb/api
Last synced: about 1 month ago
JSON representation
Documentation to support upcoming data platform API and data sets
- Host: GitHub
- URL: https://github.com/cfpb/api
- Owner: cfpb
- Created: 2013-08-20T11:33:30.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2023-05-09T17:04:19.000Z (over 1 year ago)
- Last Synced: 2024-11-07T13:40:11.594Z (2 months ago)
- Language: CSS
- Homepage: http://cfpb.github.io/api/hmda/
- Size: 6.61 MB
- Stars: 31
- Watchers: 32
- Forks: 73
- Open Issues: 17
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
CFPB Public Data API
====================This is the source project for the CFPB public data API at http://api.consumerfinance.gov. It contains up-to-date load scripts for populating data in that API. Note, however, that the `resources/static/` directory is a work in progress; refer to https://github.com/cfpb/qu/tree/master/resources/static for the current static resources.
## Create a config file
```sh
cp sample_config.edn config.edn
```Edit that file with appropriate values, including MongoDB connection information.
## Loading data
To load the HMDA dataset, run `lein repl` and enter the following:
```clj
(-main "config.edn") ;; use the name of your config file. config.edn is an example
(require 'qu.loader)
(in-ns 'qu.loader)
(load-dataset "hmda")
```
Then close your REPL, using `Ctrl-D`.
This does start the web server, so you may want to use a different configuration that starts it on an unpublished port.## Speeding up data load
You can take advantage of more CPU and Ram by concurrently loading chunks of data by splitting the large data file
into smaller files with 1 million records each.```./hmda_split_csv.sh hmda_lar_all_2012.csv split_hmda_2012_```
Then update the definition.json file to load these split files instead of the single CSV. Edit the hmda_lar sources
section and replace it with a list of all the split files for the year you're targeting.For example, for 2012, it looks like this:
```
"sources": [
"split_hmda_2012_aa",
"split_hmda_2012_ab",
"split_hmda_2012_ac",
"split_hmda_2012_ad",
"split_hmda_2012_ae"
]
```