https://github.com/cfpb/api

Documentation to support upcoming data platform API and data sets
https://github.com/cfpb/api

Last synced: 3 months ago
JSON representation

Documentation to support upcoming data platform API and data sets

Host: GitHub
URL: https://github.com/cfpb/api
Owner: cfpb
Created: 2013-08-20T11:33:30.000Z (almost 12 years ago)
Default Branch: master
Last Pushed: 2023-05-09T17:04:19.000Z (about 2 years ago)
Last Synced: 2025-04-09T20:15:22.916Z (3 months ago)
Language: CSS
Homepage: http://cfpb.github.io/api/hmda/
Size: 6.61 MB
Stars: 31
Watchers: 32
Forks: 78
Open Issues: 17
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        CFPB Public Data API

====================

This is the source project for the CFPB public data API at http://api.consumerfinance.gov. It contains up-to-date load scripts for populating data in that API. Note, however, that the `resources/static/` directory is a work in progress; refer to https://github.com/cfpb/qu/tree/master/resources/static for the current static resources.

## Create a config file

```sh

cp sample_config.edn config.edn

```

Edit that file with appropriate values, including MongoDB connection information.

## Loading data

To load the HMDA dataset, run `lein repl` and enter the following:

```clj

(-main "config.edn") ;; use the name of your config file. config.edn is an example

(require 'qu.loader)

(in-ns 'qu.loader)

(load-dataset "hmda")

```

Then close your REPL, using `Ctrl-D`.

This does start the web server, so you may want to use a different configuration that starts it on an unpublished port.

## Speeding up data load

You can take advantage of more CPU and Ram by concurrently loading chunks of data by splitting the large data file 

into smaller files with 1 million records each. 

```./hmda_split_csv.sh hmda_lar_all_2012.csv split_hmda_2012_``` 

Then update the definition.json file to load these split files instead of the single CSV. Edit the hmda_lar sources 

section and replace it with a list of all the split files for the year you're targeting. 

For example, for 2012, it looks like this:

```

   "sources": [

"split_hmda_2012_aa",

"split_hmda_2012_ab",

"split_hmda_2012_ac",

"split_hmda_2012_ad",

"split_hmda_2012_ae"

]

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/cfpb/api

Awesome Lists containing this project

README