https://github.com/caciviclab/disclosure-backend-static

Campaign finance data powering Open Disclosure California
https://github.com/caciviclab/disclosure-backend-static

california campaign-finance code-for-america openoakland ruby static-site

Last synced: 12 days ago
JSON representation

Campaign finance data powering Open Disclosure California

Host: GitHub
URL: https://github.com/caciviclab/disclosure-backend-static
Owner: caciviclab
Created: 2016-09-05T23:48:28.000Z (about 9 years ago)
Default Branch: master
Last Pushed: 2025-09-26T07:36:53.000Z (19 days ago)
Last Synced: 2025-09-26T09:28:14.326Z (19 days ago)
Topics: california, campaign-finance, code-for-america, openoakland, ruby, static-site
Language: Ruby
Homepage: https://caciviclab.org/odca-jekyll/
Size: 46.8 MB
Stars: 14
Watchers: 27
Forks: 13
Open Issues: 19
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

[![Build
Status](https://travis-ci.org/caciviclab/disclosure-backend-static.svg?branch=master)](https://travis-ci.org/caciviclab/disclosure-backend-static)

# Disclosure Backend Static

The `disclosure-backend-static` repo is the backend powering [Open Disclosure California](https://opendisclosure.io).

It was created in haste running up to the 2016 election, and thus is engineered around a "get it done" philosophy. At that time, we had already designed an API and built (most of) a frontend; this repo was created to implement those as quickly as possible.

This project implements a basic ETL pipeline to download the Oakland netfile
data, download the CSV human-curated data for Oakland, and combine the two. The
output is a directory of JSON files which mimic the existing API structure so
no client code changes will be required.

## Prerequisites

- Ruby 3.3.4 (see version in `.ruby-version`)

## Installation

**Note:** You do not need to run these commands to develop on the frontend. All
you need to do is clone the repository adjacent to the frontend repo.

If you're planning to modify the backend code, follow these steps to set up all necessary development dependencies, including a new [PostgreSQL](https://www.PostgreSQL.org/) database and Python 3:

1. Ensure [Homebrew](https://brew.sh/) is up to date by running:
```
brew update && brew upgrade
```
2. Install PostgreSQL version 16 with Homebrew:
```
brew install postgresql@16
```
- To automatically start the database at login, use:
```
brew services start postgresql@16
```
3. Verify Python 3 is installed on your system. If not, download it from [python.org](https://www.python.org/downloads/).
- macOS natively runs Python 2, so use `python3 -m pip` instead of `pip` to ensure Python 3 is used:
```
python3 -m pip install ...
```
- Or if your system's `pip` points to Python 3, you can use `pip` directly:
```
pip install ...
```
4. Run
```
sudo -H python -m pip install -r requirements.txt
gem install pg bundler
bundle install
```

### Codespaces

This repository is set up to work in a container under Codespaces. In other words, you can start up an environment that is already set up without having to do any of the installation steps required to set up a local environment. This can be used as a way to trouble-shoot code before it is committed to the production pipeline. The following information may be helpful to get started using Codespaces:

1. Go to the page displaying the repository code: https://github.com/caciviclab/disclosure-backend-static
2. Change the branch to the branch that you're interested in running and editing.
3. Click on the `Code` button and click the `Codespaces` tab in the dropdown
4. Click on the button to start a new codespace
5. Wait for the container to be set up and a terminal prompt at `/workspace` to be presented in the web page, which will look familiar if you've worked with VS Code before
6. In the terminal, you can start running the commands from the next section, such as `make download`
7. There will be a running Postgres database started on localhost and you can simply type `psql` in the terminal to connect to the server
8. The `make import` command will populate the Postgres database
9. Click on the cylinder icon on the left to access the SQL Tools extension, which will allow you to connect to Postgres and view its schema as well as query the data that you imported
10. The container has Python 3.9 and a current version of Ruby installed, along with all the required packages
11. We will try to get the container to match the environment for Travis CI, including trying to get the same Ruby version installed
12. This same setup allows anyone to run the same container on their local machine in VS Code using the Dev Containers extension, but we are mainly focused on enabling Codespaces right now and making sure that we can harden the setup for Codespaces first. (We'll be adding instructions for Dev Containers once this is solid)
13. If you make changes in your Codespaces environment, don't forget to push it after committing it with Git. It is like another machine, so it won't end up in the GitHub repository if don't do a `git push`

### Docker

This repository is also configured to run within a Docker container. This is similar to Codespaces except you can use whatever IDE and local setup that you prefer. Here's how to get started using Docker with [VSCode](https://code.visualstudio.com/):

1. Download [Docker](https://www.docker.com/products/docker-desktop/) and launch Docker Desktop.
- Version 4.32.0 (157355) has been verified to work with this repository.
2. Install the VSCode [Dev Containers](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers) extension.
- Version v0.375.1 has been confirmed to be compatible with this repository.
- Note: VSCode may suggest installing the Docker extension as well, but you only need the Dev Containers extension for this setup.
3. In VSCode, navigate to [.devcontainer/Dockerfile](.devcontainer/Dockerfile) or restart VSCode. The Dev Containers extension should prompt you to reopen the repository in a container. Proceed with this option.
4. Once the container environment is loaded, complete the setup by following the commands below.

## Running

Download the raw data files. You only need to run this once in a while to get
the latest data.

$ make download

Import the data into the database for easier processing. You only need to run
this after you've downloaded new data.

$ make import

Run the calculators. Everything is output into the "build" folder.

$ make process

Optionally, reindex the build outputs into Algolia. (Reindexing requires the
ALGOLIASEARCH_APPLICATION_ID and ALGOLIASEARCH_API_KEY environment variables).

$ make reindex

If you want to serve the static JSON files via a local web server:

$ make run

## Developing

### Maintaining database schema

When `make import` is run, a number of postgres tables are created for importing the downloaded data. The schema of these tables are explicitly defined in the `dbschema` directory and may have to be updated in the future to accommodate future data. Columns that hold string data may not be sized large enough for future data. For example, if a name column accepts names of at most 20 characters and in the future, we have data where the name is 21 characters long, the data import will fail. When this occurs, we will have to update the corresponding schema file in `dbschema` to support more characters. Simply make the change and re-run `make import` to verify that it succeeds.

### Checking output data changes

This repository is used to generate data files that are used by the website. After `make process` is run, a `build` directory is generated containing the data files. This directory is checked in to the repository and later checked out when generating the website. After making code changes, it is important to compare the generated `build` directory against the `build` directory generated before the code changes and verify that changes from the code changes are as expected.

Because a strict comparison of all contents of the `build` directory will always include changes that occur independent of any code change, every developer has to know about these expected changes in order to perform this check. To remove the need for this, a specific file, `bin/create-digests.py`, generate digests for JSON data in the `build` directory after excluding these expected changes. To look for changes that exclude these expected changes, simply look for a change in the `build/digests.json` file.

Currently, these are the expected changes that occur independent of any code change:
* timestamps change for each run
* top contributors lists contain undefined ordering of contributors with the same contribution
* top spenders lists contain undefined ordering of spenders with the same spending
* rounding differences for floats

The expected changes are excluded before generating digests for data in the `build` directory. The logic for this can be found in the function `clean_data`, found in the file `bin/create-digests.py`. After the code is modified such that an expected change no longer exists, the exclusion of that change can be removed from `clean_data`. For example, the rounding of floats are not consistently the same each time `make process` is run, due to differences in the environment. When the code is fixed so that the rounding of floats is the same as long as the data hasn't changed, the `round_float` call in `clean_data` can be removed.

An additional script has been created to generate a report that enables comparing the totals for candidates. The script is `bin/report-candidates.py` and it generates `build/candidates.csv` and `build/candidates.xlsx`. The reports include a list of all the candidates and totals calculated multiple ways that should add up to the same number.

To ensure that database schema changes are visible in pull requests, the complete postgres schema is also saved to a `schema.sql` file in the `build` directory. Because the `build` directory is automatically re-built for each branch in a PR and committed to the repository, any change to the schema caused by a code change will be shown a difference in the `schema.sql` file when reviewing the PR.

### Adding a calculator

Each metric about a candidate is calculated independently. A metric might be
something like "total contributions received" or something more complex like
"percentage of contributions that are less than $100".

When adding a new calculation, a good first place to start is the official [Form
460][form_460]. Is the data are you looking for reported on that form? If so,
you will probably find it in your database after the import process. There are
also a couple other forms that we import, like Form 496. (These are the names of
the files in the `input` directory. Check those out.)

Each schedule of each form is imported into a separate postgres table. For
example, Schedule A of Form 460 is imported into the
`A-Contributions` table.

Now that you have a way of querying the data, you should come up with a SQL
query that calculates the value you are trying to get. Once you can express
your calcualtion as SQL, put it in a calcuator file like so:

1. Create a new file named `calculators/[your_thing]_calculator.rb`
2. Here is some boilerplate for that file:
```ruby
# the name of this class _must_ match the filename of this file, i.e. end
# with "Calculator" if the file ends with "_calculator.rb"
class YourThingCalculator
def initialize(candidates: [], ballot_measures: [], committees: [])
@candidates = candidates
@candidates_by_filer_id = @candidates.where('"FPPC" IS NOT NULL')
.index_by { |candidate| candidate['FPPC'] }
end

def fetch
@results = ActiveRecord::Base.connection.execute(<<-SQL)
-- your sql query here
SQL

@results.each do |row|
# make sure Filer_ID is returned as a column by your query!
candidate = @candidates_by_filer_id[row['Filer_ID'].to_i]

# change this!
candidate.save_calculation(:your_thing, row[column_with_your_desired_data])
end
end
end
```
3. You will want to fill in the SQL query and make sure that the query selects
the `Filer_ID` column.
4. Make sure to update the call to `candidate.save_calculation`. That method
will serialize its second argument as JSON, so it can store any kind of data.
5. Your calculation can be retrieved with `candidate.calculation(:your_thing)`.
You will want to add this into an API response in the `process.rb` file.

## Data flow

This is how the data flows through the back end. Finance data is pulled from
Netfile which is supplemented by a Google Sheet mapping Filer Ids to ballot
information like candidate names, offices, ballot measures, etc. Once data is
filtered, aggregated, and transformed, the front end consumes it and builds the
static HTML front end.

Diagram showing how finance data flows through different disclosure components

### Common Errors

**During Bundle Install**
```
error: use of undeclared identifier 'LZMA_OK'
```
Try:
```
brew unlink xz
bundle install
brew link xz
```

**During `make download`**
```
wget: command not found
```
Run `brew install wget`.

**During `make import`**

It appears there is a problem on Macintosh systems using the Apple Chips.
```
ImportError: You don't appear to have the necessary database backend installed for connection string you're trying to use. Available backends include:

PostgreSQL: pip install psycopg2
```
Try the following:
```
pip uninstall psycopg2-binary
pip install psycopg2-binary --no-cache-dir
```

[form_460]: http://www.fppc.ca.gov/content/dam/fppc/NS-Documents/TAD/Campaign%20Forms/460.pdf

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/caciviclab/disclosure-backend-static

Awesome Lists containing this project

README