An open API service indexing awesome lists of open source software.

https://github.com/openbrewerydb/openbrewerydb

🍻 An open-source dataset of breweries, cideries, brewpubs, and bottleshops.
https://github.com/openbrewerydb/openbrewerydb

breweries csv dataset hacktoberfest json sql typescript

Last synced: 12 months ago
JSON representation

🍻 An open-source dataset of breweries, cideries, brewpubs, and bottleshops.

Awesome Lists containing this project

README

          

# 🍻 Open Brewery DB Dataset

[![All Contributors](https://img.shields.io/badge/all_contributors-60-orange.svg?style=flat-square)](#contributors-)

![Open Brewery DB Logo](obdb-logo-md.jpg)

This is the open-source dataset for the [Open Brewery DB API](https://www.openbrewerydb.org/) which is served by a [REST API built with Ruby on Rails](https://github.com/chrisjm/openbrewerydb-rails-api)

## 🎯 Purpose

Provide an approval-based pipeline to update the dataset and API.

## 🗄 Data Formats

- [CSV - Full Dataset](breweries.csv)
- [JSON](breweries.json)
- [PostgreSQL SQL](breweries.sql)

## 🚀 Getting Started

1. `git clone git@github.com:openbrewerydb/openbrewerydb.git`
2. `cd openbrewerydb && npm install`

## ⚙️ Scripts

The following npm scripts help maintain and manage the dataset:

### Data Management
- `npm run validate`
- Validates all CSV files against the JSON Schema
- Checks for required fields and data format consistency
- Reports any validation errors that need attention

- `npm run csv:combine`
- Combines all individual CSV files from country/state-region folders into a single `breweries.csv`
- Useful when you've made changes to individual state files and need to update the main dataset

- `npm run csv:split`
- Splits the main `breweries.csv` into separate files by country/state-region
- Helps maintain organized, manageable data files for each region
- Creates directories if they don't exist

### Data Generation
- `npm run generate:ids`
- Creates unique OBDB IDs for each brewery based on name and city
- Automatically updates `breweries.csv` with new IDs
- Ensures no duplicate IDs exist in the dataset

- `npm run generate:json`
- Converts `breweries.csv` into a JSON format (`breweries.json`)
- Useful for applications that prefer working with JSON data
- Maintains data consistency across formats

- `npm run generate:sql`
- Creates PostgreSQL SQL file from `breweries.csv`
- Includes table creation and data insertion statements
- Perfect for database implementations

- `npm run generate:stats`
- Generates comprehensive dataset statistics
- Shows brewery counts by state/city
- Displays brewery type distribution
- Reports data completeness metrics

### Contributor Management
- `npm run contributors:add`
- Interactive CLI tool to add new contributors
- Prompts for contributor information and contribution type
- Updates `.all-contributorsrc` file

- `npm run contributors:check`
- Verifies if any contributors are missing from the list
- Helps maintain accurate recognition of all contributors

- `npm run contributors:generate`
- Updates the Contributors section in `README.md`
- Generates contributor table with avatars and contribution types

### Workflow
- `npm run workflow:maintain`
- Comprehensive maintenance workflow that:
1. Validates all CSV files
2. Combines all CSV files
3. Generates new IDs if needed
4. Creates JSON and SQL files
5. Splits back into individual state files
- Run this after making any dataset updates

## 🤝 Contributing

For information on contributing to this project, please see the [contributing guide](CONTRIBUTING.md) and our [code of conduct](CODE_OF_CONDUCT.md).

1. Fork the repository
2. Add or update breweries in the CSV (Excel, Google Sheets)
3. Submit a Pull Request

### Tips

First and foremost, don't worry about messing up! 🙂 Thank you so much for contributing! 🙌

- CSVs are organized by `data/[country]/[state_province]`
- Required fields/columns: `name`, `brewery_type`, `city`, `state_province`, and `country`
- When adding a brewery, do not include an `id`. This will be created after review.
- Please either add to `breweries.csv` (preferred if adding breweries for a new country) or the individual state/province CSV file. Adding to both at the same time may introduce duplicates/errors.

## 👾 Community

- [Join the Newsletter](http://eepurl.com/dBjS0j)
- [Join the Discord](https://discord.gg/3G3syaD)

## 📫 Feedback

Any feedback, please [email me](mailto:chris@openbrewerydb.org).

Cheers! 🍻

## 📊 Project Status

- **Status**: Active
- **Last Dataset Update**: 2024
- **Maintenance**: Actively maintained through community contributions
- **Dataset Size**: 8,000+ breweries
- **Coverage**: United States, with growing international data

## 🔧 Requirements

- Node.js v22 or higher
- npm package manager
- Git

## 📚 Data Schema

Each brewery entry contains the following fields:

| Field | Type | Description | Required |
|-------|------|-------------|-----------|
| id | String | Unique identifier | Yes |
| name | String | Name of the brewery | Yes |
| brewery_type | String | Type of brewery (micro, regional, brewpub, etc.) | Yes |
| street | String | Street address | No |
| city | String | City | Yes |
| state_province | String | State/Province | Yes |
| postal_code | String | Postal code | Yes |
| country | String | Country | Yes |
| longitude | String | Decimal longitude coordinate | No |
| latitude | String | Decimal latitude coordinate | No |
| phone | String | Phone number | No |
| website_url | String | Website URL | No |

## 📖 Usage Examples

### Python
```python
import pandas as pd

# Read CSV
breweries_df = pd.read_csv('breweries.csv')

# Filter by state
california_breweries = breweries_df[breweries_df['state_province'] == 'California']
```

### JavaScript/Node.js
```javascript
const fs = require('fs');

// Read JSON
const breweries = JSON.parse(fs.readFileSync('breweries.json', 'utf8'));

// Filter by type
const microBreweries = breweries.filter(b => b.brewery_type === 'micro');
```

### SQL
```sql
-- After importing breweries.sql
SELECT name, city, state_province
FROM breweries
WHERE brewery_type = 'brewpub'
ORDER BY state_province, city;
```

## 🔄 Versioning

The dataset is updated regularly through community contributions. Each update goes through the following process:

1. Community members submit new breweries or updates via pull requests
2. Changes are reviewed and validated
3. Upon approval, changes are merged and new dataset files are generated
4. The API is automatically updated with the new data

Latest dataset version: 2024.1

## Contributors ✨

Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/docs/en/emoji-key)):



Mike Putnam
Mike Putnam

🔣
Andrew A. Barber
Andrew A. Barber

🔣
Jason Allen
Jason Allen

🔣
Juicob
Juicob

🔣
Will Karnasiewicz
Will Karnasiewicz

🔣
Dylan T. Vavra
Dylan T. Vavra

🔣
Madison Martinez
Madison Martinez

🔣


Daniel Eremchuk
Daniel Eremchuk

🔣
Alex Chong
Alex Chong

🔣
Matt S
Matt S

🔣
Samuel Rusher
Samuel Rusher

🔣
Evan Caraway
Evan Caraway

🔣
Tyler K Kuromiya Parker
Tyler K Kuromiya Parker

🔣
kendellmendoza
kendellmendoza

🔣


Johnnyk737
Johnnyk737

🔣
James Schuler
James Schuler

🔣
Creighton Leif
Creighton Leif

🔣
Vitaly Tomilov
Vitaly Tomilov

💻
Kyle Scudder
Kyle Scudder

🔣
Chris Mears
Chris Mears

💬 💻 🔣 🚧 📆 🔧
donkeyslaps
donkeyslaps

🔣


Pranav Davar
Pranav Davar

🔧
Alexandre Hernandes Barrozo
Alexandre Hernandes Barrozo

🔣
Resten
Resten

🔣
Matt Higgins
Matt Higgins

🔣
Alex Justesen
Alex Justesen

🔣
Craig Kelly
Craig Kelly

🔣
Krzysztof Rewak
Krzysztof Rewak

🔣


John Baumert
John Baumert

🔣
Charlie Cox
Charlie Cox

🔣
Miles Kane
Miles Kane

🔣
Anthony Laflamme
Anthony Laflamme

💻
Georg Engelsmann
Georg Engelsmann

🔣
Clinton Williams
Clinton Williams

🔣
Brent Busby
Brent Busby

🔣


kenster89
kenster89

🔣
Adilet Sarsembayev
Adilet Sarsembayev

🔣
Pranav Davar
Pranav Davar

🔣
b-mc2
b-mc2

🔣
Nicole
Nicole

🔣
Nicholas Hance
Nicholas Hance

🔣
Joachim Nilsson
Joachim Nilsson

🔣


Alejandro Lopez Rocha
Alejandro Lopez Rocha

🔣
zshapleigh
zshapleigh

🔣
Praval Visvanath
Praval Visvanath

🔣
JohnHenry
JohnHenry

🔣
Alfredo Garcia
Alfredo Garcia

🔣
Qerewe
Qerewe

🔣
Nathan Peters
Nathan Peters

🔣


Erich Cervantez
Erich Cervantez

🔣
Ronald Sahagun
Ronald Sahagun

🔣
Greg W.
Greg W.

🔣
David Holm
David Holm

🔣
sadilett
sadilett

🔣
Ryan Mallette
Ryan Mallette

🔣
Chris Condreay
Chris Condreay

🔣


Wi5ARD
Wi5ARD

🔣
JP Bulman
JP Bulman

🔣
Sara
Sara

🔣
Sean
Sean

🔣

This project follows the [all-contributors](https://github.com/all-contributors/all-contributors) specification. Contributions of any kind welcome!

## 📊 Statistics

> Last updated: 2024-11-01

### Overview
- Total Breweries: 8,355
- Data Completeness: 78.0%

### 🏛 Top 10 States by Brewery Count
| State | Count |
|-------|-------|
| California | 918 |
| Washington | 486 |
| Colorado | 448 |
| New York | 419 |
| Michigan | 375 |
| Texas | 352 |
| Pennsylvania | 345 |
| Florida | 312 |
| North Carolina | 307 |
| Ohio | 303 |

### 🍺 Brewery Types Distribution
| Type | Count | Percentage |
|------|--------|------------|
| micro | 4,305 | 51.5% |
| brewpub | 2,500 | 29.9% |
| planning | 684 | 8.2% |
| regional | 225 | 2.7% |
| closed | 216 | 2.6% |
| contract | 192 | 2.3% |
| large | 90 | 1.1% |
| proprietor | 69 | 0.8% |
| bar | 37 | 0.4% |
| taproom | 20 | 0.2% |
| nano | 13 | 0.2% |
| beergarden | 3 | 0.0% |
| location | 1 | 0.0% |

### 🌆 Top 10 Cities by Brewery Count
| City | Count |
|------|--------|
| Denver, Colorado | 92 |
| San Diego, California | 91 |
| Portland, Oregon | 85 |
| Seattle, Washington | 80 |
| Chicago, Illinois | 64 |
| Austin, Texas | 49 |
| Houston, Texas | 40 |
| San Francisco, California | 39 |
| Minneapolis, Minnesota | 38 |
| Cincinnati, Ohio | 34 |

### 📋 Data Completeness by Field
| Field | Completeness |
|-------|-------------|
| name | 100.0% |
| brewery_type | 100.0% |
| city | 100.0% |
| state_province | 100.0% |
| postal_code | 100.0% |
| country | 100.0% |
| address_1 | 91.0% |
| phone | 90.0% |
| website_url | 86.0% |
| longitude | 72.0% |
| latitude | 72.0% |
| address_2 | 1.0% |
| address_3 | 0.0% |