https://github.com/openbrewerydb/openbrewerydb
🍻 An open-source dataset of breweries, cideries, brewpubs, and bottleshops.
https://github.com/openbrewerydb/openbrewerydb
breweries csv dataset hacktoberfest json sql typescript
Last synced: 12 months ago
JSON representation
🍻 An open-source dataset of breweries, cideries, brewpubs, and bottleshops.
- Host: GitHub
- URL: https://github.com/openbrewerydb/openbrewerydb
- Owner: openbrewerydb
- License: mit
- Created: 2020-01-24T01:01:09.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2025-03-29T03:02:19.000Z (12 months ago)
- Last Synced: 2025-03-29T03:27:56.217Z (12 months ago)
- Topics: breweries, csv, dataset, hacktoberfest, json, sql, typescript
- Language: Jupyter Notebook
- Homepage: https://www.openbrewerydb.org
- Size: 30 MB
- Stars: 185
- Watchers: 9
- Forks: 97
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# 🍻 Open Brewery DB Dataset
[](#contributors-)

This is the open-source dataset for the [Open Brewery DB API](https://www.openbrewerydb.org/) which is served by a [REST API built with Ruby on Rails](https://github.com/chrisjm/openbrewerydb-rails-api)
## 🎯 Purpose
Provide an approval-based pipeline to update the dataset and API.
## 🗄 Data Formats
- [CSV - Full Dataset](breweries.csv)
- [JSON](breweries.json)
- [PostgreSQL SQL](breweries.sql)
## 🚀 Getting Started
1. `git clone git@github.com:openbrewerydb/openbrewerydb.git`
2. `cd openbrewerydb && npm install`
## ⚙️ Scripts
The following npm scripts help maintain and manage the dataset:
### Data Management
- `npm run validate`
- Validates all CSV files against the JSON Schema
- Checks for required fields and data format consistency
- Reports any validation errors that need attention
- `npm run csv:combine`
- Combines all individual CSV files from country/state-region folders into a single `breweries.csv`
- Useful when you've made changes to individual state files and need to update the main dataset
- `npm run csv:split`
- Splits the main `breweries.csv` into separate files by country/state-region
- Helps maintain organized, manageable data files for each region
- Creates directories if they don't exist
### Data Generation
- `npm run generate:ids`
- Creates unique OBDB IDs for each brewery based on name and city
- Automatically updates `breweries.csv` with new IDs
- Ensures no duplicate IDs exist in the dataset
- `npm run generate:json`
- Converts `breweries.csv` into a JSON format (`breweries.json`)
- Useful for applications that prefer working with JSON data
- Maintains data consistency across formats
- `npm run generate:sql`
- Creates PostgreSQL SQL file from `breweries.csv`
- Includes table creation and data insertion statements
- Perfect for database implementations
- `npm run generate:stats`
- Generates comprehensive dataset statistics
- Shows brewery counts by state/city
- Displays brewery type distribution
- Reports data completeness metrics
### Contributor Management
- `npm run contributors:add`
- Interactive CLI tool to add new contributors
- Prompts for contributor information and contribution type
- Updates `.all-contributorsrc` file
- `npm run contributors:check`
- Verifies if any contributors are missing from the list
- Helps maintain accurate recognition of all contributors
- `npm run contributors:generate`
- Updates the Contributors section in `README.md`
- Generates contributor table with avatars and contribution types
### Workflow
- `npm run workflow:maintain`
- Comprehensive maintenance workflow that:
1. Validates all CSV files
2. Combines all CSV files
3. Generates new IDs if needed
4. Creates JSON and SQL files
5. Splits back into individual state files
- Run this after making any dataset updates
## 🤝 Contributing
For information on contributing to this project, please see the [contributing guide](CONTRIBUTING.md) and our [code of conduct](CODE_OF_CONDUCT.md).
1. Fork the repository
2. Add or update breweries in the CSV (Excel, Google Sheets)
3. Submit a Pull Request
### Tips
First and foremost, don't worry about messing up! 🙂 Thank you so much for contributing! 🙌
- CSVs are organized by `data/[country]/[state_province]`
- Required fields/columns: `name`, `brewery_type`, `city`, `state_province`, and `country`
- When adding a brewery, do not include an `id`. This will be created after review.
- Please either add to `breweries.csv` (preferred if adding breweries for a new country) or the individual state/province CSV file. Adding to both at the same time may introduce duplicates/errors.
## 👾 Community
- [Join the Newsletter](http://eepurl.com/dBjS0j)
- [Join the Discord](https://discord.gg/3G3syaD)
## 📫 Feedback
Any feedback, please [email me](mailto:chris@openbrewerydb.org).
Cheers! 🍻
## 📊 Project Status
- **Status**: Active
- **Last Dataset Update**: 2024
- **Maintenance**: Actively maintained through community contributions
- **Dataset Size**: 8,000+ breweries
- **Coverage**: United States, with growing international data
## 🔧 Requirements
- Node.js v22 or higher
- npm package manager
- Git
## 📚 Data Schema
Each brewery entry contains the following fields:
| Field | Type | Description | Required |
|-------|------|-------------|-----------|
| id | String | Unique identifier | Yes |
| name | String | Name of the brewery | Yes |
| brewery_type | String | Type of brewery (micro, regional, brewpub, etc.) | Yes |
| street | String | Street address | No |
| city | String | City | Yes |
| state_province | String | State/Province | Yes |
| postal_code | String | Postal code | Yes |
| country | String | Country | Yes |
| longitude | String | Decimal longitude coordinate | No |
| latitude | String | Decimal latitude coordinate | No |
| phone | String | Phone number | No |
| website_url | String | Website URL | No |
## 📖 Usage Examples
### Python
```python
import pandas as pd
# Read CSV
breweries_df = pd.read_csv('breweries.csv')
# Filter by state
california_breweries = breweries_df[breweries_df['state_province'] == 'California']
```
### JavaScript/Node.js
```javascript
const fs = require('fs');
// Read JSON
const breweries = JSON.parse(fs.readFileSync('breweries.json', 'utf8'));
// Filter by type
const microBreweries = breweries.filter(b => b.brewery_type === 'micro');
```
### SQL
```sql
-- After importing breweries.sql
SELECT name, city, state_province
FROM breweries
WHERE brewery_type = 'brewpub'
ORDER BY state_province, city;
```
## 🔄 Versioning
The dataset is updated regularly through community contributions. Each update goes through the following process:
1. Community members submit new breweries or updates via pull requests
2. Changes are reviewed and validated
3. Upon approval, changes are merged and new dataset files are generated
4. The API is automatically updated with the new data
Latest dataset version: 2024.1
## Contributors ✨
Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/docs/en/emoji-key)):

Mike Putnam
🔣

Andrew A. Barber
🔣

Jason Allen
🔣

Juicob
🔣

Will Karnasiewicz
🔣

Dylan T. Vavra
🔣

Madison Martinez
🔣

Daniel Eremchuk
🔣

Alex Chong
🔣

Matt S
🔣

Samuel Rusher
🔣

Evan Caraway
🔣

Tyler K Kuromiya Parker
🔣

kendellmendoza
🔣

Johnnyk737
🔣

James Schuler
🔣

Creighton Leif
🔣

Vitaly Tomilov
💻

Kyle Scudder
🔣

Chris Mears
💬 💻 🔣 🚧 📆 🔧 ✅

donkeyslaps
🔣

Pranav Davar
🔧

Alexandre Hernandes Barrozo
🔣

Resten
🔣

Matt Higgins
🔣

Alex Justesen
🔣

Craig Kelly
🔣

Krzysztof Rewak
🔣

John Baumert
🔣

Charlie Cox
🔣

Miles Kane
🔣

Anthony Laflamme
💻

Georg Engelsmann
🔣

Clinton Williams
🔣

Brent Busby
🔣

kenster89
🔣

Adilet Sarsembayev
🔣

Pranav Davar
🔣

b-mc2
🔣

Nicole
🔣

Nicholas Hance
🔣

Joachim Nilsson
🔣

Alejandro Lopez Rocha
🔣

zshapleigh
🔣

Praval Visvanath
🔣

JohnHenry
🔣

Alfredo Garcia
🔣

Qerewe
🔣

Nathan Peters
🔣

Erich Cervantez
🔣

Ronald Sahagun
🔣

Greg W.
🔣

David Holm
🔣

sadilett
🔣

Ryan Mallette
🔣

Chris Condreay
🔣

Wi5ARD
🔣

JP Bulman
🔣

Sara
🔣

Sean
🔣
This project follows the [all-contributors](https://github.com/all-contributors/all-contributors) specification. Contributions of any kind welcome!
## 📊 Statistics
> Last updated: 2024-11-01
### Overview
- Total Breweries: 8,355
- Data Completeness: 78.0%
### 🏛 Top 10 States by Brewery Count
| State | Count |
|-------|-------|
| California | 918 |
| Washington | 486 |
| Colorado | 448 |
| New York | 419 |
| Michigan | 375 |
| Texas | 352 |
| Pennsylvania | 345 |
| Florida | 312 |
| North Carolina | 307 |
| Ohio | 303 |
### 🍺 Brewery Types Distribution
| Type | Count | Percentage |
|------|--------|------------|
| micro | 4,305 | 51.5% |
| brewpub | 2,500 | 29.9% |
| planning | 684 | 8.2% |
| regional | 225 | 2.7% |
| closed | 216 | 2.6% |
| contract | 192 | 2.3% |
| large | 90 | 1.1% |
| proprietor | 69 | 0.8% |
| bar | 37 | 0.4% |
| taproom | 20 | 0.2% |
| nano | 13 | 0.2% |
| beergarden | 3 | 0.0% |
| location | 1 | 0.0% |
### 🌆 Top 10 Cities by Brewery Count
| City | Count |
|------|--------|
| Denver, Colorado | 92 |
| San Diego, California | 91 |
| Portland, Oregon | 85 |
| Seattle, Washington | 80 |
| Chicago, Illinois | 64 |
| Austin, Texas | 49 |
| Houston, Texas | 40 |
| San Francisco, California | 39 |
| Minneapolis, Minnesota | 38 |
| Cincinnati, Ohio | 34 |
### 📋 Data Completeness by Field
| Field | Completeness |
|-------|-------------|
| name | 100.0% |
| brewery_type | 100.0% |
| city | 100.0% |
| state_province | 100.0% |
| postal_code | 100.0% |
| country | 100.0% |
| address_1 | 91.0% |
| phone | 90.0% |
| website_url | 86.0% |
| longitude | 72.0% |
| latitude | 72.0% |
| address_2 | 1.0% |
| address_3 | 0.0% |