https://github.com/rtlee9/sic-list
List of SIC codes and descriptions from authoritative sources
https://github.com/rtlee9/sic-list
beautifulsoup industry-classification web-scraping
Last synced: about 1 year ago
JSON representation
List of SIC codes and descriptions from authoritative sources
- Host: GitHub
- URL: https://github.com/rtlee9/sic-list
- Owner: rtlee9
- License: apache-2.0
- Created: 2016-08-15T00:37:57.000Z (almost 10 years ago)
- Default Branch: master
- Last Pushed: 2017-03-14T02:30:46.000Z (over 9 years ago)
- Last Synced: 2025-05-08T04:52:06.981Z (about 1 year ago)
- Topics: beautifulsoup, industry-classification, web-scraping
- Language: Python
- Size: 888 KB
- Stars: 12
- Watchers: 2
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# SIC codes for download -- open source edition
[](https://travis-ci.org/rtlee9/SIC-list)
[](https://coveralls.io/github/rtlee9/SIC-list?branch=)
[](https://codeclimate.com/github/rtlee9/SIC-list)
[](LICENSE)
[](https://www.python.org/download/releases/2.7/)
This repo provides lists of four-digit SIC codes scraped from the websites of two government agencies: the [SEC](https://www.sec.gov/info/edgar/siccodes.htm) and [OSHA](https://www.osha.gov/pls/imis/sic_manual.html). The cleaned lists can be downloaded [here](https://raw.githubusercontent.com/rtlee9/SIC-list/master/data/sec_combined.csv) and [here](https://raw.githubusercontent.com/rtlee9/SIC-list/master/data/osha_combined.csv), respectively, and refresh instructions can be found below.
## Background
The Standard Industrial Classification (SIC) is a system used to classify businesses by their primary business activity, or industry. The SIC system was created in the 1930's and has since been [replaced](https://www.census.gov/eos/www/naics/faqs/faqs.html#q8) as the industry classification system for Federal statistical agencies; however, it is still widely used by many businesses and by some government agencies.
## Authoritative sources
SIC codes were once maintained and assigned by the US government. I've found that only two government agencies currently publish a list of SIC codes and descriptions:
| Source | Version | Use case |
| ------ | ------- | -------- |
| [Occupational Safety & Health Administration (OSHA)](https://www.osha.gov/pls/imis/sic_manual.html) | 1987 SIC manual | Unknown |
| [U.S. Securities and Exchange Commission (SEC)](https://www.sec.gov/info/edgar/siccodes.htm) | No version provided, but the SEC website indicates the webpage was last modified January 25, 2015 | Used in [EDGAR](https://www.sec.gov/edgar/searchedgar/companysearch.html) electronic filings |
The SIC codes provided by the SEC generally align with those provided by OSHA; however, OSHA's SIC manual is more comprehensive -- it contains many more SIC codes than does the SEC's list.
## Other sources
There are a number of online sources that provide SIC codes and descriptions, though I've found none that provide all of the following:
* The source of their data
* Their code, if relevant
* Machine readable data
Taken together, these are important for assessing data quality and reliability. The purpose of this repository is to provide SIC codes in adherence with these standards.
## Usage
The latest data can be found in the root directory. To refresh:
1. Install Python 2.7
1. Install python requirements: `$ pip install -r requirements.txt`
1. From the command line run `$ python src/main.py`
## License
[Apache License 2.0](LICENSE)