{"id":13684261,"url":"https://github.com/cmusam/fortune500","last_synced_at":"2025-04-30T20:33:50.274Z","repository":{"id":53474861,"uuid":"163062571","full_name":"cmusam/fortune500","owner":"cmusam","description":"Fortune 500 company lists since 1955 in CSV format, mostly parsed using Beautiful Soup","archived":false,"fork":false,"pushed_at":"2021-03-29T18:24:56.000Z","size":3472,"stargazers_count":86,"open_issues_count":4,"forks_count":99,"subscribers_count":4,"default_branch":"master","last_synced_at":"2024-11-12T05:37:16.578Z","etag":null,"topics":["business","csv","economics","finance","fortune"],"latest_commit_sha":null,"homepage":"","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cmusam.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-12-25T08:22:36.000Z","updated_at":"2024-08-24T10:58:09.000Z","dependencies_parsed_at":"2022-09-03T13:22:31.455Z","dependency_job_id":null,"html_url":"https://github.com/cmusam/fortune500","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmusam%2Ffortune500","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmusam%2Ffortune500/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmusam%2Ffortune500/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmusam%2Ffortune500/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cmusam","download_url":"https://codeload.github.com/cmusam/fortune500/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251777732,"owners_count":21642214,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["business","csv","economics","finance","fortune"],"created_at":"2024-08-02T14:00:31.647Z","updated_at":"2025-04-30T20:33:50.269Z","avatar_url":"https://github.com/cmusam.png","language":"HTML","funding_links":[],"categories":["Datasets"],"sub_categories":[],"readme":"# Fortune 500 company lists (1955-2019)\n\n## Usage\nThe dataset is under the `csv/` directory.\n\n\u003e The Fortune 500 is an annual list compiled and published by [Fortune](https://en.wikipedia.org/wiki/Fortune_(magazine)) magazine that ranks 500 of the largest United States corporations by total revenue for their respective fiscal years.\n\n## How is this dataset collected?\nThe data come from a variety of sources, as I failed to find a single complete dataset that contains all lists from 1955 to 2018.\n\n## 2019-\nI'll be manually updating them.\n- 2019: https://fortune.com/fortune500/2019/search/\n\n\n## 2015-2018\nhttp://fortune.com/fortune500/2015/list only loads the top 20 companies. More rows can be loaded by scrolling down to page bottom.\n\n1. On the webpage, open [Developer Tools](https://developers.google.com/web/tools/chrome-devtools/).\n2. Scroll to page bottom to load the next 30 companies (ranked 21 through 50).\n3. In the **Network** panel, you can find a request whose type is [Fetch](https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API/Using_Fetch).\n4. Right click on the request to reveal link `http://fortune.com/api/v2/list/1141696/expand/item/ranking/asc/20/30`\n5. After inspecting, we find that `/20/30` means **skip 20 and take 30**, equivalent to getting row 21 through row 50.\n6. It seems this API gives at most 100 rows per call. So, we can access `http://fortune.com/api/v2/list/1141696/expand/item/ranking/asc/0/100` to get the first 100 companies, and `http://fortune.com/api/v2/list/1141696/expand/item/ranking/asc/100/100` to get the next 100, and so on.\n7. Finally, use the Python `json` package to parse the JSON files, and build the CSV files.\n\nData source:\n- homepage for 2015: http://fortune.com/fortune500/2015/list\n- 1-100: http://fortune.com/api/v2/list/1141696/expand/item/ranking/asc/0/100\n- 101-200: http://fortune.com/api/v2/list/1141696/expand/item/ranking/asc/100/100\n- 201-300: http://fortune.com/api/v2/list/1141696/expand/item/ranking/asc/200/100\n- 301-400: http://fortune.com/api/v2/list/1141696/expand/item/ranking/asc/300/100\n- 401-500: http://fortune.com/api/v2/list/1141696/expand/item/ranking/asc/400/100\n\n\n## 2013-2014\nThe data are from [FortuneChina.com](http://www.FortuneChina.com), the official website of Fortune magazine for China.\n\nData source: \n```Python3\nurl_2013 = 'http://www.fortunechina.com/fortune500/c/2013-05/06/content_154796.htm'\nurl_2014 = 'http://www.fortunechina.com/fortune500/c/2014-06/02/content_207496.htm'\n```\n\n## 2006-2012\nThe data are scrapped manually from the sources below, because the HTML pages containing 2006-2012 data do not follow a uniform structure.\n\nData source:\n```Python3\nbase = 'https://money.cnn.com/magazines/fortune/fortune500/{}/full_list/{}.html'\npages = ('index', '101_200', '201_300', '301_400', '401_500')\nurls = [base.format(year, page) for year in range(2006,2013) for page in pages]\n```\n\n## 1955-2005\nHTML sources are downloaded using `urllib`, parsed using [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/), and saved as CSV. \n\nData source:\n```Python3\nbase = 'https://money.cnn.com/magazines/fortune/fortune500_archive/full/{}/{}.html'\nurls = [base.format(year, page) for year in range(1955,2006) for page in (1,101,201,301,401)]\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcmusam%2Ffortune500","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcmusam%2Ffortune500","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcmusam%2Ffortune500/lists"}