{"id":14037009,"url":"https://github.com/JoshData/crs-reports-website","last_synced_at":"2025-07-27T04:33:37.390Z","repository":{"id":66126346,"uuid":"65489022","full_name":"JoshData/crs-reports-website","owner":"JoshData","description":"The build process for EveryCRSReport.com.","archived":false,"fork":false,"pushed_at":"2024-11-16T22:05:40.000Z","size":851,"stargazers_count":65,"open_issues_count":12,"forks_count":8,"subscribers_count":6,"default_branch":"main","last_synced_at":"2024-11-25T12:09:01.216Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://www.EveryCRSReport.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"cc0-1.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JoshData.png","metadata":{"files":{"readme":"README.md","changelog":"history_histogram.py","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-08-11T17:37:35.000Z","updated_at":"2024-11-16T22:05:43.000Z","dependencies_parsed_at":null,"dependency_job_id":"6803b6cb-6c3f-499a-8450-6db057bc3ab0","html_url":"https://github.com/JoshData/crs-reports-website","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JoshData%2Fcrs-reports-website","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JoshData%2Fcrs-reports-website/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JoshData%2Fcrs-reports-website/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JoshData%2Fcrs-reports-website/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JoshData","download_url":"https://codeload.github.com/JoshData/crs-reports-website/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":227762388,"owners_count":17816019,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-12T03:02:24.378Z","updated_at":"2024-12-02T16:31:29.918Z","avatar_url":"https://github.com/JoshData.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# EveryCRSReport.com\n\nThis repository builds the website at [EveryCRSReport.com](https://www.everycrsreport.com).\n\nIt's a totally static website. The scripts here generate the static HTML that gets copied into a public URL.\n\n## Local Development\n\nThe website build process is written in Python 3. Prepare your development environment:\n\n\tpip3 install -r requirements.txt\n\nAlthough the full website build requires access to a private source archive of CRS reports, which you probably don't have access to, you can run the core website build process on the public reports. Download some of the reports using the bulk download example script:\n\n\tpython3 bulk-download.py\n\t(CTRL+C at any time once you have as much as you want)\n\nRun the build process:\n\n\t./build.py\n\nwhich generates the static files of the website into the `build` directory. To view the generated website, you can run:\n\n\t(cd static-site; python -m http.server)\n\nand then visit http://localhost:8000/ in your web browser.\n\n\n## Production Site Configuration\n\n### Algolia search account\n\nWe use Algolia.com as a hosted facted search service index service.\n\n* Create an index on Algolia. You'll put the name of the index into `credentials.txt` later.\n* Get the client ID, admin API key (read-write access to the index), and search-only access key (read-only/public access to the index). You'll put these into `credentials.txt` later.\n\n### Server Preparation\n\nInstall packages and make a virtual environment (based on Ubuntu 22.04):\n\n\tsudo apt install python3-virtualenv unzip pandoc msmtp\n\tvirtualenv venv\n\tsource venv/bin/activate\n\tpip install -r requirements.txt\n\nGet the PDF redaction script, install its dependencies, and install QPDF:\n\n\tmkdir lib\n\tcd lib\n\n\twget https://raw.githubusercontent.com/JoshData/pdf-redactor/master/pdf_redactor.py\n\tpip install $(curl https://raw.githubusercontent.com/JoshData/pdf-redactor/master/requirements.txt)\n\n\twget https://github.com/qpdf/qpdf/releases/download/v11.9.1/qpdf-11.9.1-bin-linux-x86_64.zip\n\tunzip -d qpdf qpdf-11.9.1-bin-linux-x86_64.zip\n\n\tcd ..\n\nCreate a new file named `secrets/credentials.txt`. And add the Algolia account information.\n\n\tALGOLIA_CLIENT_ID=...\n\tALGOLIA_ADMIN_ACCESS_KEY=...\n\tALGOLIA_SEARCH_ACCESS_KEY=...\n\tALGOLIA_INDEX_NAME=...\n\nCreate a new file named `secrets/credentials.google_service_account.json` and place a Google API System Account's JSON credentials in the file. The credentials should have access to the EveryCRSReport.com Google Analytics view.\n\nCreate symlinks here for where the source report files are stored and where the static site will be built into:\n\n\tln -s /mnt/volume_nyc1_01/source-reports/ .\n\tln -s /mnt/volume_nyc1_02/processed-reports/ .\n\tln -s /mnt/volume_nyc1_01/static-site/ .\n\nSet up nginx \u0026 certbot:\n\n\tapt install nginx certbot python3-certbot-nginx\n\trmdir /var/www/html # clear it out first\n\tln -s /mnt/volume_nyc1_01/static-site/ /var/www/html\n\tchmod a+rx /home/user/\n\tcertbot -d www.everycrsreport.com\n\n### Running the site generator\n\nTo generate \u0026 update the website, run:\n\n\t./run.sh\n\nUnder the hood, this:\n\n* Prepares the raw files for publication, creating new JSON and sanitizing the HTML and PDFs, saving the new files into `reports/`. This step is quite slow, but it will only process new files on each run. If our code changes and the sanitization process has been changed, delete the whole `reports/` directory so it re-processes everything from scratch. (`process_incoming.py`) \n\n* Queries Google Analytics for top-accessed reports in the last week.\n\n* Generates the complete website in the `static-site/` directory. (`build.py`)\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FJoshData%2Fcrs-reports-website","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FJoshData%2Fcrs-reports-website","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FJoshData%2Fcrs-reports-website/lists"}