{"id":13554072,"url":"https://github.com/hypothesis/via","last_synced_at":"2025-08-15T04:38:26.584Z","repository":{"id":1024641,"uuid":"225935491","full_name":"hypothesis/via","owner":"hypothesis","description":"Proxies third-party PDF files and HTML pages with the Hypothesis client embedded, so you can annotate them","archived":false,"fork":false,"pushed_at":"2025-08-04T08:06:16.000Z","size":14109,"stargazers_count":23,"open_issues_count":49,"forks_count":9,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-08-04T10:58:54.762Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://via.hypothes.is/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hypothesis.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-12-04T18:51:25.000Z","updated_at":"2025-08-04T08:02:40.000Z","dependencies_parsed_at":"2023-10-02T07:49:04.182Z","dependency_job_id":"73baa464-df2c-48af-b20f-ad9db8b83dc8","html_url":"https://github.com/hypothesis/via","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/hypothesis/via","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hypothesis%2Fvia","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hypothesis%2Fvia/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hypothesis%2Fvia/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hypothesis%2Fvia/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hypothesis","download_url":"https://codeload.github.com/hypothesis/via/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hypothesis%2Fvia/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":270524428,"owners_count":24600195,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-15T02:00:12.559Z","response_time":110,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T12:02:39.059Z","updated_at":"2025-08-15T04:38:26.565Z","avatar_url":"https://github.com/hypothesis.png","language":"Python","funding_links":[],"categories":["JavaScript"],"sub_categories":[],"readme":"\u003ca href=\"https://github.com/hypothesis/via/actions/workflows/ci.yml?query=branch%3Amain\"\u003e\u003cimg src=\"https://img.shields.io/github/actions/workflow/status/hypothesis/via/ci.yml?branch=main\"\u003e\u003c/a\u003e\n\u003ca\u003e\u003cimg src=\"https://img.shields.io/badge/python-3.11-success\"\u003e\u003c/a\u003e\n\u003ca href=\"https://github.com/hypothesis/via/blob/main/LICENSE\"\u003e\u003cimg src=\"https://img.shields.io/badge/license-BSD--2--Clause-success\"\u003e\u003c/a\u003e\n\u003ca href=\"https://github.com/hypothesis/cookiecutters/tree/main/pyramid-app\"\u003e\u003cimg src=\"https://img.shields.io/badge/cookiecutter-pyramid--app-success\"\u003e\u003c/a\u003e\n\u003ca href=\"https://black.readthedocs.io/en/stable/\"\u003e\u003cimg src=\"https://img.shields.io/badge/code%20style-black-000000\"\u003e\u003c/a\u003e\n\n# Via\n\nAn app that proxies web pages and PDF files and injects the Hypothesis client so you can annotate them.\n\n## Setting up Your Via Development Environment\n\nFirst you'll need to install:\n\n* [Git](https://git-scm.com/).\n  On Ubuntu: `sudo apt install git`, on macOS: `brew install git`.\n* [GNU Make](https://www.gnu.org/software/make/).\n  This is probably already installed, run `make --version` to check.\n* [pyenv](https://github.com/pyenv/pyenv).\n  Follow the instructions in pyenv's README to install it.\n  The **Homebrew** method works best on macOS.\n  The **Basic GitHub Checkout** method works best on Ubuntu.\n  You _don't_ need to set up pyenv's shell integration (\"shims\"), you can\n  [use pyenv without shims](https://github.com/pyenv/pyenv#using-pyenv-without-shims).\n* [Docker Desktop](https://www.docker.com/products/docker-desktop/).\n  On Ubuntu follow [Install on Ubuntu](https://docs.docker.com/desktop/install/ubuntu/).\n  On macOS follow [Install on Mac](https://docs.docker.com/desktop/install/mac-install/).\n* [Node](https://nodejs.org/) and npm.\n  On Ubuntu: `sudo snap install --classic node`.\n  On macOS: `brew install node`.\n* [Yarn](https://yarnpkg.com/): `sudo npm install -g yarn`.\n\nThen to set up your development environment:\n\n```terminal\ngit clone https://github.com/hypothesis/via.git\ncd via\nmake services\nmake devdata\nmake help\n```\n\nTo run Via locally run `make dev` and visit http://localhost:9083.\n\n## Changing the Project's Python Version\n\nTo change what version of Python the project uses:\n\n1. Change the Python version in the\n   [cookiecutter.json](.cookiecutter/cookiecutter.json) file. For example:\n\n   ```json\n   \"python_version\": \"3.10.4\",\n   ```\n\n2. Re-run the cookiecutter template:\n\n   ```terminal\n   make template\n   ```\n\n3. Re-compile the `requirements/*.txt` files.\n   This is necessary because the same `requirements/*.in` file can compile to\n   different `requirements/*.txt` files in different versions of Python:\n\n   ```terminal\n   make requirements\n   ```\n\n4. Commit everything to git and send a pull request\n\n## Changing the Project's Python Dependencies\n\n### To Add a New Dependency\n\nAdd the package to the appropriate [`requirements/*.in`](requirements/)\nfile(s) and then run:\n\n```terminal\nmake requirements\n```\n\n### To Remove a Dependency\n\nRemove the package from the appropriate [`requirements/*.in`](requirements)\nfile(s) and then run:\n\n```terminal\nmake requirements\n```\n\n### To Upgrade or Downgrade a Dependency\n\nWe rely on [Dependabot](https://github.com/dependabot) to keep all our\ndependencies up to date by sending automated pull requests to all our repos.\nBut if you need to upgrade or downgrade a package manually you can do that\nlocally.\n\nTo upgrade a package to the latest version in all `requirements/*.txt` files:\n\n```terminal\nmake requirements --always-make args='--upgrade-package \u003cFOO\u003e'\n```\n\nTo upgrade or downgrade a package to a specific version:\n\n```terminal\nmake requirements --always-make args='--upgrade-package \u003cFOO\u003e==\u003cX.Y.Z\u003e'\n```\n\nTo upgrade **all** packages to their latest versions:\n\n```terminal\nmake requirements --always-make args=--upgrade\n```\n\nConfiguration\n-------------\n\nEnvironment variables:\n\n| Name                       | Purpose                                                                                                                                                                                                                                                             | Example                             |\n|----------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------|\n| `CHECKMATE_URL`            | The URL of the URL Checkmate instance to use                                                                                                                                                                                                                        | `https://checkmate.example.com`     |\n| `CHECKMATE_API_KEY`        | API key to authenticate with Checkmate                                                                                                                                                                                                                              |                                     |\n| `CHECKMATE_ALLOW_ALL`      | Whether to bypass Checkmate's allow-list (and use only the blocklist)                                                                                                                                                                                               | `true`                              |\n| `CHECKMATE_IGNORE_REASONS` | Comma-separated list of Checkmate block reasons to ignore                                                                                                                                                                                                           | `publisher-blocked,high-io`         |\n| `CLIENT_EMBED_URL`         | The URL of the client's embed script                                                                                                                                                                                                                                | `https://hypothes.is/embed.js`      |\n| `DATA_DIRECTORY`           | Directory for externally provided data                                                                                                                                                                                                                              | `/via-data`                         |\n| `ENABLE_FRONT_PAGE`        | Show a front page at the root URL                                                                                                                                                                                                                                   | `true`                              |\n| `NEW_RELIC_*`              | Various New Relic settings. See New Relic's docs for details                                                                                                                                                                                                        |                                     |\n| `NGINX_SECURE_LINK_SECRET` | The NGINX secure links signing secret. This is used by Via's Python endpoints to generate the signed URLs required by its NGINX-implemented `/proxy/static/` endpoint. All instances of Via must have this setting                                                  |                                     |\n| `NGINX_SERVER`             | The URL of Via's NGINX server for proxying PDF files                                                                                                                                                                                                                | `https://via.hypothes.is`           |\n| `SENTRY_*`                 | Various Sentry settings. See Sentry's docs for details                                                                                                                                                                                                              |                                     |\n| `SIGNED_URLS_REQUIRED`     | Require URLs to Via's Python endpoints to be signed so that Via can only be used by something that has the URL signing secret. Public instances of Via should _not_ enable this. Private instances of Via (e.g. the LMS app's instance of Via) _should_ enable this | `true`                              |\n| `VIA_HTML_URL`             | The URL of the Via HTML instance to redirect to for proxying HTML pages                                                                                                                                                                                             | `https://viahtml.hypothes.is/proxy` |\n| `VIA_SECRET`               | The secret that must be used to sign URLs to Via's Python endpoints if `SIGNED_URLS_REQUIRED` is on                                                                                                                                                                 |                                     |\n\nExpected data:\n\nThe following data is expected to be provided in the `DATA_DIRECTORY`:\n\n * `google_drive_credentials.json` - A list of credential JSON objects provided by the Google API console\n * `google_drive_resource_keys.json` - A dict of file ids to resource keys\n\nError codes\n-----------\n\nMost error codes have their natural meanings, but there are some special cases\nthat we catch. Those marged with **(error)** are from our point of view \nsomething which is worth investigation. Others are a normal part of the \nday-to-day.\n\n### All end-points:\n\n * 401 - The user needs a secure token and either they don't have one, or it's \n   invalid\n\n### General end-points:\n\n * 400 - The user has make a mistake in calling us or given us a bad URL\n * 408 - We timed out accessing a website\n * 409 - A website we have called gave us a conventional error like a \n   connection error. Trying again could help.\n * 417 - A website has given us an unexpected response. This could be anything.\n   Trying again could help.\n\n### `/google_drive/*`\n\n * 403 - The user has given us a URL which doesn't grant us permission to \n   download it\n * 404 - The user has given us an invalid URL or one we _really_ don't have \n   permission to see\n * **408 (error)** - We timed out trying to make a connection to Google \n * **409 (error)** - We had a conventional error like a connection error\n * **417 (error)** - We got an unexpected response from Google\n * **423 (error)** - Google has blocked the file as malicous\n * **429 (error)** - We have been rate limited by Google\n\nUpdating the PDF viewer\n-----------------------\n\nVia serves PDFs using [PDF.js](https://mozilla.github.io/pdf.js/). PDF.js is\nvendored into the source tree and the viewer HTML is patched to load the Hypothesis\nclient. To update the PDF viewer, run `make update-pdfjs`.\n\nHow Via works\n-------------\n\nVia allows users to annotate arbitrary web pages or PDF files by proxying the\npage or file and injecting the Hypothesis client. Users go to\n\u003chttps://via.hypothes.is/\u003e and paste in a PDF or HTML URL (or visit\n`https://via.hypothes.is/\u003cSOME_URL\u003e` directly) and Via responds with an\nannotatable version.\n\n### Via's architecture\n\nVia is composed of four separable components:\n\n1. A **top-level component** that responds to requests to the top-level\n   `/\u003cTHIRD_PARTY_URL\u003e` endpoint by deciding whether the URL is a PDF file or\n   not and redirecting the browser to either the PDF viewer component or the\n   HTML proxying component accordingly.\n\n   This component is implemented in Python / Pyramid.\n\n   The Pyramid app sends a GET request to the third-party URL but only\n   downloads the response headers not the body. It looks at the Content-Type\n   header to determine whether the body is a PDF file or not.\n\n   If it's a PDF file then it redirects to the PDF viewer component:\n   `/pdf/\u003cTHIRD_PARTY_URL\u003e`.\n\n   If it's an HTML file then it redirects to the HTML proxy component.\n\n   The Pyramid app also handles various other bits and bobs such as serving up\n   the front page, handling special `via.*` query params, serving static files\n   such as PDF.js's assets, etc etc.\n\n2. A **PDF viewer component** that renders a modified version of PDF.js with the Hypothesis client embedded.\n\n   This is what enables users to annotate PDF files.\n\n   The PDF viewer is also implemented in Python / Pyramid (and JavaScript served by the Pyramid app).\n\n   The PDF viewer responds to requests to `/pdf/\u003cTHIRD_PARTY_URL\u003e` by rendering\n   a version of [PDF.js](https://mozilla.github.io/pdf.js/) with the Hypothesis\n   client embedded, and configuring PDF.js to download the PDF file from the\n   static file proxy component.\n\n3. A **static files proxy component** that simply proxies static files to get around CORS.\n\n   This component is implemented in NGINX (in the [nginx.conf](nginx/nginx.conf) file) for efficiency.\n\n   This component responds to requests to the `/proxy/static/\u003cTHIRD_PARTY_URL\u003e`\n   endpoint, such as PDF.js's download requests for PDF files.\n\n   Many PDF hosts use\n   [CORS](https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS) headers to\n   prevent JavaScript cross-origin requests (such as requests from our copy of\n   PDF.js) from downloading the file.\n   See [Can I load a PDF from another server (cross domain request)?](https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#can-i-load-a-pdf-from-another-server-cross-domain-request)\n   in the PDF.js FAQ.\n\n   To get around this we proxy the PDF file through our own server so that\n   browsers no longer see PDF.js's download request as a cross-origin request.\n\n   In the future we'll also use this component to proxy some static resources\n   of web pages for the same reason.\n\n4. A **rewriting HTML proxy component** that proxies HTML pages and injects the Hypothesis client.\n\n   This is what enables users to annotate web pages.\n\n   The HTML proxy isn't implemented yet. Via currently redirects to legacy Via for HTML proxying.\n\n   The HTML proxy's job is to enable annotating of HTML pages by proxying the\n   page and injecting the Hypothesis client into it.\n\n   It also has to rewrite various elements of the page that would otherwise\n   break because the page is being proxied.\n\n### How Via works in production\n\nIn production both NGINX and Gunicorn (the WSGI server for the Python / Pyramid\napp) run inside a single Docker container defined by the app's `Dockerfile`.\n\nNGINX runs on port 9083 in the Docker container, which is exposed to the\noutside world.\n\nGunicorn runs on a UNIX socket that is accessible to NGINX within the Docker\ncontainer but is not directly accessible to the outside world.\n\nNGINX is \"in front of\" Gunicorn in production:\n\n1. All requests from user's browsers first go to NGINX on the Docker container's port 9083.\n\n2. If the request is to a URL that NGINX handles directly (such as a\n   `/proxy/static/*` URL) then NGINX just responds directly.\n\n3. If the request is to one of the URLs that should be handled by the Pyramid\n   app then NGINX proxies to Gunicorn on a UNIX socket.\n\n### How Via works in development\n\nIn development NGINX runs in Docker Compose and is exposed at\nhttp://localhost:9083/. This is defined in `docker-compose.yml`. The app's\n`Dockerfile` isn't used in development, but the NGINX running in Docker Compose\nin development does use the same `nginx.conf` file as the NGINX running in\nDocker in production.\n\nThe Python WSGI server (Gunicorn) runs on the host (no Docker) and is exposed\nat http://localhost:9082/. The NGINX running on `:9083` proxies to the Gunicorn\non `:9082`.\n\n### WhiteNoise\n\nThe Pyramid app uses [WhiteNoise](http://whitenoise.evans.io/) to serve static\nfiles in a CDN-friendly (caching-friendly) way. WhiteNoise serves the Python\napp's static files in an efficient way and with the appropriate caching headers,\ncompression, etc.\n\nWhiteNoise is a piece of [WSGI middleware](https://www.python.org/dev/peps/pep-3333/#middleware-components-that-play-both-sides)\nthat wraps our Pyramid WSGI app. Rather than proxying to Pyramid directly\nGunicorn actually proxies to WhiteNoise which either responds directly (if the\nrequest is for a static file) or proxies to Pyramid.\n\n### See also\n\n* [Caching strategy](docs/caching-strategy.md)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhypothesis%2Fvia","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhypothesis%2Fvia","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhypothesis%2Fvia/lists"}