{"id":19659389,"url":"https://github.com/gary-lgy/https-proxy","last_synced_at":"2025-04-30T23:34:50.545Z","repository":{"id":100199184,"uuid":"414990173","full_name":"gary-lgy/https-proxy","owner":"gary-lgy","description":"Transparent HTTPS proxy written in C using `epoll`","archived":false,"fork":false,"pushed_at":"2021-12-03T10:26:03.000Z","size":174,"stargazers_count":5,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-11-11T15:49:09.375Z","etag":null,"topics":["c","epoll","linux","networking","tcp","tunnel"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gary-lgy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-10-08T13:06:32.000Z","updated_at":"2024-11-05T14:15:52.000Z","dependencies_parsed_at":"2023-05-12T21:30:22.328Z","dependency_job_id":null,"html_url":"https://github.com/gary-lgy/https-proxy","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gary-lgy%2Fhttps-proxy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gary-lgy%2Fhttps-proxy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gary-lgy%2Fhttps-proxy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gary-lgy%2Fhttps-proxy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gary-lgy","download_url":"https://codeload.github.com/gary-lgy/https-proxy/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":233280459,"owners_count":18652300,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c","epoll","linux","networking","tcp","tunnel"],"created_at":"2024-11-11T15:41:55.429Z","updated_at":"2025-01-10T01:21:43.879Z","avatar_url":"https://github.com/gary-lgy.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Transparent HTTPS Proxy\n\nThis is a transparent HTTPS proxy written using Linux `epoll`.\n\n## How It Works\n\n![How It Works](docs/proxy-how-it-works.svg)\n\nTo initiate an HTTP request, the client sends a\n[HTTP `CONNECT`](https://httpwg.org/specs/rfc7231.html#rfc.section.4.3.6) request to the proxy indicating the target\nserver to connect to. Upon receiving this request, the proxy looks up the IP address of the target server and\nestablishes a TCP connection with the target server.\n\nOnce established, the proxy sends a `200 Connection established` response to the client. Subsequently, the proxy acts as\nan opaque, full-duplex TCP tunnel between a client and a target server and relays any data sent in either direction.\nSince it only speaks TCP and does not understand TLS/HTTP, it will not be able to decrypt the TLS traffic or modify the\nHTTP message.\n\nWhen either the client or the target server closes the connection, the proxy also closes the connection with the other\nend.\n\n## Extensions\n\n### Statistics\n\nIf this optional feature is enabled, the proxy will print the number of bytes transferred and duration of each TCP\nconnection.\n\n### Blocklist\n\nIf a blocklist is provided, the proxy will reject any connections based on rules specified in the blocklist. Each line\nin the blocklist specifies a string. Any domain name that contains any of the blocked strings will be blocked.\n\nFor example, the following blocklist will block connections to 'google.com', 'google.com.au', 'facebook.com', '\ngraph.facebook.com', etc.\n\n```\ngoogle\nfacebook.com\n```\n\n## Compile The Source Code\n\nRequires GCC and `make`.\n\n```bash\nmake\n```\n\nThe executable will be in `./out` directory.\n\n## Usage\n\n```bash\n./out/proxy port enable_stats path_to_blocklist [thread_count]\n```\n\nFor example, to start the proxy with the following configurations,\n\n- listen on port 3000\n- enable stats\n- use a blocklist file with name `blocklist.txt` in `./out` directory\n- use 8 threads\n\nrun `./out/proxy 3000 1 out/blocklist.txt 8`\n\nNote: The default number of threads is 8 if `thread_count` is not specified. At least 2 threads are required (the reason\nfor this is explained later).\n\n## Design\n\n### Efficient Network IO with `epoll`\n\n\u003cdetails\u003e\n\u003csummary\u003eWhy blocking IO is not an option\u003c/summary\u003e\n\nThe proxy needs to read from both ends and send any data we receive from one end to the other end.\n\nIf we have read all the data from a sender, subsequent attempts to read more bytes from the socket will block the\ncurrent thread until more data arrives. Similarly, if we send data to a receiver, and the receiver's TCP buffer fills\nup, subsequent attempts to send more bytes will block until the remote buffer has space again.\n\nIf a thread is blocked for IO, it cannot process other connections until the IO completes. This stalls all the pending\nrequests that are yet to be served.\n\nOne way to work around this issue is to create a new thread for each blocking operation. However, this approach would\nnot scale well when we have many connections open.\n\nTry loading https://www.reddit.com and see how many HTTP requests it makes. On my machine it makes 150 (!) requests in\nthe first 10 seconds of loading the page, without any user interaction. If each request is served on a new thread, we\nwould create 150 new threads just to serve the homepage of a single website.\n\u003c/details\u003e\n\nProxying network traffic is inherently an IO-bound task. The performance of the proxy heavily depends on how we handle\nIO in a scalable manner. To do this, we must abandon the blocking and synchronous programming paradigm and adopt\nevent-driven, asynchronous IO. With event-driven IO, instead of calling `read()` and `write()` directly (which risks\nblocking the current thread), we register the file descriptor we would like to read or write and receive a notification\nwhen the file descriptor becomes available.\n\nDifferent operating systems provide different tools for the job. The Linux kernel provides `select`, `poll`, and `epoll`\n, all of which are mechanisms for us to monitor a set of file descriptors and receive a notification when any of them\nbecomes available.\n\n- `select` only informs the user when _some_ file descriptor is ready for IO and does not tell us which one. We need to\n  scan all the monitored file descriptors to find out which ones are actually ready. Furthermore, it can only monitor up\n  to `FD_SETSIZE` (typically 1024) of file descriptors at a time.\n- `poll` doesn't have a fixed limit of descriptors it can monitor at a time, but still requires us to do a linear scan\n  of all monitored file descriptors.\n- `epoll` is meant to replace the older POSIX `select` and `poll` system calls to achieve better performance in more\n  demanding applications, where the number of watched file descriptors is large. It has no fixed limits to number of\n  watched file descriptors, and will helpfully report which file descriptors among all those watched are ready. However,\n  it is Linux specific.\n\nIn our implementation, we use `epoll` to perform IO multiplexing. When we need to perform IO on a socket, we don't\ncall `read` or `write` directly. Instead, we add it to our `epoll` instance and watch it for IO readiness. Only\nafter `epoll` notifies us that the socket is ready do we perform the IO. Meanwhile, we can service other sockets that\nare ready. This allows each thread to handle many connections concurrently even on a single thread.\n\n### Asynchronous DNS resolution\n\nThe typical way to perform DNS resolution in C is to call the `getaddrinfo` library function. Unfortunately, this is a\nblocking call. In some cases, we observed `getaddrinfo` to block the calling thread for up to 6 seconds when looking up\na domain name that is probably not in the DNS cache. This stalls all the pending tasks on the current thread, including\nthe data forwarding using `epoll`, producing very user-noticeable delays.\n\nTo solve this problem, we use a small external library `asyncaddrinfo` (link below) which wraps the\nblocking `getaddrinfo` call in an asynchronous API. Internally, it uses a configurable number of worker threads to\ncall `getaddrinfo` and gives us a file descriptor to receive the call result.\n\nWe can conveniently add the file descriptor into our `epoll` instance and wait for its readability. This allows the\nthread to keep on serving other requests while `getaddrinfo` is being called concurrently.\n\nWe allocate 25% of our threads to `asyncaddrinfo`, i.e., if we run with 8 threads, then 2 threads will be\nfor `asyncaddrinfo`. At least one thread must be allocated to `asyncaddrinfo`. This is the reason why the proxy needs at\nleast 2 threads (the other thread is to run an `epoll` instance and handle IO on sockets).\n\n### Multithreading and Synchronization\n\nOnce a connection is accepted from the client on a thread, that thread is responsible for the lifetime of the\nconnection. As a result, there will be no race conditions and no additional synchronisation mechanisms are needed.\n\n## External Libraries Used\n\n### asyncaddrinfo\n\n- Repository: https://github.com/firestuff/asyncaddrinfo\n- Source included under `lib/asyncaddrinfo`\n- BSD License\n\n## References\n\n- https://en.cppreference.com/w/c\n- https://stackoverflow.com/\n- [How to use epoll? - a complete example in C](https://web.archive.org/web/20170427121729/https://banu.com/blog/2/how-to-use-epoll-a-complete-example-in-c/)\n- [RFC 7231 Section 4.3.6 CONNECT](https://httpwg.org/specs/rfc7231.html#rfc.section.4.3.6)\n- Linux manual pages (e.g., `man socket`, etc)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgary-lgy%2Fhttps-proxy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgary-lgy%2Fhttps-proxy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgary-lgy%2Fhttps-proxy/lists"}