{"id":24796351,"url":"https://github.com/anthonytedja/redirectv1","last_synced_at":"2026-05-09T15:31:24.830Z","repository":{"id":211655899,"uuid":"729669025","full_name":"anthonytedja/redirectV1","owner":"anthonytedja","description":"URL Shortener Service - Low Level","archived":false,"fork":false,"pushed_at":"2023-12-10T00:49:48.000Z","size":12560,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-30T00:33:59.692Z","etag":null,"topics":["bash-script","http-server","java","multithreading","proxy","sqlite"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/anthonytedja.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-12-10T00:36:08.000Z","updated_at":"2023-12-10T01:24:31.000Z","dependencies_parsed_at":"2023-12-16T00:12:52.508Z","dependency_job_id":"6c4b62d2-0fc6-490d-9b1f-8365d82f2ffe","html_url":"https://github.com/anthonytedja/redirectV1","commit_stats":{"total_commits":98,"total_committers":9,"mean_commits":10.88888888888889,"dds":0.5816326530612245,"last_synced_commit":"e62cfa1d53f9a0419e0a61e714ba99cdc3470737"},"previous_names":["anthonytedja/redirect"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anthonytedja%2FredirectV1","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anthonytedja%2FredirectV1/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anthonytedja%2FredirectV1/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anthonytedja%2FredirectV1/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/anthonytedja","download_url":"https://codeload.github.com/anthonytedja/redirectV1/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245343984,"owners_count":20599867,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bash-script","http-server","java","multithreading","proxy","sqlite"],"created_at":"2025-01-30T00:33:08.656Z","updated_at":"2026-05-09T15:31:19.779Z","avatar_url":"https://github.com/anthonytedja.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Redirect\n\n\u003e A lightweight scalable URL Shortener system\n\n```mermaid\ngraph TD\n  user((User)):::blue\n  user --- |HTTP Request| proxy\n  proxy --- replica1\n  proxy --- replica2\n  subgraph proxy[\"Proxy Service\"]\n  note([Note: Data is partitioned between replicas \u003cbr\u003e and replicated within replicas]) -.-\n  orchestrator(Multithreaded Orchestrator) --- cache[(\u003cbr\u003eServer Response \u003cbr\u003e Cache)]\n  end\n  subgraph monitor[\"Monitor Service\"]\n  health((Health \u0026 Recovery))\n  end\n  monitor --- proxy\n  monitor -.- host1\n  monitor -.- host2\n  monitor -.- host3\n  monitor -.- host4\n  monitor -.- server1\n  monitor -.- server2\n  monitor -.- server3\n  monitor -.- server4\n  subgraph replica2[\"Replica 2\"]\n  host3([Host 3]) --- server3(Multithreaded Server)\n  server3 --- cache3[(Url Cache)]\n  server3 --- |Buffer| replica2db1[(\u003cbr\u003e replica 2 Data \u003cbr\u003e\u003cbr\u003e)]\n\n  host4([Host 4]) --- server4(Multithreaded Server)\n  server4 --- cache4[(Url Cache)]\n  server4 --- |Buffer| replica2db2[(\u003cbr\u003e replica 2 Data \u003cbr\u003e\u003cbr\u003e)]\n  end\n  subgraph replica1[\"Replica 1\"]\n  host1([Host 1]) --- server1(Multithreaded Server)\n  server1 --- cache1[(Url Cache)]\n  server1 --- |Buffer| replica1db1[(\u003cbr\u003e replica 1 Data \u003cbr\u003e\u003cbr\u003e)]\n\n  host2([Host 2]) --- server2(Multithreaded Server)\n  server2 --- cache2[(Url Cache)]\n  server2 --- |Buffer| replica1db2[(\u003cbr\u003e replica 1 Data \u003cbr\u003e\u003cbr\u003e)]\n  end\n\n%% Colors %%\nclassDef blue fill:#2374f7,stroke:#000,stroke-width:2px,color:#fff\n```\n\nThe following mermaid diagram architecturally describes the system, view in a markdown viewer that supports mermaid diagrams such as GitHub.\n\n## Table of Contents\n\n- [Architecture](#architecture)\n  - [Components](#components)\n    - [GET Data Flow](#get-data-flow)\n    - [PUT Data Flow](#put-data-flow)\n  - [Code Overview](#code-overview)\n- [Running The System](#running-the-system)\n  - [Initial Setup](#initial-setup)\n  - [Configuration](#configuration)\n  - [Usage](#usage)\n  - [Scaling Up](#scaling-up)\n  - [Scaling Down](#scaling-down)\n- [Testing The System](#testing-the-system)\n  - [Performance Testing](#performance-testing)\n    - [Read Test](#read-test)\n    - [Write Test](#write-test)\n  - [Correctness Testing](#correctness-testing)\n- [Analysis](#analysis)\n  - [Load Balancing](#load-balancing)\n  - [Caching](#caching)\n  - [Scalability](#scalability)\n  - [Latency](#latency)\n  - [Throughput](#throughput)\n  - [Availability](#availability)\n  - [Durability](#durability)\n  - [Health Check](#health-check)\n\n## Architecture\n\n### Components\n\nThe system structure is composed of the following components:\n\n- **Proxy**: The proxy service is responsible for receiving HTTP requests from the user and forwarding them to the appropriate server(s). It will load balance the requests between the servers with the orchestrator. It will also cache the responses from the servers to reduce the load on the servers. The proxy is multithreaded and will handle requests concurrently.\n\n- **Orchestrator**: The orchestrator is responsible for managing the groupings of hosts to replicas, along with the load balancing of the requests between the replicas and servers. The strategy used for load balancing is consistent hashing (ring pattern), so the hashing will stay consistent if the number of hosts scale up. By default, the orchestrator will hash the short url to a replica of 2 hosts.\n\n- **Replica**: The replica is an abstraction of a group of hosts and servers. Data is partitioned between the replicas and replicated within the replicas. This way, if a host or server fails, the data will still be available on another host within the replica. Replica groupings are different for every hash key, and each replica size is configurable, defaulting to 2 hosts. The replica size can be adjusted to increase the replication factor. This way, if a host or server fails, the data will still be available on another host within the replica.\n\n- **Host**: The host is responsible for running the server and storing the data. They are multithreaded and will handle requests concurrently. For writes, the server writes to a buffer to minimize the write to the database. For reads, the server will first check the cache for the key value pair of short and long url respectively before checking the database.\n\n- **Monitor**: The monitor service is responsible for monitoring the health of the system. It will check on the health of all hosts and servers every 5 seconds by default. If a host or server is down, it will recover by spawning a new host within the same replica as the failed host for each hash. It will also notify the orchestrator of the new host so that it can be used in replacement of the failed node.\n\n#### GET Data Flow\n\n1. User sends a GET request to the proxy with a short url.\n2. The proxy will check the cache for the short url and returns the server response if found.\n3. The proxy selects a replica to use for the request by hashing the short url.\n4. The proxy forwards the request to a host and will retry until a response is received on a different host in the replica if the host is unreachable.\n5. The host server will check its own cache for the short url and returns the long url if found.\n6. The host server will check the database for the short url and returns the long url if found.\n7. The host server will cache the short and long url pair and return the url to the proxy.\n8. The proxy will cache the short and server response pair.\n9. The proxy will return the server response to the user.\n\n```mermaid\ngraph TD\n  user((User)):::blue\n  user --\u003e |1. GET Request| proxy\n  proxy --\u003e |3 \u0026 4. Forward Request to Host 1 Server| replica1\n  proxy --- replica2\n  proxy --\u003e |9. Redirected URL Response| user\n  subgraph proxy[\"Proxy Service\"]\n  cache --\u003e |2. Server Response| orchestrator\n  orchestrator(Multithreaded Orchestrator):::blue --\u003e |2. Check Cache| cache[(\u003cbr\u003eServer Response \u003cbr\u003e Cache)]:::blue\n  orchestrator --\u003e |8. Cache Short \u0026 Server Response| cache\n  end\n  subgraph replica2[\"Replica 2\"]\n  host3([Host 3]) --- server3(Multithreaded Server)\n  server3 --- cache3[(Url Cache)]\n  server3 --- |Buffer| replica2db1[(\u003cbr\u003e replica 2 Data \u003cbr\u003e\u003cbr\u003e)]\n  host4([Host 4]) --- server4(Multithreaded Server)\n  server4 --- cache4[(Url Cache)]\n  server4 --- |Buffer| replica2db2[(\u003cbr\u003e replica 2 Data \u003cbr\u003e\u003cbr\u003e)]\n  end\n  subgraph replica1[\"Replica 1\"]\n  host1([Host 1]):::blue --\u003e server1(Multithreaded Server):::blue\n  server1 --\u003e |7. Cache Short \u0026 Long URL| cache1\n  cache1 --\u003e |5. Long URL Response| server1\n  server1 --\u003e |5. Check Cache| cache1[(Url Cache)]:::blue\n  replica1db1 --\u003e |6. DB Response| server1\n  server1 --\u003e |6. Check DB| replica1db1[(\u003cbr\u003e replica 1 Data \u003cbr\u003e\u003cbr\u003e)]:::blue\n  host2([Host 2]) --- server2(Multithreaded Server)\n  server2 --- cache2[(Url Cache)]\n  server2 --- |Buffer| replica1db2[(\u003cbr\u003e replica 1 Data \u003cbr\u003e\u003cbr\u003e)]\n  end\n  replica1 --\u003e |7. URL Response| proxy\n\n%% Colors %%\nclassDef blue fill:#2374f7,stroke:#000,stroke-width:2px,color:#fff\n```\n\nThe following mermaid diagram the GET Request Data Flow in blue, view in a markdown viewer that supports mermaid diagrams such as GitHub.\n\n#### PUT Data Flow\n\n1. User sends a PUT request to the proxy with a short and long url.\n2. The proxy selects a replica to use for the request by hashing the short url.\n3. The proxy will forward the request to all hosts in the replica.\n4. The host server will write the short and long url pair to their own buffers.\n5. The host server buffer will flush to the database when it reaches a certain size / time limit.\n6. The host server will cache the short and long url pair.\n7. The host server will notify the proxy that the write was successful.\n8. The proxy will return a success response to the user.\n\n```mermaid\ngraph TD\n  user((User)):::blue\n  user --\u003e |1. PUT Request| proxy\n  proxy --\u003e |2 \u0026 3. Forward Request to Host 1 \u0026 2 Server| replica1\n  proxy --- replica2\n  subgraph proxy[\"Proxy Service\"]\n  orchestrator(Multithreaded Orchestrator):::blue --- cache[(\u003cbr\u003eServer Response \u003cbr\u003e Cache)]\n  end\n  subgraph replica2[\"Replica 2\"]\n  host3([Host 3]) --- server3(Multithreaded Server)\n  server3 --- cache3[(Url Cache)]\n  server3 --- |Buffer| replica2db1[(\u003cbr\u003e replica 2 Data \u003cbr\u003e\u003cbr\u003e)]\n  host4([Host 4]) --- server4(Multithreaded Server)\n  server4 --- cache4[(Url Cache)]\n  server4 --- |Buffer| replica2db2[(\u003cbr\u003e replica 2 Data \u003cbr\u003e\u003cbr\u003e)]\n  end\n  subgraph replica1[\"Replica 1\"]\n  host1([Host 1]):::blue --\u003e server1(Multithreaded Server):::blue\n  server1 --\u003e |6. Cache URL Pair| cache1[(Url Cache)]:::blue\n  server1 --\u003e |4 \u0026 5. Buffer writing to DB| replica1db1[(\u003cbr\u003e replica 1 Data \u003cbr\u003e\u003cbr\u003e)]:::blue\n  host2([Host 2]):::blue --\u003e server2(Multithreaded Server):::blue\n  server2 --\u003e |6. Cache URL Pair| cache2[(Url Cache)]:::blue\n  server2 --\u003e |4 \u0026 5. Buffer writing to DB| replica1db2[(\u003cbr\u003e replica 1 Data \u003cbr\u003e\u003cbr\u003e)]:::blue\n  end\n  replica1 --\u003e |7. Successfully Saved| proxy\n  proxy --\u003e |8. Got it Response| user\n\n%% Colors %%\nclassDef blue fill:#2374f7,stroke:#000,stroke-width:2px,color:#fff\n```\n\nThe following mermaid diagram the PUT Request Data Flow in blue, view in a markdown viewer that supports mermaid diagrams such as GitHub.\n\n### Code Overview\n\nThe system code is organized into the following directories:\n\n- **orchestration**: The orchestration directory contains the scripts for the proxy, orchestrator, and monitor services. It will handle the orchestration of the system along with the recovery of the system.\n\n- **server**: The server directory contains the code for the server. It will handle the requests from the proxy and respond with the appropriate response along with writing to the database.\n\n- **storage**: The storage directory contains the code and drivers to setup the database. It will handle optionally populating the database with data when setting up the system.\n\n## Running The System\n\n### Initial Setup\n\nOn the system, fix the `~/.bashrc` file to include the following:\n\n```bash\nexport JAVA_HOME=\"/opt/jdk-20.0.1\"\nexport PATH=\"/opt/jdk-20.0.1/bin:$PATH\"\n```\n\nRun the `confirmAllHosts.bash` script to accept new ssh connections from the hosts so that there are no prompts when running the system.\n\n```bash\n./confirmAllHosts.bash\n```\n\nRun the following command from the root folder to build the system:\n\n```bash\n./make.bash\n```\n\n### Configuration\n\nThe system will initially use the hosts in the `HOSTS` file on the port found in the `PORT` file.\n\nServer configurations can be adjusted in `orchestration/runServerLocal.bash`. The configurations are as follows:\n\n- `IS_VERBOSE`: toggle log statements\n- `HOSTPORT`: port which the server runs on\n- `CACHE_SIZE`: size of cache used to store URLs obtained from GET and PUT requests\n- `NUM_THREADS`: number of threads running in the server\n- `WRITE_BUFFER_SIZE`: size of write buffer, which contains results from PUT client requests is periodically flushed to the database\n- `SLEEP_DURATION`: interval to check for write buffer flushing\n\nProxy configurations can be adjusted in `orchestration/proxy/runProxyLocal.bash`. The configurations are as follows:\n\n- `IS_VERBOSE`: toggle log statements\n- `PROXYPORT`: port which the proxy runs on\n- `CACHE_SIZE`: size of cache used to store server responses\n- `NUM_THREADS`: number of threads running in the server\n- `REPLICATION_FACTOR`: number of hosts to replicate data to\n\n### Usage\n\nRun the following command from the root folder to run the system:\n\n```bash\n./dostuff.bash\n```\n\nOnce the system is running, the following commands can be used to interact with the system:\n\n```bash\nSample PUT:\ncurl -X PUT \"http://localhost:{PROXYPORT}?short=arnold\u0026long=http://google.com\"\n\nSample GET:\ncurl \"http://localhost:{PROXYPORT}/arnold\"\n```\n\n### Scaling Up\n\nIf we want to add a host to the system while its running, we can run the following command:\n\n```bash\n./orchestration/addHost.bash\n```\n\nWe can optionally pass in arguments to the script, where the first argument is a host we want to replace, and the second argument is the host we want to clone data from.\n\n### Scaling Down\n\nIf we want to remove a host from the system, we can run the following command:\n\n```bash\n./orchestration/removeHost.bash\n```\n\nWe can optionally pass in an argument to the script, where the argument is the host we want to remove.\n\n## Testing The System\n\n### Performance Testing\n\nFor performance testing, we used `ab (apache benchmark)`. Our usage of this tool was very simple as it was just a load test.\n\nWe can run the following command to run the performance test after starting the system:\n\n```bash\n./testing/plotting/plotAll.bash\n```\n\n#### Read Test\n\n![Read Test Results](testing/plotting/readTest.png)\n\nFrom the read test, we can see that the system averages about 20ms for a read request.\n\nThe following table contains the timing results of sending 4000 read requests to the proxy.\n|Host count|Time to complete all requests|\n|--|--|\n|1|6.045 seconds|\n|2|5.255 seconds|\n|3|4.849 seconds|\n|4|4.532 seconds|\n\n#### Write Test\n\n![Write Test Results](testing/plotting/writeTest.png)\n\nFrom the write test, we can see that the system averages about 50ms for a write request.\n\nThe following table contains the timing results of sending 4000 write requests to the proxy.\n|Host count|Time to complete all requests|\n|--|--|\n|1|12.812 seconds|\n|2|11.255 seconds|\n|3|15.008 seconds|\n|4|14.605 seconds|\n\n### Correctness Testing\n\nFor correctness tests, we used `curl` to send requests and bash to validate the responses automatically.\n\nWe can run the following command to run the correctness test after starting the system:\n\n```bash\n$ ./testing/correctness/correctTest.bash\nAll tests passed!\n```\n\n## Analysis\n\n### Load Balancing\n\nThe proxy utilizes consistent hashing to load balance. The hash space is partitioned into 360 slots, with each server claiming 3 slots. Short URLs retrieved from requests are hashed. The data is replicated across two servers that are placed after the hash. Writes will go to both servers. Reads will select from one of the available servers and use the other if one is not available.\n\n## Caching\n\nCaches exist for both the proxy and server. The proxy cache maps URLs to responses sent by the servers, so they can be returned without having to contact the server again. This reduces the amount of time communicating over the network, which can be a significant bottleneck.\n\nhe server cache stores short and long URLs to avoid creating another database connection. This is also a significant bottleneck (IO).\n\n### Scalability\n\n#### Horizontal Scalability\n\nThe system is highly scalable as increasing the number of hosts and servers will increase the capacity of the system. The durability of the system will increase since the data is partitioned on more hosts and servers so there will be less data loss on a host or server failure. The system will also scale up and down dynamically. If we want to add a host to the system while its running, we can run the `./orchestration/addHost.bash` script. We can optionally pass in arguments to the script, where the first argument is a host we want to replace, and the second argument is the host we want to clone data from. If we want to remove a host from the system, we can run the `./orchestration/removeHost.bash` script. We can optionally pass in an argument to the script, where the argument is the host we want to remove.\n\n#### Vertical Scalability\n\nThe application is capable of scaling with processing power and memory through configurations. Thread size can be adjusted to take advantage of the number and speed of available processors. Cache and write buffer size can be increased if necessary.\n\n### Latency\n\nThe read latency of the application is 7.9 ms. The write latency of the application is 19.7 ms.\n\n### Throughput\n\nThe read throughput is 504 requests / second. The write throughput is 203 requests / second.\n\n### Availability\n\nThe system is highly available as it is replicated within the replicas. If a host or server fails, the data will still be available on another host within the replica. If a host is unresponsive, the proxy will retry the request on another host within the replica. The system will also recover from failures by spawning a new host within the same replica as the failed host for each hash. It will also notify the orchestrator of the new host so that it can be used in replacement of the failed node. This is a result of the ring pattern (consistent hashing) used for load balancing and data partitioning. On the first host failure, no data will be lost as all of its data is partitioned throughout the system in various replicas. On a second host failure, minimal data will be lost since the data is partitioned and replicated within the replicas.\n\n### Durability\n\nThe durability of the system is strong since each url pairing will be replicated on each replica (with a default of 2 hosts). Since each replica is unique for each hash, data is partitioned throughout the entire system of nodes but not replicated on each host. This means that if a host go down (first host failure), no data will be lost.\n\n### Health Check\n\nThe application periodically pings each host and the status of the server on each host. If a node goes down, the system will spawn another node and start the process within 5 seconds. While the service is down, requests to other nodes still operate as normal.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanthonytedja%2Fredirectv1","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fanthonytedja%2Fredirectv1","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanthonytedja%2Fredirectv1/lists"}