{"id":27373828,"url":"https://github.com/johnnyhuy/crypto-com-code-challenge","last_synced_at":"2025-04-13T11:28:58.029Z","repository":{"id":238228711,"uuid":"418672097","full_name":"johnnyhuy/crypto-com-code-challenge","owner":"johnnyhuy","description":"Johnny Huynh - Lead DevOps Engineer @ crypto.com","archived":false,"fork":false,"pushed_at":"2024-05-05T06:25:38.000Z","size":1680,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-10T09:19:14.086Z","etag":null,"topics":["crypto","devops"],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/johnnyhuy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-10-18T21:19:51.000Z","updated_at":"2024-05-05T06:25:42.000Z","dependencies_parsed_at":"2024-05-05T05:22:25.835Z","dependency_job_id":"fc1cbf2e-36b1-4914-90a1-523f6a83643e","html_url":"https://github.com/johnnyhuy/crypto-com-code-challenge","commit_stats":null,"previous_names":["johnnyhuy/crypto-com-code-challenge"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/johnnyhuy%2Fcrypto-com-code-challenge","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/johnnyhuy%2Fcrypto-com-code-challenge/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/johnnyhuy%2Fcrypto-com-code-challenge/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/johnnyhuy%2Fcrypto-com-code-challenge/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/johnnyhuy","download_url":"https://codeload.github.com/johnnyhuy/crypto-com-code-challenge/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248705222,"owners_count":21148504,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crypto","devops"],"created_at":"2025-04-13T11:28:57.513Z","updated_at":"2025-04-13T11:28:58.018Z","avatar_url":"https://github.com/johnnyhuy.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Crypto.com Code Challenge\n\nJohnny Huynh - Lead DevOps Engineer @ crypto.com\n\n## Access Log Analytics\n\nYou are given an access log file (access.log.gz) containing some requests.\nPlease prepare 3 shell script files as follows: (Note: any shell script executable on the\nmodern Linux environment, pure shell like sh, bash, csh, zsh are preferred)\n\n1. Count the total number of HTTP requests recorded by this access logfile\n2. Find the top 10 hosts that made the most requests from `2019-06-10 00:00:00` up to\nand including `2019-06-19 23:59:59`\n3. Find the country that made the most requests *(hint: use the source IP address as a\nstart)*\n\n### Usage\n\nRun the following commands to get answers to the first assessment.\n\n`bash` will be the required script runtime.\n\nWe're git ignoring the `access.log` file granted that's fairly large at 18 MB. However, we'll be using `gunzip` to uncompress the provided `.gz` file.\n\n```bash\n# Get the total count of requests\n./total-number.sh\n\n# Find the top 10 hosts\n./top-10-hosts.sh\n\n# Find the country that made the most requests\n./country-most-requests.sh\n```\n\n## System Design\n\nWe are designing a simple bit.ly-like service (API only), which includes two web API endpoints,\nas follows:\n\n### Web API endpoint for url submission\n\n`POST` `/newurl`\n\n#### Request\n\n```json\n{ \"url\": \"https://www.google.com\" }\n```\n\n#### Response\n\n```json\n{ \"url\": \"https://www.google.com\", \"shortenUrl\": \"https://shortenurl.org/g20hi3k9\"}\n```\n\n### Web API endpoint for redirecting a shortened URL to the real URL\n\nNote: The shortened link cannot be modified once created)\n\n`GET` `/[a-zA-Z0-9]{9}` (RegEx, eg. g20hi3k9t)\n\nHTTP 302 redirection to the real URL\n\n### System Design Concerns\n\n#### High Availability\n\nRoute53 failover routing in the case Cloudfront distributions fail globally (edge case). Cloudfront CDNs can then failover regional API gateways in the case the region fails. This might increase latency, however we can mitigate with a replicated API gateway in neighboring regions.\n\n#### Scalability\n\nLambda functions can run at scale as we pay per invocation. No dedicated auto-scaling compute resources are required in serverless functions.\n\nWe can also provide DynamoDB auto-scaling of write and read instances in the case load drastically increases.\n\n#### Tech Stack\n\nDynamoDB used as a NoSQL stateful backend to reduce complex schema building. Shortened link data can be stored in a NoSQL backend with the link ID being the primary index. Lookup of the ID can be the provided through the actual shortened link.\n\nLambda functions can serve the internal logic to shorten URLs. This should only be the responsibility of the Python script along with persistance. Python can be run multi-platform and developers can use serverless frameworks to debugging.\n\n### Architecture Diagram\n\n![](crypto.com.jpg)\n\n### Database schema\n\nDatabase can be fetched from one table with a timestamp. As mentioned the ID can be generated through the Python script and used as the path to fetch the real link.\n\n| Column       | Type   |\n|--------------|--------|\n| id (primary) | string |\n| link         | string |\n| created_at   | string |\n\n### Pseudo code\n\nThis will detail what the business logic we need to provide shortened URLs.\n\n```bash\n# When user sends a POST request to /newurl\ndeclare createShortLink with json do\n\n    get realUrl from json\n    get hostUrl from environment\n\n    generate shortLinkId\n    save shortLinkId realUrl to DynamoDB\n    \n    create url with realUrl\n    create shortUrl with hostUrl and shortLinkId\n    create jsonResult with url and shortUrl\n\n    return jsonResult\n\ndone\n\n# When user sends a GET request to /[a-zA-Z0-9]{9}\ndeclare getShortLink with path do\n\n    match path with regex /[a-zA-Z0-9]{9}\n    create shortLinkId with path\n    get hostUrl from environment\n\n    get shortLinkId from DynamoDB\n\n    if shortLinkId doesnt exist in DynamoDB then\n        throw error not found\n    done\n    \n    create shortUrl with hostUrl and shortLinkId\n\n    return redirect to shortUrl\ndone\n```\n\n### Design considerations\n\nWe can use CI/CD pipelines like Buildkite or GitHub Actions granted that we write all of our infrastructure in source control repositories like Git with GitHub. We can then build automation through code change triggers to reduce the feedback loop of changes.\n\nWe'll use Terraform as the tooling of choice for infrastructure as code since we backed by the open source community in terms of provider and core tooling development. SMEs within our team are welcome to contribute outbound to help others in the industry. This also benefits from avoiding knowledge siloing in the case engineers leave the company.\n\n### Assumptions \u0026 limitations\n\n- There's at least a staging or development environment prior to production\n- Greenfields project no cloud migration required including such things like database migrations\n- ACID transactions in a DynamoDB is not possible due to the inherit NoSQL design\n\n### References\n\nhttps://stackoverflow.com/questions/12457457/count-number-of-lines-in-terminal-output\nhttps://unix.stackexchange.com/questions/156261/unzipping-a-gz-file-without-removing-the-gzipped-file\nhttps://unix.stackexchange.com/questions/360273/a-command-to-do-bulk-ip-address-lookups-using-unix-command-line-works-on-a-unix/360284\nhttps://stackoverflow.com/questions/2034799/how-to-truncate-long-matching-lines-returned-by-grep-or-ack\nhttps://unix.stackexchange.com/questions/83473/get-my-country-by-ip-in-bash\nhttps://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/high_availability_origin_failover.html\nhttps://medium.com/swlh/how-to-expose-aws-http-api-gateway-via-aws-cloudfront-16383f45704b\nhttps://docs.aws.amazon.com/Route53/latest/DeveloperGuide/dns-failover-configuring.html\nhttps://blog.cloudcraft.co/programming-your-cdn/\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjohnnyhuy%2Fcrypto-com-code-challenge","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjohnnyhuy%2Fcrypto-com-code-challenge","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjohnnyhuy%2Fcrypto-com-code-challenge/lists"}