Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/eddiecorrigall/tinyurl

A tinyurl service clone
https://github.com/eddiecorrigall/tinyurl

lambda redis serverless tinyurl

Last synced: about 18 hours ago
JSON representation

A tinyurl service clone

Host: GitHub
URL: https://github.com/eddiecorrigall/tinyurl
Owner: eddiecorrigall
Created: 2019-10-12T17:46:36.000Z (about 5 years ago)
Default Branch: master
Last Pushed: 2021-04-20T18:41:56.000Z (over 3 years ago)
Last Synced: 2024-03-28T03:00:54.591Z (8 months ago)
Topics: lambda, redis, serverless, tinyurl
Language: Python
Homepage:
Size: 37.1 KB
Stars: 2
Watchers: 2
Forks: 1
Open Issues: 2
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# tinyurl
A tinyurl clone service. A classic system design interview question:
> How would you design [TinyURL](https://en.wikipedia.org/wiki/TinyURL)?

Visit [demo](https://tinyurl.7okyo.com) to see the project in action.

## Quick Start
Clone the project and run it locally.
```bash
# Setup project
./bin/setup.sh

# Run locally
./bin/local.sh
```

## Features
- Environments: local tests, local deploy, staging deploy, production deploy
- Unit and integration tests
- Bootstrap front-end
- Deployable "serverless" app

## TODO
- Flask app configuration management
- Python linter
- More doc strings
- Continuous Integration (Circle CI or Travis CI)
- Internationalization (i18n)
- Lambda cold starts
- Choose at least 2 subnets for Lambda to run your functions in high availability mode
- Automatic API docs
- Disable push to master and require all changes via pull request
- Analytics dashboard
- Viral alerts
- Tracking tags: email, blog, etc.
- More tests
- Flask app error handlers
- Tests for Redis return types
- Staging integration tests
- Bugs
- Duplicate logs in CloudWatch Logs, Lambda appears to modify root / flask logger

## References
- https://stackoverflow.com/questions/742013/how-do-i-create-a-url-shortener
- https://serverless.com/blog/flask-python-rest-api-serverless-lambda-dynamodb/
- https://pypi.org/project/redis/
- https://stackoverflow.com/questions/1119722/base-62-conversion
- https://stackoverflow.com/questions/22340676/find-or-create-idiom-in-rest-api-design
- https://flask.palletsprojects.com/en/1.1.x/patterns/apierrors/
- http://werkzeug.palletsprojects.com/en/0.16.x/exceptions/
- https://flask.palletsprojects.com/en/1.1.x/appcontext/
- https://flask.palletsprojects.com/en/1.1.x/logging/
- https://flask.palletsprojects.com/en/1.1.x/testing/
- https://serverless.com/blog/serverless-api-gateway-domain/
- https://serverless-stack.com/chapters/stages-in-serverless-framework.html
- https://serverless.com/framework/docs/dashboard/testing/
- https://serverless-stack.com/chapters/load-secrets-from-env.html

## Design decision
I use FaaS Lambda to support the application for a few good reasons. Given this is a relatively small project, which is seldomly used, Lambda will be very cost effective. In order to host the application with an an EC2 instance an ASG (Auto Scaling Group) or an ECS (Elastic Container Service) will need to keep at least one instance ready at all times regardless of traffic. Running an EC2 24 hours a day costs money. Whereas Lambda does not require any permanently provisioned machines (but it will have cold starts) and is very scalable too.

### Database
URLs can be viral which means traffic distribution of unique URLs will not be uniform. Assuming an 80-20 rule: 80% of the traffic is generated by 20% of the URLs. This application is read heavy (redirect from TinyURL) and will no doubt have significantly less writes (create TinyURL).

#### DynamoDB vs ElastiCache: Redis
DynamoDB can be highly available for the right price, and highly scalable with the right design. However it is not suitable for TinyURL since the partition key, which should uniquely identify the URL, will inevitably reach provisioned thorough-put. When this capacity is reached the application can no longer be serviced by DynamoDB, and thus requests cannot be serviced without manual intervention. Provided that a viral event might happen at any point in the day, it is not acceptable to react based on read traffic.

Redis supports [hset](https://redis.io/commands/hset) and [hget](https://redis.io/commands/hget) which does not suffer from hot partitions from heavy reads and will consistently perform at O(1) time complexity. [Redis can be scaled](https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/scaling-redis-cluster-mode-enabled.html) as storage needs increase, [persisted](https://redis.io/topics/persistence) and [clustered](https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/Clusters.Create.CON.RedisCluster.html). With the right monitoring solution, capacity planning makes Redis scalable proactively as storage is required.

**tl;dr:**
- DyanamoDB is highly available for the right price, but does not scale well under heavy read loads on a hot partition.
- Redis does well under heavy reads, and can be scaled proactively.

### Implementation
In the following examples, please note that a table is used as an abstraction to help illustrate how data is persisted. In reality the there are two dictionary-like associations: short-to-long and long-to-short. The following technique is then used to determine the short ID.

A unique sequential number is tracked simply by querying the size of a set (see [hset](https://redis.io/commands/hlen)) with time complexity O(1). In this case, this will be the cardinality of the short ID to long URL mapping. This also helps de-duplicate records if an URL is submitted more than once.

When a new association is to be written, there is a potential race condition among Lambda functions. To resolve this, a [watch](https://redis.io/commands/watch) is setup for conditional execution of a transaction. The write is then re-attempted if the cardinality of the set changes before it completes the transaction. Meaning another Lambda function successfully created an association with the same short ID. In this situation, URLs between Lambda functions can either be identical or distinct, but they do share the same short ID.

### Example: Long to short
Given an URL, insert it into table. Assuming the ID is automatically assigned by the database.

|id|long|short|
|---|---|---|
|125|'https://www.youtube.com/watch?v=dQw4w9WgXcQ'|NULL|

Get an `id` (125) (an auto incremented unique identifier). Convert `id` into a base-62 string ('cb') which will be the short ID of the long form URL. Update table at `id`, and update the short ID. This can be done in the same transaction: insert then update.

|id|long|short|
|---|---|---|
|125|'https://www.youtube.com/watch?v=dQw4w9WgXcQ'|'cb'|

### Example: Short to long
Convert short base-62 string ('cb') into a base-10 integer which is used to lookup the entry. Select from table given `id` (125), and return long form URL.

|id|long|short|
|---|---|---|
|125|'https://www.youtube.com/watch?v=dQw4w9WgXcQ'|'cb'|

## API

### Set Endpoint
```bash
# Local development: `./bin/local.sh`
export TINYURL_ENDPOINT=http://localhost:5000

# OR, in the cloud: `./bin/provision.sh --stage staging`
export TINYURL_ENDPOINT=https://tinyurl-staging.7okyo.com
```

### Make TinyURL
```bash
curl \
--write-out '%{http_code}\n' \
--request POST "${TINYURL_ENDPOINT}/api" \
--header 'Content-Type: application/json' \
--data '{"url": "http://example.com"}'
```

### Search TinyURL

#### With Long URL
```bash
curl \
--write-out '%{http_code}\n' \
--request GET "${TINYURL_ENDPOINT}/api?url=http://example.com"
```

#### With Short ID
```bash
curl \
--write-out '%{http_code}\n' \
--request GET "${TINYURL_ENDPOINT}/api?id=a"
```

### Redirect from TinyURL
```bash
curl \
--write-out '%{http_code}\n' \
--request GET "${TINYURL_ENDPOINT}/a"
```

## Commands

|Command|Wrapper for|Description|
|---|---|---|
|`./bin/setup.sh`|N/A|Setup project -- run this for before all others|
|`./bin/test.sh`|pytest|Run tests|
|`./bin/local.sh`|serverless wsgi|Run locally|
|`./bin/provision.sh`|serverless deploy|Provision cloud|
|`./bin/deprovision.sh`|serverless remove|De-provision cloud|
|`./bin/logs.sh`|serverless logs|Get logs from cloud|

### Arguments
The pytest and serverless arguments can be passed into the underlying CLI tools. For example, to deploy to production use run `./bin/provision.sh --stage production`, since `./bin/provision.sh` is a wrapper for `serverless deploy`.

## Troubleshooting

AWS DNS is unable to resolve the S3 path for the deploy. To continue developing, try switching the provider region.
> Serverless: Recoverable error occurred (Inaccessible host: `*.s3.amazonaws.com'. This service may not be available in the `us-east-1' region.), sleeping for 5 seconds. Try 4 of 4

---
Lambda log collection is not supported in ca-central-1.
> ServerlessError: No existing streams for the function