{"id":20078523,"url":"https://github.com/timbo-rafa/geo-cache","last_synced_at":"2026-05-03T01:44:04.852Z","repository":{"id":57433678,"uuid":"240548236","full_name":"timbo-rafa/geo-cache","owner":"timbo-rafa","description":"Fault-tolerant distributed cache","archived":false,"fork":false,"pushed_at":"2020-11-07T17:22:51.000Z","size":56,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-10-09T09:58:09.624Z","etag":null,"topics":["api","backend","bash","cache","couchbase","data-replication","demo","distributed-systems","docker","document-oriented","fault-tolerance","flask","geolocation","pypi","python","rest-api","script","system-design","test"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/timbo-rafa.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-02-14T16:09:52.000Z","updated_at":"2020-05-11T16:59:21.000Z","dependencies_parsed_at":"2022-08-28T03:02:02.771Z","dependency_job_id":null,"html_url":"https://github.com/timbo-rafa/geo-cache","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timbo-rafa%2Fgeo-cache","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timbo-rafa%2Fgeo-cache/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timbo-rafa%2Fgeo-cache/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timbo-rafa%2Fgeo-cache/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/timbo-rafa","download_url":"https://codeload.github.com/timbo-rafa/geo-cache/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241509658,"owners_count":19974071,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["api","backend","bash","cache","couchbase","data-replication","demo","distributed-systems","docker","document-oriented","fault-tolerance","flask","geolocation","pypi","python","rest-api","script","system-design","test"],"created_at":"2024-11-13T15:14:51.615Z","updated_at":"2025-10-30T23:07:16.611Z","avatar_url":"https://github.com/timbo-rafa.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# Geo Distributed LRU Cache\n\n## Solution\n\nIn order to quickly come up with a scalable enterprise-level library, the optimal approach is to delegate as much features as we can to an already existing software package and use it as an underlying architecture.\n\nAfter researching current technologies available, an ideal technology seems to be the [Couchbase Server](https://docs.couchbase.com/server/6.5/introduction/intro.html), a distributed multi-model NoSQL document-oriented database. Amongst the key features we have high availability, scale-out architecture, and a memory-first architecture, the ideal scenario for our cache. Essential requirements for our application are detailed below on the section [Features](#Features).\n\nCouchbase stores data through a concept\n[Buckets](https://docs.couchbase.com/server/6.5/learn/buckets-memory-and-storage/buckets-memory-and-storage.html),\n\n\u003eCouchbase Server keeps items in Buckets. Before an item can be saved, a bucket must exist for it. Each bucket is assigned a name at its creation: this name is referenced by the application or user wishing to save or access items within it.\n\nThis is how we store data in our application. We access couchbase through its Python SDK and expose our API using Flask.\n\n## Dependencies (server)\nThis application heavily relies on\n[Docker](https://docs.docker.com/install/)\nfor speedy installation and deployment. All the dependencies are already met if you use the provided docker containers.\n\nHowever, if you are unable to use Docker, the server has the following dependencies:\n\n1. Database\n   \n    1. [Couchbase Server](https://www.couchbase.com/downloads) 6.5.0 Enterprise\n2. API\n   \n   1. [Python](https://www.python.org/downloads/) v3.7\n   \n   2. [Couchbase C Client](https://docs.couchbase.com/c-sdk/2.10/start-using-sdk.html) v2.10.5\n\n   3. [Couchbase Python Client](https://docs.couchbase.com/python-sdk/current/start-using-sdk.html) v.2.9.5\n\nYou can also check the Dockerfile scripts under `backend/` for installation steps.\n\n## Dependencies (client)\n\n```python\npip install geo-cache-client\n```\nAlternatively, you can make http requests as described in `geo_cache_client/cache_client.py`\n\n## Installation (demo)\n\nIn an enterprise production environment, the different components of this application are likely to be deployed in different nodes and possibly different machines. The final deployment is therefore dependant on the back-end architecture and DevOps of such company. For demo purposes, we provide a sample application in which all components run under the same machine, as a starting point for developers.\n\nTo keep things simple, credentials are the same for all clusters and nodes, geolocations are stored in the settings, and we use docker to get the proper IPs. In a real environment, this kind of information could be processed differently.\n\nTo setup the demo back-end cluster, run:\n\n```bash\n    git clone https://github.com/timbo-rafa/geo-cache\n    cd geo-cache\n    # set credentials\n    export CB_REST_USERNAME=\"Administrator\"\n    export CB_REST_PASSWORD=\"password\"\n    bash scripts/deploy-database.sh\n    bash scripts/deploy-api.sh\n```\nIf you'd like to see the database dashboard, couchbase provides one at http://localhost:8091/ui/index.html\n\nNext, to install the client:\n\n```\npip install geo-cache-client\n```\n\nThe programs under the folder `examples` provides some sample usage:\n\n```bash\npython examples/concurrency.py\npython examples/replication.py\n```\n\n## Features\n\n### 1 - Simple integration\n\n```\npip install geo-cache-client\n```\n\n### 2 - Resilient to network failures and crashes\n\nResiliency is achieved through 4 properties:\n\n##### Data replication (within a cluster)\n\n\u003eReplicas provide protection against data loss by keeping copies of a bucket’s data on multiple servers.\n\nOn bucket creation (or editing), it is possible to set the number of replicas.\nFor our demo, we set `--bucket-replica 1`.\n\nFor more information, please see\n[bucket-create](https://docs.couchbase.com/server/6.5/cli/cbcli/couchbase-cli-bucket-create.html)\nor\n[bucket-edit](https://docs.couchbase.com/server/6.5/cli/cbcli/couchbase-cli-bucket-edit.html).\n\n##### Data persistence\nCouchbase buckets are written to disk by setting `--bucket-type couchbase`.\nFor more information, please see [bucket-create](https://docs.couchbase.com/server/6.5/cli/cbcli/couchbase-cli-bucket-create.html).\n\nData replication and persistence prevents our system from losing data in case of node crashes or failures.\n\n##### Automatic failover\n\n\u003eFailover is a process whereby a failed node can be taken out of a cluster with speed.\n\nFor more information, please see\n[Failover](https://docs.couchbase.com/server/current/learn/clusters-and-availability/failover.html)\nand\n[auto-failover command](https://docs.couchbase.com/server/4.5/cli/cbcli/setting-autofailover.html).\n\n##### Cross Data Center Replication (XDCR)\n\n[Cross Data Center Replication (XDCR)](https://docs.couchbase.com/server/6.5/manage/manage-xdcr/xdcr-management-overview.html)\nallows us to continuously replicate data from a bucket on one cluster to another bucket in another cluster, possibly located in another geolocation.\nThis makes our system still able to deliver and even more fault-tolerant, should a data center become unavailable.\n\n### 3 - Near real time replication of data across Geolocation. Writes need to be in real time.\n\nThis requirement is achieved through [XDCR](https://docs.couchbase.com/server/6.5/manage/manage-xdcr/xdcr-management-overview.html).\n\n### 4 - Data consistency across regions\n\nAchieved through\n[XDCR](https://docs.couchbase.com/server/6.5/manage/manage-xdcr/xdcr-management-overview.html).\nYou can assure consistency by passing the \n[CAS](https://docs.couchbase.com/server/4.1/developer-guide/cas-concurrency.html)\nvalue from a previous operation to a `cache.set` operation.\n\n### 5 - Locality of reference, data should almost always be available from the closest region\n\nSupported with\n[XDCR](https://docs.couchbase.com/server/6.5/manage/manage-xdcr/xdcr-management-overview.html).\nYou can connect to the closest server by using  `GET /closest/\u003clat\u003e/\u003clong\u003e`. This ensures that data will first be written to and read from the closest region.\n\n### 6 - Flexible Schema\n\nThe cache stores a key-value pair of strings and it is agnostic to the actual data value. We can therefore \"stringify\" any object in a JSON-like manner, achieving a flexible schema.\n\nAdditionally, couchbase is a NoSQL document-oriented database and also has flexible schema, if needed be in further development.\n\n### 7 - Cache can expire\nOn bucket creation or editing, we can specify the maximum TTL (time-to-live) for all documents in a bucket in seconds.\n\nYou can set the environment variable `CB_TTL` on the cache api to set the TTL of data.\n\nFor more information, please see\n[bucket-create](https://docs.couchbase.com/server/6.5/cli/cbcli/couchbase-cli-bucket-create.html).\n### 8 - LRU\n\nCouchbase default ejection policy for persistent storage is `valueOnly`, which keeps only keys in memory. With that in mind, memory eviction uses a simplified version of LRU,\n[not recently used (NRU)](https://docs.couchbase.com/server/4.1/architecture/db-engine-architecture.html#not-recently-used-nru-items).\n\n## Future Improvements\n\n1. Fine-grained credentials\n2. Geolocation processing\n3. Non-default cb port\n4. `settings.py` for cache_couchbase\n5. Check if node is up before returning closest\n6. Select fastest ping db cluster instead of closest (?)\n7. [Choose threading strategy](https://docs.couchbase.com/python-sdk/2.0/threads.html)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftimbo-rafa%2Fgeo-cache","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftimbo-rafa%2Fgeo-cache","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftimbo-rafa%2Fgeo-cache/lists"}