{"id":23064355,"url":"https://github.com/hiejulia/jhipster-distributed-system-computing","last_synced_at":"2025-04-28T16:16:47.782Z","repository":{"id":39941976,"uuid":"245819631","full_name":"hiejulia/jhipster-distributed-system-computing","owner":"hiejulia","description":"Jhipster in distributed computing","archived":false,"fork":false,"pushed_at":"2025-02-11T16:54:45.000Z","size":1613,"stargazers_count":3,"open_issues_count":7,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-28T16:16:36.628Z","etag":null,"topics":["design","design-patterns","distributed-computing","distributed-systems","event-sourcing","hpc-applications","kubernetes","kubernetes-cluster","large-scale","load-balancer","metrics","queue","queues","scalable","scaling","worker"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hiejulia.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-03-08T13:28:34.000Z","updated_at":"2025-02-11T16:54:50.000Z","dependencies_parsed_at":"2025-02-08T21:43:06.433Z","dependency_job_id":"f8271bb0-ccd2-414c-a0cd-1fae5d426811","html_url":"https://github.com/hiejulia/jhipster-distributed-system-computing","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hiejulia%2Fjhipster-distributed-system-computing","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hiejulia%2Fjhipster-distributed-system-computing/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hiejulia%2Fjhipster-distributed-system-computing/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hiejulia%2Fjhipster-distributed-system-computing/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hiejulia","download_url":"https://codeload.github.com/hiejulia/jhipster-distributed-system-computing/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251342725,"owners_count":21574245,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["design","design-patterns","distributed-computing","distributed-systems","event-sourcing","hpc-applications","kubernetes","kubernetes-cluster","large-scale","load-balancer","metrics","queue","queues","scalable","scaling","worker"],"created_at":"2024-12-16T04:17:52.263Z","updated_at":"2025-04-28T16:16:47.776Z","avatar_url":"https://github.com/hiejulia.png","language":"Java","funding_links":["https://www.buymeacoffee.com/hientech"],"categories":[],"sub_categories":[],"readme":"\u003ca href=\"https://www.buymeacoffee.com/hientech\" target=\"_blank\"\u003e\u003cimg src=\"https://img.shields.io/badge/-buy_me_a%C2%A0coffee-gray?logo=buy-me-a-coffee\" alt=\"Buy Me A Coffee\"\u003e\u003c/a\u003e\n  \u003cbr\u003e\n# jhipster-distributed-system-computing\n- Distributed system that can serve high load\n- scale system \n- HA\n- Handle failure \n- External distributed storage system for recovery \n\n\n## Description\n+ Netflix clone \n    + Live Video streaming service \n        + Architecture : Open Connect, CDN \n        + Transcode/ Encode service \n            + Validate service -\u003e Media pipeline -\u003e put chunk data into pipeline for parallel \n                + Archer : MapReduce platfor for media processing that use containers \n                    + Prores : \n                        + Detect dead pixels caused by defective digital camera \n                        + ML to tag audio\n                        + QC for subtitles \n\n        + Search service : \n        + Datastorage : Hadoop \n        + \n    + Live stream movie features (with only friends who also has netflix account)\n        + Streaming video content \n    + Social network \n        + Graph database \n    \n    + Billing service\n      + coupon service, invoice service, order service, payment service      \n    + Metrics service + Logging service     \n        + Kafka : distributed system monitoring \n            + Move data from kafka to sinks : ElasticSearch, S3 \n        + Elasticsearch : set up 150 clusters - 3,500 instances hosting - 1.3 PB data \n        + Apache Chukwa : data collection system for monitor large distributed system - built on top of HDFS and Map/Reduce \n        + Time series database with Cassandra\n    + Authentication service \n        + KSQL streaming, Schema registry, Avro, Kafka, Java producer, C# consumer\n        + Credit card registry \u0026 email registry function\n        + Healthcheck stream producer \u0026 consumer service \n    - Image tagging \u0026 processing pipeline \n        - batch processing \n        - data : large collection of image \n        - work queue \n        - 1 worker detect \n        - 1 worker blur location of image \n        - worker containers into single container group \n        - maximize parallel processing : shard image across multiple worker queues \n        - join pattern to merge output of al sharded work queue into a single queue\n        - design a queue that apply shard pattern to distributed the work \n            - 2 workers \n                - identify the location, type of each vehicle \n                - color a region \n                - apply filters \n        - multi worker pattern \n        - event driven \n    - background processing : transcode a video, compress log files, long running computation \n    + Architecture :\n        + AWS ELB : route traffic to front end service \n        + EVCache: sharded multiple copies of cache is sotred in shared nodes \n\n        + Data \n            + Move 1 TB data from RAM to SSD \n            + DB : EC2 deployed MySQL : master-master - Sync replication protocol \n            + Cassandra : 500 nodes - 50 cluster \n            + \n        + Container scale : AWS Titus : \n        + Reactive - Akka \n        + Spring cloud :\n            + distributed messaging : Cloud bus link the nodes of a distributed system with a lightweigh message broker \n            + \n    + gRPC for 1 service : written with Go\n+ Distributed crawler \n\n\n## Principle for performance tuning \n- Understand you env \n- TANSTAAFL!\n- throughput versus latency \n- DO NOT OVERUTILIZE A RESOURCE\n\n\n\n## Implementation \n- deployment service with kubernetes \n+ Distributed Cache server \n    + HazelCast distributed caching \n    - varnish distributed cache \n- replicated load balance \n- nginx replicated \n- sharded caching with memcache(replica)\n    - twemproxy for Redis \n\n+ Distributed messaging \n    + ActiveMQ\n    + Kafka \n+ Distributed DB\n    + Data partition \n    + Riak \n    + Cassandra  \n    - Google Big table Distributed Storage System for Structured Data\n+ Distributed file system : \n    + Hadoop, HDFS\n+ Distributed DNS \n+ Distributed proxy server\n+ Distributed web server \n+ Utilize cloud services: \n    + AWS : AWS ELB \n    - Google S2 geometry lib \n    - cloud native d d\n+ Network communication \n    + Async \n    + Axon framework : CQRS \n    + Web socket \n    + RMI, CORBA\n    + gRPC \n    - TChannel : network multiplexing and framing protocol for RPC \n    \n- distributed locking \n    - CAP \n    - handle concurrent data manipulation \n\n- distributed tracing, tracking, logging\n- distributed scheduling \n- distributed security \n- distributed messaging, queuing, event streaming \n- distributed search \n- distributed storage \n- CD/CI scaling \n- Monitor \u0026 benchmark \n     - Prometheus\n     - Fluentd normalizing different logging format \n     - https://github.com/mominosin/fluent-plugin-redis-slowlog\n     \n\n\n\n\n+ Distributed architecture  \n    + peer-to-peer;\n\n\t+ client/server;\n\t\t- multi-tier;\n\n\t+ mobile agents;\n\n    + Reactive \n        + Akka : HTTP, stream, clustering, sharding, actors \n        + Domain sourcing \n        + Distributed domain driven design \n        + CQRS \n        - event driven batch processing \n    \n    + Bulkhead pattern\n    + Distributed domain  \n- event sourcing architecture \n    - event driven batch processing \n        - distributed work queue \n\n- serverless architecture FaaS \n    - kubernetes native serverless framework : https://kubeless.io/\n    - kubeless install.\n    \n- Master- slave \n    - container crash - restart \n    - container hangs - health check - restart\n    - machine failes, container will be moved to diferent machine \n    - master election service \n        - distributed consensus algo Paxos - Raft \n    - etcd\n- redis \n- thrift\n- nginx \n\n\n## Availability \n- Resilience engineer \n- Failover\n- LB\n- Rate limit \n- Autoscale\n- Global availability \n- HA \n- Circuit breaker\n- timeouts\n\n\n## Performance \n- OS, storage, database, network \n- Performance tuning with GC \n- Performance optimization with Image, video, page load \n- \n\n# Distributed cloud computing \n\n\n# Scalability \n+ Universal scalability laws \n    \n\n# Server clustering \n\n\n# LB \n\n# Testing \n+ Multi JVM testing \n\n# Container distributed application \n+ Debug a service running in a container \n    - container design for modularity \u0026 reusability \n+ minimized docker images using multi stage\n+ secure distributed app \n    + kubernetes secret \n    + secrets in env\n    + External secrets like HashiCorp Vault \n+ make service scale  \n+ techniques to increase resiliency \n+ availability check \n+ enable zero downtime updates \n+ Prod deployment \n    + Kubernetes pods, replicasets, deployment, services \n    + create template\n    + orchestrator \n    + deploy on premise/ cloud \n    + peek into 2 big corp hosted kubernetes SaaS : Microsoft azure \u0026 google cloud \n\n+ Prod \n    + self heal \n    + update service, avoid cascading failures \n    \n\n\n\n\n\n## Tech stack \n- C, C++ \n- Java, Spring, Spring cloud \n- Node.js, io.js\n- Python \n- Go lang\n- microservices\n- cache\n- kubernetes\n- KUDA\n- Data pipeline \n- cloud \n- Redis \n+ Active MQ \n+ Hazelcast \n+ Docker, Kubernetes on distributed system\n+ Architecture\n    + Reactive architecture: Java (Axon framework), Scala (Akka)\n    + Event sourcing architecture \n+ Database in distributed system \n\n\n\n## Reference to \n+ Book \n- https://github.com/binhnguyennus/awesome-scalability \n+ Research paper (Graduate, PhD level )\n+ Distributed system, large scale system : Uber, Netflix, Grab, AirBnB, Amazon, AWS, Google, Microsoft, Facebook, Apple \n+ Resource \n    + https://eng.uber.com/ureplicator-apache-kafka-replicator/\n    + https://cwiki.apache.org/confluence/display/ZOOKEEPER/PoweredBy\n\n- Facebook scalability / distributed system paper\n    - Scaling backend authentication at facebook \n    - facebook distributed architecture : https://www.researchgate.net/publication/262689075_Overview_of_Facebook_scalable_architecture\n    - Inside the Social Network data center Facebook\n    - Building a billion user load balancer at Facebook https://www.youtube.com/watch?v=bxhYNfFeVF4\n    \n\n\n- Uber distributed system / scalability\n    - http://highscalability.com/blog/2015/9/14/how-uber-scales-their-real-time-market-platform.html\n    - Uber Marketplace Meetup: Using Distributed Locking to Build Reliable Systems\n\n\n- Netflix distributed system / scalability\n\n\n\n- Google distributed system / scalability\n    - Designing distributed system : Google case study \n        - web search \n            - deep search \n            - index, inverted index \n            - ranking - Page Rank \n\n        - massively multiplayer online games\n        - financial trading \n\n    - Developing real world case studies \n    - Large scale cluster management at Google with Borg \n    - Google’s Data Architecture and What it Takes to Work at Scale\n    - Bigtable:A DistributedStorageSystemforStructuredData\n    \n\n\n- AirBnB distributed system / scalability\n\n\n- Microsoft distributed system / scalability\n\n\n- Amazon distributed system / scalability\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhiejulia%2Fjhipster-distributed-system-computing","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhiejulia%2Fjhipster-distributed-system-computing","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhiejulia%2Fjhipster-distributed-system-computing/lists"}