{"id":23744698,"url":"https://github.com/tanyokwok/hdfslock","last_synced_at":"2026-03-06T19:30:17.870Z","repository":{"id":152426353,"uuid":"84143135","full_name":"tanyokwok/hdfsLock","owner":"tanyokwok","description":"A distributed lock based on hdfs","archived":false,"fork":false,"pushed_at":"2017-03-07T02:15:40.000Z","size":2,"stargazers_count":2,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2024-12-31T12:29:31.820Z","etag":null,"topics":["distributed-lock","hdfs-lock"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tanyokwok.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-03-07T02:09:12.000Z","updated_at":"2024-01-19T10:24:29.000Z","dependencies_parsed_at":"2023-04-29T23:53:34.479Z","dependency_job_id":null,"html_url":"https://github.com/tanyokwok/hdfsLock","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tanyokwok%2FhdfsLock","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tanyokwok%2FhdfsLock/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tanyokwok%2FhdfsLock/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tanyokwok%2FhdfsLock/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tanyokwok","download_url":"https://codeload.github.com/tanyokwok/hdfsLock/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239895977,"owners_count":19714936,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["distributed-lock","hdfs-lock"],"created_at":"2024-12-31T12:25:26.894Z","updated_at":"2026-03-06T19:30:17.809Z","avatar_url":"https://github.com/tanyokwok.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"简介\n=====\n DistLock是基于HDFS实现的分布式锁机制。\n 该锁的目的是方便HDFS文件锁操作，但是适用于对锁效率要求不高的场景。\n + 主要优势是：分布式锁创建机制，配置简单\n\n\n使用要求\n----------\n+ 需要安装HDFS\n+ 需要安装pydoop\n\n设计思想\n---------\n参考zookeeper，redis等分布式锁的设计，在pydoop hdfs api上实现：\n+ 在hdfs上面创建一个lock目录(/path/to/lock)\n+ 各个节点(拥有唯一ID)在申请锁的时候需要执行如下步骤\n    + 创建路径为/path/to/lock/namd-ID.lock的文件\n    + 获取/path/to/lock/目录下的所有文件信息\n    + /path/to/lock/下的文件按照创建时间排序\n    + 如果最先创建的文件的ID，正好为当前节点ID的话，则获得锁。否则，获得锁失败\n+ 释放锁操作，删除/path/to/lock目录下的所有文件\n\n\n合理性\n----------\nHDFS NameNode上存储的是HDFS文件的元信息，且为单点故障模式。最重要的是，NameNode上的文件创建操作属于事务型的原子操作。即任意一个客户端创建的文件成功，所有客户端立即能够看到这个文件。上面的策略充分利用这种原子操作的特性，保证了分布式锁的互斥。\n\n\n创建文件作为锁标志可以吗？\n------------------\n该思路把HDFS文件路径当作竞争资源，首先成功创建lock文件的获得锁，释放锁就是删除文件操作。\n+ 每个进程在申请锁的时候查看lock文件是否存在\n+ 如果lock文件不存在则创建文件，创建成功则认为自己获得锁\n\n这种思路（使用pydoop）存在严重的问题：\n+ 可能存在多个进程同时申请资源，并发现lock文件未创建\n+ 此时，多个进程都会创建文件。但是没有任意一个进程会创建失败（使用pydoop hdfs.open_file不会报错，而是返回已经存在的文件句柄）\n+ 最终多个进程都会获得锁，显然是错误的。\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftanyokwok%2Fhdfslock","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftanyokwok%2Fhdfslock","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftanyokwok%2Fhdfslock/lists"}