{"id":28580441,"url":"https://github.com/kfrancischen/galaxy","last_synced_at":"2025-08-19T13:17:59.605Z","repository":{"id":57432937,"uuid":"257481291","full_name":"kfrancischen/galaxy","owner":"kfrancischen","description":"Simple distributed file system based on gRPC","archived":false,"fork":false,"pushed_at":"2023-04-05T05:32:57.000Z","size":832,"stargazers_count":6,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-07-07T08:12:27.034Z","etag":null,"topics":["abseil","bazel","distributed-file-system","glog","grpc-cpp","opencensus","prometheus","protobuf","pybind11","rapidjson","rpc"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kfrancischen.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2020-04-21T04:38:42.000Z","updated_at":"2024-12-24T19:48:26.000Z","dependencies_parsed_at":"2025-07-07T08:12:31.695Z","dependency_job_id":"c789cb87-a241-439c-ad1a-bdc60169b643","html_url":"https://github.com/kfrancischen/galaxy","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/kfrancischen/galaxy","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kfrancischen%2Fgalaxy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kfrancischen%2Fgalaxy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kfrancischen%2Fgalaxy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kfrancischen%2Fgalaxy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kfrancischen","download_url":"https://codeload.github.com/kfrancischen/galaxy/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kfrancischen%2Fgalaxy/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271159145,"owners_count":24709217,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-19T02:00:09.176Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["abseil","bazel","distributed-file-system","glog","grpc-cpp","opencensus","prometheus","protobuf","pybind11","rapidjson","rpc"],"created_at":"2025-06-11T04:00:45.534Z","updated_at":"2025-08-19T13:17:59.549Z","avatar_url":"https://github.com/kfrancischen.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# galaxy\n\nSimple distributed file system based on gRPC.\n\n[![996.icu](https://img.shields.io/badge/link-996.icu-red.svg)](https://996.icu)\n\nThis is a toy-version of distributed file system developed based on gRPC, and it is also integrated with [pslx](https://github.com/kfrancischen/pslx). The core logic is developed with C++ and later wrapped with Python using [Pybind11](https://github.com/pybind/pybind11). The build system for the whole package is [bazel](https://bazel.build/). In addition, several public versions of Google internal tools are used: [abseil](https://abseil.io/docs/cpp/quickstart), [glog](https://github.com/google/glog). [rapidjson](https://rapidjson.org/) is used to parse the config files. [prometheus](https://prometheus.io/) and [opencensus](https://opencensus.io/) are integrated to monitor the server side rpc.\n\n## Core concepts\n- `cell`: a cell is a machine that can be added as part of the filesystem, and is associated with a cell name. In galaxy, we use a two letter name for a cell, i.e. `aa`. The path for a cell in galaxy filesystem starts with `/galaxy/${CELL}-d/...`, where the `${CELL}` is the name of the cell. To make a machine as a cell in the filesystem, one just needs to launch the server code on the machine. Details are discussed in the next section.\n- `/LOCAL`: a path started with `/LOCAL/...` instead of `/galaxy/${CELL}-d/...` will be considered a path specifically for the cell where the galaxy server is hosted. For instance if the server is hosted at cell `aa`, then special indicator of `/LOCAL/...` is equal to `/galaxy/aa-d/...`, but there exists one difference: path with `/LOCAL/...` does not go through gRPC calls under requests, and instead vanilla local IO are performed. This special indicator is useful when user knows that the IO on the path is local.\n- `/SHARED`: a path started with `/SHARED/...` will be considered a path for all the cells in the galaxy system. For instance, a folder named `/SHARED/test` will appear in all cells with format `/galaxy/${CELL}-d/test` for all cells. This allows file updates to all cells with one function call.\n- `global config`: a configuration file containing configurations for each cell in the galaxy filesystem. An example of it is in [server_config_example.json](https://github.com/kfrancischen/galaxy/blob/master/example/cpp/server_config_example.json).\n\n## Server Entry Points\nThe entry point for galaxy filesystem server is located at [server_main.cc](https://github.com/kfrancischen/galaxy/blob/master/cpp/server_main.cc). To run the binary, one can use the following cmd\n```shellscript\nbazel run //cpp:galaxy_server -- \\\n--fs_global_config=/home/pslx/galaxy/example/cpp/server_config_example.json \\\n--fs_cell=aa\n```\nWith the above cmd, the machine is added as cell `aa` with configurations specified in the `server_config_example.json` file.\n\n## Client Python API\ngalaxy provides unified API for client to access both local and remote files, to build the python modules, please following the cmd of\n```shellscript\npython setup.py install\n```\n\nThe modules: `gclient` and `gclient_ext` will be built as part of `galaxy_py`. The following functions are provided under `gclient` module\n```python\ncreate_dir_if_not_exist(path, mode=0777)\n```\n* Description: create a directory (along with its parent directories) if it does not exist.\n* Args:\n    1. path: the path to the directory\n    2. mode: the permission mode\n\n```python\ndir_or_die(path)\n```\n* Decription: get the path to the directory if exist, other wise empty.\n* Args:\n    1. path: the path to the directory\n\n```python\nrm_dir(path)\n```\n* Decription: remove a directory and the files in the directory.\n* Args:\n    1. path: the path to the directory\n\n```python\nrm_dir_recursive(path)\n```\n* Decription: remove an empty directory recursively (including all the children files and directories).\n* Args:\n    1. path: the path to the directory\n\n```python\nlist_dirs_in_dir(path)\n```\n* Decription: list all directories in a directory.\n* Args:\n    1. path: the path to the directory\n\n```python\nlist_files_in_dir(path)\n```\n* Decription: list all files in a directory.\n* Args:\n    1. path: the path to the directory\n\n```python\nlist_dirs_in_dir_recursive(path)\n```\n* Decription: list all directories in a directory and its subdirectories.\n* Args:\n    1. path: the path to the directory\n\n```python\nlist_files_in_dir_recursive(path)\n```\n* Decription: list all files in a directory and its subdirectories.\n* Args:\n    1. path: the path to the directory\n\n```python\ncreate_file_if_not_exist(path, mode=0777)\n```\n* Description: create a file (along with its parent directories) if it does not exist.\n* Args:\n    1. path: the path to the file\n    2. mode: the permission mode\n\n```python\nfile_or_die(path)\n```\n* Decription: get the path to the file if exist, other wise empty.\n* Args:\n    1. path: the path to the direcfiletory\n\n```python\nrm_file(path)\n```\n* Decription: remove a file.\n* Args:\n    1. path: the path to the file\n\n```python\nrename_file(old_path, new_path)\n```\n* Decription: rename a file on the same cell\n* Args:\n    1. old_path: the path to the old file\n    2. new_path: the path to the new file\n\n```python\nread(path)\n```\n* Decription: read a file (Note: the return is in the form of raw bytes).\n* Args:\n    1. path: the path to the file\n\n```python\nread_multiple(paths)\n```\n* Decription: read a list of files (Note: the return is in the form of raw bytes).\n* Args:\n    1. paths: the paths to the files\n\n\n```python\nwrite(path, data, mode=\"w\")\n```\n* Decription: write data to a file.\n* Args:\n    1. path: the path to the file\n    2. data: the data in string format\n    3. mode: `w` means overwrite and `a` means append.\n\n```python\nwrite_multiple(path_data_map, mode=\"w\")\n```\n* Decription: write multiple data to files.\n* Args:\n    1. path_data_map: a map from path to the data to write.\n    3. mode: `w` means overwrite and `a` means append.\n\n```python\nget_attr(path)\n```\n* Decription: get attribute information of a file or a directory.\n* Args:\n    1. path: the path to the file or directory\n\n```python\ncopy_file(from_path, to_path)\n```\n* Decription: copy a file from from_path to to_path. Note these two paths could be in the same cell or different cells.\n* Args:\n    1. from_path: the path to the file\n    2. to_path: the path to the copied file\n\n```python\nmove_file(from_path, to_path)\n```\n* Decription: move a file from from_path to to_path. Note these two paths could be in the same cell or different cells.\n* Args:\n    1. from_path: the path to the file\n    2. to_path: the path to the moved file\n\n```python\nlist_cells()\n```\n* Description: list all the cells in the galaxy system.\n\n```python\ncheck_health(cell)\n```\n* Decription: check the health of a cell server.\n* Args:\n    1. cell: the cell name.\n\n```python\nremote_execute(cell, home_dir, main, program_args, env_kargs)\n```\n* Decription: remotely execute a cmd to a remote cell.\n* Args:\n    1. cell: the cell where the cmd will be executed.\n    2. home_dir: the home director to execute the cmd.\n    3. main: the main program.\n    4. program_args: the arguments for the main program.\n    5. env_kargs: environmental variables.\n\n```python\nis_local_path(path)\n```\n* Decription: whether a path is remote or local.\n* Args:\n    1. path: the path to be checked.\n\n```python\nbroadcast_shared_path(path, cells)\n```\n* Decription: broadcast a shared path to cell paths.\n* Args:\n    1. path: the path to be broadcasted. Has to start with `/SHARED/`.\n    2. cells: the cells to be broadcasted.\n\n\nIn addition, in `gclient_ext` module, a few extension functions are provided\n\n```python\nwrite_proto_message(path, data, mode=\"w\")\n```\n* Decription: write a protobuf data to a file.\n* Args:\n    1. path: the path to the file\n    2. data: the data in protobuf message format\n    3. mode: \"w\" means overwrite and \"a\" means append.\n\n```python\nwrite_proto_messages(path_data_map, mode=\"w\")\n```\n* Decription: write multiple protobuf data to files.\n* Args:\n    1. path_data_map: a map from path to the protobuf to write.\n    3. mode: \"w\" means overwrite and \"a\" means append.\n\n\n```python\nread_proto_message(path, message_type)\n```\n* Decription: read a protobuf message file.\n* Args:\n    1. path: the path to the protobuf message file\n    2. message_type: the protobuf type of the message\n\n```python\nread_proto_messages(paths, message_type)\n```\n* Decription: read a list of protobuf message files.\n* Args:\n    1. paths: the paths to the protobuf message files\n    2. message_type: the protobuf type of the message\n\n```python\nread_txt(path)\n```\n* Decription: read a text file.\n* Args:\n    1. path: the path to the text file\n\n```python\nread_txts(paths)\n```\n* Decription: read a list of text files.\n* Args:\n    1. paths: the paths to the text files\n\n```python\nlist_all_in_dir(path)\n```\n* Decription: list all directories and files in a directory.\n* Args:\n    1. path: the path to the directory\n\n```python\nlist_all_in_dir_recursive(path)\n```\n* Decription: list all directories and files in a directory and subdirectories.\n* Args:\n    1. path: the path to the directory\n\n```python\ncopy_folder(from_path, to_path)\n```\n* Decription: copy a folder from from_path to to_path. Note these two paths could be in the same cell or different cells.\n* Args:\n    1. from_path: the path to the folder\n    2. to_path: the path to the copied folder\n\n```python\nmove_folder(from_path, to_path)\n```\n* Decription: move a folder from from_path to to_path. Note these two paths could be in the same cell or different cells.\n* Args:\n    1. from_path: the path to the folder\n    2. to_path: the path to the moved folder\n\n\n## Galaxy Logging\nGalaxy logging allows users to stream logs to different cells in a distributed fashion. The logger class is defined as follows\n```python\nfrom galaxy_py import glogging\nglogging.get_logger(log_name, log_dir, disk_only)\n```\n* Args:\n    1. log_name: the name of the logger\n    2. log_dir: the directory to save the logs.\n    3. disk_only: whether the log is only output to disk (not to the console). This can be controlled by the environment variable `GALAXY_logging_disk_only`. Default value is False.\n\nThe final log file will be in the format of `${log_dir}/${log_name}.${YY-MM-DD}.${LOG_LEVEL}.log`. The following is an example to use the `glogging`:\n```python\nfrom galaxy_py import glogging\n\n\ndef main():\n    logger = glogging.get_logger('test', \"/galaxy/aa-d/ttl=1d\")\n\n    logger.info(\"this is an info test\")\n    logger.warning(\"This is a warning test\")\n    logger.error(\"This is an error test\")\n    logger.debug(\"This is a debug test\")\n    logger.critical(\"This is a critical test\")\n    logger.fatal(\"This is a fatal test\")\n\n\nif __name__ == \"__main__\":\n    main()\n```\n\n## Fileutil tool\nfileutil is an entry point for file operations across different cells. The entry point is located at [fileutil_main.cc](https://github.com/kfrancischen/galaxy/blob/master/cpp/tool/fileutil_main.cc). To build the binary, the bazel cmd is\n```shellscript\nbazel build -c opt //cpp/tool:fileutil\n```\n\nThe following cmds are supported:\n\n```shellscript\nfileutil ls ${DIR_NAME}\n```\n* Description: list all the contents in the remote directory.\n\n```shellscript\nfileutil cp_file ${FILE_1} ${FILE_2} [--f]\n```\n* Description: copy a file from `FILE_1` to `FILE_2`. Overwrite if `--f` is set.\n\n```shellscript\nfileutil move_file ${FILE_1} ${FILE_2} [--f]\n```\n* Description: move a file from `FILE_1` to `FILE_2`. Overwrite if `--f` is set.\n\n```shellscript\nfileutil cp_dir ${DIR_1} ${DIR_2} [--f]\n```\n* Description: copy a directory from `DIR_1` to `DIR_2`. Overwrite if `--f` is set.\n\n```shellscript\nfileutil move_dir ${DIR_1} ${DIR_2} [--f]\n```\n* Description: move a directory from `DIR_1` to `DIR_2`. Overwrite if `--f` is set.\n\n```shellscript\nfileutil rm ${REMOTE_DIR/REMOTE_FILE} [--r]\n```\n* Description: delete remote file/directory (recursively if `--r` is set).\n\n```shellscript\nfileutil lscells\n```\n* Description: list all cells in the galaxy system.\n\n## Flags\ngalaxy allows users to set following flags to customize server (mainly) and the client. These flags are defined in [galaxy_flag,h](https://github.com/kfrancischen/galaxy/blob/master/cpp/core/galaxy_flag.h), and their definitions are at [galaxy_flag.cc](https://github.com/kfrancischen/galaxy/blob/master/cpp/core/galaxy_flag.cc). For servers the flags of `fs_root`, `fs_address`, `fs_password` must be specified, and the values of these flags are usually put in the global configuration file. Besides using the configuration file or using the cmd line fashion [abseil](https://abseil.io/docs/cpp/quickstart) supports, one can also specify the flags by using `GALAXY_${FLAG_NAME}` environment variable. For instance, setting `GALAXY_fs_root=/home` is equivalent to parsing `fs_root=/home` as cmd line argument.\n\n## Extensions\n\n#### Galaxy Viewer\n\nA file browser extension is also implemented under [ext/viewer](https://github.com/kfrancischen/galaxy/tree/master/ext/viewer), which uses the Galaxy Python API and flask. The viewer can be launched with the following cmd\n\n```shellscript\npython galaxy_viewer.py --username=test --password=test --port=8000\n```\nand the viewer is hosted at `0.0.0.0:8000` with the preset username and password for login.\n\n#### Galaxy TTL Cleaner\n\nThe galaxy file system is also built in with an [ext/ttl_cleaner](https://github.com/kfrancischen/galaxy/tree/master/ext/ttl_cleaner) extension, where one can specify a path with `ttl=${N}d` or `ttl=${N}h` or `ttl=${N}m` for `N` days, hours, minutes, respectively. Capital letters of `D`, `H` and `M` can also be used. Galaxy will only keep the files within the ttl lifetime in the path if the path is associated with a valid ttl.\n\nTo launch the batch, periodic ttl cleaner, please use the following command\n```shellscript\nbazel run -c opt //ext/ttl_cleaner:galaxy_ttl_cleaner -- --run_every=1\n```\nThe argument of `run_every` means the sleep time (in minute) between adjacent ttl cleaner runs. The deafult value is 10 (minutes).\n\n\n## Examples\nThe examples are at folder [example](https://github.com/kfrancischen/galaxy/tree/master/example), and the following is a Python example\n```python\nfrom galaxy_py import gclient, gclient_ext\nfrom hello_world_pb2 import TestMessage\nimport time\n\n\ndef main():\n    gclient.create_dir_if_not_exist(\"/galaxy/aa-d/test_from_python\")\n    print(gclient.list_dirs_in_dir(\"/galaxy/aa-d/\"))\n    print(gclient.list_dirs_in_dir(\"/home/pslx/Downloads\"))\n    print(gclient.get_attr(\"/home/pslx/Downloads\"))\n    print(gclient.get_attr(\"/galaxy/aa-d/test_from_python\"))\n    gclient.create_file_if_not_exist(\"/galaxy/aa-d/test_from_python/test.txt\")\n    gclient_ext.cp_file(\"/galaxy/aa-d/test_from_python/test.txt\",\n                        \"/galaxy/aa-d/test_from_python/test1.txt\")\n\n    gclient_ext.mv_file(\"/galaxy/aa-d/test_from_python/test.txt\",\n                        \"/galaxy/aa-d/test_from_python/test3.txt\")\n    print(gclient.list_dirs_in_dir_recursive(\"/galaxy/aa-d/test_from_python\"))\n    print(gclient.list_files_in_dir_recursive(\"/galaxy/aa-d/test_from_python\"))\n\n    t = time.time()\n    gclient.read(\"/galaxy/aa-d/large_test.txt\")\n    print(time.time() - t)\n\n    data = gclient.read_multiple([\"/galaxy/aa-d/test_from_python/test1.txt\", \"/galaxy/aa-d/test3.txt\"])\n    for key, val in data.items():\n        print(key, val)\n\n    data = gclient_ext.read_txts([\"/galaxy/aa-d/test_from_python/test1.txt\", \"/galaxy/aa-d/test3.txt\"])\n    for key, val in data.items():\n        print(key, val)\n\n    print(gclient.list_cells())\n    print(gclient.read('/galaxy/aa-d/test.pb'))\n    print(gclient_ext.read_txt('/galaxy/aa-d/test.pb'))\n    message = TestMessage()\n    message.name = \"test\"\n    gclient_ext.write_proto_message('/galaxy/aa-d/test1.pb', message)\n    print(gclient_ext.read_proto_message('/galaxy/aa-d/test1.pb', TestMessage))\n    print(gclient.check_health(\"aa\"))\n\n    gclient.write_multiple(\n        path_data_map={\n            '/galaxy/aa-d/test_from_python/test2.txt': '123',\n            '/galaxy/aa-d/test_from_python/test4.txt': '234',\n        },\n        mode='a'\n    )\n    gclient_ext.write_proto_messages(\n        path_data_map={\n            '/galaxy/aa-d/test_from_python/test2.pb': message,\n            '/galaxy/aa-d/test_from_python/test4.pb': message,\n        }\n    )\n\n\nif __name__ == \"__main__\":\n    main()\n\n```\nThe following is a C++ example:\n```cpp\n/* Example cmd\n* GALAXY_fs_global_config=/home/pslx/galaxy/example/cpp/server_config_example.json \\\n* bazel run -c opt //example/cpp:client_example -- --proto_test=/galaxy/aa-d/Downloads/test1/test.pb\n*/\n\n#include \u003ciostream\u003e\n#include \u003cstring\u003e\n#include \u003cvector\u003e\n\n#include \"cpp/client.h\"\n#include \"cpp/core/galaxy_flag.h\"\n#include \"glog/logging.h\"\n#include \"absl/flags/flag.h\"\n#include \"absl/flags/parse.h\"\n#include \"schema/fileserver.pb.h\"\n\nABSL_FLAG(std::string, mkdir_test, \"\", \"The directory for mkdir test.\");\nABSL_FLAG(std::string, rmdir_test, \"\", \"The directory for rmdir test.\");\nABSL_FLAG(std::string, createfile_test, \"\", \"The directory for createfile test.\");\nABSL_FLAG(std::string, proto_test, \"\", \"The directory for createfile test.\");\n\nint main(int argc, char* argv[]) {\n    absl::ParseCommandLine(argc, argv);\n    FLAGS_log_dir = absl::GetFlag(FLAGS_fs_log_dir);\n    google::InitGoogleLogging(argv[0]);\n    if (!absl::GetFlag(FLAGS_mkdir_test).empty()) {\n        galaxy::client::CreateDirIfNotExist(absl::GetFlag(FLAGS_mkdir_test));\n    }\n    if (!absl::GetFlag(FLAGS_rmdir_test).empty()) {\n        galaxy::client::RmDir(absl::GetFlag(FLAGS_rmdir_test));\n    }\n    if (!absl::GetFlag(FLAGS_createfile_test).empty()) {\n        galaxy::client::CreateFileIfNotExist(absl::GetFlag(FLAGS_createfile_test));\n        galaxy::client::Write(absl::GetFlag(FLAGS_createfile_test), \"hello world\");\n        std::cout \u003c\u003c galaxy::client::Read(absl::GetFlag(FLAGS_createfile_test)) \u003c\u003c std::endl;\n        std::cout \u003c\u003c galaxy::client::Read(\"/galaxy/aa-d/some_random_file\") \u003c\u003c std::endl;\n    }\n    if (!absl::GetFlag(FLAGS_proto_test).empty()) {\n        galaxy_schema::Credential cred;\n        cred.set_password(\"test\");\n        std::string cred_str;\n        cred.SerializeToString(\u0026cred_str);\n        galaxy::client::Write(absl::GetFlag(FLAGS_proto_test), cred_str);\n        std::string result = galaxy::client::Read(absl::GetFlag(FLAGS_proto_test));\n        galaxy_schema::Credential result_cred;\n        result_cred.ParseFromString(result);\n        std::cout \u003c\u003c result_cred.DebugString() \u003c\u003c std::endl;\n    }\n\n    return 0;\n}\n```\n`Python` APIs are more recommended.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkfrancischen%2Fgalaxy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkfrancischen%2Fgalaxy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkfrancischen%2Fgalaxy/lists"}