{"id":20340627,"url":"https://github.com/cleve/pulzar","last_synced_at":"2025-10-09T21:09:25.564Z","repository":{"id":43227637,"uuid":"249227850","full_name":"cleve/pulzar","owner":"cleve","description":"Scalable key-value database and manage/schedule job processes distributed","archived":false,"fork":false,"pushed_at":"2023-05-23T00:53:13.000Z","size":1520,"stargazers_count":1,"open_issues_count":2,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-04T14:49:21.373Z","etag":null,"topics":["api","backup","balance","database","distibuted","framework","job-scheduler","key-value","python","python3","restore","scalability","server","uwsgi","volumes"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cleve.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-03-22T16:46:09.000Z","updated_at":"2024-08-14T23:50:09.000Z","dependencies_parsed_at":"2025-01-14T17:42:24.535Z","dependency_job_id":"f6921e18-6768-4b35-9166-c0898f95182f","html_url":"https://github.com/cleve/pulzar","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/cleve/pulzar","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cleve%2Fpulzar","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cleve%2Fpulzar/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cleve%2Fpulzar/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cleve%2Fpulzar/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cleve","download_url":"https://codeload.github.com/cleve/pulzar/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cleve%2Fpulzar/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279002074,"owners_count":26083285,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-09T02:00:07.460Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["api","backup","balance","database","distibuted","framework","job-scheduler","key-value","python","python3","restore","scalability","server","uwsgi","volumes"],"created_at":"2024-11-14T21:22:55.206Z","updated_at":"2025-10-09T21:09:25.550Z","avatar_url":"https://github.com/cleve.png","language":"Python","readme":"# Pulzar\n\nIntended to be used in an internal network. In the future will be added security.\n\n## Versioning\n\nThe versioning number match with the year.month.day of the release.\n\nPulzar has two components\n\n### VariDB\n\nIs a distributed database system, with load balance, easy to recover and backup.\n\n### Job system\n\nIs a distributed job system with load balance.\n\n## Uses\n\n* Configuration server.\n* Store big amount of data, scalable.\n* Run jobs (Python scripts in parallel).\n\n## Dependences\n\n### Next Python modules are needed for the basic system\n\n- lmdb 1.1.1\n- requests 2.25.1\n- psutil 5.8.0\n- schedule 0.6.0\n- pillow 8.1.2\n\n## Configuration\n\nThe system can be configured under **config/server.conf**\n\nThe configuration is pretty simple:\n\n```ini\n[server]\nhost=localhost\nport=31414\nkey=l415S4Nt05\n\n[volume]\n# Where to store files\ndir=/var/lib/pulzar/data\nport=31415\n\n[general]\nretention_policy=90\n# In MB\nmaxsize=5\n\n[jobs]\ndir=jobs\n\n[backup]\nactive=False\ntype=None\naddress=None\nuser=None\npsw=None\n```\n\n### Start system DEV\n\nIf you are in Ubuntu, remove the default **uwsgi** package installed, and use \n**pip** to get the proper one.\n\nMake sure to run in DEBUG mode. Under app/pulzarutils/constants.py\n\nThis option allows you, to use the local file directory, under **app/storage**\n\n```py\n# app/pulzarutils/constants.py\nself.DEBUG = True\n```\n\n```sh\ncd app\n# Start the master\nuwsgi --ini config/master.ini\n\n# Start the node\nuwsgi --ini config/volume.ini\n```\n\n# Methods\n\n## String values\n\n### Add key value\n\n```sh\n# master:[port]/add_key/{key}\ncurl -X PUT -L -T /path/to/file http://master:[port]/add_key/{key}\n\n# Or\ncurl --request PUT --location --data-binary '@/path/to/file' 'http://master:[port]/add_key/{key}\n```\n\n### Add a key value during a time\n\nUse the **temporal** parameter.\n\n```sh\nmaster:[port]/add_key/{key}?temporal=[int:days]\ncurl -X PUT -L -T /path/to/file http://master:[port]/add_key/{key}?temporal={int}\n```\n\nWhere the int value indicates the amount of days than the file will be available.\n\nFor large files, you can use an efficient way:\n\n1. Request the node URL for your key\n2. Use the URL in 1 for upload the file\n\nExample:\n\n```sh\n# Request the url\nmaster:[port]/get_node/{key}\ncurl -X GET -L http://master:[port]/get_node/my_key.txt\n```\n\nResponse:\n\n```json\n{\n    \"data\": \n        {\n            \"node\": \"http://node:port/add_key/my_key.txt?url=master:port\"\n        },\n    \"status\": \"ok\",\n    \"msg\": \"ok\"\n}\n```\n\nUse the **node** URL to storage the file\n\n```sh\n# Upload the file\ncurl -X PUT -T /path/to/file/my_key.txt -L http://master:[port]/get_node/my_key.txt\n```\n\n#### Snippets ####\n\n##### c# #####\n```csharp\n// Upload the file\nusing (WebClient wc = new WebClient())\n    {\n        try\n        {\n            string apiUrl = @\"http://master:[port]/add_key/path/my_key.key\"\n            wc.Headers.Add(\"Content-Type\", \"application/octet-stream\");\n            wc.Headers.Add(\"User-Agent\", \"Mozilla/5.0 (Windows NT 6.3; WOW64; rv:29.0) Gecko/20100101 Firefox/29.0\");\n            byte[] result = wc.UploadFile(apiUrl, \"PUT\", filePath);\n            // Get string response, you should serialize it.\n            string strResult = Encoding.UTF8.GetString(result);\n        }\n        catch (Exception ex)\n        {\n            // Handle ex.\n        }\n    }\n```\n\n##### Python #####\n\n```python\n# Upload the file\ntry:\n    req = requests.put(\n        url='http://master:[port]/add_key/path/my_key.key',\n        data=open('path/to/the/file.key', 'rb'),\n        headers={'Content-Type': 'application/octet-stream'}\n    )\nexcept Exception as err:\n    # Handle error\n```\n\n### Read key value\n```sh\nmaster:[port]/get_key/{key}\ncurl -X GET -L http://master:[port]/get_key/{key}\n```\n\n### Remove key value\n```sh\nmaster:[port]/delete_key/{key}\ncurl -X DELETE -L http://master:[port]/delete_key/{key}\n```\n\n## Integrations\n\n### Extending the app\n\nSometimes you would like to add your own code, like some analysis over the data\nor even a totally new kind of process. This feature is intended to execute process \nduring the request operation.\n\nBy default the next libaries are ready to go:\n\n- pyodbc\n- opencv\n- pytesseract\n- Pillow\n- psycopg2\n\nIn order to do this, you can add a module into the\n***app/extensions/*** directory. The extension must have a class with the\nname of the file **capitalized**.\n\nIf the extension is:\n\n```app/extensions/mysuperextension```\n\nThe class has to be named **Mysuperextension**\n\nThe template of the file:\n\n```py\nfrom pulzarutils.extension import Extension\n\n\nclass Mysuperextension(Extension):\n    def __init__(self, arguments, params, file_path=None):\n        '''Receiving values\n            URL: http://master:[port]/extension/arg_1/arg_2/arg_n?param_1=1\u0026param_2=2\u0026param_n=n\n\n        arguments\n        ---------\n        arguments = ['arg_1', 'arg_2', 'arg_n']\n\n        parameters\n        ----------\n        params = {'param_1': [1], 'param_2': [2], 'param_n': [n]}\n        '''\n        pass\n\n    def execute(self):\n        '''Mandatory method\n\n        Return\n        ------\n        Python serializable: list or dictionary\n        '''\n        return []\n```\n\nwhere the *arguments* parameter is a string list provided in the URL.\n\nAlso a **return** is required as python list or dictionary.\n\nTo call the custom function you can use:\n\n```sh\nmaster:[port]/extension/{app_id}/{args}\ncurl -X GET -L http://master:[port]/extension/{app_id}/{arg1}/{arg2}/{arg_n}\n```\n\nWhere **app_id** is the script added into the *extensions* directory and the **arg1, arg2,...,arg_n**\nis a string list of type:\n\n```py\n['arg1', 'arg2', 'arg_n']\n```\n\n#### Example\n\nYou can find an example in the **extensions** directory:\n\n```py\n# File: example.py\nfrom pulzarutils.extension import Extension\n\n\nclass Example(Extension):\n    def __init__(self, arguments, params, file_path=None):\n        '''Receiving values\n            URL: http://master:[port]/extension/arg_1/arg_2/arg_n?param_1=1\u0026param_2=2\u0026param_n=n\n\n        arguments\n        ---------\n        arguments = ['arg_1', 'arg_2', 'arg_n']\n\n        parameters\n        ----------\n        params = {'param_1': [1], 'param_2': [2], 'param_n': [n]}\n        '''\n\n        self.args = arguments\n        self.params = params\n\n    def hello(self):\n        if len(self.args) \u003e 0:\n            print('Hello example with arg ', self.args)\n\n    def method_return(self):\n        return {'my_arg': self.args, 'my_params': self.params}\n\n    def execute(self):\n        '''Mandatory method\n        '''\n        self.hello()\n        return self.method_return()\n```\n\n#### Search extension\n\nA search utility is included in order to demonstrate the powerful of this tool.\n\nYou can search values using dates, the format is: *mm-dd-yyyy*\n\n```sh\n# Search a key\nmaster:[port]/extension/search/[key]\n\n# Search a key in a specific date\nmaster:[port]/extension/search/[key]?eq=[date]\n\n# Search a key lower and greater than\nmaster:[port]/extension/search/[key]?lt=[date]\u0026gt=[date]\n```\n\n#### OCR extension\n\nText detection and search feature\n\n```sh\n# Get text\nmaster:[port]/extension/ocr/[image_name]\ncurl -X PUT -L -T /path/to/file http://master:[port]/extension/ocr/[image_name]\n\n# Search text into the image\nmaster:[port]/extension/ocr/[image_name]?search=text\u0026invert=[0|1]\ncurl -X PUT -L -T /path/to/file http://master:[port]/extension/ocr/[image_name]?search=text\u0026invert=[0|1]\n```\n\nResponse\n\n```json\n{\n    \"data\": {\n        \"text\": \"ubuntu\\n\\f\"\n    },\n    \"status\": \"ok\",\n    \"msg\": \"\"\n}\n```\n\n#### Image Match extension\n\nSearch a sub-image into a base image\n\n```sh\n# Search a sub-image\nmaster:[port]/extension/imagematching?image_url=[URI]\ncurl -X PUT -L -T /path/to/file http://master:[port]/extension/imagematching?image_url=[URI]\n\n# Search a sub-image with percent\nmaster:[port]/extension/imagematching?image_url=[URI]\u0026percent=90\ncurl -X PUT -L -T /path/to/file http://master:[port]/extension/imagematching?image_url=[URI]\u0026percent=90\n```\n\nResponse\n\n```json\n{\n    \"data\": {\n        \"found\": true,\n        \"percent_of_match\": 0.9,\n        \"coordinates\": {\n            \"x\": 242,\n            \"y\": 32,\n            \"w\": 800,\n            \"h\": 409\n        },\n        \"msg\": null\n    },\n    \"status\": \"ok\",\n    \"msg\": \"\"\n}\n```\n\n## Jobs\n\nYou can launch jobs using the nodes. Similarly to third party, there is a directory \nused to store the scripts.\n\nThe job directory can be changed into the configuration file. By default \nthe system is set to the **jobs** directory.\n\n```\napp/launch_job/[custom_directory]/[your_script].py\n```\n\nThe API \n\n```sh\n# POST\nmaster:[port]/launch_job/[custom_directory]/[your_script]\n```\n\n#### Body\n\n```json\n{\n    \"arg1\": \"value1\",\n    \"arg2\" : 123\n}\n```\n\n### Scheduling jobs\n\nTo schedule a job, you need to add the *scheduled* key into the body\n\n#### Body\n\n```json\n{\n    \"arg1\": 12,\n    \"arg2\": 225798,\n    \"scheduled\": {\"interval\": \"minutes\", \"time_unit\": 5, \"repeat\": 1}\n}\n```\n\nWhere:\n\n**interval**\n\nThe repetitive interval of time, this string can be:\n\n* minutes\n* hours\n* weeks\n\n**time_unit**\n\nIndicates the repetition time based in the interval type. For example:\n\n    interval = minutes\n    time_unit = 5: \n\nLaunch a job every 5 minutes\n\n    interval = hours\n    time_unit = 24\n\nLaunch a job every day\n\n#### Using data stored in a node\n\nIf you have data in one of the nodes and the data needs to be processed, you can use\nan extra key in the parameters in order to use that node and avoiding download/transfer\ndata. The key is named **pulzar_data** and should include the complete key of the data.\n\n```json\n{\n    \"arg1\": 12,\n    \"arg2\": 225798,\n    \"scheduled\": {\"interval\": \"minutes\", \"time_unit\": 5, \"repeat\": 1},\n    \"pulzar_data\": \"/path/to/my_key.key\"\n}\n```\n\n#### Cancel jobs\n\n```sh\n# POST\nmaster:[port]/cancel_job/job_id\n```\n\n# Maintenance\n\n## System information\n\nAll the API responses are formed as:\n\n```json\n{\n    \"data\": {\n        \"my_data_0\": 0,\n        \"my_data_1\": 1,\n        \"my_data_2\": \"2\",\n        \"my_data_n\": [1,2]\n    },\n    \"msg\": \"\",\n    \"status\": \"ok\",\n}\n```\n\nWhere the **data** key, can contain any JSON.\n\n\n### Get master status\n\n```sh\nmaster:[port]/admin/status\ncurl -X GET -L http://master:[port]/admin/status\n```\n\nThe response is a binding from LMDB info.\n\n```json\ndata: {\n    \"psize\": 4096,\n    \"depth\": 2,\n    \"branch_pages\": 1,\n    \"leaf_pages\": 7,\n    \"overflow_pages\": 0,\n    \"entries\": 600\n}\n```\n\n### Get network status\n\n```sh\nmaster:[port]/admin/network\ncurl -X GET -L http://master:[port]/admin/network\n```\n\nA JSON list will be sent, of type:\n\n```json\ndata: [\n    {\n        \"node\": \"node_name\",\n        \"percent\": 13,\n        \"synch\": true\n    }\n]\n```\n\n### Get node status\n\n```sh\nmaster:[port]/admin/network/{node_id}\ncurl -X GET -L http://master:[port]/admin/network/{node_id}\n```\n\nA JSON will be sent, of type:\n\n```json\ndata: {\n    \"node\": \"node_name\",\n    \"percent\": 13,\n    \"synch\": true\n}\n```\n\n### Get job status\n\n```sh\nmaster:[port]/admin/jobs\ncurl -X GET -L http://master:[port]/admin/jobs\n```\n\nA JSON will be sent, of type:\n\n```json\ndata: {\n    \"pendings\": [\n        {\n            \"job_id\": 21,\n            \"job_name\": \"example_01\",\n            \"parameters\": \"{\\\"arg1\\\": \\\"12\\\", \\\"arg2\\\": \\\"20\\\"}\",\n            \"node\": \"mauricio-ksrd\",\n            \"creation_time\": 0\n        }\n    ],\n    \"ready\": [\n        {\n            \"job_id\": 19,\n            \"job_name\": \"example_01\",\n            \"parameters\": \"{\\\"arg1\\\": \\\"12\\\", \\\"arg2\\\": \\\"20\\\"}\",\n            \"node\": \"mauricio-ksrd\",\n            \"creation_time\": 1\n        }\n        {\n            \"job_id\": 26,\n            \"job_name\": \"example_01\",\n            \"parameters\": \"{\\\"arg1\\\": \\\"12\\\", \\\"arg2\\\": \\\"20\\\"}\",\n            \"node\": \"mauricio-ksrd\",\n            \"creation_time\": 1\n        }\n    ],\n    \"failed\": [\n        {\n            \"job_id\": 4,\n            \"job_name\": \"example_01\",\n            \"parameters\": \"{\\\"arg1\\\": \\\"1\\\", \\\"arg2\\\": \\\"2\\\", \\\"arg3\\\": \\\"33\\\"}\",\n            \"node\": \"mauricio-ksrd\",\n            \"creation_time\": 2\n        },\n        {\n            \"job_id\": 5,\n            \"job_name\": \"example_01\",\n            \"parameters\": \"{\\\"arg1\\\": \\\"1\\\", \\\"arg2\\\": \\\"2\\\", \\\"arg3\\\": \\\"33\\\"}\",\n            \"node\": \"mauricio-ksrd\",\n            \"creation_time\": 2\n        }\n    ],\n    \"scheduled\": [\n        {\n            \"job_id\": 20,\n            \"job_name\": \"example_01\",\n            \"parameters\": \"{\\\"arg1\\\": 12, \\\"arg2\\\": 225798}\",\n            \"creation_time\": \"2020-08-16 11:50:55.276460\",\n            \"interval\": \"minutes\",\n            \"time_unit\": \"5\",\n            \"repeat\": 1,\n            \"next_execution\": \"2020-08-16 14:23:53.398787\"\n        }\n    ]\n}\n```\n\n## Auto-backup\n\nAn auto-backup can be configured using the configuration file, unser the *backup* section.\n\n## Backup\n\nTo backup the data, you only need to save the directory configured in the config file.\nThe example shows **/tmp/volume**. So you can simply Tar or Zip the files and move it to another \nplace.\n\n## Restore\n\nIts pretty simple, just follow the next steps:\n\n### Volume restauration\n\n1. If its a fresh installation, make sure to fill up the volume configuration under **app/config**\ndirectory. If not, go to step **2**.\n\n2. In order to restore the files, just untar the files\npreviously backed into the directory configured.\n\n3. Start the volume server\n\n### Master restauration\n\n1. If its a fresh installation, make sure to fill up the master configuration under **app/config**\ndirectory.\n\n2. Start the Master server.\n\n3. Use *manage.py* utility to synch the volumes with the master.\n\n```sh\npython3 manage.py --restore [volume_url]\n```\nLimitation: The datetime will be lost with this action.\n\n# Internal methods\n\nUsed internally to sync\n\n```sh\nmaster:[port]/skynet/\nvolume:[port]/autodiscovery/\n```\n\n# Dev\n\nTo run the DB locally, point your name machine properly to 127.0.0.1 in the \n**/etc/hosts** file.\n\nIn order to debug faster, I created an app to view values from LMDB. This app runs over Java 11.\n\nhttps://github.com/cleve/lmdb-viewer\n\n## Keys\n\nKeys will be encoded in base64, only ASCII chars are allowed.\n\n## Docker\n\n### Master\n\n```sh\n# From the root directory\ndocker build --rm -f dockers/Dockerfile.main -t pulzar-master:latest .\n\n# Run it\ndocker run --hostname [host] --name [name] --rm -d -p 31414:31414 pulzar-master:latest\n```\n\n### Volume\n\n```sh\n# From the root directory\ndocker build --rm -f dockers/Dockerfile.node -t pulzar-node:latest .\n\n# Run it\ndocker run --hostname [host] --name [name] --rm -d -p 31415:31415 pulzar-node:latest\n```\n\n# Test\n\nFor test purposes files of 1kb were used.\n\n## Write tests\n\n### Synchronical executions\n\nFor a set of 10000 instances\n\n* Request time: 0.038187040000000005(s)\n* Total time: 0.0632065145202796(s)\n\n## Read tests\n\n### Executions\n\n* Request time: 0.0019433000000000002(s)\n* Total time: 0.0057227806300943485(s)\n\n## Delete tests\n\n* Request time: 0.0038965599999999994\n* Total time: 0.012113715090054029\n\n## Restore test\n\nPreparing 600 files: 23.977071480000177(s)\n\n## Windows\n\nYou can use the linux subsystem. Tested with Ubuntu 20.04.\n\nFirst install pip3 using\n\n```sh\nsudo apt install python3-pip\n```\n\nAfter, install **uwsgi** with pip3\n\n```sh\n sudo pip3 install uwsgi\n```\n\n# Test docker in local\n\nYou can try with the public repo on dockerhub:\n\n## For linux\n```sh\n# Run UI\ndocker run -it --name pulzar-ui -d --rm -p 80:80 mauriciocleveland/pulzar-ui:1.0.1\n\n# Run master\ndocker run --network host --name pulzar-master --rm -d mauriciocleveland/pulzar-master:1.0.1\n\n# Run node\ndocker run --network host --name pulzar-node --rm -d mauriciocleveland/pulzar-node:1.0.1\n```\n\n## For Windows\n\nSame commands but make sure set the **host** option to **docker.for.win.localhost** into the **config/server.conf file**\n\n```ini\n[server]\nhost=docker.for.win.localhost\nport=31414\nkey=l415S4Nt05\n...\n```\n\n# Logs\n\nYou can define the log level under app/pulzarutils/constants.py\n\n```sh\n# INFO, DEBUG, ERROR\nself.DEBUG_LEVEL = 'DEBUG'\n```\n\nIn production, the logs can be found at ``` /var/lib/pulzar/log/ ```\n\nFor errors, you can access to the docker container or use volume to mount the logs\ninto the host machine\n\n# Production\n\nYou can deploy the system using Docker.","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcleve%2Fpulzar","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcleve%2Fpulzar","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcleve%2Fpulzar/lists"}