{"id":17928512,"url":"https://github.com/evilsocket/sum","last_synced_at":"2025-06-24T22:40:44.326Z","repository":{"id":66844739,"uuid":"130066103","full_name":"evilsocket/sum","owner":"evilsocket","description":"A specialized database server for linear algebra and machine learning.","archived":false,"fork":false,"pushed_at":"2023-02-25T03:05:21.000Z","size":1234,"stargazers_count":86,"open_issues_count":3,"forks_count":10,"subscribers_count":6,"default_branch":"master","last_synced_at":"2024-12-30T21:41:53.713Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/evilsocket.png","metadata":{"files":{"readme":"README.md","changelog":"changelog.sh","contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":null,"patreon":"evilsocket","open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"custom":null}},"created_at":"2018-04-18T13:24:22.000Z","updated_at":"2024-10-30T18:43:27.000Z","dependencies_parsed_at":"2023-08-21T15:30:37.341Z","dependency_job_id":null,"html_url":"https://github.com/evilsocket/sum","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/evilsocket%2Fsum","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/evilsocket%2Fsum/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/evilsocket%2Fsum/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/evilsocket%2Fsum/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/evilsocket","download_url":"https://codeload.github.com/evilsocket/sum/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":233430341,"owners_count":18675067,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-28T21:03:37.548Z","updated_at":"2025-01-11T02:17:29.585Z","avatar_url":"https://github.com/evilsocket.png","language":"Go","readme":"# SUM\n\n[![Build](https://img.shields.io/travis/evilsocket/sum/master.svg?style=flat-square)](https://travis-ci.org/evilsocket/sum) \n[![Go Report Card](https://goreportcard.com/badge/github.com/evilsocket/sum)](https://goreportcard.com/report/github.com/evilsocket/sum) \n[![Coverage](https://img.shields.io/codecov/c/github/evilsocket/sum/master.svg?style=flat-square)](https://codecov.io/gh/evilsocket/sum) \n[![License](https://img.shields.io/badge/license-GPL3-brightgreen.svg?style=flat-square)](/LICENSE) \n[![GoDoc](https://godoc.org/github.com/evilsocket/sum?status.svg)](https://godoc.org/github.com/evilsocket/sum) \n[![Release](https://img.shields.io/github/release/evilsocket/sum.svg?style=flat-square)](https://github.com/evilsocket/sum/releases/latest) \n\nSum is a database server for linear algebra and machine learning, providing data persistency, fast in-memory operators with multiple backends (only `blas32` supported at the moment, `cuda` soon) and a scripting engine to access all of this with ease.\n\n## Installation\n\nDownload the [latest binary release](https://github.com/evilsocket/sum/releases/latest), then create the certificate used for authentication and channel encryption:\n\n\tsudo mkdir -p /etc/sumd/creds\n\tsudo openssl req -x509 -newkey rsa:4096 -keyout /etc/sumd/creds/key.pem -out /etc/sumd/creds/cert.pem -days 365 -nodes -subj '/CN=localhost'\n\nProceed to install the `sumd`, `sumcli` and `sumcluster` binaries:\n\n    cd /path/to/extracted/sum\n\tsudo mkdir -p /var/lib/sumd/data\n\tsudo mkdir -p /var/lib/sumd/oracles\n\tsudo mv {sumd,sumcli,sumcluster} /usr/local/bin/\n\nTo install a single `sumd` node as systemd service:\n\n\tsudo mv sumd.service /etc/systemd/system/\n\tsudo systemctl daemon-reload\n\n## Compile from Source\n\nInstall [gRPC go bindings](https://grpc.io/docs/quickstart/go/) and then:\n\n    go get github.com/evilsocket/sum\n    cd $GOPATH/src/github.com/evilsocket/sum\n\nTo run the tests:\n\n    make tests\n\nTo run the benchmarks:\n\n    make benchmark\n\nTo compile and install:\n\n    make\n    sudo make install\n\n## Run a Node\n\n    sudo sumd -listen \"localhost:50051\" -creds /etc/sumd/creds -datapath /var/lib/sumd\n\n## Run a Master\n\n    sudo sumd -listen \"localhost:50051\" -master master.json\n\nWhere `master.json` contains the list of the nodes that this master administers:\n\n```json\n{\n\t\"nodes\": [{\n\t\t\"address\": \"localhost:1000\",\n\t\t\"credentials\": \"/etc/sumd/creds/cert.pem\"\n\t}, {\n\t\t\"address\": \"localhost:1001\",\n\t\t\"credentials\": \"/etc/sumd/creds/cert.pem\"\n\t}, {\n\t\t\"address\": \"localhost:1002\",\n\t\t\"credentials\": \"/etc/sumd/creds/cert.pem\"\n\t}, {\n\t\t\"address\": \"localhost:1003\",\n\t\t\"credentials\": \"/etc/sumd/creds/cert.pem\"\n\t}, {\n\t\t\"address\": \"localhost:1004\",\n\t\t\"credentials\": \"/etc/sumd/creds/cert.pem\"\n\t}, {\n\t\t\"address\": \"localhost:1005\",\n\t\t\"credentials\": \"/etc/sumd/creds/cert.pem\"\n\t}, {\n\t\t\"address\": \"localhost:1006\",\n\t\t\"credentials\": \"/etc/sumd/creds/cert.pem\"\n\t}, {\n\t\t\"address\": \"localhost:1007\",\n\t\t\"credentials\": \"/etc/sumd/creds/cert.pem\"\n\t}]\n}\n```\n\n## Start a Cluster\n\nTo use the `sumcluster` utility to spawn a specific number of workers (by default one per logical CPU), each one in a separate datapath and one master process:\n\n    sudo sumcluster\n\nIf you want to run the nodes bound to localhost, but the master bound to another ip or domain, you need to create two set of certificates. First, create one for the slave nodes:\n\n    sudo mkdir -p /etc/sumd/creds/localhost\n    sudo openssl req -x509 -newkey rsa:4096 -keyout /etc/sumd/creds/localhost/key.pem -out /etc/sumd/creds/localhost/cert.pem -days 365 -nodes -subj '/CN=localhost'\n\nAnd then another for the master node, serving from `domain.com`:\n\n    sudo mkdir -p /etc/sumd/creds\n    sudo openssl req -x509 -newkey rsa:4096 -keyout /etc/sumd/creds/key.pem -out /etc/sumd/creds/cert.pem -days 365 -nodes -subj '/CN=domain.com'\n\nYou can now start the cluster with:\n\n    sumcluster -address \"domain.com:50051\" -creds /etc/sumd/creds/localhost/cert.pem\n\nAnd connect to it with a client:\n\n    sumcli -address domain.com:50051 -name domain.com -cert /path/to/cert.pem -eval \"info; nlist; q\"\n\n## Client\n\nYou can access your sum instance by using the `sumcli` client, run `sumcli -eval \"help; q\"` to print a list of available commands. Moreover, to have an idea of how the client side works, take a look at [the example python client code](https://github.com/evilsocket/sumpy/blob/master/example.py) that will create a few vectors on the server, define an oracle, call it for every vector and print the similarities the server returned.\n\n## Why?\n\nIf you work with machine learning you probably find yourself having around a bunch of huge CSV files that maybe you \nkeep using to train your models, or you run PCA on them, or you perform any sort of analysis. If this is the case, you \nknow the struggle of:\n\n* parsing and loading the file with `numpy`, `tensorflow` or whatever.\n* crossing your fingers that your laptop can actually store those records in memory.\n* running your algorithm\n* ... waiting ...\n\nThis project is an attempt to make these tedious tasks (and many others) simpler if not completely automated. Sum is a database and gRPC high performance service offering three main things:\n\n1. Persistace for your vectors.\n2. A simple CRUD system to create, read, update and delete them.\n3. **Oracles**.\n\nAn **oracle** is a piece of javascript logic you want to run on your data, this code is sent to the Sum server by a \nclient, compiled and stored. It'll then be available for every client to use in order to \"query\" the data.\n\nFor instance, this is the `findSimilar` oracle definition:\n\n```js\n// Given the vector with id=`id`, return a list of\n// other vectors which cosine similarity to the reference\n// one is greater or equal than the threshold.\n// Results are given as a dictionary of :\n//      `vector_id =\u003e similarity`\nfunction findSimilar(id, threshold) {\n    var v = records.Find(id);\n    if( v.IsNull() == true ) {\n        return ctx.Error(\"Vector \" + id + \" not found.\");\n    }\n\n    var results = {};\n    records.AllBut(v).forEach(function(record){\n        var similarity = v.Cosine(record);\n        if( similarity \u003e= threshold ) {\n           results[record.ID] = similarity\n        }\n    });\n\n    return results;\n}\n```\n\nOnce defined on the Sum server, any client will be able to execute calls like `findSimilar(\"some-vector-id-here\", 0.9)`, such\ncalls will be evaluated on data **in memory** in order to be as fast as possible, while the same data will be persisted on disk \nas binary protobuf encoded files.\n\nHere you can see the output of an example usecase - finding behaviourally similar malware samples given a reference executable:\n\n\u003cimg src=\"https://raw.githubusercontent.com/evilsocket/sum/master/malware_pe.png\" /\u003e\n\n\u003cimg src=\"https://raw.githubusercontent.com/evilsocket/sum/master/malware_elf.png\" /\u003e\n","funding_links":["https://patreon.com/evilsocket"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fevilsocket%2Fsum","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fevilsocket%2Fsum","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fevilsocket%2Fsum/lists"}