{"id":13735270,"url":"https://github.com/bgokden/veri","last_synced_at":"2025-08-14T03:31:32.668Z","repository":{"id":33133577,"uuid":"134308304","full_name":"bgokden/veri","owner":"bgokden","description":"Scalable Feature Store","archived":false,"fork":false,"pushed_at":"2023-07-05T21:26:19.000Z","size":1056,"stargazers_count":55,"open_issues_count":1,"forks_count":6,"subscribers_count":5,"default_branch":"master","last_synced_at":"2024-11-15T03:33:35.401Z","etag":null,"topics":["feature-store","golang","knn","knn-search","machine-learning"],"latest_commit_sha":null,"homepage":"http://www.veri.im","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bgokden.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-05-21T18:22:04.000Z","updated_at":"2024-01-04T16:23:13.000Z","dependencies_parsed_at":"2024-06-19T00:16:05.211Z","dependency_job_id":"de7f2f7b-c273-4faf-b5e0-4c8319d5ad75","html_url":"https://github.com/bgokden/veri","commit_stats":null,"previous_names":[],"tags_count":88,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bgokden%2Fveri","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bgokden%2Fveri/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bgokden%2Fveri/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bgokden%2Fveri/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bgokden","download_url":"https://codeload.github.com/bgokden/veri/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":229795836,"owners_count":18125286,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["feature-store","golang","knn","knn-search","machine-learning"],"created_at":"2024-08-03T03:01:04.907Z","updated_at":"2024-12-15T08:14:52.396Z","avatar_url":"https://github.com/bgokden.png","language":"Go","funding_links":[],"categories":["Go","Feature Stores"],"sub_categories":[],"readme":"# Veri\n\n\n![](./resources/verilogo.svg)\n\nVeri is a Distributed Feature Store optimized for Search and Recommendation tasks.\n\nFeature Label store allows storing features as keys and labels as values.\nQuerying values is only possible with knn using features.\n\nVeri also supports creating sub sample spaces of data by default.\n\n\n# Probabilistically Scaling Vector Spaces\n\nVeri works as a cluster that can hold a Vector Space with fixed dimension and allows easy querying of k nearest neighbour search queries and also querying a sample space to be used in a machine learning algorithm.\n\nVeri is currently in Beta Stage\n\n*Veri means data in Turkish.*\n\nVeri is not a regular database, instead it is purely designed to be used in machine learning. It does not give any guarantee of responding with the same result every time.\n\nIn machine learning, data scientist usually convert data into a feature label vector space, when a space is ready it is almost always about writing and optimising the algorithm.\n\nI have worked in different roles as a Data Engineer, Data scientist and a Software Developer. In many projects, I wanted a scalable approach to vector space search which is not available. I wanted to optimise the data ingestion and data querying into one tool.\n\nVeri is meant to scale. Each Veri instance tries to synchronise its data with other peers and keep a statistically identical subset of the general vector space.\n\n## What does statistically identical mean?\n\nVeri keeps the average (Center) and a histogram of distribution of data with the distance to the center (Euclidean Distance).\nEvery instance continue, exchanging data as long as their average and histogram are not close enough.\n\n## Knn querying\n\nVeri internally has an internal key-value store, but it also queries its neighbours and merges the result. \nIt is very similar to map-reduce process done on the fly without planning.\n\nWhen a knn query is stated, veri creates a unique hash,\nStarts a timer,\nThen do a local knn search locally,\nThen calls its peers to do the same with a smaller timeout,\nMerges results into a map,\nWaits for timeout and then do a refine process on the result map,\nand return.\n\nif a search with the same id received, query is rejected to avoid infinite recursions. This behaviour will be replaced with cached results and checking timeout.\n\nEvery knn query has a timeout and timeout defines the precision of the result. User can trade the precision for time. In production users usually want a predictable response time. Since every Veri instance keeps a statistically identical in most classification case you will get the same result.\n\n## High Availability\n\nVeri replicates the data to its peers periodically and data is persisted to the disk for crahes.\n\nTODO:\n- Test multinode syncranization\n- Authentication.\n- Documentation.\n\n### Note:\nVeri uses [badger](https://github.com/dgraph-io/badger) internally. Many functions are made possible thanks to badger.\n\nContact me for any questions: berkgokden@gmail.com\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbgokden%2Fveri","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbgokden%2Fveri","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbgokden%2Fveri/lists"}