{"id":13702534,"url":"https://github.com/Spreads/Spreads","last_synced_at":"2025-05-05T04:31:08.728Z","repository":{"id":37768283,"uuid":"48335526","full_name":"Spreads/Spreads","owner":"Spreads","description":"Series and Panels for Real-time and Exploratory Analysis of Data Streams","archived":false,"fork":false,"pushed_at":"2023-04-16T20:28:15.000Z","size":18879,"stargazers_count":432,"open_issues_count":12,"forks_count":39,"subscribers_count":39,"default_branch":"main","last_synced_at":"2025-04-10T14:17:20.384Z","etag":null,"topics":["cep","data-stream","real-time","series-manipulation","time-series"],"latest_commit_sha":null,"homepage":"http://docs.dataspreads.io/spreads/","language":"C#","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Spreads.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.Dependencies.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2015-12-20T20:10:03.000Z","updated_at":"2025-03-19T10:53:00.000Z","dependencies_parsed_at":"2023-01-24T19:16:14.714Z","dependency_job_id":"6776a7b8-6ad5-4f8a-a974-6f7e2621ac4f","html_url":"https://github.com/Spreads/Spreads","commit_stats":null,"previous_names":[],"tags_count":19,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Spreads%2FSpreads","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Spreads%2FSpreads/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Spreads%2FSpreads/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Spreads%2FSpreads/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Spreads","download_url":"https://codeload.github.com/Spreads/Spreads/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252439538,"owners_count":21748025,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cep","data-stream","real-time","series-manipulation","time-series"],"created_at":"2024-08-02T21:00:37.349Z","updated_at":"2025-05-05T04:31:07.444Z","avatar_url":"https://github.com/Spreads.png","language":"C#","readme":"# Spreads.Core\n\nSpreads.Core contains several high-performance features: buffer pools, optimized binary/interpolation search, collections, threading utils, etc.\n\n-----------------------------------\n**The \"series and panels\" part of the library is under very slow rewrite**\n\n**Below is a very old readme**\n\n# Spreads\n\u003cimg src=\"https://raw.githubusercontent.com/Spreads/Spreads.Docs/master/img/ZipN.png\" alt=\"Spreads\" width=\"200\" align=\"right\" /\u003e\n\nThe name **Spreads** stands for **S**eries and **P**anels for **R**eal-time and **E**xploratory **A**nalysis of\n**D**ata **S**treams.\n\n+ **Data Streams** are unbounded sequences of data items, either recorded or\narriving in real-time;\n+ **Series** are navigable ordered data streams of key-value pairs;\n+ **Panels** are series of series or data frames;\n+ **Exploratory** data transformation in C#/F# REPLs;\n+ **Real-time** fast incremental calculations.\n\nSpreads is an ultra-fast library for [complex event processing](https://en.wikipedia.org/wiki/Complex_event_processing)\n and time series manipulation.\nIt could process tens of millions items per second per thread - historical and real-time data in the\nsame fashion, which allows to build and test analytical systems on historical data and use\nthe same code for processing real-time data.\n\nSpreads is a [library, not a framework](http://tomasp.net/blog/2015/library-frameworks/), and could\nbe plugged into existing code bases and used immediately.\nEven though the primary domain is financial data, Spreads was designed as a generic complex event processing library,\nwith a performance requirement that it must be suitable for ticks and full order log processing.\nThis is probably the largest data stream that cannot be meaningfully sharded: financial instruments\nare all directly or indirectly correlated and we need to monitor markets as a whole while\nGoogle/Facebook and similar user event streams could be processed independently.\n\n\n\n## Performance\n\nSpreads library is optimized for performance and memory usage.\nIt is several times faster than other open source [projects](https://github.com/BlueMountainCapital/Deedle),\ndoes not allocate memory for intermediate calculations or windows,\nand provides real-time incremental calculations with low-latency lock-free synchronization\nbetween data producers and consumers. You could run tests and [benchmarks](https://github.com/Spreads/Spreads/blob/master/tests/Spreads.Tests/Benchmarks.fs)\nto see the exact numbers.\n\nFor regular keys - keys that have equal difference between them (e.g. seconds) - Spreads stores\nonly the first key and the step size, reducing memory usage for `\u003cDateTime,T\u003e` data item by\n8 bytes. So `\u003cDateTime,double\u003e` data item takes only 8 bytes inside Spreads series instead of 16.\nThe gains of this optimization are not obvious on microbenchmarks with a single\nseries, and one could argue that memory is cheap. However, L1/L2/L3 caches\nare still small, and saving 50% of memory allows to place two times\nmore useful data in the caches and to avoid needless cache trashing.\n\nSpreads library is written in C# and F# and targets .NET 4.5.1 and .NET Standard 1.6 versions.\n.NET gives native performance when optimized for memory access patterns, which means\n no functional data structures and minimum allocations.\nEven though .NET is a managed platform with garbage collection, in a steady state Spreads\nshould not allocate many objects and create GC pressure.\n.NET properly supports generic value types and arrays of them are laid out\ncontiguously in memory. Such layout enables CPUs to prefetch data efficiently,\nresulting in great performance boost compared to collections of boxed objects. Also .NET makes\nit trivial to call native methods and *Spreads.Core* project\nuses SIMD-optimized compression and math libraries written in C.\n\nWe haven't compared Spreads performance to performance of commercial systems yet\n(because their costs are atrocious and learning cryptic languages is not necessary).\nHowever, the main benchmark while developing Spreads was modern CPUs capabilities,\nnot any existing product. We tried to achieve mechanical sympathy, to avoid any wasteful\noperations and to get the most from modern processors. Therefore, unless the fastest commercial\nproducts use magic or quantum computers, Spreads must be in the same bracket.\n\n\n## Series manipulation and join\n\n### Continuous and discrete series\n\nSeries could be continuous or discrete. Continuous series have values at any key,\neven between observed keys. For example, linear interpolation or cubic splines are continuous series\ndefined from observed points. Another example is \"last price\", which is defined for any key as observed\n price at or before the key.\n\n\n\u003cimg src=\"https://raw.githubusercontent.com/Spreads/Spreads.Docs/master/img/Continuous_Series.png\" alt=\"Continuous series\" width=\"500\" /\u003e\n\nDiscrete series have values only at observations/events, e.g. trade volume\nis meaningful only at observed trades, there is no implied latent volumes between trades. We could\ncreate a derived continuous series, e.g. `let liquidity = volume.SMA(N).Repeat()`, but this\nseries changes meaning from a real observed volume to an abstract analytical indicator of average\nliquidity over the last N observations.\n\n\n\u003cimg src=\"https://raw.githubusercontent.com/Spreads/Spreads.Docs/master/img/Discrete_Series.png\" alt=\"Discrete Series\" width=\"500\" /\u003e\n\nOn pictures, a solid line means continuous series, dotted line means discrete series, solid blue dot\nmeans an observation, a white dot with blue outline means a calculated value of a continuous series\nat a key between observations.\n\n### Declarative lazy calculations\n\nOne of the core feature of Spreads library is declarative lazy series manipulation.\nA calculation on series is not performed until results are pulled from Series. For example,\nexpression `let incremented = series + 1.0` is not evaluated until `incremented` series\nis used. Instead, it returns a calculation definition that could be\nevaluated on demand.\n\n#### Missing values replacement\n\nMissing values are really missing in Spreads, not represented as a special NA or option value.\nWhen missing values are present as special values, one need to spend memory and CPU cycles to\nprocess them (and a lot of brain cycles to comprehend why missing values are somehow present, and not\nmissing).\n\nOne of the most frequently used series transformations are `Repeat` and `Fill`. Calling them\non a discrete series returns a continuous series, where for each non-existing key we could get\na value from the key at or before requested key for `Repeat` or a given value for `Fill`:\n\n    let repeated = sparseSeries.Repeat()\n    let filled = sparseSeries.Fill(0.0)\n\nThe returned series contains infinite number of values defined for any key, but the values from\nnon-observed keys are calculated on demand and do not take any space.\n\n\n### ZipN\n\nZipN functionality is probably the most important part in Spreads.Core\nand it is shown on Spreads logo.\nZipN supports declarative lazy joining of N series and in many\ncases replaces Frames/Panels functionality and adds\nreal-time incremental calculations over N joined series.\n\n\n\u003cimg src=\"https://raw.githubusercontent.com/Spreads/Spreads.Docs/master/img/ZipN.png\" alt=\"ZipN\" width=\"200\"  /\u003e\n\nAll binary arithmetic operations are implemented via ZipN cursor with N=2.\nZipN alway produces inner join, but it is very easy to implement any complex\nouter join by transforming an input series from a discrete to a continuous one.\n\nFor example, imagine we have two discrete series (in pseudocode) `let upper = [2=\u003e2; 4=\u003e4]`\nand `let lower = [1=\u003e10; 3=\u003e30; 5=\u003e50]` that correspond to the picture. If we add them via `+` operator,\nwe will get an empty series because there are no matching keys and inner join returns an empty set.\nBut if we repeat the upper series, we will get two items, because the\nrepeated upper series is defined at any key:\n\n    let sum = upper.Repeat() + lower // [3=\u003e2+30=32; 5=\u003e4+50=54]\n\nIf we then fill the lower series with 42, we will get:\n\n    let sum = upper.Repeat() + lower.Fill(42.0) // [2=\u003e2+42=44; 3=\u003e2+30=32; 4=\u003e4+42=46; 5=\u003e4+50=54]\n\nFor N series logic remains the same. If we want to calculate a simple price index like DJIA\nfor each tick of underlying stocks, we could take 30 tick series, repeat them (because ticks are irregular), apply `ZipN`\nand calculate average of prices at any point:\n\n    let index30 : Series\u003cDateTime,double\u003e =\n        arrayOfDiscreteSeries\n        .Map(fun ds -\u003e ds.Repeat())\n        .ZipN(fun (k:'DateTime) (vArr:'double[]) -\u003e vArr.Average())\n\nThe values array `vArr` is not copied and the lambda must not return anything that has a\nreference to the array. If the arrays of zipped values are needed for further use outside\nzip method, one must copy the array inside the lambda. However, this is rarely needed,\nbecause we could zip outputs of zips and process the arrays inside lambda without allocating\nmemory. For example, if we have series of returns and weights from applying Zip as before,\nthese series are not evaluated until values are requested, and when we zip them to calculate\nSumProduct, we will only allocate two arrays of values and one array or arrays (pseudocode):\n\n    let returns = arrayOfPrices\n        .Map(fun p -\u003e p.Repeat())\n        .ZipN(fun k (vArr:double[]) -\u003e vArr)\n        .ZipLag(1,(fun (cur:double[]) (prev:double[]) -\u003e cur.Zip(prev, (fun c p -\u003e c/p - 1.0)))) // the last zip is on arrays, must be eager\n    let weights = arrayOfWeights\n        .Map(fun p -\u003e p.Repeat())\n        .ZipN(fun k vArr -\u003e vArr)\n    let indexReturn =\n        returns.ZipN(weights.Repeat(), (fun k (ret:double[]) (ws:double[]) -\u003e SumProduct(ret, ws))\n\nHere we violate the rule of not returning vArr, because it will be used inside lambda of\nZipLag, which applies lambda to current and lagged values and does not returns references to\nthem. But for this to be true, Zip of arrays must be eager and we will have to allocate\nan array to store the result. We could change the example to avoid intermediate allocations:\n\n    let returns = arrayOfPrices\n        .Map(fun p -\u003e p.Repeat())\n        .ZipN(fun k (vArr:double[]) -\u003e vArr)\n        .ZipLag(1,(fun (cur:double[]) (prev:double[]) -\u003e ValueTuple(cur,prev)))\n    let weights = arrayOfWeights\n        .Map(fun p -\u003e p.Repeat())\n        .ZipN(fun k vArr -\u003e vArr)\n    let indexReturn =\n        returns.ZipN(\n            weights.Repeat(),\n            (fun k (ret:ValueTuple\u003cdouble[],double[]\u003e) (ws:double[]) -\u003e\n                    let currentPrices : double[] = ret.Item1\n                    let previousPrices: double[] = ret.Item2\n                    let currentWeights: double[] = ws\n                // imperative for loop to walk over three arrays\n                // and calculate returns and sumproduct with weight\n                // we need a single value and could get it in many\n                // ways without copying the arrays\n        )\n\nIn the last ZipN lambda we have three arrays of current and previous prices and current weights.\nWe could calculate weighted return with them and return a single value. For each key, these arrays\nare refilled with new values and the last lambda is reapplied to updated arrays.\n\nWhen all series are continuous, we get full outer join and the resulting series will have\na union of all keys from input series, with values defined by continuous series constructor.\nOther than repeat/fill it could be linear or spline interpolation, a forecast from\nmoving regression or any other complex logic that is hidden inside an input continuous\nseries. For outside world, such a continuous series becomes defined at every point, inner\njoin assumes that every key exists and zipping works as expected just as if we had precalculated\nevery point. But this works without allocating memory and also works in real-time for streaming\ndata.\n\n\n## Install\n\n    PM\u003e Install-Package Spreads\n\n\n## Contributing\n\nPRs \u0026 issues are welcome!\n\nThis Source Code Form is subject to the terms of the Mozilla Public\nLicense, v. 2.0. If a copy of the MPL was not distributed with this\nfile, You can obtain one at http://mozilla.org/MPL/2.0/.\n\n(c) Victor Baybekov, 2014-2017\n\n\n## Status and version\nCurrent status is alpha and we are actively working on [1.0-beta release](https://github.com/Spreads/Spreads/milestone/1). We will use [semantic versioning](http://semver.org/) after 1.0 release.\n\n## Links\n\n+ Twitter [@DataSpreads](https://twitter.com/DataSpreads)\n+ [Introducing Spreads library](http://hotforknowledge.com/2015/12/20/introducing-spreads-library/) about why and how Spreads library was born.\n+ [How to write the simplest trading strategy using Spreads](http://hotforknowledge.com/2015/12/29/how-to-write-the-simplest-trading-strategy-using-spreads/).\n+ [Technical introduction with pictures: updated slides from Feb'16 London F# Meetup.](https://github.com/Spreads/Spreads.Docs/blob/master/docs/20160603_Spreads_technical_introduction.pdf)\n","funding_links":[],"categories":["High Performance Libraries"],"sub_categories":["Application Insights"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSpreads%2FSpreads","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FSpreads%2FSpreads","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSpreads%2FSpreads/lists"}