{"id":16938708,"url":"https://github.com/dsnet/tri-approx","last_synced_at":"2025-04-11T19:20:28.269Z","repository":{"id":49980616,"uuid":"41650582","full_name":"dsnet/tri-approx","owner":"dsnet","description":"Experiments in fixed-point approximation of trigonometric functions.","archived":false,"fork":false,"pushed_at":"2015-08-31T02:07:01.000Z","size":6820,"stargazers_count":9,"open_issues_count":1,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-25T15:12:17.668Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dsnet.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-08-31T02:02:54.000Z","updated_at":"2024-05-05T16:50:16.000Z","dependencies_parsed_at":"2022-08-01T01:38:56.507Z","dependency_job_id":null,"html_url":"https://github.com/dsnet/tri-approx","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dsnet%2Ftri-approx","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dsnet%2Ftri-approx/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dsnet%2Ftri-approx/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dsnet%2Ftri-approx/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dsnet","download_url":"https://codeload.github.com/dsnet/tri-approx/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248465348,"owners_count":21108244,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-13T21:02:13.710Z","updated_at":"2025-04-11T19:20:28.249Z","avatar_url":"https://github.com/dsnet.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Fixed-point Sine and Cosine Approximator #\n\n## Introduction ##\n\nMost embedded hardware (microcontrollers, FPGAs, etc) are too simple to come\nwith integrated floating-point units. However, sometimes there still arises a\nneed for complex trigonometric computations. This experimental code attempts to\nnumerically compute sine and cosine functions for use on FPGAs.\n\nThere are many ways to compute sine and cosine, and the most common algorithms\nare Taylor series and CORDIC. For this experiment, Taylor series was chosen\nover CORDIC because CORDIC uses a relatively large lookup table of constants\nto operate. This table occupied more memory than I was willing to sacrifice for\nmy application.\n\nSince this experiment was targeted towards Altera FPGAs, the bit-width of 18\nwill appear frequently in the example implementation. This is due to the fact\nthat Altera hardware multipliers have 18b wide inputs and have 36b wide outputs.\nAs such the sine and cosine functions will take in a 20b unsigned fixed-point\ninteger upscaled by 2²⁰ and output a 18b two's complement fixed-point integer\nupscaled by 2¹⁷. Notice that the domain of the input is [0,1) and the range of\nthe output is [-1,+1). This is because the sine and cosine functions implemented\nhere are normalized such that 0 maps to 0.0 and 2π maps to 1.0. In other words,\nthe functions being implemented are actually sine(2πx) and cosine(2πx).\n\n\n## Theory ##\n\nIn a fixed-point approximation, it makes sense to make use of the entire input\ndomain. For that reason, I chose to emulate the normalized functions of\nsine(2πx) and cosine(2πx). This way, the entire range of a 20-bit unsigned value\nperfectly covers the input domain to sine or cosine for a full period rotation.\nThe use of normalized functions not only makes better utilization of the input,\nbut also simplifies the fixed-point arithmetic.\n\nThe approximation method used is based on Taylor series, which is a\nrepresentation of a function through an infinite sum of terms. The formula for\na Taylor series that represents function *f(x)* centered around point *a* is\nthe following power series:\n\n![eqn-taylor-series](doc/eqn-taylor-series.png)\n\nIn specific, the type of Taylor series used is technically a Maclaurin series,\nsince the representation is centered at *a=0*. This means that the power series\nconverges fastest when *x* is closest to 0. In other words, in a power series of\na finite number of terms, the approximation will be most accurate where *x* is\nclosest to 0.\n\nUsing the formula for a Maclaurin series, the following power series\nrepresentations of sine and cosine were derived:\n\n![eqn-sine](doc/eqn-sine.png)\n\n![eqn-cosine](doc/eqn-cosine.png)\n\nSince this experiment is approximating trigonometric functions which have many\nproperties of symmetry and reflection, the power series is only used to estimate\nthe values of sine or cosine for the region of [0.0,0.25); in the standard\ntrigonometric functions, this is the region of [0°,90°). The other parts of\ntrigonometric functions are computed using the first region mirrored in various\nways to exploit the aforementioned properties. This allows the approximated\nfunctions to be significantly more accurate using fewer terms in the series.\n\nWith an 18-bit output, it was found that using only 4 non-trivial terms of the\npower series approximations was sufficient to achieve an average error that\nwas less than half the minimum representable range of the\nleast-significant-bit (LSB). Adding more terms would increase the amount of\nhardware resources needed to approximate the trigonometric functions with\ndiminishing returns on approximation accuracy.\n\nIn both the sine and cosine representations, the values multiplied to the\nvariable *x* could be pre-computed. The following equation shows computation of\nthe 8 constants needed for sine (odd indexes) and cosine (even indexes):\n\n![eqn-constants](doc/eqn-constants.png)\n\nUsing only 4 non-trivial terms and the pre-computed constants shown above, the\nequations to compute the approximate sine and cosine value is as follows:\n\n![eqn-trig-approx](doc/eqn-trig-approx.png)\n\n\n## Implementation ##\n\nIn order to reduce the amount of hardware resources needed, the *k* constants\ndefined above were pre-computed. The following Python code demonstrates how the\nconstants were computed and upscaled to the largest value that fit within an\n18-bit unsigned integer. Upscaling is done since all the arithmetic performed\nis based on fixed-point math. The constants actually used in the example\nimplementation are not exactly the ones generated by the mini-script since some\nconstants were manually tweaked for accuracy.\n\n```python\nfor i in range(1,9):\n    val = (2*math.pi)**i / math.factorial(i)\n    scale = int(math.log(2**18 / val, 2))\n    print \"k%d = %d / 2^%d\" % (i, int(round(val*2**scale)), scale)\n```\n\nSince the target platform was an FPGA, the operations involved with Taylor\nseries approximation allow for the designs to be easily pipelined.\nIt was assumed that the hardware multipliers held the highest latency and would\ntake a single cycle to execute, while all addition and subtraction operations\ntogether could complete in a single cycle.\n\n![pipeline-sine](doc/pipeline-sine_lite.png)\n\nPipelined design of sine. Sine requires more registers to hold state between\nstages and thus uses more hardware resources than cosine.\nFurthermore, analysis later will show that sine is also less accurate.\n\n![pipeline-cosine](doc/pipeline-cosine_lite.png)\n\nPipelined design of cosine. Note that the stage to compute *x⁸* could be\neliminated since it could be computed in parallel with *x⁶* by squaring *x⁴*.\nThis extra stage was kept so that the pipeline lengths would be identical for\nsine and cosine computations.\n\nIn both FPGA designs, the pipeline length is 6 stages. The shaded green regions\nrepresent logic needed to do reflections and corrections, while the shaded blue\nregions represent the logic actually needed to do the Taylor series expansions.\nAlso, all the constants shown are not upscaled according to their fixed-point\ncounterpoints. In general, the bit-widths of the data lines is 18b.\nHowever, the wiring to shift the bits is not shown.\n\nThe *HIGH0* and *HIGH1* operators at the start of the pipelines extract the\nmost-significant-bit and second most-significant-bit, respectively. The *STRIP*\noperator removes the top two bits. The *CLAMP* operator at the end of both\npipelines is performing the overflow check as shown in the C implementation.\n\n\n## Results ##\n\nUsing the values generated from the C implementation, we can plot the sine\nand cosine functions approximated by this method. In the graph below, it appears\nthat only a single sine and a single cosine is graphed. In actuality, the real,\nfloating-point approximate, and fixed-point approximate are graphed together.\nIt is clear from a graph of this resolution that their errors are negligible.\nThe floating-point approximate is the result obtained when the 4-term equations\nlisted above are computed using IEEE 754 floating-point units, while the\nfixed-point approximate is the result obtained using the 18b wide fixed-point\narithmetic in the C implementation.\n\n![chart-approx](doc/chart-approx.png)\n\nWith 18-bits used for the fixed-point representation, the LSB maps to a quantum\nof about 7.63E-6. If we could approximate sines and cosines perfectly, we would\nexpect an maximum errors to be no worse than half the quantum. We will define\nthis value of 3.81E-6 as the error epsilon, *ε*.\n\nHowever, since we are only using a 4-term approximation, we obviously cannot get\nmaximum errors below *ε*. For this reason, some of the *k* constants in the later\nterms were manually adjusted to compensate for the lack of infinite terms at the\nend. Manually tweaking of the constants did not follow any sort of rigorous\nmethod and was mainly done till the average and maximum errors \"seemed better\".\nIn tweaking the constants, there were two goals: improve the average error and\nimprove the maximum error. With sine and cosine, these two goals were at odds\nwith each other. In improving the average error, the maximum error would get\nworse, or vice-versa. Choosing a good compromise took human intuition.\n\nThe graph below shows the errors in approximation using the tweaked constants\nthat we settled upon. The two horizontal green lines represent the values *+ε*\nand *-ε*. The red and blue lines that are smooth and solid represent the error\nwhen using a floating-point approximation. Note that the maximum of these errors\nalmost lie entirely within *±ε*. Lastly, the red and blue shaded regions\nrepresent the error when using the fixed-point approximation. These errors\ngenerally follow the trend of the floating-point error, but are worse due to\ntruncation errors.\n\n![chart-error](doc/chart-error.png)\n\nUsing a short C program to exhaustively compute all values in the input domain,\nwe could compute the average and maximum errors of our fixed-point\napproximations. The sine approximation had an average error of (3.08±2.22)E-6\nand a maximum error of 13.8E-6. The cosine approximation had an average error\nof (2.82±2.05)E-6 and a maximum error of 12.6E-6.\n\nGiven that the cosine design takes less hardware resources and is actually more\naccurate, it may make sense to implement cosine over sine. Sine itself can be\nobtained from cosine by shifting the input by 90°.\n\n\n## References ##\n\n* [FPGA Multiplier](http://www.altera.com/literature/hb/cyc2/cyc2_cii51012.pdf) - Embedded multipliers on Cyclone II devices\n* [Taylor Series](http://en.wikipedia.org/wiki/Taylor_series) - Wikipedia article on Taylor series\n* [CORDIC](http://en.wikipedia.org/wiki/CORDIC) - Wikipedia article on CORDIC\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdsnet%2Ftri-approx","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdsnet%2Ftri-approx","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdsnet%2Ftri-approx/lists"}