{"id":20757890,"url":"https://github.com/amdmi3/flops","last_synced_at":"2025-04-29T23:29:31.327Z","repository":{"id":32433800,"uuid":"36011799","full_name":"AMDmi3/flops","owner":"AMDmi3","description":"flops.c benchmark by Al Aburto with some improvements","archived":false,"fork":false,"pushed_at":"2020-09-08T21:47:50.000Z","size":14,"stargazers_count":6,"open_issues_count":0,"forks_count":17,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-03-30T12:22:17.496Z","etag":null,"topics":["benchmark"],"latest_commit_sha":null,"homepage":null,"language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"puzzlet/pytranscode","license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AMDmi3.png","metadata":{"files":{"readme":"README","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-05-21T12:35:46.000Z","updated_at":"2024-08-19T14:47:55.000Z","dependencies_parsed_at":"2022-07-08T12:17:02.764Z","dependency_job_id":null,"html_url":"https://github.com/AMDmi3/flops","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AMDmi3%2Fflops","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AMDmi3%2Fflops/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AMDmi3%2Fflops/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AMDmi3%2Fflops/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AMDmi3","download_url":"https://codeload.github.com/AMDmi3/flops/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251599280,"owners_count":21615498,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmark"],"created_at":"2024-11-17T09:46:13.582Z","updated_at":"2025-04-29T23:29:31.301Z","avatar_url":"https://github.com/AMDmi3.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"-------\n   I have finally revised the flops.c program to version 2.0 which\n   addresses the concerns brought out over the last year or so (version\n   1.2c and earliar versions). Below is a discussion of the new flops.c\n   program (flops20.c) and some results for the HP 9000/730 and IBM\n   RS/6000 Model 550 systems.\n\n   Flops.c is a 'c' program which attempts to estimate your systems\n   floating-point 'MFLOPS' rating for the FADD, FSUB, FMUL, and FDIV\n   operations based on specific 'instruction mixes' (discussed below).\n   The program provides an estimate of PEAK MFLOPS performance by making\n   maximal use of register variables with minimal interaction with main\n   memory. The execution loops are all small so that they will fit in\n   any cache. Flops.c can be used along with Linpack and the Livermore\n   kernels (which exercise memory much more extensively) to gain further\n   insight into the limits of system performance. The flops.c execution\n   modules include various percent weightings of FDIV's (from 0% to 25%\n   FDIV's) so that the range of performance can be obtained when using\n   FDIV's. FDIV's, being computationally more intensive than FADD's or\n   FMUL's, can impact performance considerably on some systems.\n   \n   Flops.c consists of 8 independent 'modules' which, except for module\n   2, conduct numerical integration of various functions. Some of the\n   functions (sin(x) and cos(x)) are approximated using a power series\n   expansion accurate to 1.0e-14 over the integration interval. Module 2,\n   estimates the value of pi based upon the Maclaurin series expansion of\n   atan(1). MFLOPS ratings are provided for each module, but the programs\n   overall results are summerized by the MFLOPS(1), MFLOPS(2), MFLOPS(3),\n   and MFLOPS(4) outputs.\n\n   The MFLOPS(1) result is identical to the result provided by all\n   previous versions of flops.c (flops12c.c and earliar versions). It is\n   based only upon the results from modules 2 and 3. Actually, on faster\n   machines, MFLOPS(1) from flops.c V2.0 is expected to provide more\n   accurate results since the number of iterations conducted (which is\n   reflected in the timing accuracy) is more tightly controlled than in\n   previous versions of flops.c.\n   \n   Two problems surfaced in using MFLOPS(1). First, it was difficult to\n   completely 'vectorize' the result due to the recurrence of the 's'\n   variable in module 2. This problem is addressed in the MFLOPS(2) result\n   which does not use module 2, but maintains nearly the same weighting of\n   FDIV's (9.2%) as in MFLOPS(1) (9.6%). For scalar machines the MFLOPS(2)\n   results 'should' be similar to the MFLOPS(1) results. However, for\n   vector machines the MFLOPS(1) and MFLOPS(2) results may differ\n   considerably since the MFLOPS(2) result is expected to be more\n   completely vectorizable. The second problem with MFLOPS(1) centers\n   around the percentage of FDIV's (9.6%) which was viewed as too high for\n   an important class of problems. This concern is addressed in the\n   MFLOPS(3) result which does only 3.4% FDIV's, and the MFLOPS(4) result\n   where NO FDIV's are conducted at all.\n   \n   The number of floating-point instructions per iteration (loop) is\n   given below for each module executed.\n\n   MODULE   FADD   FSUB   FMUL   FDIV   TOTAL  Comment\n     1        7      0      6      1      14   7.1%  FDIV's\n     2        3      2      1      1       7   difficult to vectorize.\n     3        6      2      9      0      17   0.0%  FDIV's\n     4        7      0      8      0      15   0.0%  FDIV's\n     5       13      0     15      1      29   3.4%  FDIV's\n     6       13      0     16      0      29   0.0%  FDIV's\n     7        3      3      3      3      12   25.0% FDIV's\n     8       13      0     17      0      30   0.0%  FDIV's\n   \n   A*2+3     21     12     14      5      52   A=5, MFLOPS(1), Same as\n\t    40.4%  23.1%  26.9%  9.6%          previous versions of the\n\t\t\t\t\t       flops.c program. Includes\n\t\t\t\t\t       only Modules 2 and 3.\n   \n   1+3+4     58     14     66     14     152   A=4, MFLOPS(2), New output\n   +5+6+    38.2%  9.2%   43.4%  9.2%          does not include Module 2,\n   A*7                                         but does 9.2% FDIV's.\n   \n   1+3+4     62      5     74      5     146   A=0, MFLOPS(3), New output\n   +5+6+    42.5%  3.4%   50.7%  3.4%          does not include Module 2,\n   7+8                                         but does 3.4% FDIV's.\n\n   3+4+6     39      2     50      0      91   A=0, MFLOPS(4), New output\n   +8       42.9%  2.2%   54.9%  0.0%          does not include Module 2,\n\t\t\t\t\t       and does NO FDIV's.\n\n   I hope that flops.c V2.0 (flops20.c) proves more useful than earliar\n   versions.\n\n\n(1) HP 9000/730 flops.c V2.0 Results, cc +OS +O3 -W1-a,archive   \n\n   Below are the HP 9000/730 results (provided by Bo Thide'). The minimum\n   MFLOPS rating is 15.1 MFLOPS for module 7, which does 25% FDIV's. The\n   maximum MFLOPS rating is 37.1 MFLOPS for module 6, which does 0.0%\n   FDIV's. FDIV appears to be reasonably efficient on the HP 9000/730,\n   as indicated by the overall MFLOPS(n) outputs. \n\n   The 'Runtime' output is the time in microseconds (usec) for one\n   iteration (loop) through the module. The MFLOPS rating is obtained by\n   dividing the number of floating-point instructions in the loop by the\n   Runtime (in microseconds). For example for module 1 below:\n   MFLOPS = 14.0 / 0.5978 = 23.42.\n\n   The Runtime output has already been adjusted for an estimate of the\n   time in microseconds to conduct one empty 'for' loop (NullTime). If\n   NullTime is not calculated (that is, NullTime = 0.0), due to compiler\n   optimization, it can produce a 3% to 5% lower MFLOPS rating than would\n   otherwise be obtained.\n\n\n   FLOPS C Program (Double Precision), V2.0 18 Dec 1992\n\n   Module     Error        RunTime      MFLOPS\n\t\t\t    (usec)\n     1     -4.6896e-13      0.5978     23.4187\n     2      2.2160e-13      0.2447     28.6079\n     3     -6.9944e-15      0.7412     22.9342\n     4     -9.7256e-14      0.6906     21.7195\n     5     -1.6542e-14      0.9200     31.5217\n     6      4.3632e-14      0.7822     37.0755\n     7     -4.9454e-11      0.7972     15.0529\n     8      7.2164e-14      0.8275     36.2538\n\n   Iterations      =   32000000\n   NullTime (usec) =     0.0306\n   MFLOPS(1)       =    26.4673  [same as flops12c.c, 9.6% FDIV's]\n   MFLOPS(2)       =    21.9633  [9.2% FDIV's]\n   MFLOPS(3)       =    27.2566  [3.4% FDIV's]\n   MFLOPS(4)       =    29.9188  [0.0% FDIV's]\n\n\n(2) IBM RS/6000 Model 550 flops.c V2.0 results, cc -DUNIX -O -Q\n\n   The IBM RS/6000 Model 550 flops20.c results are shown below. Here,\n   the minimum MFLOPS rating is 7.3 MFLOPS also for module 7 which does\n   25.0% FDIV's. The maximum MFLOPS rating is 56.9 MFLOPS (!) also for\n   module 6 which does 0.0% FDIV's. While the Model 550 works wonders\n   with FADD's and FMULS's its performance falls off rapidly with FDIV's.\n\n\n   FLOPS C Program (Double Precision), V2.0 18 Dec 1992\n\n   Module     Error        RunTime      MFLOPS\n\t\t\t    (usec)\n     1     -4.6896e-13      0.7028     19.9200\n     2      2.2160e-13      0.5806     12.0560\n     3     -7.0499e-15      0.4372     38.8849\n     4     -9.7145e-14      0.4359     34.4086\n     5     -1.6542e-14      0.9903     29.2837\n     6      4.3632e-14      0.5100     56.8627\n     7     -4.9454e-11      1.6456      7.2921\n     8      7.2164e-14      0.5572     53.8418\n\n   Iterations      =   32000000\n   NullTime (usec) =     0.0484\n   MFLOPS(1)       =    15.5674  [same as flops12c.c, 9.6% FDIV's]\n   MFLOPS(2)       =    15.7370  [9.2% FDIV's]\n   MFLOPS(3)       =    27.6568  [3.4% FDIV's]\n   MFLOPS(4)       =    46.8997  [0.0% FDIV's]\n\nAl Aburto\naburto@nosc.mil\n\n-------\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famdmi3%2Fflops","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Famdmi3%2Fflops","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famdmi3%2Fflops/lists"}