{"id":13473526,"url":"https://github.com/Sandia-OpenSHMEM/SOS","last_synced_at":"2025-03-26T19:34:20.184Z","repository":{"id":38815507,"uuid":"42528680","full_name":"Sandia-OpenSHMEM/SOS","owner":"Sandia-OpenSHMEM","description":"Sandia OpenSHMEM is an implementation of the OpenSHMEM specification over multiple Networking APIs, including Portals 4, the Open Fabric Interface (OFI), and UCX.  Please click on the Wiki tab for help with building and using SOS.","archived":false,"fork":false,"pushed_at":"2025-03-18T17:16:09.000Z","size":5693,"stargazers_count":67,"open_issues_count":108,"forks_count":56,"subscribers_count":22,"default_branch":"main","last_synced_at":"2025-03-24T16:42:45.803Z","etag":null,"topics":["hpc","middleware","openshmem","parallel-computing","pgas"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Sandia-OpenSHMEM.png","metadata":{"files":{"readme":"README","changelog":"NEWS","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2015-09-15T15:34:37.000Z","updated_at":"2025-03-24T09:35:19.000Z","dependencies_parsed_at":"2024-04-16T15:38:06.056Z","dependency_job_id":"dcd23d3a-4ca7-4679-a5dc-27d274b4757e","html_url":"https://github.com/Sandia-OpenSHMEM/SOS","commit_stats":{"total_commits":2507,"total_committers":48,"mean_commits":"52.229166666666664","dds":0.5823693657758278,"last_synced_commit":"99a1fd3b2a777f9ad9b3a5cb37edf9989d9cbeed"},"previous_names":[],"tags_count":36,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Sandia-OpenSHMEM%2FSOS","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Sandia-OpenSHMEM%2FSOS/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Sandia-OpenSHMEM%2FSOS/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Sandia-OpenSHMEM%2FSOS/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Sandia-OpenSHMEM","download_url":"https://codeload.github.com/Sandia-OpenSHMEM/SOS/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245722847,"owners_count":20661835,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["hpc","middleware","openshmem","parallel-computing","pgas"],"created_at":"2024-07-31T16:01:04.441Z","updated_at":"2025-03-26T19:34:16.527Z","avatar_url":"https://github.com/Sandia-OpenSHMEM.png","language":"C","funding_links":[],"categories":["C"],"sub_categories":[],"readme":"Sandia OpenSHMEM\n----------------\n\n* About\n\nSandia OpenSHMEM is an implementation of the OpenSHMEM specification over\nPortals 4.0, the Open Fabrics Interface (OFI), and XPMEM.\n\nPlease refer to the \"tests-sos\" repository (https://github.com/openshmem-org/tests-sos)\nto download only the unit tests and the performance test suite that are\nincluded with Sandia OpenSHMEM.\n\n* Building\n\nThe Sandia OpenSHMEM implementation utilizes the GNU Autoconf/Automake/Libtool\ntools to generate a configure script.  If the `configure` file is not present\n(e.g. after downloading the repository for the first time), generate it\nby running:\n\n  $ ./autogen.sh\n\nOnce the configure file exists, run:\n\n  $ ./configure \u003coptions\u003e\n  $ make\n  $ make check\n  $ make install\n\nThe \"make check\" step is not strictly necessary, but is a good idea.  Make\ncheck utilizes the TEST_RUNNER and NPROCS make variables, which can be used to\noverride defaults, e.g. \"make check NPROCS=4\" or \"make check\nTEST_RUNNER='mpiexec -n 2 -ppn 1 -hosts compute1,compute2'\".\n\nSandia OpenSHMEM must be configured to use either the Portals 4 or OFI network\ntransport, but not both.  It can optionally be configured to use XPMEM or CMA\nto optimize communication between PEs within the same shared memory domain.\n\nOptions to configure include:\n\n  --prefix=\u003cDIR\u003e          Install implementation in \u003cDIR\u003e, default: /usr/local\n  --with-portals4=\u003cDIR\u003e   Find the Portals 4 library in \u003cDIR\u003e\n  --with-ofi=\u003cDIR\u003e        Find the libfabric library in \u003cDIR\u003e\n  --with-xpmem=\u003cDIR\u003e      Find the XPMEM library in \u003cDIR\u003e\n  --with-cma              Use cross-memory attach for on-node communication\n  --with-pmi=DIR          Location of PMI installation.  Configure will \n                          automatically look for the PMI runtime provided by\n                          the Portals 4 reference implementation\n  --enable-pmi-simple     Include support for interfacing with a PMI 1.0\n                          launcher.  The launcher must be provided by a\n                          separate package, such as MPICH, Hydra, or SLURM.\n  --enable-error-checking Enable error checking in SHMEM calls.  This will\n                          increase the overhead of communication operations.\n  --enable-hard-polling   When using only the network transport, the\n                          implementation will use counting events to\n                          block the implementation when waiting for \n                          local memory changes.  On some implementations,\n                          enabling hard polling may increase target side\n                          message rate.\n  --enable-remote-virtual-addressing\n                          Enable optimizations assuming the symmetric heap is\n                          always symmetric with regards to virtual address.\n                          This may cause applications to abort during\n                          shmem_init() if such a symmetric heap can not be\n                          created, but will reduce the instruction count for\n                          some operations. This optimization also requires\n                          that the Portals 4 implementation support\n                          BIND_INACCESSIBLE on LEs.  This optimization will\n                          reduce the overhead of communication calls.\n  --disable-fortran       Disable the Fortran bindings.  This may be useful\n                          if the machine has a Fortran compiler which does\n                          not support ISO_C_BINDING.\n  --enable-nonblocking-fence\n                          By default, shmem_fence() is equivalent to\n                          shmem_quiet(), which can be a lengthy\n                          operation.  Enabling this feature results in\n                          the ordering point being moved from the\n                          shmem_fence() to the next put-like call,\n                          which can help improve overlap in some\n                          cases.\n  --enable-total-data-ordering=\u003cyes|no|check\u003e\n                          If a network supports total data ordering\n                          (that is, ordering guarantees to two\n                          different addresses on the same target\n                          node), this option can remove the\n                          shmem_quiet() from shmem_fence() calls when\n                          sending short messages.  The option does,\n                          however, force ordering requirements on the\n                          network, so experimentation may be necessary\n                          to determine the best configuration.  Yes\n                          means always assume total data ordering is\n                          available and abort a job if that's not the\n                          case.  No means never use total data\n                          ordering optimizations.  Check will result\n                          in slightly higher overhead than \"yes\", but\n                          will provide a fallback if the network\n                          doesn't provide total data ordering.\n\n\nThere are many other options to configure to influence performance and\nbehavior.  See 'configure --help' for documentation on available\noptions.\n\n* SHMEM Runtime Support\n\n  Environment variables:\n\n    SHMEM_VERSION: if defined, print SHMEM version during start_pes().\n\n    SHMEM_INFO: if defined, print (stdout) SHMEM environment variables.\n\n    SHMEM_SYMMETRIC_SIZE (default: 64 MiB)\n        The allocated size of the symmetric heap which shmalloc() and shfree()\n        operates on. The size value can be scaled with a suffix of\n            'K' for kilobytes (B * 1024),\n            'M' for Megabytes (KiB * 1024)\n            'G' for Gigabytes (MiB * 1024)\n\n    SHMEM_BOUNCE_SIZE (default: 2 KiB)\n        The maximum size of a bounce buffer for put messages.\n        Messages greater than the immediate send value for the\n        underlying network but greater than this threshold will be\n        copied into a bounce buffer and then sent.\n\n    SHMEM_MAX_BOUNCE_BUFFERS (default: 128)\n        The maximum number of bounce buffers that can be created per context.\n\n    SHMEM_COLL_CROSSOVER (default: 4)\n        For num_pes \u003c SHMEM_COLL_CROSSOVER, collective algorithms are\n        serial instead of tree based.\n\n    SHMEM_COLL_SIZE_CROSSOVER (default: 16kiB)\n        For size \u003c SHMEM_COLL_SIZE_CROSSOVER, collective algorithms are\n        optimized for latency, rather than bandwidth.\n\n    SHMEM_COLL_RADIX (default: 4)\n        Controls the width of the n-ary tree for collectives, such that each\n        node will fanout-send to a max of approximately SHMEM_COLL_RADIX\n\n    SHMEM_SYMMETRIC_HEAP_USE_MALLOC (default: 0)\n        If set to a non-zero integer, will use malloc() instead of\n        mmap() to allocate the symmetric heap.  This option may result in\n        incorrect behavior when remote virtual addressing is enabled.\n\n    SHMEM_BARRIER_ALGORITHM (default: auto)\n        Algorithm to use for barriers.  Default is to auto-select (which\n        may result in different algorithms being used for different \n        PE sets).  Options are: auto, linear, tree, dissem.\n\n    SHMEM_BCAST_ALGORITHM (default: auto)\n        Algorithm to use for broadcasts.  Default is to auto-select (which\n        may result in different algorithms being used for different \n        PE sets).  Options are: auto, linear, tree.\n\n    SHMEM_REDUCE_ALGORITHM (default: auto)\n        Algorithm to use for reductions.  Default is to auto-select (which\n        may result in different algorithms being used for different \n        PE sets).  Options are: auto, linear, tree, recdbl, ring.\n\n    SHMEM_COLLECT_ALGORITHM (default: auto)\n        Algorithm to use for allgathers.  Default is to auto-select (which\n        may result in different algorithms being used for different \n        PE sets).  Options are: auto, linear.\n\n    SHMEM_FCOLLECT_ALGORITHM (default: auto)\n        Algorithm to use for allgathers with fixed contribution amounts.\n        Default is to auto-select (which may result in different \n        algorithms being used for different PE sets).  \n        Options are: auto, linear, ring, recdbl.  Note that recursive\n        doubling (recdbl) will fall back to ring if the PE set is not a\n        power of two in size.\n\n    SHMEM_BARRIERS_FLUSH (default: off)\n        If defined, standard output (stdout) and error (stderr) streams \n        will be flushed at the beginning of each barrier operation.\n\n    SHMEM_CMA_PUT_MAX (default: 8192)\n        '--with-cma', shmem put lengths \u003c= CMA_PUT_MAX use process_vm_writev();\n        otherwise use Portals4 transport put.\n\n    SHMEM_CMA_GET_MAX (default: 16384)\n        '--with-cma', shmem get lengths \u003c= CMA_GET_MAX use process_vm_readv();\n        otherwise use Portals4 transport get.\n\n    SHMEM_SYMMETRIC_HEAP_USE_HUGE_PAGES (default: off)\n        If defined, large pages will be used to back the symmetric heap.  This\n        feature is only available on Linux.\n\n    SHMEM_SYMMETRIC_HEAP_PAGE_SIZE (default: 2MB)\n        Used to specify a large page size when using large pages to back the\n        symmetric heap.  Ignored if SHMEM_SYMMETRIC_HEAP_USE_HUGE_PAGES is not\n        set.  Refer to SHMEM_SYMMETRIC_SIZE for input syntax.\n\n    SHMEM_DISABLE_ASLR_CHECK (default: on)\n        Disable runtime checks for address space layout randomization (ASLR).\n\n  OFI Transport Environment variables:\n\n    SHMEM_OFI_PROVIDER (default: auto)\n        The name of the provider that should be used by the OFI transport.\n        Shell-style wildcards, including * and ?, are allowed.  The fi_info\n        utility included with libfabric can be used for assistance with\n        identifying the desired provider.\n\n    SHMEM_OFI_FABRIC (default: auto)\n        The name of the fabric that should be used by the OFI transport.\n        Shell-style wildcards, including * and ?, are allowed.  The fi_info\n        utility included with libfabric can be used for assistance with\n        identifying the desired fabric.\n\n    SHMEM_OFI_DOMAIN (default: auto)\n        The name of the fabric domain that should be used by the OFI transport.\n        Shell-style wildcards, including * and ?, are allowed.  The fi_info\n        utility included with libfabric can be used for assistance with\n        identifying the desired fabric domain.\n\n    SHMEM_OFI_ATOMIC_CHECKS_WARN (default: off)\n        If defined, OFI will not abort if fabric provider doesn't support every\n        data type x op combination, instead it will print a warning.\n\n    SHMEM_OFI_TX_POLL_LIMIT (default: 0)\n        Sets the maximum number of iterations for the transmit polling loop\n        (for put/quiet operations).  Setting this to -1 enables continuous\n        completion polling (i.e. there is no polling limit).  The default\n        behavior is to call fi_cntr_wait without polling.\n\n    SHMEM_OFI_RX_POLL_LIMIT (default: 0)\n        Sets the maximum number of iterations for the receive polling loop (for\n        get/wait operations).  Setting this to -1 enables continuous completion\n        polling (i.e. there is no polling limit).  The default behavior is to\n        call fi_cntr_wait without polling.\n\n    SHMEM_OFI_STX_MAX (default: 1)\n        Sets the maximum number of sharable transmit contexts (STXs) per PE.\n        STXs are the underlying transmit resources that are allocated to\n        OpenSHMEM contexts and they are allocated using the algorithm specified\n        by the SHMEM_OFI_STX_ALLOCATOR parameter.\n\n    SHMEM_OFI_STX_ALLOCATOR (default: round-robin)\n        Algorithm for allocating STX resources to OpenSHMEM contexts.  In\n        particular, the algorithm determines how resources are shared by\n        contexts once all STXs have been allocated.  Options are: round-robin,\n        random.\n\n    SHMEM_OFI_STX_THRESHOLD (default: 1)\n        Number of contexts that must be allocated to all shared STXs before\n        another shared STX can be allocated.  This threshold can be increased\n        to reduce the number of shared STXs and increase the number of STXs\n        available for private use (i.e., with contexts that enable the\n        SHMEM_CTX_PRIVATE option).\n\n    SHMEM_OFI_STX_DISABLE_PRIVATE (default: off)\n        Disable STX privatization. Enabling this may improve load balance\n        across transmit resources, especially in scenarios where the number of\n        contexts exceeds the number of STXs.\n\n    SHMEM_OFI_STX_AUTO (default: off)\n        Automatically determine an appropriate value for the number of STXs per\n        compute node, and evenly partition them across PEs on the same node. A\n        compute node is determined by its unique hostname, and the number of\n        STXs available on a compute node is provided by the libfabric library.\n\n    SHMEM_OFI_DISABLE_MULTIRAIL (default: off)\n        Disable multirail functionality. Enabling this will restrict all\n        communications to occur over a single NIC per system.\n\n  Team Environment variables:\n\n    SHMEM_TEAMS_MAX (default: 10)\n        Sets the maximum number of available teams per PE, including the\n        predefined teams.  The maximum supported value is 64.  The value must\n        be the same across all PEs in SHMEM_TEAM_WORLD.\n\n    SHMEM_TEAM_SHARED_ONLY_SELF (default: off)\n        If defined, the predefined team, SHMEM_TEAM_SHARED, will only include\n        the self PE.\n\n  Debugging Environment variables:\n\n    SHMEM_DEBUG (default: off)\n        If defined enables debugging messages from OpenSHMEM runtime. \n\n    SHMEM_TRAP_ON_ABORT (default: off)\n        If defined, generate a trap when aborting an OpenSHMEM program.  This\n        can be used to interface with a debugger or generate core files.\n\n    SHMEM_BACKTRACE (default: \u003cempty\u003e)\n        Can be used to choose the backtracing mechanism. Default value is NULL \n        for which no backtrace information is provided upon failure. User can set \n        this with any one of these available options: execinfo, gdb, auto. \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSandia-OpenSHMEM%2FSOS","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FSandia-OpenSHMEM%2FSOS","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSandia-OpenSHMEM%2FSOS/lists"}