{"id":13585190,"url":"https://github.com/corelight/community-id-spec","last_synced_at":"2026-01-16T19:20:51.526Z","repository":{"id":28979951,"uuid":"119905964","full_name":"corelight/community-id-spec","owner":"corelight","description":"An open standard for hashing network flows into identifiers, a.k.a \"Community IDs\".","archived":false,"fork":false,"pushed_at":"2024-09-23T19:46:37.000Z","size":99,"stargazers_count":177,"open_issues_count":12,"forks_count":25,"subscribers_count":22,"default_branch":"master","last_synced_at":"2025-04-07T06:35:36.978Z","etag":null,"topics":["community-id","flow-hashing","network-flow","network-monitoring","network-security","network-security-monitoring"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/corelight.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"COPYING","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-02-01T23:24:31.000Z","updated_at":"2025-04-02T15:10:55.000Z","dependencies_parsed_at":"2024-11-06T03:03:32.205Z","dependency_job_id":"fe254499-9469-4f07-9c3c-982bbbe7bf7e","html_url":"https://github.com/corelight/community-id-spec","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/corelight/community-id-spec","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/corelight%2Fcommunity-id-spec","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/corelight%2Fcommunity-id-spec/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/corelight%2Fcommunity-id-spec/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/corelight%2Fcommunity-id-spec/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/corelight","download_url":"https://codeload.github.com/corelight/community-id-spec/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/corelight%2Fcommunity-id-spec/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28481675,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-16T11:59:17.896Z","status":"ssl_error","status_checked_at":"2026-01-16T11:55:55.838Z","response_time":107,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["community-id","flow-hashing","network-flow","network-monitoring","network-security","network-security-monitoring"],"created_at":"2024-08-01T15:04:47.618Z","updated_at":"2026-01-16T19:20:51.501Z","avatar_url":"https://github.com/corelight.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"Community ID Flow Hashing\n=========================\n\nWhen processing flow data from a variety of monitoring applications\n(such as Zeek and Suricata), it's often desirable to pivot quickly\nfrom one dataset to another. While the required flow tuple information\nis usually present in the datasets, the details of such \"joins\" can\nbe tedious, particular in corner cases. This spec describes \"Community\nID\" flow hashing, standardizing the production of a string identifier\nrepresenting a given network flow, to reduce the pivot to a simple\nstring comparison.\n\nPseudo code\n-----------\n\n    function community_id_v1(ipaddr saddr, ipaddr daddr, port sport, port dport, int proto, int seed=0)\n    {\n        # Get seed and all tuple parts into network byte order\n        seed = pack_to_nbo(seed); # 2 bytes\n        saddr = pack_to_nbo(saddr); # 4 or 16 bytes\n        daddr = pack_to_nbo(daddr); # 4 or 16 bytes\n        sport = pack_to_nbo(sport); # 2 bytes\n        dport = pack_to_nbo(dport); # 2 bytes\n\n        # Abstract away directionality: flip the endpoints as needed\n        # so the smaller IP:port tuple comes first.\n        saddr, daddr, sport, dport = order_endpoints(saddr, daddr, sport, dport);\n\n        # Produce 20-byte SHA1 digest. \".\" means concatenation. The\n        # proto value is one byte in length and followed by a 0 byte\n        # for padding.\n        sha1_digest = sha1(seed . saddr . daddr . proto . 0 . sport . dport)\n\n        # Prepend version string to base64 rendering of the digest.\n        # v1 is currently the only one available.\n        return \"1:\" + base64(sha1_digest)\n    }\n    \n    function community_id_icmp(ipaddr saddr, ipaddr daddr, int type, int code, int seed=0)\n    {\n        port sport, dport;\n\n        # ICMP / ICMPv6 endpoint mapping directly inspired by Zeek\n        sport, dport = map_icmp_to_ports(type, code);\n\n        # ICMP is IP protocol 1, ICMPv6 would be 58\n        return community_id_v1(saddr, daddr, sport, dport, 1, seed); \n    }\n\n\nTechnical details\n-----------------\n\n- The Community ID is an additional flow identifier and doesn't need to\n  replace existing flow identification mechanisms already supported by\n  the monitors. It's okay, however, for a monitor to be configured to\n  log only the Community ID, if desirable.\n\n- The Community ID can be computed as a monitor produces flows, or can\n  also be added to existing flow records at a later stage assuming\n  that said records convey all the needed flow endpoint information.\n\n- Collisions in the Community ID, while undesirable, are not\n  considered fatal, since the user should still possess flow timing\n  information and possibly the monitor's native ID mechanism (hopefully\n  stronger than the Community ID) for disambiguation.\n\n- The hashing mechanism uses seeding to enable additional control over\n  \"domains\" of Community ID usage. The seed defaults to 0, so this\n  mechanism gets out of the way so it doesn't affect operation for\n  operators not interested in it.\n\n- In version 1 of the ID, the hash algorithm is SHA1. Future hash\n  versions may switch it or allow additional configuration.\n\n- The binary 20-byte SHA1 result gets base64-encoded to reduce output\n  volume compared to the usual ASCII-based SHA1 representation. This\n  assumes that space, not computation time, is the primary concern,\n  and may become configurable in a later version.\n\n- The resulting flow ID includes a version number to make the\n  underlying Community ID implementation explicit. This allows users\n  to ensure they're comparing apples to apples while supporting future\n  changes to the algorithm. For example, when one monitor's version of\n  the ID incorporates VLAN IDs but another's does not, hash value\n  comparisons should reliably fail. A more complex form of this\n  feature could allow capturing configuration settings in addition to\n  the implementation version.\n\n  The versioning scheme currently simply prefixes the hash value with\n  \"\u003cversion\u003e:\", yielding something like this in the current version 1:\n\n  `1:hO+sN4H+MG5MY/8hIrXPqc4ZQz0=`\n\n- The hash input is aligned on 32-bit-boundaries. Flow tuple\n  components use network byte order (big-endian) to standardize\n  ordering regardless of host hardware.\n\n- The hash input is ordered to remove directionality in the flow\n  tuple: swap the endpoints, if needed, so the numerically smaller\n  IP:port tuple comes first. If the IP addresses are equal, the ports\n  decide.  For example, the following netflow 5-tuples create\n  identical Community ID hashes because they both get ordered into\n  the sequence 10.0.0.1, 127.0.0.1, 1234, 80.\n\n  - Proto: TCP; SRC IP: 10.0.0.1; DST IP: 127.0.0.1; SRC Port: 1234; DST Port: 80\n  - Proto: TCP; SRC IP: 127.0.0.1; DST IP: 10.0.0.1; SRC Port: 80; DST Port: 1234\n\n- This version includes the following protocols and fields:\n\n  - TCP / UDP / SCTP:\n\n    IP src / IP dst / IP proto / source port / dest port \n\n  - ICMPv4 / ICMPv6:\n\n    IP src / IP dst / IP proto / ICMP type + \"counter-type\" or code\n\n    The exact handling of ICMP type \u0026 code is taken from Zeek; see\n    implementations here:\n\n    - https://github.com/corelight/pycommunityid/blob/master/communityid/icmp.py\n    - https://github.com/corelight/pycommunityid/blob/master/communityid/icmp6.py\n    - https://github.com/zeek/zeek/blob/master/src/analyzer/protocol/icmp/ICMP.cc#L860\n\n  - Other IP-borne protocols:\n\n    IP src / IP dst / IP proto\n\n  The above does not currently cover how to handle nesting (IP in IP,\n  v6 over v4, etc) as well as encapsulations such as VLAN and MPLS.\n\n- If a network monitor doesn't support any of the above protocol\n  constellations, it can safely report an empty string (or another\n  non-colliding value) for the flow ID.\n\n- Consider v1 a prototype. Feedback from the community, particularly\n  implementers and operational users of the ID, is _greatly_\n  appreciated. Please create issues directly in the GitHub project at\n  https://github.com/corelight/community-id-spec, or contact Christian\n  Kreibich (christian@corelight.com).\n\n- Many thanks for helpful discussion and feedback to Victor Julien,\n  Johanna Amann, and Robin Sommer, and to all implementors and\n  supporters.\n\nReference implementation\n------------------------\n\nA complete implementation is available in the\n[pycommunityid](https://github.com/corelight/pycommunityid) package.\nIt includes a range of tests to verify correct computation for the\nvarious protocols. We recommend it to guide new implementations.\n\nA smaller implementation is also available via the community-id.py\nscript in this repository, including the byte layout of the hashed\nvalues (see packet_get_comm_id()). See --help and make.sh to get\nstarted:\n\n```\n  $ ./community-id.py --help\n  usage: community-id.py [-h] [--seed NUM] PCAP [PCAP ...]\n\n  Community flow ID reference\n\n  positional arguments:\n    PCAP         PCAP packet capture files\n\n  optional arguments:\n    -h, --help   show this help message and exit\n    --seed NUM   Seed value for hash operations\n    --no-base64  Don't base64-encode the SHA1 binary value\n    --verbose    Show verbose output on stderr\n```\n\nFor troubleshooting, the implementation supports omitting the base64\noperation, and can provide additional detail about the exact sequence\nof bytes going into the SHA1 hash computation.\n\nReference data\n--------------\n\nThe [`baseline`](baseline) directory in this repo contains datasets to\nhelp you verify that your implementation of Community ID functions\ncorrectly.\n\nReusable modules/libraries\n--------------------------\n\n- C\n  - https://github.com/corelight/c-community-id\n  - https://github.com/ntop/nDPI (3.2+, details [here](https://github.com/ntop/nDPI/blob/dev/src/lib/ndpi_community_id.c))\n- C#: https://github.com/decompile/community-id-dotnet-core\n- Golang: https://github.com/satta/gommunityid\n- Java: https://github.com/rapid7/community-id-java\n- JavaScript: https://github.com/corelight/communityid-js\n- Python: https://github.com/corelight/pycommunityid\n- Rust: https://crates.io/crates/communityid\n\nProduction implementations\n--------------------------\n\n- Arkime (1.7.0+): https://github.com/arkime/arkime/issues/966\n- Elastic Beats: e.g. https://www.elastic.co/guide/en/beats/packetbeat/master/community-id.html\n- Elastic Common Schema: https://github.com/elastic/ecs/blob/master/schemas/network.yml\n- Elasticsearch (7.12.0+): https://www.elastic.co/guide/en/elasticsearch/reference/master/community-id-processor.html\n- HELK: https://github.com/Cyb3rWard0g/HELK (with [Ruby implementation](https://github.com/Cyb3rWard0g/HELK/commit/e81a98a745a4d02acc9d346865aeb312b3ee599d#diff-81497c6343ac648c68637062cf1ba082))\n- LogScale/Humio: https://library.humio.com/falcon-logscale/functions-communityid.html\n- MISP: https://www.misp-project.org/2019/07/19/MISP.2.4.111.released.html\n- MISP-wireshark: https://github.com/MISP/misp-wireshark\n- Osquery (4.2.0+): https://osquery.readthedocs.io/en/latest/introduction/sql/#sql-additions, [blog post](https://dactiv.llc/blog/correlate-osquery-network-connections/)\n- Qosmos ixEngine: https://www.qosmos.com/wp-content/uploads/Enea-Qosmos-ixEngine-Suricata-Solution-Brief-20211202.pdf\n- Security Onion (2.0+): https://docs.securityonion.net/en/2.3/community-id.html\n- Suricata (4.1+): https://suricata.readthedocs.io/en/suricata-4.1.2/output/eve/eve-json-output.html#community-flow-id\n- VAST: https://github.com/vast-io/vast/pull/525\n- Wireshark (3.3.1+): https://www.wireshark.org/news/20201001.html\n- Zeek package (2.5+): https://github.com/corelight/zeek-community-id\n- ntopng: https://github.com/ntop/ntopng\n\nFeature requests in other projects\n----------------------------------\n\n- https://github.com/MicrosoftDocs/sysinternals/issues/219\n\nTalks\n-----\n\n- [SuriCon 2018](https://www.icir.org/christian/talks/2018-11-suricon-communityid.pdf)\n- [FOSDEM 2021](https://www.icir.org/christian/talks/2021-02-fosdem-communityid.pdf)\n\nBlog posts and other resources\n------------------------------\n\n- [Sample captures for QUIC, DoH, CommunityID, WPA3 and other protocols in CloudShark 3.10](https://www.qacafe.com/resources/sample-captures-for-quic-doh-communityid-wpa3-cloudshark-3-10), qacafe.com\n- [Correlate network connections with community ID in osquery](https://fleetdm.com/guides/correlate-network-connections-with-community-id-in-osquery), fleetdm.com\n- [Generating CommunityIDs with Sysmon and Winlogbeat](https://holdmybeersecurity.com/2020/06/04/generating-communityids-with-sysmon-and-winlogbeat/), holdmybeersecurity.com\n\nDiscussion\n----------\n\nFeel free to discuss aspects of the Community ID via GitHub here:\nhttps://github.com/corelight/community-id-spec/issues\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcorelight%2Fcommunity-id-spec","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcorelight%2Fcommunity-id-spec","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcorelight%2Fcommunity-id-spec/lists"}