{"id":19657823,"url":"https://github.com/gdatasoftwareag/mse","last_synced_at":"2025-04-28T19:32:33.483Z","repository":{"id":36975096,"uuid":"298255764","full_name":"GDATASoftwareAG/MSE","owner":"GDATASoftwareAG","description":"Malware sample exchange system and API intended for Anti-Virus companies and researchers.","archived":false,"fork":false,"pushed_at":"2024-08-19T04:50:16.000Z","size":231,"stargazers_count":16,"open_issues_count":12,"forks_count":2,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-04-05T10:33:26.726Z","etag":null,"topics":["antivirus","binaries","exchange","malware"],"latest_commit_sha":null,"homepage":"","language":"C#","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/GDATASoftwareAG.png","metadata":{"files":{"readme":"Readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-09-24T11:18:35.000Z","updated_at":"2025-03-29T23:09:34.000Z","dependencies_parsed_at":"2024-05-21T09:31:21.338Z","dependency_job_id":"3719ac4a-589b-439c-b9a1-f365a89a3ee3","html_url":"https://github.com/GDATASoftwareAG/MSE","commit_stats":null,"previous_names":[],"tags_count":28,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GDATASoftwareAG%2FMSE","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GDATASoftwareAG%2FMSE/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GDATASoftwareAG%2FMSE/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GDATASoftwareAG%2FMSE/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/GDATASoftwareAG","download_url":"https://codeload.github.com/GDATASoftwareAG/MSE/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251375534,"owners_count":21579465,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["antivirus","binaries","exchange","malware"],"created_at":"2024-11-11T15:33:49.437Z","updated_at":"2025-04-28T19:32:33.222Z","avatar_url":"https://github.com/GDATASoftwareAG.png","language":"C#","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Malware Sample Exchange (MSE)\n\nThe **Malware Sample Exchange (MSE)** provides a modern alternative to [Virex](https://github.com/NextSecurity/virex), to exchange malware samples between AV industry partners. It is easy to set up and supports cloud-native workflows.\n\nSee the blog post for a longer introduction: [A modern Sample Exchange System](https://www.gdatasoftware.com/blog/2020/10/36410-a-modern-sample-exchange-system)\n\nOur idea is to provide a standardized exchange system which meets the following criteria:\n\n- The ability to choose only the malware samples you need. \n- The ability for partners to filter before the download.\n  - SHA256 \n  - Categories which can include, but is not limited to, the target platform or specific detections.\n- Easy to consume API and built on current web standards ([OpenAPI](https://www.openapis.org/))\n- Easy to set up in a few minutes, so that every exchange partner is able to host it with little added effort.\n- Specific sample sets per partner\n\n## HTTP API\n\nThe **Malware Sample Exchange** service is [OpenAPI](https://www.openapis.org/) compatible and exposes an API description that can be used to automatically generate a client for a programming language.\n\nHTTP API Web UI: `http://{}/swagger/index.html`\n\nHTTP API Json description: `http://{}/swagger/v1/swagger.json`\n\nExposed End-Points:\n\n|Route|Parameter|Example|Basic Auth|Description|\n|-----|---------|-------|--------------|-----------|\n|/swagger/index.html|-|-|No|Shows the OpenAPI web interface|\n|/swagger/v1/swagger.json|-|-|No|OpenAPI json description|\n|/v1/list|start (required), end (optional)| /v1/list?start=2020-09-23 |Yes (user:password) |Fetch list with available samples\n|/v1/download|token (required)| /v1/download?token=eyJ0eX... |No |Download a sample with a token from the list\n\n## Usage\n\nFor example, by executing:\n\n```bash\n# Get all samples for the user \"testuser\" in the time range 2020-09-23 until now\ncurl -u \"testuser:somenicepassword\" -X GET -k -i 'http://localhost:8080/v1/list?start=2020-09-23'\n```\n\nyou will receive\na list of [JWT](https://jwt.io/) tokens, which generally have the format ```aaaa.bbbbbbbb.cccc``` of three base64 encoded sections that are separated\nby dots. The first section are header information, declaring the structure as JWT and the used hash algorithm. The second part is the\nactual payload that contains expiration date of the token, SHA256, file size and the platform. The third part is a signature that guarantees that the JWT is valid.\nIf after checking that the sample is not already part of your collection and you have interest in the reported platform,\nyou can download it with: \n\n```bash\n# Download a specific sample\ncurl -X GET -k \"http://localhost:8080/v1/download?token=$PUT_TOKEN_HERE\"\n```\n\nNo authentication is needed for the download, as the [JWT](https://jwt.io/) is signed and as such authenticates the request.\n\nThe list-endpoints returns a list of Json data structure which contains a JTW for each sample. How a JWT is decoded is shown below.\n\n```bash\n# Decode a JWT from the list-endpoint\nTOKEN=\"eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzUxMiJ9.eyJleHAiOjE2MTk1MTc4NzMsInNoYTI1NiI6IjA1YzIyNDU1Zjc3YjVmOTIxY2I1ZWIyM2FkZDBkYjkwNzc3NjljMGNhY2I4NDBjNDYwZjQxZDlhODM1NzkyOWYiLCJmaWxlc2l6ZSI6MTIzNDUsInBsYXRmb3JtIjoiUERGIiwicGFydG5lciI6InRlc3R1c2VyIn0.pACN0JaMnSoA0Dnk1lXk77BU9krCawnRkXAVTDTDKahXT9HKleAfuK8ngZ62SauOj-pGXkO2m3ijH2x3PNRl1A\"\n# Split the token in its three parts at '.'\nARRAY=(`echo $TOKEN | tr '.' ' '`)\n\n# Decode header\necho \"HEADER:\"\necho ${ARRAY[1]} | base64 -d\n# Output:\n# {\"typ\":\"JWT\",\"alg\":\"HS512\"}\n\n# Decode the payload\necho \"PAYLOAD:\"\necho ${ARRAY[2]} | base64 -d\n# Output:\n# {\"exp\":1619517873,\"sha256\":\"05c22455f77b5f921cb5eb23add0db9077769c0cacb840c460f41d9a8357929f\",\"filesize\":12345,\"platform\":\"PDF\",\"partner\":\"testuser\"}\n\n\n# Decode signatrue (not printable)\necho \"Signature:\"\necho ${ARRAY[3]} | base64 -d\n# Output:\n# ��Ж��*�9��U��T�J�k      ёp\u0015L4�)�WO�ʕ�\u001f��'���I��\n```\n\n## Setup\n\nThere are several methods to setup MSE in production or for testing.\n\n### Kubernetes Deployment\n\nAn example deployment for Kubernetes if given in [k8s-deployment.yaml](./k8s-deployment.yaml).\n\nFor ease of use, it uses NodePorts to expose the Mongodb for meta-data and the REST service to the network. If you already have K8S cluster with an ingress/load-balancer, use them instead of the NodePorts.\n\n\n- All data will be stored here: `/mnt/sampleexportstorage`\n  - The folder has to be created before the deployment\n- REST API will be reachable under: `http://{your k8s host}:32000`\n- Mongodb for meta-data will be reachable under: `mongodb://{your k8s host}:32001`\n\nYou can find the latest image on Docker Hub: [Sample-Exchange Docker Image](https://hub.docker.com/r/gdatacyberdefense/sampleexchange/tags?page=1\u0026ordering=last_updated)\n\n```bash\n# Deploy to k8s\nkubectl apply -f k8s-deployment.yaml\n\n# Fill with example data\npython3 ./src/ python3 main.py -s \"/mnt/sampleexportstorage\" -m \"mongodb://localhost:32001\"\n\n# Fetch list with samples (set date to current)\ncurl -u \"testuser:somenicepassword\" -X GET -k -i 'http://localhost:32000/v1/list?start=2020-09-23'\n\n# Remove all k8s resources (does not remove /mnt/sampleexportstorage)\nkubectl apply/delete -f k8s-deployment.yaml\n```\n\n\n### Local test setup\n\nThe exchange API is in need of a Mongodb for storing sample meta data. You can start a database with the following command:\n`docker run -d -it --rm -p 27017:27017 mongo`.\n\nMake sure that the folder `/mnt/sampleexportstorage/` exists and execute the Python script located in this repository by typing\n`python3 ./src/FillMongoWithTestData/main.py -s \"/mnt/sampleexportstorage/\" -m \"mongodb://localhost:27017\"`.\n\nThis scripts creates three benign test samples on the share and adds meta data to the Mongodb.\nNow you can start up Exchange API by changing to directory `./src/MalwareSampleExchange.Console/` and typing `dotnet run`.\n\n## Configuration\n\nTo configure the MSE itself, the [appsettings.json](./src/MalwareSampleExchange.Console/appsettings.json) is used.\n```json\n{\n  \"Token\": {\n    \"Secret\": \"PutSomeNiceSecretHere\", // The global secret used to \"sign\" the JWTs. Only you must know it.\n    \"Expiration\": \"1.00:00:00\" // The expiration timespan with format \"d.hh:mm:ss\". If the time expired, the token is invalid and cannot be used anymore. \n  },\n  \"Upload\": {\n    \"AllowPartnerToUpload\": \"\" // allow a single partner to be able to upload\n  },\n  \"Config\": {\n    \"Url\": \"https://url\", // the url is used to download partners config as json, if not provided fallback to file\n    \"FilePath\": \"shareconfig.yml\" // The file used to configure users and sample-sets.\n  },\n  \"MongoDb\": {\n    \"ConnectionString\": \"mongodb://localhost:27017\", // Connection string to the MongoDB\n    \"DatabaseName\": \"Sample\", // Database name in the MongoDB.\n    \"CollectionName\": \"Sample\" // Collection name in the MongoDB database.\n  },\n  \"Storage\": {\n    \"Backend\": \"File\", // Allows to store sample on file system or in S3, possible values: File, S3, and Url. \n    \"Path\": \"/mnt/sampleexportstorage\" // Path to the actual samples, only required for backend File.\n  }\n}\n```\nAll settings can be overwritten by **environment variables**. This is useful, if you want to run the Docker image directly of in Kubernetes, where editing the `appsettings.json` is not feasible.\nFor example `Token__Secret=\"PutSomeNiceSecretHere`. The delimiter for sub-sections is the **double underscore** `__` in env. vars.\n\nThe `Storage` must have a specific folder structure. All files have to be named after their **SHA256**. The folder structure consists of the first hex byte of the SHA256, which contains the second hex byte of the SHA256 as a sub-folder. In the sub-folder the sample itself is stored.\n\n```\n# Example of the expected sample structure\n/mnt/sampleexportstorage\n  - /00\n    - /00\n      - /00002455f77b5f921cb5eb23add0db9077769c0cacb840c460f41d9a8357929f\n      - ..\n    - /01\n    - ..\n    - /FF\n  - /01\n  - ..\n  - /FF\n```\n\nFor the configuration of users and their corresponding data sets, the [shareconfig.yaml](./src/MalwareSampleExchange.Console/sh\n) is used. The MongoDB does not know about any users, it only contains samples which belong to a set.\n\n```yaml\n# Example sharedconfig.yaml with two exchange partners\nPartners:\n- Name: partner1 # Name of the exchange partner\n  Password: 466fef588adae318d7f50541982785daaf61d51b5c47101c1c751fbd717dd9e8 # Password Hash\n  Salt: 79b48cd1d1ed8fa129c58c5c2d0633b3f9d46087feb8b0165a5ed560356db894 # Password Salt\n  Enabled: Yes # Is the exchange with the partner enabled?\n  Sampleset: Classic # Which set it shared with the partner?\n  IncludeFamilyName: Yes # Allows to include a family into the token\n\n- Name: partner2\n  Password: c5363549da9f03d8da44db70ec12ca5dce8078d4cb5fda1d7ecadd4372031539\n  Salt: 8ec1690da1bf1baad62a20c0db8e4ad26205ec577b741ccc8b1e2e834670a5e4\n  Enabled: No\n  Sampleset: Extended\n  IncludeFamilyName: no\n```\n\nThe [main.py](./src/FillMongoWithTestData/main.py) is an example script, which show how the MongoDB is filled with samples to share. It does two things. First it moved the sample itself to the sample folder, as described above. Second, it inserts the needed meta-data for the sample into the MongoDB. This is all that is needed to be able to share the sample with a partner.\n\n```python\n#!/usr/bin/python3\n\nimport hashlib\nimport pymongo # sudo pip install pymongo\nimport datetime\nimport os\nimport sys, getopt\n\ndef put_string_into_db(sha256, platform, file_size, sample_set, mongo_collection, family_name):\n    current_iso_datetime = datetime.datetime.utcnow()\n    entry = {\n                \"_id\": f\"{sha256}:test\",                  # Unique ID \n                \"Sha256\": sha256,                         # SHA256 of the sample\n                \"Platform\": platform,                     # Free to set and not a not a fixed set. E.g. \"EXE_PE32\", \"Mobile\", \"PDF\" ...\n                \"Imported\": current_iso_datetime,         # Date-time, when the sample was added\n                \"FileSize\": file_size,                    # File size in bytes\n                \"DoNotUseBefore\": current_iso_datetime,   # Do not share before this date-time\n                \"SampleSet\": sample_set,                  # Which set the samples belongs to\n                \"FamilyName\": family_name                 # Custom FamilyName\n            }\n    mongo_collection.insert_one(entry)\n\n\ndef hash_string_and_save_to_file_in_folder(hash_target, folder):\n    sha256_of_string = hashlib.sha256(hash_target.encode('utf-8')).hexdigest()\n    file_path = f\"{folder}/\" + f\"{sha256_of_string[0:2]}/\" + f\"{sha256_of_string[2:4]}/\" + sha256_of_string\n    os.makedirs(os.path.dirname(file_path), exist_ok=True)\n    file = open(file_path, 'w+')\n    file.write(hash_target)\n    file.close()\n    return sha256_of_string\n\n\ndef main(argv):\n    destination_folder = ''\n    mongo_url = ''\n    help = 'main.py -s \u003cstorage folder\u003e -m \u003cmongodb url\u003e'\n\n    try:\n        opts, args = getopt.getopt(argv, \"hs:m:\", [\"storage=\", \"mongodb=\"])\n    except getopt.GetoptError:\n        print (help)\n        sys.exit(2)\n\n    for opt, arg in opts:\n      if opt == '-h':\n         print (help)\n         sys.exit()\n      elif opt in (\"-s\", \"--storage\"):\n         destination_folder = arg\n      elif opt in (\"-m\", \"--mongodb\"):\n         mongo_url = arg\n\n    string_1 = '\"Your focus determines your reality.\" – Qui-Gon Jinn'\n    string_2 = '\"Do. Or do not. There is no try.\" – Yoda'\n    string_3 = '\"In my experience there is no such thing as luck.\" – Obi-Wan Kenobi'\n\n    mongo_client = pymongo.MongoClient(mongo_url)\n    mongo_db = mongo_client[\"Sample\"]\n    mongo_collection = mongo_db[\"Sample\"]\n\n    sha256_1 = hash_string_and_save_to_file_in_folder(string_1, destination_folder)\n    sha256_2 = hash_string_and_save_to_file_in_folder(string_2, destination_folder)\n    sha256_3 = hash_string_and_save_to_file_in_folder(string_3, destination_folder)\n    put_string_into_db(sha256_1, \"PDF\", 12345, \"test\", mongo_collection, \"family2\")\n    put_string_into_db(sha256_2, \"PE32\", 67890, \"test\", mongo_collection, \"family1\")\n    put_string_into_db(sha256_3, \"AND\", 112233, \"test\", mongo_collection, \"family1\")\n\nif __name__ == '__main__':\n    main(sys.argv[1:])\n```\n\n## Build and Release\n\nA GitHub action builds on every push and pull request. A new Docker image will be pushed to the Docker Hub.\n\n To release a new version, push a tagged version like this:\n\n```bash\ngit tag -a 1.0.0 -m \"Release version 1.0.0\"\ngit push origin 1.0.0\n```\n\nReplace with the corresponding version.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgdatasoftwareag%2Fmse","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgdatasoftwareag%2Fmse","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgdatasoftwareag%2Fmse/lists"}