{"id":24196580,"url":"https://github.com/dedis/student_22_dissecting_ipfs_swarm","last_synced_at":"2025-08-25T23:09:36.560Z","repository":{"id":120600946,"uuid":"580070579","full_name":"dedis/student_22_dissecting_ipfs_swarm","owner":"dedis","description":"Dissecting IPFS and Swarm to demystify distributed decentralized storage networks","archived":false,"fork":false,"pushed_at":"2023-01-05T11:39:44.000Z","size":5317,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":11,"default_branch":"main","last_synced_at":"2025-07-08T00:50:45.523Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dedis.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-12-19T16:38:23.000Z","updated_at":"2023-01-25T12:13:35.000Z","dependencies_parsed_at":null,"dependency_job_id":"7c5db156-2614-4a4c-a193-70ce570c0521","html_url":"https://github.com/dedis/student_22_dissecting_ipfs_swarm","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/dedis/student_22_dissecting_ipfs_swarm","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dedis%2Fstudent_22_dissecting_ipfs_swarm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dedis%2Fstudent_22_dissecting_ipfs_swarm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dedis%2Fstudent_22_dissecting_ipfs_swarm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dedis%2Fstudent_22_dissecting_ipfs_swarm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dedis","download_url":"https://codeload.github.com/dedis/student_22_dissecting_ipfs_swarm/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dedis%2Fstudent_22_dissecting_ipfs_swarm/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":272144649,"owners_count":24881141,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-25T02:00:12.092Z","response_time":1107,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-01-13T19:35:58.142Z","updated_at":"2025-08-25T23:09:36.524Z","avatar_url":"https://github.com/dedis.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"This repository contains code for the project **Dissecting IPFS and Swarm to demystify distributed decentralized storage networks**\n\nThere are two parts in the project:\n\n1. Analysis of an IPFS gateway dataset.\n\n   - Published in the paper [Design and evaluation of IPFS: a storage layer for the decentralized web](https://dl.acm.org/doi/abs/10.1145/3544216.3544232).\n\n   - Results regarding time series analysis, CID popularity, agent request and churn rate.\n2. Monitor IPFS and Swarm clients\n   - Run IPFS and Swarm client on VMs.\n   - Monitor and analyze standard metrics and connected peers.\n\n\n\nYou can either download the data and run the notebooks yourself, or view the results in PDF in the corresponding notebook_output files.\n\n# Dependencies\n\n## Notebooks\n\nTo get started with the notebooks, you will need to have the following dependencies installed:\n\n- Python 3\n- Jupyter Notebook\n- NumPy 1.21.2\n- Pandas 1.3.4\n- Plotly 5.6.0\n- Scipy 1.7.1\n- tqdm 4.62.3 (optional)\n\nYou can install these dependencies using `pip` or your preferred package manager. For example:\n\n```\n# Jupyter Notebook\npip install jupyter\n\n# NumPy\npip install numpy\n```\n\nOnce you have the dependencies installed, you can clone this repository and navigate to the `notebooks` directory to access the Jupyter notebooks.\n\n## Data\n\nThe IPFS gateway dataset is available on IPFS with the CID: bafybeiftyvcar3vh7zua3xakxkb2h5ppo4giu5f3rkpsqgcfh7n7axxnsa\n\nYou can download the original log file. Then, run the Data_Cleaning notebook to get csv files.\n\n## Monitoring\n\nThe experiments were conducted on two virtual machines with 2 vCPUs, 16GB RAM, and 50GB storage running Debian 10.\n\nTo reproduce the results, you will need to have the IPFS and Swarm clients installed on your system:\n\n- IPFS-client 0.16.0 ([Install IPFS](https://docs.ipfs.tech/install/))\n- Swarm Bee Client 1.8.2 ([Install Bee](https://docs.ethswarm.org/docs/installation/install/))\n\nAnd, the following dependencies:\n\n- top 3.3.15\n- nethogs 0.8.5-2+b1\n\nYou can install these dependencies if not already installed using `sudo` or your preferred package manager. For example:\n\n```\n# nethogs\nsudo apt install nethogs\n```\n\n# Monitoring Steps\n\n## Connected Peers\n\nTo monitor the connected peers of the IPFS client, you can:\n\n1. Start the IPFS client.\n\n   ```\n   ipfs daemon\n   ```\n\n2. Create shell script.\n\n   ```\n   touch peer_ipfs.sh\n   chmod u+x peer_ipfs.sh\n   ```\n\n   Add the following:\n\n   ```shell\n   ipfs swarm peers | ts '[%Y-%m-%d %H:%M:%S]' \u003e\u003e ~/peer_ipfs.txt 2\u003e\u00261\n   ```\n\n3. Open cron config\n\n   ```\n   crontab -e\n   ```\n\n   Add the following to schedule every 1min:\n\n   ```\n   * * * * * ~/peer_ipfs.sh\n   ```\n\n4. Start and stop cron\n\n   ```\n   sudo service cron start\n   sudo service cron stop\n   ```\n\nFor Swarm client, use `curl http://localhost:1635/peers ` in step 2, and modify other file names accordingly.\n\n## Standard Metrics\n\nTo monitor the CPU/MEM usage of the IPFS client, you can:\n\n1. Start the IPFS client.\n\n   ```\n   ipfs daemon\n   ```\n\n2. Find PID using top command.\n\n   ```\n   top\n   ```\n\n3. Create shell script.\n\n   ```\n   touch top_ipfs.sh\n   chmod u+x top_ipfs.sh\n   ```\n\n   Add the following:\n\n   ```shell\n   command=`top -b -p PID -n 1 | grep ipfs | awk '{$1=$1};1' | ts '[%Y-%m-%d %H:%M:%S]'`\n   echo $command\n   ```\n\n4. Open cron config\n\n   ```\n   crontab -e\n   ```\n\n   Add the following to schedule every 10s:\n\n   ```\n   * * * * * ~/top_ipfs.sh \u003e\u003e ~/top_ipfs.txt 2\u003e\u00261\n   * * * * * ( sleep 10 ; ~/top_ipfs.sh \u003e\u003e ~/top_ipfs.txt 2\u003e\u00261 )\n   * * * * * ( sleep 20 ; ~/top_ipfs.sh \u003e\u003e ~/top_ipfs.txt 2\u003e\u00261 )\n   * * * * * ( sleep 30 ; ~/top_ipfs.sh \u003e\u003e ~/top_ipfs.txt 2\u003e\u00261 )\n   * * * * * ( sleep 40 ; ~/top_ipfs.sh \u003e\u003e ~/top_ipfs.txt 2\u003e\u00261 )\n   * * * * * ( sleep 50 ; ~/top_ipfs.sh \u003e\u003e ~/top_ipfs.txt 2\u003e\u00261 )\n   ```\n\n5. Start and stop cron\n\n   ```\n   sudo service cron start\n   sudo service cron stop\n   ```\n\nFor Swarm client, use `grep bee` in step 3, and modify other file names accordingly.\n\nTo measure network, change the shell script in setp 3:\n\n```shell\ncommand=`sudo nethogs -t -c 2 | grep ipfs | ts '[%Y-%m-%d %H:%M:%S]'`\necho $command\n```\n\n## Problems you may encounter\n\n1. [how to remove \"TERM environment variable not set\"](https://stackoverflow.com/questions/19425727/how-to-remove-term-environment-variable-not-set)\n\n   Sol: Set the variable in the script by adding a line to your shell script:\n\n   ```\n   export TERM=${TERM:-dumb}\n   ```\n\n2. [\"command not found\" when running a script via cron](https://askubuntu.com/questions/47800/command-not-found-when-running-a-script-via-cron)\n\n   Sol: Run `echo \"$PATH\"` to get the `$PATH` variable.\n\n   Put the line in the top of your shell script:\n\n   ```\n   export PATH=\"your path\"\n   ```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdedis%2Fstudent_22_dissecting_ipfs_swarm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdedis%2Fstudent_22_dissecting_ipfs_swarm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdedis%2Fstudent_22_dissecting_ipfs_swarm/lists"}