{"id":16161466,"url":"https://github.com/gamemann/linux-btrfs-lab","last_synced_at":"2025-10-26T11:09:01.676Z","repository":{"id":171459221,"uuid":"647956935","full_name":"gamemann/Linux-BTRFS-Lab","owner":"gamemann","description":"A small lab using Ubuntu 23.04 with the BTRFS file system to test deduplication feature.","archived":false,"fork":false,"pushed_at":"2023-06-01T06:25:24.000Z","size":122,"stargazers_count":12,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-16T23:51:24.525Z","etag":null,"topics":["23-04","btrfs","dd","deduplication","disk","disk-space","documentation","duperemove","filesystem","hard-drive","kvm","lab","linux","qemu","save-space","ssd","ubuntu","vm"],"latest_commit_sha":null,"homepage":"https://deaconn.net/blog/view/lab-linux-btrfs","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gamemann.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-05-31T22:30:59.000Z","updated_at":"2025-02-16T12:55:47.000Z","dependencies_parsed_at":null,"dependency_job_id":"09ae31b8-da28-4f5a-814c-3db23889bf15","html_url":"https://github.com/gamemann/Linux-BTRFS-Lab","commit_stats":null,"previous_names":["gamemann/linux-btrfs-lab"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gamemann%2FLinux-BTRFS-Lab","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gamemann%2FLinux-BTRFS-Lab/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gamemann%2FLinux-BTRFS-Lab/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gamemann%2FLinux-BTRFS-Lab/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gamemann","download_url":"https://codeload.github.com/gamemann/Linux-BTRFS-Lab/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244318312,"owners_count":20433875,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["23-04","btrfs","dd","deduplication","disk","disk-space","documentation","duperemove","filesystem","hard-drive","kvm","lab","linux","qemu","save-space","ssd","ubuntu","vm"],"created_at":"2024-10-10T02:25:20.692Z","updated_at":"2025-10-15T16:29:42.043Z","avatar_url":"https://github.com/gamemann.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Linux BTRFS Lab\nThis is just a small repository to store my results on testing Linux [BTRFS's](https://archive.kernel.org/oldwiki/btrfs.wiki.kernel.org/) out-of-band deduplication [feature](https://btrfs.readthedocs.io/en/latest/Deduplication.html). BTRFS is an advanced Linux file system that comes with many neat features!\n\n## Motives\nWhile I'm interested in Linux file systems in general, a gaming community I help out in has a dedicated server running Linux with 512 GBs of disk space utilizing the `ext4` file system that runs multiple servers for the game [Counter-Strike: Global Offensive](https://steamdb.info/app/730/charts/). CS:GO's base installation files are around *~33 GBs* each which resulted in the dedicated server running low on disk space without many **custom** game server files. Using hard links is an option, but since they utilize [Pterodactyl](https://pterodactyl.io/)/Docker, implementing a hard-link approach would be more difficult since Pterodactyl's mount feature wouldn't work because we'd have hard links on separate file systems which is incompatible. Therefore, since they are buying a new machine soon, I wanted to look into using different Linux file systems that can utilize compression and/or deduplication to save disk space. I assumed the deduplication feature with file systems such as BTRFS would benefit a lot in this situation since the 33 GBs of base installation files for CS:GO are identical.\n\n## Lab Specs\n* Created on my '[SpyKids](https://github.com/gamemann/Home-Lab#three-spykids)' home server running Ubuntu 22.04.\n* Virtual machine created with KVM/QEMU running Ubuntu 23.04.\n* Two virtual cores.\n* 4 GBs of RAM.\n* 1 x 200 GBs SSD (virtio driver).\n\nOutput from `lshw -short` on VM.\n\n```bash\nchristian@sk-btrfstest01:~$ sudo lshw -short\nH/W path           Device      Class          Description\n=========================================================\n                               system         Standard PC (Q35 + ICH9, 2009)\n/0                             bus            Motherboard\n/0/0                           memory         96KiB BIOS\n/0/400                         processor      Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz\n/0/401                         processor      Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz\n/0/1000                        memory         4GiB System Memory\n/0/1000/0                      memory         4GiB DIMM RAM\n/0/100                         bridge         82G33/G31/P35/P31 Express DRAM Controller\n/0/100/1           /dev/fb0    display        QXL paravirtual graphic card\n/0/100/2                       bridge         QEMU PCIe Root port\n/0/100/2/0                     network        Virtio network device\n/0/100/2/0/0       enp1s0      network        Ethernet interface\n/0/100/2.1                     bridge         QEMU PCIe Root port\n/0/100/2.1/0                   bus            QEMU XHCI Host Controller\n/0/100/2.1/0/0     usb1        bus            xHCI Host Controller\n/0/100/2.1/0/0/1   input4      input          QEMU QEMU USB Tablet\n/0/100/2.1/0/1     usb2        bus            xHCI Host Controller\n/0/100/2.2                     bridge         QEMU PCIe Root port\n/0/100/2.2/0                   communication  Virtio console\n/0/100/2.2/0/0                 generic        Virtual I/O device\n/0/100/2.3                     bridge         QEMU PCIe Root port\n/0/100/2.3/0                   storage        Virtio block device\n/0/100/2.3/0/0     /dev/vda    disk           214GB Virtual I/O device\n/0/100/2.3/0/0/1   /dev/vda1   volume         1023KiB BIOS Boot partition\n/0/100/2.3/0/0/2   /dev/vda2   volume         199GiB EFI partition\n/0/100/2.4                     bridge         QEMU PCIe Root port\n/0/100/2.4/0                   generic        Virtio memory balloon\n/0/100/2.4/0/0                 generic        Virtual I/O device\n/0/100/2.5                     bridge         QEMU PCIe Root port\n/0/100/2.5/0                   generic        Virtio RNG\n/0/100/2.5/0/0                 generic        Virtual I/O device\n/0/100/2.6                     bridge         QEMU PCIe Root port\n/0/100/2.7                     bridge         QEMU PCIe Root port\n/0/100/3                       bridge         QEMU PCIe Root port\n/0/100/3.1                     bridge         QEMU PCIe Root port\n/0/100/3.2                     bridge         QEMU PCIe Root port\n/0/100/3.3                     bridge         QEMU PCIe Root port\n/0/100/3.4                     bridge         QEMU PCIe Root port\n/0/100/3.5                     bridge         QEMU PCIe Root port\n/0/100/1b          card0       multimedia     82801I (ICH9 Family) HD Audio Controller\n/0/100/1f                      bridge         82801IB (ICH9) LPC Interface Controller\n/0/100/1f/0                    communication  PnP device PNP0501\n/0/100/1f/1                    input          PnP device PNP0303\n/0/100/1f/2                    input          PnP device PNP0f13\n/0/100/1f/3                    system         PnP device PNP0b00\n/0/100/1f/4                    system         PnP device PNP0c01\n/0/100/1f.2        scsi0       storage        82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode]\n/0/100/1f.2/0.0.0  /dev/cdrom  disk           QEMU DVD-ROM\n/0/100/1f.3                    bus            82801I (ICH9 Family) SMBus Controller\n/1                 input0      input          Power Button\n/2                 input1      input          AT Translated Set 2 keyboard\n/3                 input3      input          ImExPS/2 Generic Explorer Mouse\n/4                 input6      input          spice vdagent tablet\n```\n\nOutput from `uname -r` on VM.\n\n```bash\nchristian@sk-btrfstest01:~$ sudo uname -r\n6.2.0-20-generic\n```\n\nOutput from `cat /etc/*-release` on VM.\n\n```bash\nchristian@sk-btrfstest01:~$ cat /etc/*-release\nDISTRIB_ID=Ubuntu\nDISTRIB_RELEASE=23.04\nDISTRIB_CODENAME=lunar\nDISTRIB_DESCRIPTION=\"Ubuntu 23.04\"\nPRETTY_NAME=\"Ubuntu 23.04\"\nNAME=\"Ubuntu\"\nVERSION_ID=\"23.04\"\nVERSION=\"23.04 (Lunar Lobster)\"\nVERSION_CODENAME=lunar\nID=ubuntu\nID_LIKE=debian\nHOME_URL=\"https://www.ubuntu.com/\"\nSUPPORT_URL=\"https://help.ubuntu.com/\"\nBUG_REPORT_URL=\"https://bugs.launchpad.net/ubuntu/\"\nPRIVACY_POLICY_URL=\"https://www.ubuntu.com/legal/terms-and-policies/privacy-policy\"\nUBUNTU_CODENAME=lunar\nLOGO=ubuntu-logo\n```\n\n## Partition Table When Installing Ubuntu 23.04\nHere is a screenshot of the VM's partition table/configuration during the Ubuntu installation.\n\n![Ubuntu Install](./images/ubuntu_install.png)\n\nFor simplicity, I created a single partition for the entire hard drive mounted at `/` (root) which uses BTRFS.\n\n## Installing Deduplication Tools\nThere are multiple deduplication tools you can install or build on your system. In this lab, we'll be using [Duperemove](https://github.com/markfasheh/duperemove). For Ubuntu 23.04, you may run the following `apt` command to install Duperemove via package manager.\n\n```bash\nsudo apt install -y duperemove\n```\n\n## Starting Disk Space\nHere's the output from `df -h` on the VM with a vanilla installation of Ubuntu 23.04.\n\n```bash\nchristian@sk-btrfstest01:~$ df -h\nFilesystem      Size  Used Avail Use% Mounted on\ntmpfs           390M  1.7M  389M   1% /run\n/dev/vda2       200G  6.5G  192G   4% /\ntmpfs           2.0G     0  2.0G   0% /dev/shm\ntmpfs           5.0M  8.0K  5.0M   1% /run/lock\ntmpfs           390M  180K  390M   1% /run/user/1000\n```\n\n## Creating Dummy Files\nIn this lab, we will create a directory with two files. One file will be 15 GBs and the other will be 10 GBs. We can use the `dd` Linux tool to create these files that are padded with 0's.\n\n```bash\n# Create directory.\nmkdir test1\n\n# Change directory.\ncd test1\n\n# Create 15 GBs file.\ndd if=/dev/zero of=filedum1 bs=1G count=15\n\n# Create 10 GBs file.\ndd if=/dev/zero of=filedum2 bs=1G count=10\n```\n\n**Note** - There are faster commands to create dummy files other than `dd`. However, `dd` is the most commonly used which is why I chose to use the command in this lab.\n\nHere is the output from the commands above.\n\n```bash\nchristian@sk-btrfstest01:~$ # Create directory.\nmkdir test1\n\n# Change directory.\ncd test1\n\n# Create 15 GBs file.\ndd if=/dev/zero of=filedum1 bs=1G count=15\n\n# Create 10 GBs file.\ndd if=/dev/zero of=filedum2 bs=1G count=10\n\n15+0 records in\n15+0 records out\n16106127360 bytes (16 GB, 15 GiB) copied, 25.2724 s, 637 MB/s\n10+0 records in\n10+0 records out\n10737418240 bytes (11 GB, 10 GiB) copied, 23.9933 s, 448 MB/s\n```\n\nNow if we execute `ls -lh .`, you can see the size of each file in the test directory.\n\n```bash\nchristian@sk-btrfstest01:~/test1$ ls -lh .\ntotal 25G\n-rw-rw-r-- 1 christian christian 15G May 31 18:51 filedum1\n-rw-rw-r-- 1 christian christian 10G May 31 18:52 filedum2\n```\n\nNow let's check the output from `df -h` again.\n\n```bash\nchristian@sk-btrfstest01:~/test1$ df -h\nFilesystem      Size  Used Avail Use% Mounted on\ntmpfs           390M  1.7M  389M   1% /run\n/dev/vda2       200G   32G  167G  16% /\ntmpfs           2.0G     0  2.0G   0% /dev/shm\ntmpfs           5.0M  8.0K  5.0M   1% /run/lock\ntmpfs           390M  172K  390M   1% /run/user/1000\n```\n\n## Testing Duplication\nNow, we could copy the directory we just created, but I noticed this automatically handles deduplication due to what the `cp` command does behind the scenes. Therefore, you won't see what it looks like without deduplication.\n\nHowever, if you want a quick example, you may execute the following.\n\n```bash\n# Go up a directory.\ncd ..\n\n# Copy test1 to test2.\ncp -r test1/ test2/\n```\n\nIf you execute `du -sh *`, you'll see both directories are 25 GBs in size.\n\n```bash\nchristian@sk-btrfstest01:~$ du -sh *\n...\n25G     test1\n25G     test2\n...\n```\n\nIf we run `df -h` again, you'll see we have the same size as before (32 GBs used).\n\n```bash\nchristian@sk-btrfstest01:~$ df -h\nFilesystem      Size  Used Avail Use% Mounted on\ntmpfs           390M  1.7M  389M   1% /run\n/dev/vda2       200G   32G  167G  16% /\ntmpfs           2.0G     0  2.0G   0% /dev/shm\ntmpfs           5.0M  8.0K  5.0M   1% /run/lock\ntmpfs           390M  172K  390M   1% /run/user/1000\n```\n\nThis is the deduplication feature working!\n\n### Testing Without Copying Directory\nLet's try testing without copying the directory which handles deduplication automatically. We will use the same commands as we used the first time to generate large files. However, we'll rename `test1` to `test2` like the following.\n\n```bash\n# Create directory.\nmkdir test2\n\n# Change directory.\ncd test2\n\n# Create 15 GBs file.\ndd if=/dev/zero of=filedum1 bs=1G count=15\n\n# Create 10 GBs file.\ndd if=/dev/zero of=filedum2 bs=1G count=10\n```\n\nHere's the output from above.\n\n```bash\nchristian@sk-btrfstest01:~$ # Create directory.\nmkdir test2\n\n# Change directory.\ncd test2\n\n# Create 15 GBs file.\ndd if=/dev/zero of=filedum1 bs=1G count=15\n\n# Create 10 GBs file.\ndd if=/dev/zero of=filedum2 bs=1G count=10\n\n15+0 records in\n15+0 records out\n16106127360 bytes (16 GB, 15 GiB) copied, 26.6588 s, 604 MB/s\n10+0 records in\n10+0 records out\n10737418240 bytes (11 GB, 10 GiB) copied, 19.1798 s, 560 MB/s\n```\n\nNow if we run `df -h`, you can see we're using 57 GBs instead of 32 GBs.\n\n```bash\nchristian@sk-btrfstest01:~/test2$ df -h\nFilesystem      Size  Used Avail Use% Mounted on\ntmpfs           390M  1.7M  389M   1% /run\n/dev/vda2       200G   57G  142G  29% /\ntmpfs           2.0G     0  2.0G   0% /dev/shm\ntmpfs           5.0M  8.0K  5.0M   1% /run/lock\ntmpfs           390M  172K  390M   1% /run/user/1000\n```\n\nTo my understanding, since BTRFS operates in out-of-band mode **only**, it will not handle deduplication automatically on each write. Therefore, we will have to run the command `duperemove` with a couple of parameters.\n\nLet's run this command with the `-dr` flags!\n\n```bash\n# Go up one directory.\ncd ..\n\n# Run deduplication command.\nsudo duperemove -dr .\n```\n\nThis ends up consuming a bit of CPU since it is scanning files/hashes to determine if the files need to be deduplicated. However, it didn't perform any deduplication on my end. I figured this was due to checksum/block differences. Afterall, we are creating completely separate files rather than copying. Though, the contents of the files should be the same (all zero'd bytes).\n\nTherefore, I started reading the manual page for `duperemove` (`man duperemove`). I ended up trying other hashing algorithms which didn't make any differences. I then came across the  `---dedupe-options=[OPTIONS]` flag which is explained below.\n\n```bash\n--dedupe-options=options\n    Comma separated list of options which alter how we dedupe. Prepend 'no' to an option in order to turn it off.\n\n    [no]partial\n        Duperemove can often find more dedupe by comparing portions of extents to each other. This can be a lengthy, CPU  in‐\n        tensive task so it is turned off by default.\n\n        The  code  behind  this  option is under active development and as a result the semantics of the partial argument may\n        change.\n\n    [no]same\n        Defaults to off. Allow dedupe of extents within the same file.\n\n    [no]fiemap\n        Defaults to on. Duperemove uses the fiemap ioctl during the dedupe stage to optimize out already deduped  extents  as\n        well as to provide an estimate of the space saved after dedupe operations are complete.\n\n        Unfortunately,  some  versions of Btrfs exhibit extremely poor performance in fiemap as the number of references on a\n        file extent goes up. If you are experiencing the dedupe phase slowing down or 'locking up' this option may give you a\n        significant amount of performance back.\n\n        Note: This does not turn off all usage of fiemap, to disable fiemap during the file scan stage, you will also want to\n        use the --lookup-extents=no option.\n\n    [no]block\n        Deprecated.\n```\n\nI ended up using `--dedupe-options partial` which took **a lot** longer to run along with more CPU but performed deduplication. However, it did perform solid deduplication.\n\n```bash\n# Run deduplication command with partial option set.\nsudo duperemove --dedupe-options partial -dr .\n```\n\nWhile this does consume a bit of CPU, you can use the following flags to limit the amount of cores/threads.\n\n```bash\n--io-threads=N\n    Use N threads for I/O. This is used by the file hashing and dedupe stages. Default is automatically detected based on number\n    of host cpus.\n\n--cpu-threads=N\n    Use N threads for CPU bound tasks. This is used by the duplicate extent finding stage.  Default  is  automatically  detected\n    based on number of host cpus.\n\n    Note:  Hyperthreading  can adversely affect performance of the extent finding stage. If duperemove detects an Intel CPU with\n    hyperthreading it will use half the number of cores reported by the system for cpu bound tasks.\n```\n\n## Conclusion\nThis was a fun experiment for me because I feel my knowledge with file systems isn't enough and I'll be using the BTRFS file system more in the future along with the deduplication feature. I think it's best to have a cron job run late at night every two weeks or so to perform deduplication with partial support due to how long it takes.\n\nThere are other file systems I'm going to also experiment with such as [ZFS](https://en.wikipedia.org/wiki/ZFS) and [XFS](https://en.wikipedia.org/wiki/XFS) which I've heard great things about. With that said, having deduplication in-band sounds like a better approach since it performs deduplication if needed on any write operation. However, this could be costly to the CPU as well, so there are definitely pros to out-of-band deduplication (e.g. being able to pick what directories to perform deduplication on and what times).\n\nOne other thing I did want to note is while file systems such as BTRFS have matured a lot over the years, it is stated deduplication still poses a very small risk of data corruption. This is because the deduplication process involves sharing data blocks and any changes to the data block being shared could cause issues. BTRFS does have safeguards for this, though. So make sure to always **back up your files** if you can!\n\n## Credits\n* [Christian Deacon](https://github.com/gamemann)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgamemann%2Flinux-btrfs-lab","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgamemann%2Flinux-btrfs-lab","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgamemann%2Flinux-btrfs-lab/lists"}