https://github.com/opencoff/fastdd

Fast "dd" implementation that leverages Linux's splice(2)
https://github.com/opencoff/fastdd

dd disk-dump multithreaded-copy splice

Last synced: 8 months ago
JSON representation

Fast "dd" implementation that leverages Linux's splice(2)

Host: GitHub
URL: https://github.com/opencoff/fastdd
Owner: opencoff
License: gpl-2.0
Created: 2019-06-07T04:46:35.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2022-03-20T21:45:44.000Z (almost 4 years ago)
Last Synced: 2025-03-31T20:39:14.430Z (9 months ago)
Topics: dd, disk-dump, multithreaded-copy, splice
Language: C
Size: 71.3 KB
Stars: 18
Watchers: 2
Forks: 4
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# What is `fastdd`
`fastdd` is a performance enhanced, simpler implementation of `dd`.
On Linux, it uses `splice(2)` to avoid data-copy to/from user space.
On other platforms, it uses multiple threads and larger block sizes
to speed up I/O.

`fastdd` doesn't try to emulate all the behavior of `dd`; thus, the
notion of "blocks" (and "block size") is used only as a multipler
for the number of bytes to move. The I/O size is picked based on the
source and destination (default is 64k).

# What part of `dd` does it implement?
It only supports a few options of dd:

* bs=N
* count=N
* skip=N -- skip N input blocks
* seek=N -- skip N output blocks before first write
* if=FILE
* of=FILE
* iflag=nonblock
* oflag=nonblock,excl,sync

Each of the integer arguments `N` can have an optional suffix of
`k`, `M`, `G`, `T`, `P` for kilo, Mega, Giga, Tera, Peta byte
respectively (**multiples of 1024**).

The source and destination can be a file, pipe, character-device or
block-device.

# Building and Installing `fastdd`
`fastdd` currently is designed for Linux, OpenBSD and MacOS;
therefore the makefile only supports those 3 OSes.

You need `GNU make`, `gcc`. There is no `configure` mess, the
code is organized to be generic enough to build on modern POSIX
systems (depends on pthreads):

On Darwin and OpenBSD:

$ gmake # or gnumake

On Linux:

$ make

By default, the makefile generates a "release" binary (with full
optimization). The build artifacts produces the `fastdd` binary
in a OS specific directory:

* Linux: `Linux-rel`
* Mac OS: `Darwin-rel`
* OpenBSD: `OpenBSD-rel`

## Installing `fastdd`
Fastdd can be installed in any location:

1. System default: `sudo make install DESTDIR=/usr`
2. Local install: `sudo make install DESTDIR=/usr/local`
3. Home dir: `make install DESTDIR=$HOME`

In each case, the `fastdd` binary goes in `$DESTDIR/bin`.

## Using `fastdd`
Fastdd is written to be usable without having to remember
complicated flags. You only need to know "if=" and "of=".
For the most common use cases of copying from a file to a block
device, you don't need to specify `bs=` or `count=` arguments;
`fastdd` can infer the input size and use a platform optimal block
size for copying.

e.g., if your USB device on Linux was on `/dev/sde`, then you can
turn it into a bootable ISO like so:

fastdd if=my.iso of=/dev/sde

That's it. If you do forget what flags to use, try:

fastdd --help

Unless the `--quiet` option is used, `fastdd` prints a progress bar
to show its incremental progress. When the input size is known, the
progress bar is "rich" (it shows progress & completion %). When the
input is unknown (e.g., from a pipe), the progress bar is simple -
only showing number of bytes written. In either case, the sizes are
human friendly (kB, MB, etc.).

# Performance Numbers
Anecdotally, on OpenBSD and Darwin, the multi-threaded version seems
to be faster than the native dd. On Linux, the version with
`splice(2)` seems to be faster than `dd`.

**TODO**: One of these days, I will sit and write a repetable benchmark
script.

# Developer Notes
On Linux, `fastdd` is single-threaded and uses `splice(2)` for
moving data in the kernel avoiding all user-space reads. If
neither source nor destination are pipes (sockets), `fastdd`
creates an intermediate pipe to splice the data.

On other platforms, `fastdd` is multi-threaded and uses a separate
read thread to gather I/O blocks. The reader and writer communicate
via producer-consumer queue.

In both cases, I/O (`splice(2)` or `read(2)`) is done in units of
`iosize` (command line parameter).

## Testing & Test Framework
There are two test harnesses:

1. `basic-tests.sh`: This is simple set of test cases to cover basic
functionality, command-line flags, `oflag` combinations etc.

2. `tests.sh`: This is a data-file test driver that reads from
`tests.in` and runs each test in turn. Each test comprises of
generating some input, using `fastdd` to move the data from source
to destination, repeating the same data movement using `dd` and
comparing checksums of the resulting output.

### Using tests.sh
New tests can be added into 'tests.in' or its own input file. The
format of the input file is documented in tests.in.

Running the data-driven tests is simple:

./tests.sh tests.inp

## Debug builds
You can build a debug version of the program:

make DEBUG=1 -j5

And the build artifacts will be in the directory `$OS-dbg`.

## Guide to Source Code

* fastdd.c - `main()` for `fastdd`.

* args.c - `fastdd` command line parsing (key=value). It uses a
table driven approach to parse the values directly into a struct
instance (uses `offsetof()`).

* blksize_darwin.c - Implementation of `Blksize()` for Mac OS
(tested on 10.11 -- 10.14)

* blksize_linux.c - Implementation of `Blksize()` for Linux (tested
on Linux 4.18)

* blksize_openbsd.c - Implementation of `Blksize()` for OpenBSD
(tested on 6.5).

* copy_linux.c - Implementation of `Copy()` for Linux using
`splice(2)`.

* copy_posix.c - Implementation of `Copy()` using pthreads for
non-Linux platforms (tested only on Darwin and OpenBSD).

* disksize.c - Small test program to call `Blksize()` and print the
resulting disk size.

* utils.c - I/O utility functions.

* opts.c - Auto-generated file for parsing long and short options;
the command line options are in opts.in. The code uses standard
`getopt_long()` - but removes the tedium of having to write the
option processing by hand.

There is a separate directory `portable/` that is a partial copy of
an [external repository](github.com/opencoff/portable-lib). This
subdirectory only contains the files needed to build `fastdd`. Some
functionality from that lib:

* Generic (type-safe) lists, producer-consumer queue
* Implementation of POSIX semaphores for Darwin
* functions to convert "sizes" to strings and vice versa (a size
string is a numeric string with a suffix of `[kKMGTPE]`).

## Adding support for other OSes
The easiest way to add support to other POSIX OSes (FreeBSD,
DragonFlyBSD etc.), is:

* Implement `Blksize()` for that OS - follow similar implementations
as in in *blksize_darwin.c*, *blksize_openbsd.c* etc.

* Make changes to GNUmakefile:

* Add the OS specific object files to its var
* Add any necessary LD libs to its var

e.g., for some POSIX OS "foo":

* write code for *blksize_foo.c*
* *GNUMakefile* changes:
1. `foo_objs = blksize_foo.o copy_posix.o`
2. `foo_LIBS =`

This will give you a working version that uses pthreads for I/O. If
your OS supports `splice(2)` like functionality, you have to write
the fast-path code for `Copy()` and put it in *copy_$OS.c*.

## TODO
* Benchmark suite to measure performance on supported platforms
* For non-linux platforms, is `mmap(2)` for source and/or
destination worth it?

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/opencoff/fastdd

Awesome Lists containing this project

README