Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/dagwieers/renumid

Renumber or remap UIDs/GIDs on file systems efficiently
https://github.com/dagwieers/renumid

Last synced: 2 months ago
JSON representation

Renumber or remap UIDs/GIDs on file systems efficiently

Awesome Lists containing this project

README

        

= Renumber UIDs/GIDs on file systems efficiently

== Use case
Typically when consolidating environments, or moving standalone systems into
a consolidated directory existing users and groups with conflicting UIDs and
GIDs have to be renumbered. Part of this process means you have to safely and
efficiently modify the file systems and bring the AS-IS situation into line
with the wanted TO-BE end-result.

This is exactly what renumid intends to help with.

== Two-phase approach
Intuitively one would scan the filesystems for a specific UID or GID, once
identified you'd change the ownership to the new state, continue with the
next files until all files, and all UIDs/GIDs are dealt with. Often the UNIX
'find' utility is used for this like:

find / -uid 1234 -exec chown 3456 {} \;
find / -uid 3456 -exec chown 5678 {} \;

However this poses a few problems:

- You will need to scan the whole filesystem for each UID/GID to process.
This takes a lot of time and is far from efficient

- You risk ending up with the wrong end-result. If you are not careful
(like the above example) you may be renumbering already mapped files in
a subsequent action. Creating a situation that you cannot restore easily.

- Changes that require both UID and GID changes cannot be easily dealt with.

- If the process (for whatever reason is interrupted) replaying the remaining
actions is tedious and time-consuming.

The solution to this is to use a two-phased approach.

1. First index the file system and identify all paths that require a change.

2. Use this Index file to efficiently perform the needed renumbering.

Since this index not just stores the paths, but also the original state. One
can extract exactly the intended end-result, and the required work to get there.
(e.g. the number of atomic file system actions to perform)

A two phased approach also allows to prepare the required actions before the
migration itself takes place, and can help to assess the required migration
time for individual systems. As such file system renumbering can potentially
be very time-consuming one can much better predict the required downtime
needed for the migration.

For this reason we added 2 optional steps.

3. A step that can report on what changes are to be expected from the Index file

4. A step that can restore the original state or can be used to benchmark the process (without impact)

// 5. A step to validate the Index file against the current system (files missing, ownership changes)

== Implementation details
The following potential issues should be considered:

- Symlinks should not be followed (no-dereference) - DONE
- Pseudo file systems should be excluded - DONE
- Crossing file systems boundaries may not be wanted
- File systems are mounted inside file systems (so device nr is important) - DONE
- Python os.walk provides every directory as a (new) root, so we need to store excluded device nrs - DONE
- Bind mounts need to be handled carefully (keep a list of scanned device nrs ?)

== Usage

=== Commands

renumid index - Create a filesystem index of impacted paths using a map
renumid status - Show a status report of impacted paths and affected UIDs/GIDs
renumid renumber - Renumber the impacted paths according to the provided map
renumid restore - Restore the original situation using the filesystem index

=== General options
-f - the index file to create/use
(default: renumid--

=== Status options
-u - list the files for specific UIDs
-g - list the files for specific GIDs

=== Renumber options
-t - test the changes, without actually changing

=== Restore options
No specific options.

=== Examples

renumid index -f renumid-20151126-1912.idx -m mapping.yaml /home /usr
renumid status -f renumid-20151126-1912.idx
renumid renumber -f renumid-20151126-1912.idx
renumid restore -f renumid-20151126-1912.idx

== File format

=== Map
The map is created as YAML or JSON file.

Below is an example file showing how it can be used:

----
---
# The below UIDs/GIDs are examples to use on a RHEL /dev and /tmp file system

# These maps would match for all hosts
uidmap:
10: 10010 # uucp
42: 10042 # gdm
gidmap:
5: 10005 # tty
16: 10016 # oprofile
39: 10039 # video

# But you can define per-host maps, or use wildcards on FQDN
moria*:
uidmap:
48: 10048 # apache
69: 10069 # vcsa
gidmap:
42: 10042 # gdm
69: 10069 # vcsa
484: 10484 # tmux
505: 10505 # vboxusers

lisse.gent.wieers.com:
uidmap:
8: 10008 # mail
gidmap:
8: 10008 # mail
----

=== File system index
The file system index will be a pickle file with a specific renumid data structure.

Or could become something more efficient (storage and speed-wise).

The file system index stores the following information:

- A version number of the file format (to detect incompatible files)
- A set of excluded devices (so it is clear what the scope is)
- Statistics (time, duration, paths scanned, ...)
- The time it took to create the file system index
- The scope (parents) that require changes
- The actual mapping used for the index
- The list of paths and original uids/gids