{"id":15012757,"url":"https://github.com/microsoft/recursiveextractor","last_synced_at":"2025-05-15T06:06:35.862Z","repository":{"id":39551650,"uuid":"284071455","full_name":"microsoft/RecursiveExtractor","owner":"microsoft","description":"RecursiveExtractor is a .NET Standard 2.0 archive extraction Library, and Command Line Tool which can process 7zip, ar, bzip2, deb, gzip, iso, rar, tar, vhd, vhdx, vmdk, wim, xzip, and zip archives and any nested combination of the supported formats.","archived":false,"fork":false,"pushed_at":"2025-04-14T18:54:32.000Z","size":201173,"stargazers_count":200,"open_issues_count":20,"forks_count":33,"subscribers_count":11,"default_branch":"main","last_synced_at":"2025-05-15T00:08:39.614Z","etag":null,"topics":["archive","disc-image","extractor","nuget","recursion"],"latest_commit_sha":null,"homepage":"","language":"C#","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/microsoft.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-07-31T15:41:35.000Z","updated_at":"2025-05-10T16:48:47.000Z","dependencies_parsed_at":"2023-02-11T23:15:39.354Z","dependency_job_id":"c3a1f907-7c20-479d-9fbd-74f45776b170","html_url":"https://github.com/microsoft/RecursiveExtractor","commit_stats":{"total_commits":144,"total_committers":11,"mean_commits":"13.090909090909092","dds":0.2152777777777778,"last_synced_commit":"abc7d875fff6fb3205ca09a3d51799cf70880ebf"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FRecursiveExtractor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FRecursiveExtractor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FRecursiveExtractor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FRecursiveExtractor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/microsoft","download_url":"https://codeload.github.com/microsoft/RecursiveExtractor/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254283340,"owners_count":22045140,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["archive","disc-image","extractor","nuget","recursion"],"created_at":"2024-09-24T19:43:10.863Z","updated_at":"2025-05-15T06:06:30.853Z","avatar_url":"https://github.com/microsoft.png","language":"C#","funding_links":[],"categories":[],"sub_categories":[],"readme":"# About\n![CodeQL](https://github.com/microsoft/RecursiveExtractor/workflows/CodeQL/badge.svg) ![Nuget](https://img.shields.io/nuget/v/Microsoft.CST.RecursiveExtractor?link=https://www.nuget.org/packages/Microsoft.CST.RecursiveExtractor/\u0026link=https://www.nuget.org/packages/Microsoft.CST.RecursiveExtractor/) ![Nuget](https://img.shields.io/nuget/dt/Microsoft.CST.RecursiveExtractor?link=https://www.nuget.org/packages/Microsoft.CST.RecursiveExtractor/\u0026link=https://www.nuget.org/packages/Microsoft.CST.RecursiveExtractor/)\n\nRecursive Extractor is a Cross-Platform [.NET Standard 2.0 Library](#library) and [Command Line Program](#cli) for parsing archive files and disk images, including nested archives and disk images.\n\n# Supported File Types\n| | | |\n|-|-|-|\n| 7zip+ | ar    | bzip2 |\n| deb   | dmg** | gzip  | \n| iso   | rar^  | tar   | \n| vhd   | vhdx  | vmdk  | \n| wim*  | xzip  | zip+  |\n\n\u003cdetails\u003e\n\u003csummary\u003eDetails\u003c/summary\u003e\n\u003cbr/\u003e\n* Windows only\u003cbr/\u003e\n+ Encryption Supported\u003cbr/\u003e\n^ Encryption supported for Rar version 4 only\u003cbr/\u003e\n** Limited support. Unencrypted HFS+ volumes with certain compression schemes.\n\u003c/details\u003e\n\n# Variants\n\n## Command Line\n### Installing\n1. Ensure you have the latest [.NET SDK](https://dotnet.microsoft.com/download).\n2. Run `dotnet tool install -g Microsoft.CST.RecursiveExtractor.Cli`\n\nThis adds `RecursiveExtractor` to your path so you can run it directly from your shell.\n\n### Running\nBasic usage is: `RecursiveExtractor --input archive.ext --output outputDirectory`\n\n\u003cdetails\u003e\n\u003csummary\u003eDetailed Usage\u003c/summary\u003e\n\u003cbr/\u003e\n\u003cul\u003e\n    \u003cli\u003e\u003ci\u003einput\u003c/i\u003e: The path to the Archive to extract.\u003c/li\u003e\n    \u003cli\u003e\u003ci\u003eoutput\u003c/i\u003e: The path a directory to extract into.\u003c/li\u003e\n    \u003cli\u003e\u003ci\u003epasswords\u003c/i\u003e: A comma separated list of passwords to use for archives.\u003c/li\u003e\n    \u003cli\u003e\u003ci\u003eallow-globs\u003c/i\u003e: A comma separated list of glob patterns to require each extracted file match.\u003c/li\u003e\n    \u003cli\u003e\u003ci\u003edeny-globs\u003c/i\u003e: A comma separated list of glob patterns to require each extracted file not match.\u003c/li\u003e\n    \u003cli\u003e\u003ci\u003eraw-extensions\u003c/i\u003e: A comma separated list of file extensions to not recurse into.\u003c/li\u003e\n    \u003cli\u003e\u003ci\u003eno-recursion\u003c/i\u003e: Don't recurse into sub-archives.\u003c/li\u003e\n    \u003cli\u003e\u003ci\u003esingle-thread\u003c/i\u003e: Don't attempt to parallelize extraction.\u003c/li\u003e\n    \u003cli\u003e\u003ci\u003eprintnames\u003c/i\u003e: Output the name of each file extracted.\u003c/li\u003e\n    \n\u003c/ul\u003e\n\nFor example, to extract only \".cs\" files:\n```\nRecursiveExtractor --input archive.ext --output outputDirectory --allow-globs **/*.cs\n```\n\nRun `RecursiveExtractor --help` for more details.\n\u003c/details\u003e\n\n## .NET Standard Library\nRecursive Extractor is available on NuGet as [Microsoft.CST.RecursiveExtractor](https://www.nuget.org/packages/Microsoft.CST.RecursiveExtractor/). Recursive Extractor targets netstandard2.0+ and the latest .NET, currently .NET 6.0, .NET 7.0 and .NET 8.0.\n\n### Usage\n\nThe most basic usage is to enumerate through all the files in the archive provided and do something with their contents as a Stream.\n\n```csharp\nusing Microsoft.CST.RecursiveExtractor;\n\nvar path = \"path/to/file\";\nvar extractor = new Extractor();\nforeach(var file in extractor.Extract(path))\n{\n    doSomething(file.Content); //Do Something with the file contents (a Stream)\n}\n```\n\n\u003cdetails\u003e\n\u003csummary\u003eExtracting to Disk\u003c/summary\u003e\n\u003cbr/\u003e\nThis code adapted from the Cli extracts the contents of given archive located at `options.Input` to a directory located at `options.Output`, including extracting failed archives as themselves.\n\n```csharp\nusing Microsoft.CST.RecursiveExtractor;\n\nvar extractor = new Extractor();\nvar extractorOptions = new ExtractorOptions()\n{\n    ExtractSelfOnFail = true,\n};\nextractor.ExtractToDirectory(options.Output, options.Input, extractorOptions);\n```\n\u003c/details\u003e\n\u003cdetails\u003e\n\u003csummary\u003eAsync Usage\u003c/summary\u003e\n\u003cbr/\u003e\nThis example of using the async API prints out all the file names found from the archive located at the path.\n\n```csharp\nvar path = \"/Path/To/Your/Archive\"\nvar extractor = new Extractor();\ntry {\n    IEnumerable\u003cFileEntry\u003e results = extractor.ExtractFileAsync(path);\n    await foreach(var found in results)\n    {\n        Console.WriteLine(found.FullPath);\n    }\n}\ncatch(OverflowException)\n{\n    // This means Recursive Extractor has detected a Quine or Zip Bomb\n}\n```\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eThe FileEntry Object\u003c/summary\u003e\n\u003cbr/\u003e\nThe Extractor returns `FileEntry` objects.  These objects contain a `Content` Stream of the file contents.\n\n```csharp\npublic Stream Content { get; }\npublic string FullPath { get; }\npublic string Name { get; }\npublic FileEntry? Parent { get; }\npublic string? ParentPath { get; }\npublic DateTime CreateTime { get; }\npublic DateTime ModifyTime { get; }\npublic DateTime AccessTime { get; }\n```\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eExtracting Encrypted Archives\u003c/summary\u003e\n\u003cbr/\u003e\nYou can provide passwords to use to decrypt archives, paired with a Regular Expression that will operate against the Name of the Archive to determine on which archives to try the passwords in each List.\n\n```csharp\nvar path = \"/Path/To/Your/Archive\"\nvar directory\nvar extractor = new Extractor();\ntry {\n    IEnumerable\u003cFileEntry\u003e results = extractor.ExtractFile(path, new ExtractorOptions()\n    {\n        Passwords = new Dictionary\u003cRegex, List\u003cstring\u003e\u003e()\n        {\n            { new Regex(\"\\.zip\"), new List\u003cstring\u003e(){ \"PasswordForZipFiles\" } },\n            { new Regex(\"\\.7z\"), new List\u003cstring\u003e(){ \"PasswordFor7zFiles\" } },\n            { new Regex(\".*\"), new List\u003cstring\u003e(){ \"PasswordForAllFiles\" } }\n\n        }\n    });\n    foreach(var found in results)\n    {\n        Console.WriteLine(found.FullPath);\n    }\n}\ncatch(OverflowException)\n{\n    // This means Recursive Extractor has detected a Quine or Zip Bomb\n}\n```\n\u003c/details\u003e\n\n## Exceptions\nRecursiveExtractor protects against [ZipSlip](https://snyk.io/research/zip-slip-vulnerability), [Quines, and Zip Bombs](https://en.wikipedia.org/wiki/Zip_bomb).\nCalls to Extract will throw an `OverflowException` when a Quine or Zip bomb is detected and a `TimeOutException` if `EnableTiming` is set and the specified time period has elapsed before completion.\n\nOtherwise, invalid files found while crawling will emit a logger message and be skipped.  You can also enable `ExtractSelfOnFail` to return the original archive file on an extraction failure.\n\n## Notes on Enumeration\n\n### Multiple Enumeration\nYou should not iterate the Enumeration returned from the `Extract` and `ExtractAsync` interfaces multiple times, if you need to do so, convert the Enumeration to an in memory collection first.\n\n### Parallel Enumeration\nIf you want to enumerate the output with parallelization you should use a batching mechanism, for example:\n\n```csharp\nvar extractedEnumeration = Extract(fileEntry, opts);\nusing var enumerator = extractedEnumeration.GetEnumerator();\nConcurrentBag\u003cFileEntry\u003e entryBatch = new();\nbool moreAvailable = enumerator.MoveNext();\nwhile (moreAvailable)\n{\n    entryBatch = new();\n    for (int i = 0; i \u003c BatchSize; i++)\n    {\n        entryBatch.Add(enumerator.Current);\n        moreAvailable = enumerator.MoveNext();\n        if (!moreAvailable)\n        {\n            break;\n        }\n    }\n\n    if (entryBatch.Count == 0)\n    {\n        break;\n    }\n\n    // Run your parallel processing on the batch\n    Parallel.ForEach(entryBatch, new ParallelOptions() { CancellationToken = cts.Token }, entry =\u003e\n    {\n        // Do something with each FileEntry\n    }\n}\n```\n\n### Disposing During Enumeration\nIf you are working with a very large archive or in particularly constrained environment you can reduce memory and file handle usage for the Content streams in each FileEntry by disposing as you iterate.\n\n```csharp\nvar results = extractor.Extract(path);\nforeach(var file in results)\n{\n    using var theStream = file.Content;\n    // Do something with the stream.\n    _ = theStream.ReadByte();\n// The stream is disposed here by the using statement\n} \n```\n\n# Feedback\n\nIf you have any issues or feature requests (for example, supporting other formats) you can open a new [Issue](https://github.com/microsoft/RecursiveExtractor/issues/new).  \n\nIf you are having trouble parsing a specific archive of one of the supported formats, it is helpful if you can include an sample archive with your report that demonstrates the issue.\n\n# Dependencies\n\nRecursive Extractor aims to provide a unified interface to extract arbitrary archives and relies on a number of libraries to parse the archives.\n\n* [SharpZipLib](https://github.com/icsharpcode/SharpZipLib)\n* [SharpCompress](https://github.com/adamhathcock/sharpcompress)\n* [LTRData/DiscUtils](https://github.com/LTRData/discutils)\n\n# Contributing\n\nThis project welcomes contributions and suggestions.  Most contributions require you to agree to a\nContributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us\nthe rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.\n\nWhen you submit a pull request, a CLA bot will automatically determine whether you need to provide\na CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions\nprovided by the bot. You will only need to do this once across all repos using our CLA.\n\nThis project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).\nFor more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or\ncontact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrosoft%2Frecursiveextractor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmicrosoft%2Frecursiveextractor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrosoft%2Frecursiveextractor/lists"}