{"id":13344139,"url":"https://github.com/LeversonCarlos/HttpZipStream","last_synced_at":"2025-03-12T06:31:15.513Z","repository":{"id":103959251,"uuid":"160963705","full_name":"LeversonCarlos/HttpZipStream","owner":"LeversonCarlos","description":"Library to extract specific entries from a remote http zip archive without downloading the entire file","archived":false,"fork":false,"pushed_at":"2020-03-23T20:20:41.000Z","size":57,"stargazers_count":10,"open_issues_count":1,"forks_count":6,"subscribers_count":0,"default_branch":"master","last_synced_at":"2024-10-24T16:50:57.989Z","etag":null,"topics":["http","remote-zip","zip","zip-stream"],"latest_commit_sha":null,"homepage":"","language":"C#","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LeversonCarlos.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-12-08T17:43:35.000Z","updated_at":"2024-08-29T22:16:06.000Z","dependencies_parsed_at":null,"dependency_job_id":"6de9d5ef-e428-4ba8-97de-6fb365bd4810","html_url":"https://github.com/LeversonCarlos/HttpZipStream","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LeversonCarlos%2FHttpZipStream","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LeversonCarlos%2FHttpZipStream/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LeversonCarlos%2FHttpZipStream/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LeversonCarlos%2FHttpZipStream/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LeversonCarlos","download_url":"https://codeload.github.com/LeversonCarlos/HttpZipStream/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243171652,"owners_count":20247878,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["http","remote-zip","zip","zip-stream"],"created_at":"2024-07-29T19:32:27.855Z","updated_at":"2025-03-12T06:31:15.507Z","avatar_url":"https://github.com/LeversonCarlos.png","language":"C#","funding_links":[],"categories":[],"sub_categories":[],"readme":"# HttpZipStream \nA simple library to extract specific entries from a remote http zip archive without the need to download the entire file.  \n![Release](https://github.com/LeversonCarlos/HttpZipStream/workflows/Release/badge.svg)\n\n## Understanding the magic\nWhen opening a zip archive using a remote url, the zip library will need to download the entire file to be able to read its contents. So if you had a 90 mega zipfile and wanted only a 100 kbyte file from within it, you will end doing the entire 90 mega download anyway.  \nThe [zip format](https://en.wikipedia.org/wiki/Zip_(file_format)) defines a directory pointing to all it's inner entries. Containing properties like names, starting offset, size, and other stuff. And this directory is pretty small, just a few bytes placed on the very end of the archive. So, if we could just read this directory, we could know where, on the entire zip archive, is stored the file we want.  \nAnd if we could just request from the remote url, just that part of the content, we could get a smaller download, with just what we want and need.  \nTurns out that the http protocol supports a technique called [byte serving](https://en.wikipedia.org/wiki/Byte_serving). That states that we could define some header parameters on the http request specifying the byte ranges we want for that request.  \nWith that in mind, what we do it's pretty simple. We make a first http request asking just for the http headers (not its content) and from that we know the content size. Then we make a small range requests at the end of the file, extracting all the directory info. Then, for the entries we want, we make requests for just that ranges. Apply the deflate algoritm and it's done.  \nWith this approach, we end doing more http requests, so its only good to use if the desired content represents a small part of the entire zip archive.  \nMore on this, can be found on my [medium](https://medium.com/@lcjohnny/httpzipstream-extracting-single-entry-from-remote-zip-without-downloading-the-entire-file-7a0f3d24a6fc) article.\n\n## Install instructions\nYou can add the library to your project using the [nuget](https://www.nuget.org/packages/HttpZipStream) package: \n```\ndotnet add package HttpZipStream\n```\n\n## Sample of how to use the library\nExtracting just the first entry from a remote zip archive: \n```csharp \n   var httpUrl = \"http://MyRemoteFile.zip\"; \n   using (var zipStream = new System.IO.Compression.HttpZipStream(httpUrl)) \n   { \n      var entryList = await zipStream.GetEntriesAsync(); \n      var entry = entryList.FirstOrDefault(); \n      byte[] entryContent = await zipStream.ExtractAsync(entry);\n      /* do what you want with the entry content */\n   }\n``` \n\n## Build using\n* [DotNET Core](https://dotnet.github.io)\n* [xUnit](https://xunit.github.io)\n* [vsCode](https://github.com/Microsoft/vscode) \n* [ZipFormat](https://en.wikipedia.org/wiki/Zip_(file_format))\n\n## Changelog\n### v0.1.*\n- Some minor documentation adjust.  \n- Proper name convention for async methods.  \n- Preparing projects to be build, packed and deploy by the server.  \n### v0.2.*\n- Implementing a ExtractAsync overload that results just the entry content byte array.  \n- BUG #13: Some entries are not deflate correctly.  \n### v0.3.*\n- Upgrading dotnet version to 3.1\n\n\n## Authors\n* [Leverson Carlos](https://github.com/LeversonCarlos) \n\n## License\nMIT License - see the [LICENSE](LICENSE) file for details\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FLeversonCarlos%2FHttpZipStream","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FLeversonCarlos%2FHttpZipStream","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FLeversonCarlos%2FHttpZipStream/lists"}