{"id":13878930,"url":"https://github.com/martijn/xsv","last_synced_at":"2025-05-15T03:07:24.459Z","repository":{"id":37992577,"uuid":"241318574","full_name":"martijn/xsv","owner":"martijn","description":"High performance, lightweight .xlsx parser for Ruby that provides nothing a CSV parser wouldn't","archived":false,"fork":false,"pushed_at":"2025-02-20T08:00:23.000Z","size":713,"stargazers_count":204,"open_issues_count":1,"forks_count":19,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-05-07T23:54:14.565Z","etag":null,"topics":["excel","ruby","xlsx"],"latest_commit_sha":null,"homepage":"https://storck.io/posts/announcing-xsv-1-0-0/","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/martijn.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-02-18T09:15:34.000Z","updated_at":"2025-03-09T19:05:28.000Z","dependencies_parsed_at":"2023-02-09T16:16:11.371Z","dependency_job_id":"9876dc46-ab43-4b01-9dba-61695bc62907","html_url":"https://github.com/martijn/xsv","commit_stats":{"total_commits":185,"total_committers":10,"mean_commits":18.5,"dds":0.08648648648648649,"last_synced_commit":"7f68eebbdf5b0f04517f5633820e55f889284b0c"},"previous_names":[],"tags_count":38,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/martijn%2Fxsv","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/martijn%2Fxsv/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/martijn%2Fxsv/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/martijn%2Fxsv/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/martijn","download_url":"https://codeload.github.com/martijn/xsv/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254264766,"owners_count":22041793,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["excel","ruby","xlsx"],"created_at":"2024-08-06T08:02:04.526Z","updated_at":"2025-05-15T03:07:19.428Z","avatar_url":"https://github.com/martijn.png","language":"Ruby","readme":"# Xsv .xlsx reader for Ruby\n\n[![Test badge](https://img.shields.io/github/actions/workflow/status/martijn/xsv/ruby.yml?branch=main)](https://github.com/martijn/xsv/actions/workflows/ruby.yml)\n[![Yard Docs badge](http://img.shields.io/badge/yard-docs-blue.svg)](https://rubydoc.info/github/martijn/xsv)\n[![Gem Version badge](https://badge.fury.io/rb/xsv.svg)](https://badge.fury.io/rb/xsv)\n\nXsv is a high performance, lightweight, pure Ruby parser for ISO/IEC 29500 Office Open XML spreadsheets\n(commonly known as Excel or .xlsx files). It strives to be minimal in the sense that it provides nothing a\nCSV reader wouldn't. This means it only deals with the minimal required formatting and cannot create or modify\ndocuments.\nXsv can handle very large Excel files with minimal resources thanks to a custom streaming XML parser that\nis optimized for the Excel file format.\n\nXsv is designed for worksheets with a single table of data, optionally\nwith a header row. It only casts values to basic Ruby types (integer, float,\ndate and time) and does not deal with most formatting or more advanced\nfunctionality. Xsv has been production-ready since the initial release.\n\nXsv stands for 'Excel Separated Values', because Excel just gets in the way.\n\n## Installation\n\nAdd this line to your application's Gemfile:\n\n```ruby\ngem 'xsv'\n```\n\nAnd then execute:\n\n    $ bundle\n\nOr install it yourself as:\n\n    $ gem install xsv\n\nXsv targets ruby \u003e= 2.7 and has a just single dependency, `rubyzip`. It has been\ntested successfully with MRI, JRuby, and TruffleRuby. It has no native extensions\nand is designed to be thread-safe.\n\n## Usage\n\n### Array and hash mode\n\nXsv has two modes of operation. By default, it returns an array for\neach row in the sheet:\n\n```ruby\nworkbook = Xsv.open(\"sheet.xlsx\") # =\u003e #\u003cXsv::Workbook sheets=1\u003e\n\n# Access worksheet by index, 0 is the first sheet\nsheet = workbook[0]\n# or, access worksheet by name\nsheet = workbook[\"Sheet1\"]\n\n# Iterate over rows\nsheet.each do |row|\n  row # =\u003e [\"header1\", \"header2\"]\nend\n\n# Access row by index (zero-based)\nsheet[1] # =\u003e [\"value1\", \"value2\"]\n```\n\nAlternatively, it can load the headers from the first row and return a hash\nfor every row by calling `parse_headers!` on the sheet or setting the `parse_headers`\noption on open:\n\n```ruby\n# Parse headers for all sheets on open\n\nworkbook = Xsv.open(\"sheet.xlsx\", parse_headers: true)\n\n# Get the first row from the first sheet\nworkbook.first.first # =\u003e {\"header1\" =\u003e \"value1\", \"header2\" =\u003e \"value2\"}\n\n# Manually parse headers for a single sheet\n\nworkbook = Xsv.open(\"sheet.xlsx\")\n\nsheet = workbook.first\n\nsheet.first # =\u003e [\"header1\", \"header2\"]\n\nsheet.parse_headers!\n\nsheet.first # =\u003e {\"header1\" =\u003e \"value1\", \"header2\" =\u003e \"value2\"}\n```\n\nXsv will raise `Xsv::DuplicateHeaders` if it detects duplicate values in the header row when calling\n`#parse_headers!` or when opening a workbook with `parse_headers: true` to ensure hash keys are unique.\n\n`Xsv::Sheet` implements `Enumerable` so along with `#each`\nyou can call methods like `#first`, `#filter`/`#select`, and `#map` on it. Likewise these methods can\nbe used on `Xsv::Workbook` to iterate over sheets, for example:\n\n```ruby\n# Get the name of all the sheets in a workbook\nsheet_names = @workbook.map(\u0026:name)\n```\n\n### Opening a string or buffer instead of filename\n\n`Xsv.open` accepts a filename, or an IO or String containing a workbook. Optionally, you can pass a block\nwhich will be called with the workbook as parameter, like `File#open`. Example of this together:\n\n```ruby\n# Use an existing IO-like object as source\n\nfile = File.open(\"sheet.xlsx\")\n\nXsv.open(file) do |workbook|\n  puts workbook.inspect\nend\n\n# or even:\n\nXsv.open(file.read) do |workbook|\n  puts workbook.inspect\nend\n```\n\nPrior to Xsv 1.1.0, `Xsv::Workbook.open` was used instead of `Xsv.open`. The parameters are identical and\nthe former is maintained for backwards compatibility.\n\n### Assumptions\n\nSince Xsv treats worksheets like csv files it makes certain assumptions about your\nsheet:\n\n- In array mode, your data starts on the first row\n\n- In hash mode the first row of the sheet contains headers, followed by rows of data\n\nIf your data or headers do not start on the first row of the sheet you can\ntell Xsv to skip a number of rows:\n\n```ruby\nsheet = workbook[0]\nsheet.row_skip = 1\n```\n\nAll operations will honour this offset, making the skipped rows unreachable.\n\n## Development\n\nAfter checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can\nalso run `bin/console` for an interactive prompt that will allow you to experiment.\n\nTo install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the\nversion number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version,\npush git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).\n\n## Performance and Benchmarks\n\nXsv is faster and more memory efficient than other gems because of two things: it only _reads values_ from Excel files\nand it's based on a SAX-based parser instead of a DOM-based parser. If you want to read some background on this, check\nout my blog post on\n[Efficient XML parsing in Ruby](https://storck.io/posts/efficient-xml-parsing-in-ruby/).\n\nJamie Schembri did a shootout of Xsv against various other Excel reading gems comparing parsing speed, memory usage, and\nallocations.\nCheck our his blog post: [Faster Excel parsing in Ruby](https://blog.schembri.me/post/faster-excel-parsing-in-ruby/).\n\nPre-1.0, Xsv used a native extension for XML parsing, which was faster than the native Ruby one (on MRI). But even\nthe current native Ruby parser generally outperforms the competition. For maximum performance, it is recommended to\nenable YJIT.\n\n## Contributing\n\nBug reports and pull requests are welcome on GitHub at https://github.com/martijn/xsv.\nPlease provide an .xlsx file with a minimum breaking example that is acceptable\nfor inclusion in the source code repository.\n\n## License\n\nCopyright © Martijn Storck and Xsv contributors\n\nThe gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).\n","funding_links":[],"categories":["Ruby"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmartijn%2Fxsv","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmartijn%2Fxsv","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmartijn%2Fxsv/lists"}