{"id":13449029,"url":"https://github.com/Blosc/bcolz","last_synced_at":"2025-03-22T18:32:21.734Z","repository":{"id":1018839,"uuid":"846444","full_name":"Blosc/bcolz","owner":"Blosc","description":"A columnar data container that can be compressed.","archived":true,"fork":false,"pushed_at":"2022-10-27T17:33:11.000Z","size":3627,"stargazers_count":959,"open_issues_count":132,"forks_count":150,"subscribers_count":62,"default_branch":"master","last_synced_at":"2024-10-29T17:28:28.659Z","etag":null,"topics":["column-store","compressed-data","python"],"latest_commit_sha":null,"homepage":"http://bcolz.blosc.org","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Blosc.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSES/APPVEYOR-DEMO.txt","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2010-08-18T15:27:02.000Z","updated_at":"2024-09-12T08:44:13.000Z","dependencies_parsed_at":"2022-08-16T11:50:22.533Z","dependency_job_id":null,"html_url":"https://github.com/Blosc/bcolz","commit_stats":null,"previous_names":["francescalted/carray"],"tags_count":29,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Blosc%2Fbcolz","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Blosc%2Fbcolz/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Blosc%2Fbcolz/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Blosc%2Fbcolz/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Blosc","download_url":"https://codeload.github.com/Blosc/bcolz/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244788554,"owners_count":20510391,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["column-store","compressed-data","python"],"created_at":"2024-07-31T06:00:28.947Z","updated_at":"2025-03-22T18:32:16.713Z","avatar_url":"https://github.com/Blosc.png","language":"C","readme":"===========================\nUnmaintained Package Notice\n===========================\n\nUnfortunately, and due to lack of resources, the Blosc Development Team is unable to maintain this package anymore.\nDuring the last 10 years we managed to find resources (even if in a quite irregular way) to develop what we think is a\nnice package for handling compressed data containers, especially tabular data.  Regrettably, for the last years we did\nnot found sponsorship enough to continue the maintenance of this package.\n\nFor those that depend on bcolz, a fork is welcome and we will try our best to provide advice for possible new\nmaintainers.  Indeed, if we manage to get some decent grants via Blosc (https://blosc.org/pages/donate/), our umbrella\nproject, we would be glad to reconsider the maintenance of bcolz.  But again, we would be very open and supportive\nfor this project to get a new maintenance team.\n\nFinally, thanks to all the people that used and contributed in one way or another to bcolz; it has been a nice ride!\nLet's hope it still would have a bright future ahead.\n\nThe Blosc Development Team\n\nbcolz: columnar and compressed data containers\n==============================================\n\n.. image:: https://badges.gitter.im/Blosc/bcolz.svg\n   :alt: Join the chat at https://gitter.im/Blosc/bcolz\n   :target: https://gitter.im/Blosc/bcolz?utm_source=badge\u0026utm_medium=badge\u0026utm_campaign=pr-badge\u0026utm_content=badge\n\n:Version: |version|\n:Travis CI: |travis|\n:Appveyor: |appveyor|\n:Coveralls: |coveralls|\n:And...: |powered|\n\n.. |version| image:: https://img.shields.io/pypi/v/bcolz.png\n        :target: https://pypi.python.org/pypi/bcolz\n\n.. |travis| image:: https://img.shields.io/travis/Blosc/bcolz.png\n        :target: https://travis-ci.org/Blosc/bcolz\n\n.. |appveyor| image:: https://img.shields.io/appveyor/ci/FrancescAlted/bcolz.png\n        :target: https://ci.appveyor.com/project/FrancescAlted/bcolz/branch/master\n\n.. |powered| image:: http://b.repl.ca/v1/Powered--By-Blosc-blue.png\n        :target: http://blosc.org\n\n.. |coveralls| image:: https://coveralls.io/repos/Blosc/bcolz/badge.png\n        :target: https://coveralls.io/r/Blosc/bcolz\n\n\n.. image:: docs/bcolz.png\n\nbcolz provides columnar, chunked data containers that can be\ncompressed either in-memory and on-disk.  Column storage allows for\nefficiently querying tables, as well as for cheap column addition and\nremoval.  It is based on `NumPy \u003chttp://www.numpy.org\u003e`_, and uses it\nas the standard data container to communicate with bcolz objects, but\nit also comes with support for import/export facilities to/from\n`HDF5/PyTables tables \u003chttp://www.pytables.org\u003e`_ and `pandas\ndataframes \u003chttp://pandas.pydata.org\u003e`_.\n\nbcolz objects are compressed by default not only for reducing\nmemory/disk storage, but also to improve I/O speed.  The compression\nprocess is carried out internally by `Blosc \u003chttp://blosc.org\u003e`_, a\nhigh-performance, multithreaded meta-compressor that is optimized for\nbinary data (although it works with text data just fine too).\n\nbcolz can also use `numexpr \u003chttps://github.com/pydata/numexpr\u003e`_\ninternally (it does that by default if it detects numexpr installed)\nor `dask \u003chttps://github.com/dask/dask\u003e`_ so as to accelerate many\nvector and query operations (although it can use pure NumPy for doing\nso too).  numexpr/dask can optimize the memory usage and use\nmultithreading for doing the computations, so it is blazing fast.\nThis, in combination with carray/ctable disk-based, compressed\ncontainers, can be used for performing out-of-core computations\nefficiently, but most importantly *transparently*.\n\nJust to whet your appetite, here it is an example with real data, where\nbcolz is already fulfilling the promise of accelerating memory I/O by\nusing compression:\n\nhttp://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb\n\n\nRationale\n---------\n\nBy using compression, you can deal with more data using the same\namount of memory, which is very good on itself.  But in case you are\nwondering about the price to pay in terms of performance, you should\nknow that nowadays memory access is the most common bottleneck in many\ncomputational scenarios, and that CPUs spend most of its time waiting\nfor data.  Hence, having data compressed in memory can reduce the\nstress of the memory subsystem as well.\n\nFurthermore, columnar means that the tabular datasets are stored\ncolumn-wise order, and this turns out to offer better opportunities to\nimprove compression ratio.  This is because data tends to expose more\nsimilarity in elements that sit in the same column rather than those\nin the same row, so compressors generally do a much better job when\ndata is aligned in such column-wise order.  In addition, when you have\nto deal with tables with a large number of columns and your operations\nonly involve some of them, a columnar-wise storage tends to be much\nmore effective because minimizes the amount of data that travels to\nCPU caches.\n\nSo, the ultimate goal for bcolz is not only reducing the memory needs\nof large arrays/tables, but also making bcolz operations to go faster\nthan using a traditional data container like those in NumPy or Pandas.\nThat is actually already the case in some real-life scenarios (see the\nnotebook above) but that will become pretty more noticeable in\ncombination with forthcoming, faster CPUs integrating more cores and\nwider vector units.\n\nRequisites\n----------\n\n- Python \u003e= 2.7 and \u003e= 3.5\n- NumPy \u003e= 1.8\n- Cython \u003e= 0.22 (just for compiling the beast)\n- C-Blosc \u003e= 1.8.0 (optional, as the internal Blosc will be used by default)\n\nOptional:\n\n- numexpr \u003e= 2.5.2\n- dask \u003e= 0.9.0\n- pandas\n- tables (pytables)\n\nBuilding\n--------\n\nThere are different ways to compile bcolz, depending if you want to\nlink with an already installed Blosc library or not.\n\nCompiling with an installed Blosc library (recommended)\n.......................................................\n\nPython and Blosc-powered extensions have a difficult relationship when\ncompiled using GCC, so this is why using an external C-Blosc library is\nrecommended for maximum performance (for details, see\nhttps://github.com/Blosc/python-blosc/issues/110).\n\nGo to https://github.com/Blosc/c-blosc/releases and download and\ninstall the C-Blosc library.  Then, you can tell bcolz where is the\nC-Blosc library in a couple of ways:\n\nUsing an environment variable:\n\n.. code-block:: console\n\n    $ BLOSC_DIR=/usr/local     (or \"set BLOSC_DIR=\\blosc\" on Win)\n    $ export BLOSC_DIR         (not needed on Win)\n    $ python setup.py build_ext --inplace\n\nUsing a flag:\n\n.. code-block:: console\n\n    $ python setup.py build_ext --inplace --blosc=/usr/local\n\nCompiling without an installed Blosc library\n............................................\n\nbcolz also comes with the Blosc sources with it so, assuming that you\nhave a C++ compiler installed, do:\n\n.. code-block:: console\n\n    $ python setup.py build_ext --inplace\n\nThat's all.  You can proceed with testing section now.\n\nNote: The requirement for the C++ compiler is just for the Snappy\ndependency.  The rest of the other components of Blosc are pure C\n(including the LZ4 and Zlib libraries).\n\nTesting\n-------\n\nAfter compiling, you can quickly check that the package is sane by\nrunning::\n\n  $ PYTHONPATH=.   (or \"set PYTHONPATH=.\" on Windows)\n  $ export PYTHONPATH    (not needed on Windows)\n  $ python -c\"import bcolz; bcolz.test()\"  # add `heavy=True` if desired\n\nInstalling\n----------\n\nInstall it as a typical Python package::\n\n  $ pip install -U .\n\nOptionally Install the additional dependencies::\n\n  $ pip install .[optional]\n\nDocumentation\n-------------\n\nYou can find the online manual at:\n\nhttp://bcolz.blosc.org\n\nbut of course, you can always access docstrings from the console\n(i.e. ``help(bcolz.ctable)``).\n\nAlso, you may want to look at the bench/ directory for some examples\nof use.\n\nResources\n---------\n\nVisit the main bcolz site repository at:\nhttp://github.com/Blosc/bcolz\n\nHome of Blosc compressor:\nhttp://blosc.org\n\nUser's mail list:\nhttp://groups.google.com/group/bcolz (bcolz@googlegroups.com)\n\nAn `introductory talk (20 min)\n\u003chttps://www.youtube.com/watch?v=-lKV4zC1gss\u003e`_ about bcolz at\nEuroPython 2014.  `Slides here\n\u003chttp://blosc.org/docs/bcolz-EuroPython-2014.pdf\u003e`_.\n\nLicense\n-------\n\nPlease see ``BCOLZ.txt`` in ``LICENSES/`` directory.\n\nShare your experience\n---------------------\n\nLet us know of any bugs, suggestions, gripes, kudos, etc. you may\nhave.\n\n**Enjoy Data!**\n","funding_links":[],"categories":["C","数据容器和结构","数据存储"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FBlosc%2Fbcolz","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FBlosc%2Fbcolz","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FBlosc%2Fbcolz/lists"}