{"id":13513255,"url":"https://github.com/gecko984/supervenn","last_synced_at":"2025-03-31T01:32:17.456Z","repository":{"id":49204923,"uuid":"239858555","full_name":"gecko984/supervenn","owner":"gecko984","description":"supervenn: precise and easy-to-read multiple sets visualization in Python","archived":false,"fork":false,"pushed_at":"2024-11-09T20:46:40.000Z","size":120,"stargazers_count":337,"open_issues_count":15,"forks_count":23,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-03-12T09:22:29.604Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gecko984.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-02-11T20:32:27.000Z","updated_at":"2025-02-20T13:38:01.000Z","dependencies_parsed_at":"2024-11-01T16:30:31.008Z","dependency_job_id":"e5acac47-a929-436f-90c0-6860abcb2242","html_url":"https://github.com/gecko984/supervenn","commit_stats":{"total_commits":144,"total_committers":1,"mean_commits":144.0,"dds":0.0,"last_synced_commit":"2ee8ae95922c87d7de5085f04c63d2cfb65ba3d7"},"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gecko984%2Fsupervenn","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gecko984%2Fsupervenn/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gecko984%2Fsupervenn/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gecko984%2Fsupervenn/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gecko984","download_url":"https://codeload.github.com/gecko984/supervenn/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246403961,"owners_count":20771526,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T04:00:48.137Z","updated_at":"2025-03-31T01:32:17.424Z","avatar_url":"https://github.com/gecko984.png","language":"Python","funding_links":[],"categories":["Uncategorized","Python"],"sub_categories":["Uncategorized"],"readme":"\n\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4016732.svg)](https://doi.org/10.5281/zenodo.4016732)\n\n\n# supervenn: precise and easy-to-read multiple sets visualization in Python\n\n### What it is\n**supervenn** is a matplotlib-based tool for visualization of any number of intersecting sets. It supports Python\n`set`s as inputs natively, but there is a [simple workaround](#use-intersection-sizes-as-inputs-instead-of-sets) to use just intersection sizes.\n\nNote that despite its name, `supervenn` does not produce actual (Euler-)Venn diagrams.\n\nThe easiest way to understand how supervenn diagrams work, is to compare some simple examples to their Euler-Venn\ncounterparts. Top row is Euler-Venn diagrams made with [matplotlib-venn](https://github.com/konstantint/matplotlib-venn)\npackage, bottom row is supervenn diagrams:\n\n\u003cimg src=\"https://i.imgur.com/dJoNhYQ.png\" width=800\u003e\n\n### Installation\n`pip install supervenn`\n\n### Requirements\nPython 2.7 or 3.6+ with `numpy`, `matplotlib` and `pandas`.\n\n### Basic usage \nThe main entry point is the eponymous `supervenn` function. It takes a list of python `set`s as its first and only\nrequired argument and returns a `SupervennPlot` object.\n```python\nfrom supervenn import supervenn\nsets = [{1, 2, 3, 4}, {3, 4, 5}, {1, 6, 7, 8}]\nsupervenn(sets, side_plots=False)\n```\n\u003cimg src=\"https://i.imgur.com/aAOP6dq.png\" width=330\u003e\n\nEach row represents a set, the order from bottom to top is the same as in the `sets` list. Overlapping parts correspond\nto set intersections.\n\nThe numbers at the bottom show the sizes (cardinalities) of all intersections, which we will call **chunks**.\nThe sizes of sets and their intersections (chunks) are up to proportion, but the order of elements is not preserved,\ne.g. the leftmost chunk of size 3 is `{6, 7, 8}`.\n\nA combinatorial optimization algorithms is applied that rearranges the chunks (the columns of the\narray plotted) to minimize the number of parts the sets are broken into. In the example above each set is in one piece\n( no gaps in rows at all), but it's not always possible, even for three sets:\n\n```python\nsupervenn([{1, 2}, {2, 3}, {1, 3}], side_plots=False)\n```\n\n\u003cimg src=\"https://i.imgur.com/8aTSg2A.png\" width=\"330\"\u003e\n\nBy default, additional *side plots* are also displayed:\n\n```python\nsupervenn(sets)\n```\n\u003cimg src=\"https://i.imgur.com/9IhLBcK.png\" width=330\u003e\nHere, the numbers on the right are the set sizes (cardinalities), and numbers on the top show how many sets does this\nintersection make part of. The grey bars represent the same numbers visually.\n\nIf you need only one of the two side plots, use `side_plots='top'` or `side_plots='right'`\n\n### Features (how to)\n\n#### Use intersection sizes as inputs instead of sets\n(New in 0.5.0). Use the utility function `make_sets_from_chunk_sizes` to produce synthetic sets of integers from your intersection sizes.\nThen pass these sets to `supervenn()`: \n\n```python\nfrom supervenn import supervenn, make_sets_from_chunk_sizes\nsets, labels = make_sets_from_chunk_sizes(sizes_df)  # see below for the structure of sizes_df\nsupervenn(sets, labels)\n```\n\nIntersection sizes `sizes_df` should be a `pandas.DataFrame` with the following structure:\n\n- For `N` sets, it must have `N` boolean (or 0/1) columns and the last column must be integer, so `N+1` columns in total.\n- Each row represents a unique intersection (chunk) of the sets. The boolean value in column `set_x` indicate whether\nthis chunk lies within `set_x`. The integer value represents the size of the chunk.\n\nFor example, consider the following dataframe\n\n```\n   set_1  set_2  set_3  size\n0  False   True   True     1\n1   True  False  False     3\n2   True  False   True     2\n3   True   True  False     1\n```\n\nIt represents a configuration of three sets such that\n- [row 0] there is one element that lies in `set_2` and `set_3` but not in `set_1`,\n- [row 1] there are three elements that lie in `set_1` only and not in `set_2` or `set_3`,\n- etc two more rows.\n\n#### Add custom set annotations instead of `set_1`, `set_2` etc\nUse the `set_annotations` argument to pass a list of annotations. It should be in the same order as the sets. It is\nthe second positional argument.\n```python\nsets = [{1, 2, 3, 4}, {3, 4, 5}, {1, 6, 7, 8}]\nlabels = ['alice', 'bob', 'third party']\nsupervenn(sets, labels)\n```\n\u003cimg src=\"https://i.imgur.com/YlPKs7u.png\" width=330\u003e\n\n#### Change size and dpi of the plot\nCreate a new figure and plot into it:\n```python\nimport matplotlib.pyplot as plt\nplt.figure(figsize=(16, 8))\nsupervenn(sets)\n```\n\nThe `supervenn` function has `figsize` and `dpi` arguments, but they are **deprecated** and will be removed in a future\nversion. Please don't use them.\n\n#### Plot into an existing axis\nUse the `ax` argument:\n\n```python\nsupervenn(sets, ax=my_axis)\n```\n\n#### Access the figure and axes objects of the plot\nUse `.figure` and `axes`  attributes of the object returned by `supervenn()`. The `axes` attribute is\norganized as a dict with descriptive strings for keys: `main`, `top_side_plot`, `right_side_plot`, `unused`. \nIf `side_plots=False`, the dict has only one key `main`.\n\n#### Save the plot to an image file\n\n```python\nimport matplotlib.pyplot as plt\nsupervenn(sets)\nplt.savefig('myplot.png')\n```\n\n#### Use a different ordering of chunks (columns)\nUse the `chunks_ordering` argument. The following options are available:\n- `'minimize gaps'`: default, use an optimization algorithm to find an order of columns with fewer\ngaps in each row;\n- `'size'`: bigger chunks go first;\n- `'occurrence'`: chunks that are in more sets go first;\n- `'random'`: randomly shuffle the columns.\n\nTo reverse the order (e.g. you want smaller chunks to go first), pass `reverse_chunks_order=False` (by default\nit's `True`) \n\n#### Reorder the sets (rows) instead of keeping the order as passed into function\nUse the `sets_ordering` argument. The following options are available:\n- `None`: default - keep the order of sets as passed into function;\n- `'minimize gaps'`: use the same algorithm as for chunks to group similar sets closer together. The difference in the\nalgorithm is that now gaps are minimized in columns instead of rows, and they are weighted by the column widths\n(i.e. chunk sizes), as we want to minimize total gap width;\n- `'size'`: bigger sets go first;\n- `'chunk count'`: sets that contain most chunks go first;\n- `'random'`: randomly shuffle the rows.\n\nTo reverse the order (e.g. you want smaller sets to go first), pass `reverse_sets_order=False` (by default\nit's `True`) \n\n#### Inspect the chunks' contents\n`supervenn(sets, ...)` returns a `SupervennPlot` object, which has a `chunks` attribute.\nIt is a `dict` with `frozenset`s of set indices as keys, and chunks as values. For example, \n`my_supervenn_object.chunks[frozenset([0, 2])]` is the chunk with all the items that are in `sets[0]` and\n`sets[2]`, but not in any of the other sets.\n\nThere is also a `get_chunk(set_indices)` method that is slightly more convenient, because you\ncan pass a `list` or any other iterable of indices instead of a `frozenset`. For example:\n`my_supervenn_object.get_chunk([0, 2])`. \n\nIf you have a good idea of a more convenient method of chunks lookup, let me know and I'll\nimplement it as well.\n\n#### Make the plot prettier if sets and/or chunks are very different in size\nUse the `widths_minmax_ratio` argument, with a value between 0.01 and 1. Consider the following example\n```python\nsets = [set(range(200)), set(range(201)), set(range(203)), set(range(206))]\nsupervenn(sets, side_plots=False)\n```\n\u003cimg src=\"https://i.imgur.com/i05lgNU.png\" width=330\u003e\n\nAnnotations in the bottom left corner are unreadable.\n\nOne solution is to trade exact chunk proportionality for readability. This is done by making small chunks visually\nlarger. To be exact, a linear function is applied to the chunk sizes, with slope and intercept chosen so that the\nsmallest chunk size is exactly `widths_minmax_ratio` times the largest chunk size. If the ratio is already greater than\nthis value, the sizes are left unchanged. Setting `widths_minmax_ratio=1` will result in all chunks being displayed as\nsame size.\n\n```python\nsupervenn(sets, side_plots=False, widths_minmax_ratio=0.05)\n```\nThe image now looks clean, but chunks of size 1 to 3 look almost the same.\n\n\n\u003cimg src=\"https://i.imgur.com/cIp42uD.png\" width=330\u003e\n\n\n#### Avoid clutter in the X axis annotations\n- Use the `min_width_for_annotation` argument to hide annotations for chunks smaller than this value. \n```python\nsupervenn(sets, side_plots=False, min_width_for_annotation=100)\n```\n\u003cimg src=\"https://i.imgur.com/YdCmHtZ.png\" width=330\u003e\n\n- Pass `rotate_col_annotations=True` to print chunk sizes vertically.\n\n- There's also `col_annotations_ys_count` argument, but it is **deprecated** and will be removed in a future version.\n\n#### Change bars appearance in the main plot\nUse arguments `bar_height` (default `1`), `bar_alpha` (default `0.6`), `bar_align` (default `edge`)', `color_cycle` (\ndefault is current style's default palette). You can also use styles, for example:\n```python\nimport matplotlib.pyplot as plt\nwith plt.style.context('bmh'):\n    supervenn([{1,2,3}, {3,4}])\n```\n\u003cimg src=\"https://i.imgur.com/yEUChI4.png\" width=\"330\"\u003e\n\n\n#### Change side plots size and color\nUse `side_plot_width` (in inches, default 1) and `side_plot_color` (default `'tab:gray'`) arguments.\n\n#### Change axes labels from `SETS`, `ITEMS` to something else\nJust use `plt.xlabel` and `plt.ylabel` as usual.\n\n#### Change other parameters\nOther arguments can be found in the docstring to the function. \n\n### Algorithm used to minimize gaps\nIf there are are no more than 8 chunks, the optimal permutation is found with exhaustive search (you can increase this\nlimit up to 12 using the `max_bruteforce_size` argument). For greater chunk counts, a randomized quasi-greedy algorithm\nis applied. The description of the algorithm can be found in the docstring to `supervenn._algorithms` module.\n\n### Less trivial examples: \n\n#### Words with many meanings\n\n```python\nletters = {'a', 'r', 'c', 'i', 'z'}\nprogramming_languages = {'python', 'r', 'c', 'c++', 'java', 'julia'}\nanimals = {'python', 'buffalo', 'turkey', 'cat', 'dog', 'robin'}\ngeographic_places = {'java', 'buffalo', 'turkey', 'moscow'}\nnames = {'robin', 'julia', 'alice', 'bob', 'conrad'}\ngreen_things = {'python', 'grass'}\nsets = [letters, programming_languages, animals, geographic_places, names, green_things]\nlabels = ['letters', 'programming languages', 'animals', 'geographic places',\n          'human names', 'green things']\nplt.figure(figsize=(10, 6))\nsupervenn(sets, labels , sets_ordering='minimize gaps')\n```\n\u003cimg src=\"https://i.imgur.com/hinM4I8.png\" width=400\u003e\n\nAnd this is how the figure would look without the smart column reordering algorithm:\n\u003cimg src=\"https://i.imgur.com/sWFah6k.png\" width=400\u003e\n\n#### Banana genome compared to 5 other species\n[Data courtesy of Jake R Conway, Alexander Lex, Nils Gehlenborg - creators of UpSet](https://github.com/hms-dbmi/UpSetR-paper/blob/master/bananaPlot.R)\n\nImage from [D’Hont, A., Denoeud, F., Aury, J. et al. The banana (Musa acuminata) genome and the evolution of\nmonocotyledonous plants](https://www.nature.com/articles/nature11241)\n\nFigure from original article (note that it is by no means proportional!):\n\n\u003cimg src=\"https://i.imgur.com/iQlcLVG.jpg\" width=650\u003e\n\nFigure made with [UpSetR](https://caleydo.org/tools/upset/)\n\n\u003cimg src=\"https://i.imgur.com/DH72eJJ.png\" width=700\u003e\n\nFigure made with supervenn (using the `widths_minmax_ratio` argument)\n\n```python\nplt.figure(figsize=(20, 10))\nsupervenn(sets_list, species_names, widths_minmax_ratio=0.1,\n          sets_ordering='minimize gaps', rotate_col_annotations=True, col_annotations_area_height=1.2)\n```\n\u003cimg src=\"https://i.imgur.com/1FGvOLu.png\" width=850\u003e\n\nFor comparison, here's the same data visualized to scale (no `widths_minmax_ratio`, but argument\n`min_width_for_annotation` is used instead to avoid column annotations overlap):\n\n```python\nplt.figure(figsize=(20, 10))\nsupervenn(sets_list, species_names, rotate_col_annotations=True,\n          col_annotations_area_height=1.2, sets_ordering='minimize gaps',\n          min_width_for_annotation=180)\n\n```\n\n\u003cimg src=\"https://i.imgur.com/MgUqkL6.png\" width=850\u003e\n\nIt must be noted that `supervenn` produces best results when there is some inherent structure to the sets in question.\nThis typically means that the number of non-empty intersections is significantly lower than the maximum possible\n(which is `2^n_sets - 1`). This is not the case in the present example, as 62 of the 63 intersections are non-empty, \nhence the results are not that pretty.\n\n#### Order IDs in requests to a multiple vehicle routing problem solver\nThis was actually my motivation in creating this package. The team I'm currently working in provides an API that solves\na variation of the Multiple Vehicles Routing Problem. The API solves tasks of the form\n\"Given 1000 delivery orders each with lat, lon, time window and weight, and 50 vehicles each with capacity and work\nshift, distribute the orders between the vehicles and build an optimal route for each vehicle\". \n\nA given client can send tens of such requests per day and sometimes it is useful to look at their requests and\nunderstand how they are related to each other in terms of what orders are included in each of the requests. Are they\nsending the same task over and over again  - a sign that they are not satisfied with routes they get and they might need\nour help in using the API? Are they manually editing the routes (a process that results in more requests to our API, with\nonly the orders from affected routes included)? Or are they solving for several independent order sets and are happy\nwith each individual result?\n\nWe can use `supervenn` with some custom annotations to look at sets of order IDs in each of the client's requests.\nHere's an example of an OK but not perfect client's workday:\n\u003cimg src=\"https://i.imgur.com/9YfRC61.png\" width=800\u003e\n\nRows from bottom to top are requests to our API from earlier to later, represented by their sets of order IDs.\nWe see that they solved a big task at 10:54, were not satisfied with the result, and applied some manual edits until\n11:11. Then in the evening they re-solved the whole task twice over, probably with some change in parameters.\n\nHere's a perfect day:\n\n\u003cimg src=\"https://i.imgur.com/E2o2ela.png\" width=800\u003e\n\nThey solved three unrelated tasks and were happy with each (no repeated requests, no manual edits; each order is\ndistributed only once).\n\nAnd here's a rather extreme example of a client whose scheme of operation involves sending requests to our API every\n15-30 minutes to account for live updates on newly created orders and couriers' GPS positions.\n\n\u003cimg src=\"https://i.imgur.com/vKxHOF7.jpg\" width=800\u003e\n\n### Comparison to similar tools\n\n#### [matplotlib-venn](https://github.com/konstantint/matplotlib-venn) \nThis tool plots area-weighted Venn diagrams with circles for two or three sets. But the problem with circles\nis that they are pretty useless even in the case of three sets. For example, if one set is symmetrical difference of the\nother two:\n```python\nfrom matplotlib_venn import venn3\nset_1 = {1, 2, 3, 4}\nset_2 = {3, 4, 5}\nset_3 = set_1 ^ set_2\nvenn3([set_1, set_2, set_3], set_colors=['steelblue', 'orange', 'green'], alpha=0.8)\n```\n\u003cimg src=\"https://i.imgur.com/Mijyzj8.png\" width=260\u003e\n\nSee all that zeros? This image makes little sense. The `supervenn`'s approach to this problem is to allow the sets to be\nbroken into separate parts, while trying to minimize the number of such breaks and guaranteeing exact proportionality of\nall parts:\n\n\u003cimg src=\"https://i.imgur.com/e3sMQrO.png\" width=400\u003e\n\n\n#### [UpSetR and pyUpSet](https://caleydo.org/tools/upset/)\n\u003cimg src=\"https://raw.githubusercontent.com/ImSoErgodic/py-upset/master/pictures/basic.png\" width=800\u003e\nThis approach, while very powerful, is less visual, as it displays, so to say only _statistics about_ the sets, not the\nsets in flesh.\n\n#### [pyvenn](https://raw.githubusercontent.com/wiki/tctianchi/pyvenn)\n\u003cimg src=\"https://raw.githubusercontent.com/wiki/tctianchi/pyvenn/venn6.png\" width=800\u003e\nThis package produces diagrams for up to 6 sets, but they are not in any way proportional. It just has pre-set images\nfor every given sets count, your actual sets only affect the labels that are placed on top of the fixed image,\nnot unlike the banana diagram above. \n\n#### [RainBio](http://www.lesfleursdunormal.fr/static/appliweb/rainbio/index.html) ([article](https://hal.archives-ouvertes.fr/hal-02264217/document))\nThis approach is quite similar to supervenn. I'll let the reader decide which one does the job better:\n\n##### RainBio:\n\n\u003cimg src=\"https://i.imgur.com/jwQAltx.png\" width=400\u003e\n\n\n##### supervenn:\n\n\u003cimg src=\"https://i.imgur.com/hinM4I8.png\" width=400\u003e\n\n\n_Thanks to Dr. Bilal Alsallakh for referring me to this work_\n\n#### [Linear Diagram Generator](https://www.cs.kent.ac.uk/people/staff/pjr/linear/index.html?abstractDescription=programming_languages+1%0D%0Aletters+programming_languages+2%0D%0Aprogramming_languages+animals+green_things+1%0D%0Ageographic_places+1%0D%0Aletters+3%0D%0Ahuman_names+3%0D%0Agreen_things+1%0D%0Aprogramming_languages+geographic_places+1%0D%0Aanimals+2%0D%0Aanimals+geographic_places+2%0D%0Aanimals+human_names+1%0D%0Aprogramming_languages+human_names+1%0D%0A\u0026width=700\u0026height=250\u0026guides=lines)\nThis tool has a similar concept, but only available as a Javascript web app with minimal functionality, and you have to\ncompute all the intersection sizes yourself. Apparently there is also an columns rearrangement algorithm in place, but\nthe target function (number of gaps within sets) is higher than in the diagram made with supervenn.\n\u003cimg src=\"https://i.imgur.com/tZN8QAb.png\" width=600\u003e\n\n_Thanks to [u/aboutscientific](https://www.reddit.com/user/aboutscientific/) for the link._\n\n### Credits\nThis package was created and is maintained by [Fedor Indukaev](https://www.linkedin.com/in/fedor-indukaev-4a52961b/). \nYou can contact me on Gmail and Telegram by the same username as on github.\n\n### How can I help?\n- If you like supervenn, you can click the star at the top of the page and tell other people about this tool\n- If you have an idea or even an implementation of a algorithm for matrix columns rearrangement, I'll be happy to try\nit, as my current algorithm is quite primitive. (The problem in question is almost  the traveling salesman problem in\nHamming metric).\n- If you are a Python developer, you can help by reviewing the code in any way that is convenient to you.\n- If you found a bug or have a feature request, you can submit them via the \n[Issues section](https://github.com/gecko984/supervenn/issues).\n \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgecko984%2Fsupervenn","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgecko984%2Fsupervenn","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgecko984%2Fsupervenn/lists"}