https://github.com/edinburgh-genome-foundry/saboteurs

:bomb: Identify elements impairing success accross group experiments.
https://github.com/edinburgh-genome-foundry/saboteurs

dna-assembly dna-part lab-automation quality-control

Last synced: 4 months ago
JSON representation

:bomb: Identify elements impairing success accross group experiments.

Host: GitHub
URL: https://github.com/edinburgh-genome-foundry/saboteurs
Owner: Edinburgh-Genome-Foundry
License: mit
Created: 2018-03-15T19:36:27.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2022-05-09T11:17:51.000Z (over 3 years ago)
Last Synced: 2024-11-06T20:52:30.232Z (about 1 year ago)
Topics: dna-assembly, dna-part, lab-automation, quality-control
Language: Python
Homepage: https://edinburgh-genome-foundry.github.io/saboteurs/
Size: 7.96 MB
Stars: 4
Watchers: 7
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.rst

Awesome Lists containing this project

README

          .. raw:: html

    


    

    



    

    

.. image:: https://github.com/Edinburgh-Genome-Foundry/saboteurs/actions/workflows/build.yml/badge.svg

    :target: https://github.com/Edinburgh-Genome-Foundry/saboteurs/actions/workflows/build.yml

    :alt: GitHub CI build status

.. image:: https://coveralls.io/repos/github/Edinburgh-Genome-Foundry/saboteurs/badge.svg?branch=master

   :target: https://coveralls.io/github/Edinburgh-Genome-Foundry/saboteurs?branch=master

Saboteurs is a Python library to detect failure-causing elements from success/failure data.

We use it at the `Edinburgh Genome Foundry `_ to

identify defectuous genetic parts early:

- When assembling large fragments of DNA, each with typically 5 to 25 parts, we

  observe that some assemblies have far fewer successes ("good clones") than

  some others. We use Saboteurs to identify possible parts which would be

  causing the damage. This would generally mean that the sample corresponding

  to these parts has been compromised.

- Before launching a large batch of assemblies which reuse the same few parts,

  we use Saboteurs to design a smaller "test batch" of carefully selected

  assemblies to detect and identify possible bad parts.

See `this page `_  for the HTML docs.

You can also use Saboteurs online, using `this web app `_ for saboteurs detection, or `this other app `_ for designing test batches.

Usage

-----

Logical methods

~~~~~~~~~~~~~~~

**Identifying saboteur elements from experimental results**

Assume that a secret organization has a few dozen agents ([A]nna, [B]ob, [C]harlie, [D]olly, etc.). Regularly, the organization puts together a team (e.g. A, C, D) and sends them to a mission, which should succeed unless one of the members is a double-agent who will secretly sabotage the mission. Looking at the table below, can you identify the *saboteur(s)*?

======= ======= =======

Mission Members Outcome

======= ======= =======

1       A C D   Success

2       B C E   Failure

3       A B D   Success

4       D F G   Failure

======= ======= =======

Mission 2 raises suspicion on B, C, and E, but Mission 1 clears C, and mission 3 clears B. Therefore **E is a saboteur**. Meanwhile mission 4 raises **suspicion on F and G**, but while none of them is cleared by another mission, it is impossible to say if only F or only G or both are saboteurs.

The Saboteurs library has a method ``find_logical_saboteurs`` which allows to do this reasoning many groups with many elements. Here is how you would solve the problem above:

.. code:: python

    from saboteurs import find_logical_saboteurs

    groups = {

        1: ['A', 'C', 'D'],

        2: ['B', 'C', 'E'],

        3: ['A', 'B', 'D'],

        4: ['D', 'F', 'G']

    }

    find_logical_saboteurs(groups, failed_groups=[2, 4])

    # result: {'saboteurs': ['E'], 'suspicious': ['G', 'F']}

In the result, ``suspicious`` is the list of all elements which only appear in

failing groups, and ``saboteurs`` is the list of suspicious elements which are

also the only suspicious element in at least one group (and therefore confirmed

unambiguously as saboteurs).

**Designing experiment batches to find saboteur elements**

Assume that we have a list of agents, among which we suspect might hide one or two saboteurs.

We want to select a batch of "test groups" (from all possible teams) so that when we get the result

of all these teams (success or failure) we will be able to identify the one or two saboteurs.

This is solved as follows:

.. code:: python

    from saboteurs import design_test_batch

    all_possible_groups = {

        'group_1': ['A', 'B', 'C],

        'group_2': ['A', 'B', 'D', 'E'],

        # ... and many more

    }

    selected_groups, error = design_test_batch(all_possible_groups,

                                               max_saboteurs=2)

    # result:

    # OrderedDict([('group_3', ('A', 'B', 'L')),

    #              ('group_9', ('A', 'E', 'I', 'L')),

    #              ... and more])

        

You can get a quick report (CSV file and plot) of the selected groups with

.. code:: python

    generate_batch_report(selected_groups, plot_format='png',

                          target='design_test_batch_report')

.. image:: https://github.com/Edinburgh-Genome-Foundry/saboteurs/raw/master/examples/logical_methods/design_test_batch_report/groups.png

In practice, a group can have different "positions" and a given element can

only fill one of these positions. Consider for instance that there are 4

possible positions, with respective possible elements lists as follows: 

.. code:: python

    elements_per_position = {

        "Position_1": ['A', 'B', 'C'],

        "Position_2": ['D', 'E', 'F', 'G'],

        "Position_3": ['H', 'I', 'J', 'K'],

        "Position_4": ['L', 'M', 'N'],

    }

In that case there are 3x4x4x3=144 possible combinations, which can be generated

using Saboteur's utility method ``generate_combinatorial_groups``:

.. code:: python

    from saboteurs import (generate_combinatorial_groups, design_test_batch)

    possible_groups = generate_combinatorial_groups(elements_per_position)

    selected_groups = design_test_batch(possible_groups, max_saboteurs=2)

    # result:

    # OrderedDict([('group_009', ('A', 'D', 'J', 'N')),

    #              ('group_016', ('A', 'E', 'I', 'L')),

    #              ... and 13 other groups])

Statistical methods

~~~~~~~~~~~~~~~~~~~

**Example 1:** assume that a secret organization has a few dozen agents (Anna, Bob, Charlie, etc.). Regularly, the organization puts together a group (Anna and David and Peggy) and sends that group to missions, some of which will be successful, some of which will fail. After a large number of missions, looking at the results of each group, you may ask: are there some agents which tend to lower the chances of success of the groups they are part of ?

With the Saboteurs library, you would first put your data in a spreadsheet ``data.csv`` like `this one `_ then run the following script:

.. code:: python

  from saboteurs import (csv_to_groups_data,

                         find_statistical_saboteurs,

                         statistics_report)

  groups_data = csv_to_groups_data("data.csv")

  analysis_results = find_statistical_saboteurs(groups_data)

  statistics_report(analysis_results, "report.pdf")

You obtain the following `PDF report `_ highlighting which members have a significant negative impact on their groups, and where they appear:

.. raw:: html

    


    

    

    


Installation

------------

You can install Saboteurs with PIP:

.. code::

    pip install saboteurs

License = MIT

-------------

Saboteurs is an open-source software originally written at the Edinburgh Genome Foundry by `Zulko `_ and `released on Github `_ under the MIT licence (Copyright 2017 Edinburgh Genome Foundry, University of Edinburgh). Everyone is welcome to contribute!

More biology software

---------------------

.. image:: https://raw.githubusercontent.com/Edinburgh-Genome-Foundry/Edinburgh-Genome-Foundry.github.io/master/static/imgs/logos/egf-codon-horizontal.png

 :target: https://edinburgh-genome-foundry.github.io/

Saboteurs is part of the `EGF Codons `_ synthetic biology software suite for DNA design, manufacturing and validation.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/edinburgh-genome-foundry/saboteurs

Awesome Lists containing this project

README