{"id":21721345,"url":"https://github.com/informaticsmatters/fragmentor","last_synced_at":"2025-04-12T21:33:52.205Z","repository":{"id":40262719,"uuid":"240257457","full_name":"InformaticsMatters/fragmentor","owner":"InformaticsMatters","description":"Fragment network build optimisation","archived":false,"fork":false,"pushed_at":"2025-03-24T09:11:51.000Z","size":16174,"stargazers_count":2,"open_issues_count":6,"forks_count":1,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-24T10:24:56.003Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/InformaticsMatters.png","metadata":{"files":{"readme":"README-AWS-PCLUSTER.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-02-13T12:40:34.000Z","updated_at":"2025-03-24T09:11:55.000Z","dependencies_parsed_at":"2024-09-11T12:25:24.264Z","dependency_job_id":"f45b7d72-f524-4481-8d47-e616da5e80bf","html_url":"https://github.com/InformaticsMatters/fragmentor","commit_stats":null,"previous_names":[],"tags_count":72,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InformaticsMatters%2Ffragmentor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InformaticsMatters%2Ffragmentor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InformaticsMatters%2Ffragmentor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InformaticsMatters%2Ffragmentor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/InformaticsMatters","download_url":"https://codeload.github.com/InformaticsMatters/fragmentor/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248636402,"owners_count":21137442,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-26T02:15:59.500Z","updated_at":"2025-04-12T21:33:52.183Z","avatar_url":"https://github.com/InformaticsMatters.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Executing on an AWS ParallelCluster\n\nWe can execute on an AWS ParallelCluster environment. A suitable cluster can be\nformed by following the instructions in our [nextflow-pcluster] repository.\n\nWith a cluster formed you should clone this repository to the **MasterServer**\ninstance where you can run our playbooks to create a postgres database server\nand then execute the fragmentor plays. From here we assume you're\non the cluster's master instance: -\n\n\u003e   You should have your cluster private key in ~/.ssh as described in the\n    [nextflow-pcluster] documentation\n\n    $ git clone https://github.com/InformaticsMatters/fragmentor\n    $ cd fragmentor/ansible\n    $ sudo pip install --upgrade pip\n    $ sudo pip install -r ../requirements.txt\n    $ ansible-galaxy install -r ../requirements.yaml\n\nAdd the cluster private key to the SSH agent: -\n\n    $ eval `ssh-agent`\n    $ ssh-add ~/.ssh/nf-pcluster\n    \nSet your AWS credentials. You'll need these if you're creating the cluster's\npostgres database: -\n\n    $ export AWS_DEFAULT_REGION=eu-central-1\n    $ export AWS_ACCESS_KEY_ID=?????\n    $ export AWS_SECRET_ACCESS_KEY=?????\n\nLife's a lot easier using parameter files with ansible, so create file\nthat provides variables that satisfy your cluster in order to create\nand configure the cluster database. Something like this: -\n\n```yaml\n---\ndb_server_state: present\naws_db_instance_type: t3a.2xlarge\ndb_volume_size_g: 10\ndatabase_cloud_provider: aws\ndb_shared_buffers_g: 4\ndb_max_parallel_workers: 8\naws_vpc_subnet_id: \u003cCLUSTER_PUBLIC_SUBNET_ID\u003e\naws_vpc_id: \u003cCLUSTER_VPC_ID\u003e\n\ndeployment: production\nrunpath: /data/share-2/frag\nadd_backup: no\n```\n\n\u003e   Setting `add-backup` to `no` prevents the automatic backup\n    from taking place, which typically occurs after the inchi play.\n\nNow create the server: -\n\n    $ ansible-playbook site-db-server.yaml -e @parameters \n\nAdjust your parameters so that they include the address of the database server.\nThe server's IP address is printed by the above play: -\n\n    TASK [db-server : Display DB server address (Private IP)] *****************\n    Thursday 22 October 2020  18:54:00 +0000 (0:00:00.048)       0:00:24.557 ** \n    ok: [localhost] =\u003e {\n        \"server_result.instances[0].private_ip\": \"10.0.0.192\"\n    }\n\nIn this case you'd add the following to the parameter file: -\n\n```yaml\ndatabase_login_host: 10.0.0.192\n```\n\nUsing the AWS console wait for the database server instance to become ready\n(initialise) before trying to configure it.\n\nYou will need to install the `ec2.py` dynamic inventory\nscript, provided by ansible. The following installs the script locally\nand then runs an ansible `ping` to ensure the database server can be found: -\n\n    $ wget https://raw.githubusercontent.com/ansible/ansible/stable-2.9/contrib/inventory/ec2.py\n    $ chmod +x ec2.py\n    $ export EC2_INSTANCE_FILTERS='tag:Name=FragmentorProductionDatabase'\n\n    $ ansible -i ec2.py tag_Name_FragmentorProductionDatabase -m ping\n\nNow you can configure the server and start and prepare the\nfragmentation database. The first two plays rely on dynamic inventory\nprovided by the `ec2.py` script: -\n\n    $ ansible-playbook -i ec2.py site-db-server-configure.yaml -e @parameters\n    $ ansible-playbook site-db-server-configure_create-database.yaml -e @parameters\n\n\u003e   The Slurm compute instances may be incorrectly configured with regard to\n    available memory. You need to reset the slurm manager with the correct\n    memory.\n\nThe compute instances may be incorrectly configured with regard to memory.\nRun the `sinfo` command to see the `MEMORY` value. If it's `1` you may need\nto fix them.\n\n    $ sinfo --exact --long -N\n    [...]\n    NODELIST              NODES PARTITION       STATE CPUS    S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON               \n    compute-dy-m4large-1      1  compute*        idle 2       2:1:1      1        0      1 dynamic, none                 \n    compute-dy-m4large-2      1  compute*        idle 2       2:1:1      1        0      1 dynamic, none    \n\nFix the compute instance memory by running the _fix_ script.\n\nIf your compute instances have 16GiB RAM run: -\n\n    $ ../fix-pcluster-slurm-compute-memory.sh 16\n\nFrom here you should be able to run fragmentation plays, i.e. stuff like this\nfor a typical MolPort fragmentation, extract and combination run: -\n\n    $ ansible-playbook site-standardise.yaml -e @parameters \\\n        -e vendor=molport \\\n        -e version=2020-10 \n\n    $ ansible-playbook site-fragment.yaml -e @parameters \\\n        -e vendor=molport \\\n        -e version=2020-10\n\n    $ ansible-playbook site-inchi.yaml -e @parameters\n\n    $ ansible-playbook site-extract.yaml -e @parameters\n\n    $ ansible-playbook site-combine.yaml -e @parameters\n\n---\n\n[nextflow-pcluster]: https://github.com/InformaticsMatters/nextflow-pcluster\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finformaticsmatters%2Ffragmentor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Finformaticsmatters%2Ffragmentor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finformaticsmatters%2Ffragmentor/lists"}