https://github.com/datajoint/sciops-workshop
DataJoint SciOps Workshop
https://github.com/datajoint/sciops-workshop
Last synced: 4 months ago
JSON representation
DataJoint SciOps Workshop
- Host: GitHub
- URL: https://github.com/datajoint/sciops-workshop
- Owner: datajoint
- Created: 2022-07-21T13:29:58.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2022-09-06T13:09:28.000Z (almost 3 years ago)
- Last Synced: 2025-01-10T16:23:07.566Z (6 months ago)
- Size: 24.4 KB
- Stars: 0
- Watchers: 5
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Neuro Workflows | Houston | Sep 6-7
## Brainstorming workshopThe DataJoint team and the Johns Hopkins University Applied Physics Lab will hold an open brainstorm workshop to define the roadmap for automated research workflows in neuroscience.
We will review the current landscape critically and define paths for implementing effective research workflows in neuroscience.### Organizers and participants
* JHU/APL: Daniel Xenes, Erik Johnson
* Columbia University: Taiga Abe, John Cunningham
* Berlin Institute of Health, Charité University [EBRAINS](https://ebrains.eu/) (remote): Michael Schirner, Petra Ritter
* University of Utah, [Visus](https://visus.org): Amy Gooch, Valerio Pascucci
* [Catalyst Neuro](https://www.catalystneuro.com/): Ben Dichter
* [Ontologic](http://ontologic.ly): Dallas McCall
* Allen Institute - Neural Dynamics (remote): David Feng
* DataJoint: Thinh Nguyen, Raphael Guzman, Monty Kosma, Kabi Gunalan, Joseph Burling, Chris Brozdowski, Tolga Dincer, Sidharth Hulyalkar, Jaerong Ahn, Kushal Bakshi, Monty Kosma, Anu Pansare, Dimitri Yatsenko### Where
The workshop will be held at [DataJoint in Houston, TX](https://www.datajoint.com/about).Address: [4265 San Felipe St, Ste 1025 (10th Floor)](https://goo.gl/maps/SMHvhV1ARFsGWXWA8), Houston, TX 77027. Plenty of parking is available in the garage attached on the south side of the building.
### Focus
Aim:
* to vastly increase the speed, scale, and validity of neuroscience research
* to provide a clear competitive advantage to researchers
- time to launch
- high-performance computing
- latest analysis tools
- ability to collaborate
- time to publishApproach: Implement [Research Workflow Automation](https://nap.nationalacademies.org/read/26532)
- reduce barriers to participation / collaboration
- expand capabilities -> tackle new problems
- ease access to cyberinfrastructure
- transparency and reproducibility
- integration of diverse tools and resources
- proper incentives and credit assignmentWhat is needed to implement automated workflows in neuroscience
- framework + tools = a language for formal workflows = **DataJoint Core**
- reference implementations + integrations = **DataJoint Elements**
- community + support + resources + services = **DataJoint Works**Breakfast and lunch will be provided.
The workshop will identify general directions for addressing the challenges in team science and data-driven research workflows.
Day 1 and Day 2 will map the current trends and challenges in the field.
This session will be open to interested researchers -- please notify Dimitri Yatsenko August 31, 2022 if you would like to attend.
Day 3 will focus on the roadmap for the emerging DataJoint platform and will be limited to the DataJoint and JHU/APL teams.## Day 1 (2022-09-06)
7.30 am - Breakfast
8 am - Automated Workflows for Neuroscience - Dimitri
9 am - Current Challanges - Erik, Dimitri, Thinh
- Major friction points in data-driven research
- The structure of typical vs highly successful projects
- Advanced neurotechnologies and multi-modal experiments
- Data acquisition and aggregation
- Team organization, collaboration process
- IT Infrastructure - Use of cloud
- Collaborative analysis
- Credit assignment
- Human in the loop -- curation
- AI in the loop
- Reproducibility, integrity, continuity
- Budgeting and cost controls10 am - Virtual Research Environments - Michael Schirner
- [The Virtual Brain](https://ebrains.eu/service/the-virtual-brain/) – EBRAINS Cloud woorkflows
- Virtual Research Environment (VRE) for research on sensitive health data at Charité Berlin11 am - BRAIN Initiative Neuroinformatics Developments - Erik
- A critical review of the current and emerging neuroinformatics initiatives
- Global efforts - INCF12 noon - lunch
1 pm - Visus. Data streaming, HPC infrastructure - Valerio
2 pm - collaborative analysis tools, platforms, and environments. DANDI, PanNeuro - Ben ,Erik
- [PanNeuro](https://arokem.github.io/2019-BRAINI-PanNeuro-slides/#/) - Ben3 pm - ML integration. Human-in-the loop - Dimitri, Erik
## Day 2 (2022-09-07)
7.30 am breakfast
8 am: An overview of Formal Workflow management tools - Raphael, Erik
- Prefect, Apache Airflow, Flyte, Argo -> [Common Workflow Language](https://www.commonwl.org/)Most widely used bioinformatics workflow systems (from [Reiter-2020](https://academic.oup.com/gigascience/article/10/1/giaa140/6092773?fbclid=IwAR1I92LXvDbpesunIQOENtLRa4vm3zH4pvC8HJQ269luTaQ_WBwWIuMeFh8#312918873)):
| Workflow system | Documentation | Example workflow | Tutorial |
| :--- | :--- | :--- | :--- |
| Snakemake | [Docs](https://snakemake.readthedocs.io) | [ChipSeq](https://github.com/snakemake-workflows/chipseq) | [Tutorial](https://snakemake.readthedocs.io/en/stable/tutorial/tutorial.html) |
| Nextflow | [Docs](https://www.nextflow.io/) | [Sarek](https://github.com/nf-core/sarek) | [Get Started](https://www.nextflow.io/docs/latest/getstarted.html) |
| CWL | [commonwl.org](https://www.commonwl.org/) | [EBI-Metagenomics](https://github.com/EBI-Metagenomics/pipeline-v5) | [User Guide Example](https://www.commonwl.org/user_guide/02-1st-example/index.html) |
| WDL | [openwdl.org](https://openwdl.org/) | [gatk4](https://github.com/gatk-workflows/gatk4-data-processing) | [HOWTO](https://support.terra.bio/hc/en-us/articles/360037127992–1-howto-Write-your-first-WDL-script-running-GATK-HaplotypeCaller) |9 am Infrastructure access, orchestration - Erik, Raphael, Joseph, Taiga
- [NeuroCaas](https://neurocaas.org) - Taiga Abe
- Job orchestration - Raphael10 am Data Management, the role of databases - Dimitri, David Feng
- workflows + data management
- data models: structured vs self-describing data
- data consistency, integrity
- query speed,indexing
- Data lakes, workflows/structure: Benchling, Code Ocean11 am - Accessibility, provenance, versioning - Erik, Ben, Dimitri, David Feng
- association between code, data
- Joint management of code, environment, and data12 noon - lunch
1 pm - Neuroinformatics resources - Erik, Dimitri, Ben
- Allen and Janelia atlases
- Catalogs, nomenclatures, ontologies
- Data standards, archives, public data repositories and databases.
- Analysis tools
- Collaboration platforms2 pm - The DataJoint experience - Thinh, Dimitri
- DataJoint Core - differentiators, performance, gap analysis
- DataJoint Elements - User experience
- DataJoint Works - user experience
- Interfaces: SciViz, LabBook, Codebook
- Platform for tool developers - dissemination, tracking, credit assignment3 pm - Teamflows - Erik, Thinh, Dimitri
- Roles, team structure
- Software Engineering in Research
- incentives, credit assignment
- informal collaborations
- integration of community development
- funding, support4 pm - Online research platforms - Erik
- Examples: Galaxy, [HubMap](https://portal.hubmapconsortium.org/)## Day 3 - September 8, 2022
7.30 am - breakfast
8 am - DataJoint Platform Workshop -- internal. DataJoint and JHU APL only.
* Aims for DataJoint SciOps
- Key differentiators
- SciOps user experience
* Framework development
* Integration of DataJoint Elements
* Integrations - acquisition software, atlases, archives
* Aim 1
* Aim 2
* Aim 3
* Milestones, Statement of work, Roles and responsibilities1 pm - lunch, adjourn