SURF – DTL Interest Group Compute Resources for Life Science Research Second meeting: April 22, 2015
Summary of kick-off meeting Update on NFU Data4LifeSciences WP 7 This meeting: best practices HPC facilities - NL - Usage - Maintenance - Support - Business model Interest Group Compute Resources for Life Science Research
April 22, 2015
• Compute needs are increasing • How to accommodate for peak capacity needs
Interest Group Compute Resources for Life Science Research
April 22, 2015
Summary of kick-off meeting SIG Goal set: Share expertise in compute resources for life science research Discussions on scale-out models - Depending on the use cases one or more scale-out models may be appropriate. - Relatively simple solutions that connect clusters at different locations would work with a fast network. - Complex infrastructures for high-throughtput data analyses provide solutions for accounting, job submission and data provenance. Interest Group Compute Resources for Life Science Research
April 22, 2015
Sharing expertise & best practises: SURF-DTL Special Interest Group
• Kick-off: January 20th • 43 members • Yearplan 2015 available • Topics: scale out models, best practices 5
NFU Data4LifeSciences WP7. Facilities for high-throughput data processing. • Workgroup: UMCU, UMCG, Vumc, AMC, SURFsara, SURFnet • Goals: - How to efficiently use compute resources in NL - Share software en package installations on compute facilities • Share expertise, best practices using compute resources (SIG) • Workplan: 2015 Q1-Q4 - Inventory of compute facilities NL - Criteria and requirements - Use cases - Pilots - Evaluation: business models, best practices - Advice / project plan for harmonization compute resources NL 6
Scale out models & AAI
model
status
involved
1. Federated cloud
Proof of Concept, in pilot phase
UMCG, VUmc, SURFsara
2. Grid infrastructure
Production: Grid (LSG, AMC, SURFsara, LUMC EGI), evaluation
3. Sharing clusters by federated IDs
Initiation, in pilot phase UMCU, LUMC, SURFsara, SURFnet
4. Hybrid clustercloud: cluster in cloud
Initiation, in pilot phase UMCU, LUMC, SURFsara
Werkgroep: SURFsara (lead), UMCU, LUMC, AMC, Vumc, UMCG, SURFnet
Inventory compute facilities – UMC’s NL UMC
facility
capacity
note
UMCG/ RUG
Meerdere decentrale clusters
1.680 cores (totaal)
Gedeelde faciliteiten met RUG
LUMC
Shark cluster
544 cores
UMCU
Research HPC cluster
720 cores
VUmc
Meerdere clusters decentraal, NCAgrid en cloud
560 cores (totaal)
others
..
..
..
SURFsara
Meerdere infrastructures: cluster, cloud, grid
25.000 (excl. cloud)
Nationaal beschikbaar op aanvraag
Uitbreiding naar 2000 cores verwacht, ook beschikbaar voor UU
Several service delivery models • Local services - provider: research ICT (integrated or separated from central ICT, diagnostics) or central ICT - mainly cluster computing - Support levels differ - capacity differs per site, depending on research programs • National services (when local services not available or not part of core business) - provider: SURFsara and partners (grid) - different flavors in compute services available - high scalability (up to EU resources) • Combination of local and national services - different service delivery models possible
Service delivery model: example RCCS • National Research Capacity Computing Service (RCCS)
What
Who
maintenance
National (SURFsara)
User management: handle requests & access
Local PI (VU)
• Platform as a Service
Account management
National (SURFsara)
Functional support
Local PI (VU)
Technical support
National (SURFsara)
financed
VU
• Used for cohort genomics studies
Use of local facilities: trends • De huidige bezetting loopt uiteen van 50-85%, maar ligt gemiddeld op 80-85%. 10-15% is beschikbaar om op te schalen voor overflow. • Piekmomenten zijn vaak in de zomer en over de kerstvakantie. • Vraag wordt over het algemeen per jaar verdubbeld, in elk geval bij het UMCU en UMCG/RUG. • Use cases - Er blijven nieuwe use-cases komen. Onderzoekers zullen steeds meer de mogelijkheden aangrijpen om meer computational resources te gebruiken, bijv. daar waar eerst 1000 permutaties werden gedaan, worden nu al snel 10.000 permutaties gedaan als hiervoor resources beschikbaar zijn. - Daarnaast komen er steeds meer smaken op basis van workload, bijv. speciale machines voor moleculaire dynamica. • Er wordt bij het LUMC en UMCU gebruik gemaakt van een fairshare model op basis van budget.
Example: Project ALS MinE • amyotrofische laterale sclerose: 200.000 patients worldwide • Project 22.500 genomes to analyze • Too big for local resources: UMCU HPC facility • Use Life Science Grid
• 4,000 samples: 1,080,000 core hours (123.2 core years), 320 TB • 5,500 samples: 1,485,000 core hours (2015), 880 TB • 11,400 samples: 3,078,000 core hours or 351 core years (2016), 1.8 PB
Compute resource needs • Manage needs & capacity - Close contact with research groups / projects in pipeline - Gather needs: for how long, how much - Determine capacity and infrastructure needs • Balance peak capacity by using central or shared resources
SURFsara Compute Services applications in life science research Service
Description
Applications
Characteristics
HPC Cartesius
High performance / capability computing.
Simulations and prediction models, e.g. cell simulations
National supercomputer with accelerators for specific applications
Research Capacity Batch processing, Computing Service capacity computing
Directly scalable from local cluster environment, serviced Parallel applications, e.g. genetics, RNAseq and medical environment on application imaging analyses level
HPC Cloud
Virtualized / cloud compute infrastructure
High-throughput workflows, e.g. Dynamic and flexible, for any Galaxy, interactive workflow and OS, self-service visualizations, large memory IaaS applications
Grid, Life Science Grid
Large scale distributed cluster computing
Large scale cohort genetics, The sky is the limit. Requires RNAseq analyses, imputations, job submission protocols, molecular modeling middleware
Big data services, e.g. Hadoop, NoSQL
Data-centric computing Mining of large amounts of data, Hadoop data processing taking advantage of data e.g. annotations, ‘big data’ requires MapReduce enabled locality analytics algorithms
Data I/O intensive – CPU cycle intenstive
Considerations … • Managing the needs - use cases - different infrastructures required - complexity of multi-center studies • Finances - cores versus costs - hardware + FTEs (maintenance + support)
Total costs
• Business model / delivery model: SLA’s etc. • Expertise: maintenance & support
? Cores
NFU Data4LifeSciences WP7. Future work • Work out optimal use of compute resources NL - Architecture: scale-out model - Usage - Maintenance - Support level - Business model • Benchmarking of use cases - Walltime - Memory usage - CPU usage - IOPS: input and output operations per second • Data & software parallelisation
This meeting. Discussion Reflect on findings so far How to efficiently use local and national services - usage - maintenance - support - business model & governance • Best practices, e.g. documentation by Ansible playbook Focus on particular topics for next meeting?
Agenda • 13:15 Welcome & introduction • 13:30 HPC Genome Coordination Center Groningen/RUG - Pieter Neerincx • 13:55 Compute resources WeNMR - Alexandre Bonvin • 14:20 Cluster computing for brain imaging - Keith Cover • 14:45 Tea & coffee break • 15:05 Computing for the Life Sciences - Patrick Kemmeren • 15:30 HPC WUR - Hendrik-Jan Megens • 15:55 HPC LUMC - Martijn Vermaat • 16:20 Discussion • 16:45 Drinks