Core Concepts
The GA4GH Task Execution Service (TES) API provides a standardized way to submit and manage computational tasks across a variety of on-premises and cloud-based compute environments, enabling researchers to easily deploy their workflows in a multi-cloud setting.
Abstract
The GA4GH Task Execution Service (TES) API is a standardized schema and API for describing and executing batch execution tasks. It was designed to address the challenges of running computational workflows in hybrid and multi-cloud environments, where the execution environment may lack the guarantees of traditional on-premises high-performance computing (HPC) systems.
The core of the TES API is the "Task" resource, which defines all the necessary parameters for a computational job, including the application environment, required computational resources, input/output files, environmental variables, and command lines to be executed. This allows the TES API to abstract away the details of the underlying compute infrastructure, making it easy for researchers to deploy their workflows across different cloud and on-premises systems.
The TES API has been adopted by several service providers, including Microsoft, Funnel, TESK, and Pulsar, which provide TES-compatible servers for executing tasks on various compute environments such as HPC clusters, Kubernetes, and cloud platforms like Azure and AWS. Additionally, multiple workflow engines like Cromwell, Nextflow, Snakemake, and CWL-TES have integrated support for the TES API, allowing researchers to leverage the flexibility and portability it provides.
The TES API is designed to be extensible and flexible, with plans to further improve support for authentication, security, and software portability across different containerization and software management systems. The goal is to enable seamless multi-cloud execution of computational workflows in the life sciences, reducing the burden on researchers to manage the underlying infrastructure.
Stats
The average Whole Genome Sequencing file is more than 200GB, making it impractical to download all data to a single storage site.
The TES API supports the definition of computational resource requirements, including CPUs, GPUs, memory, and storage, to optimize task execution.
The TES API allows the definition of multiple command lines per task, enabling the execution of setup and teardown steps in addition to the main computational work.
Quotes
"The flexibility of the TES API, and its ability to be deployed in a number of different infrastructures is a valuable tool for the genomics community and other audiences that benefit from a cross-platform, cross-cloud batch execution solution."
"TES can help to simplify and streamline the execution of computational workflows. It can also help to reduce the cost of running these workflows by making it possible to use a variety of compute resources, including cloud computing platforms."