Core Concepts
A fast and accurate method, HILARy, is introduced to efficiently partition B-cell receptor repertoire sequencing data into clonal families, which are key to understanding the evolution and dynamics of the adaptive immune response.
Abstract
The article presents a new method, HILARy, for efficiently and accurately partitioning B-cell receptor (BCR) repertoire sequencing data into clonal families. Clonal families are lineages of related B cells stemming from the same V(D)J recombination event and are crucial for understanding the function, evolution, and dynamics of the adaptive immune response.
The key insights and highlights are:
The authors identify the main factors that influence the difficulty of clonal family inference: low clonality levels (prevalence) and short recombination junctions (CDR3 lengths).
They develop a CDR3-based clustering method that uses a probabilistic model of recombination and selection to set an adaptive threshold, achieving high precision across different prevalence and CDR3 length regimes.
To further improve performance, especially for short CDR3s and low prevalence, the authors incorporate the phylogenetic signal of shared mutations outside the CDR3 region.
Benchmarking on synthetic data shows the mutations-based method outperforms state-of-the-art approaches, achieving consistently high sensitivity and precision.
Applying the method to a healthy donor's repertoire, the authors find universal statistics of clonal family sizes, site frequency spectra, and selection pressures across different CDR3 length regimes, suggesting the dynamics of affinity maturation and memory formation are independent of CDR3 specificity.
The proposed framework provides a robust tool for reliable partitioning of BCR repertoire data, enabling deeper insights into the adaptive immune response.
Stats
The total number of pairwise comparisons in the largest VJl classes is ~ 1010.
The distribution of CDR3 lengths l shows that in-frame (productive) sequences dominate the repertoire.
The prevalence (fraction of positive pairs) varies widely across VJl classes, spanning 3 orders of magnitude.
The null distribution of distances between unrelated sequences, P0(x|l), becomes more peaked around 1/2 as CDR3 length l increases.
Quotes
"Clonal families are the main building blocks of the repertoire. Since members of the same family usually share their specificities, affinity maturation first competes families against each other for antigen binding in the early stages of the reaction, and then selects out the best binders within families in the later stages."
"The extraordinary diversity of VDJ rearrangments can be efficiently described and quantified using probabilistic models of the recombination process as well as subsequent purifying selection."
"Identifying clonal families with high accuracy is paramount in such approaches as it avoids the potential biases of different family sizes and varying levels of clonality."