# Dynamics of Immune Repertoires: Exploration and Translation

The workshop poster session takes place on Tuesday, 12th July 2022 from 19:00 to 21:30 CEST.
For each on-site poster contribution there will be one poster wall (width: 97 cm, height: 250 cm) available. The preferred size of a poster is A0, portrait. Please do not feel obliged to fill the whole space. Posters can be put up for the full duration of the workshop, 11th to 15th July 2022.
Virtual participants and their posters will be individually contacted and connected for joining the poster session.

List of poster presentations

### Statistical analysis of synthetic AIRR-datasets to guide the development and benchmarking of AIRR-based machine learning

Chernigovskaia, Maria

The adaptive immune receptor repertoire (AIRR) is a snapshot of past and ongoing immune events, such as infections, vaccinations, allergies and (autoimmune) diseases. This information is remembered as a mixture of unknown event-specific immune signals that are distributed across immune receptors. It is assumed that each immune signal is a set of immune-event specific motifs and each immune event may be associated with multiple motifs. Linking these immune signals to immune status and classification of immune repertoires based on immune signals is one of the main goals of immunodiagnostics. Machine learning (ML) provides various approaches to detect complex signals in high-dimensional data. Earlier, ML and statistical approaches have been successfully applied to classify CMV, cancer and multiple sclerosis immune repertoires. However, there is still no systematic machine learning approach for the specific problem of immune repertoire classification. Here we present our plans for comprehensive benchmarking of machine learning methods applied to AIRR data. We plan to investigate a range of machine learning and deep learning algorithms to learn, classify and generate immune-event specific motifs on simulated AIRR datasets. We will focus on the understanding of how machine learning approaches learn and reconstruct an immune signal from AIRR-seq data. This research will make the first step toward establishing clear guidelines on how to apply machine learning to AIRR-seq data.

### Corecount analysis using IgDiscover – from inference to genotype analysis

Corcoran, Martin

The germline inference tool IgDiscover has been in use for several years in studies involving multiple species where individualized IG germline identification is required. The software has been updated to identify TCR germline alleles and, critically, now includes a novel genotyping functionality. This new genotype analysis tool, called corecount, provides a number of advantages compared to prior inference outputs. It is highly efficient at identifying alleles from low count genes; it facilitates accurate D and J gene genotyping; and finally, it enables clear distinction between alleles that contain 3’ end variation that are otherwise refractory to clear identification during germline inference from expressed libraries. The tool enables accurate identification of the full set of expressed IG and TCR germline alleles in Rep Seq libraries. The output is highly reproducible and will facilitate the comparison of multiple individuals in disease cohorts to identify disease associated allelic or structural variants.

### Analysis of T-cell repertoire diversity in the three-spined sticklebacks as a natural eco-evolutionary model system for adaptive immunity

Efstratiou, Artemis

The vertebrate adaptive immune system is based on lymphocyte recognition of pathogen-derived antigens presented by Major Histocompatibility Complex (MHC) molecules. As the engagement of a peptide-MHC complex with a suitable T-cell receptor (TCR) is the critical first step in the initiation of adaptive immune responses, the availability of a diverse T-cell repertoire, constituted by a pool of broad TCR specificities, is crucial. However, surprisingly little is known regarding the degree of natural variation in the inter-individual diversity and dynamics of the T-cell repertoire, especially during infection. Our research thus examines the qualitative and quantitative variation and dynamics of TCRβ repertoires within and among individuals of the three-spined stickleback, an eco-evolutionary model species. Here we analyze T-cell repertoires of lab-bred sticklebacks that were experimentally exposed to the cestode parasite Schistocephalus solidus, known to trigger adaptive immunity. Using NGS sequencing and advanced bioinformatics tools, we investigate TCRß repertoire size and diversity in relation to infection treatment, family background, and individual MHC diversity. Preliminary analyses indicate a substantial variation of TCRβ repertoires among individuals and across infection status, as expected for a species with a natural level of genetic diversity. Interestingly, infected individuals appear to exhibit higher inter-repertoire overlap than control ones. The existence of public, expanded clonotypes shared by all infected individuals further hints at convergent antigen-specific T-cell responses. Yet -surprisingly- most of the top public clonotypes are shared across experimental groups. Lastly, we show significant biases and a family effect on the usage of V-J gene segments.

### Why are cell populations maintained via multiple intermediate compartments?

Feliciangeli, Flavia

In an adaptive immune response, naive antigen-specifi c T-cell populations expand dramatically. The number and phenotype of descendants of individual naive T cells are highly variable because the fates of individual cells are subject to chance. Maturation of T cells in the thymus is another process where a small pool of progenitor cells continuously replenishes large populations of product'' cells via structured developmental journeys through sequences of intermediate cell types which we call compartments. If there is only one intermediate compartment, a large ratio of product cells to progenitors can only be achieved at the cost of the product cell population being dominated by large families of cells descended from individual progenitors, and large average number of divisions separating product cells from progenitors. This may increase the risk of cancerous mutations becoming established. These undesirable features can be avoided if there are multiple intermediate compartments. A sequence of compartments is, in fact, an efficient way to maintain a product cell population from a progenitor population, avoiding excessive clonality and minimising the number of rounds of division en route.

### Ontogeny of the B cell receptor repertoire and microbiome in mice

Gilboa, Amit

The immune system matures throughout childhood to achieve full functionality in protecting our bodies against threats. The immune system has a strong reciprocal symbiosis with the host bacterial population and the two systems co-develop, shaping each other. Despite their fundamental role in health physiology, the ontogeny of these systems is poorly characterized. Here, we studied the development of B cell repertoire (BCR) by analyzing high throughput sequencing of their receptors in several time points of young C57BL/6J mice. In parallel, we explored the development of the gut microbiome. We discovered that gut IgA repertoires change from birth to adolescence including an increase in complementary determining region 3 (CDR3) lengths and somatic hypermutation (SHM) levels. This contrasts with the spleen IgM repertoires that remain stable and distinct from the IgA repertoires in the gut. We also discovered that large clones that germinate in the gut are initially confined to a specific gut compartment, then expand to nearby compartments and later on expand also to the spleen and remain there. Finally, we explored the associations between diversity indices of the B cell repertoires and the microbiome, as well as associations between bacterial and B cell receptor clusters. Our results shed light on the ontogeny of the adaptive immune system and the microbiome, providing a baseline for future research.

### Echidna: integrated simulations of single-cell immune receptor repertoires and transcriptomes

Han, Jiami

Single-cell sequencing now enables the recovery of full-length immune repertoires [B cell receptor (BCR) and T cell receptor (TCR) repertoires], in addition to gene expression information. The feature-rich datasets produced from such experiments require extensive and diverse computational analyses, each of which can significantly influence the downstream immunological interpretations, such as clonal selection and expansion. Simulations produce validated standard datasets, where the underlying generative model can be precisely defined and furthermore perturbed to investigate specific questions of interest. Currently, there is no tool that can be used to simulate a comprehensive ground truth single-cell dataset that incorporates both immune receptor repertoires and gene expression. Therefore, we developed Echidna, an R package that simulates immune receptors and transcriptomes at single-cell resolution. Our simulation tool generates annotated single-cell sequencing data with user-tunable parameters controlling a wide range of features such as clonal expansion, germline gene usage, somatic hypermutation, and transcriptional phenotypes. Echidna can additionally simulate time-resolved B cell evolution, producing mutational networks with complex selection histories incorporating class-switching and B cell subtype information. Finally, we demonstrate the benchmarking potential of Echidna by simulating clonal lineages and comparing the known simulated networks with those inferred from only the BCR sequences as input. Together, Echidna provides a framework that can incorporate experimental data to simulate single-cell immune repertoires to aid software development and bioinformatic benchmarking of clonotyping, phylogenetics, transcriptomics and machine learning strategies.

### Next-generation germline reference sets for B-cell and T-cell repertoire sequencing

Lees, William

Adaptive immune repertoire sequencing (AIRR-seq) has become an established research tool for understanding the immune response to infection, disease and other stimuli. Clinical applications are also emerging. Germline inference techniques, applied to these repertoires, have revealed a much greater degree of genetic diversity than previously expected - a diversity that is not reflected in currently available germline reference sets. Reference set problems limit the accuracy of AIRR-seq analysis and impede the ability to determine genetic modulators of the immune response, particularly in work on non-European human populations, or laboratory animals. Next-generation techniques, such as the inference of germline alleles from AIRR-seq and high-throughput long-read genomic sequencing, are starting to provide considerable volumes of additional data, but often do not supply the comprehensive information required for traditional germline gene curation. Nevertheless, information from these studies, for example the existence of genes and alleles that cannot be fully mapped into a genomic reference sequence, can be valuable for AIRR-seq analysis. In the past, individual research groups have published sets developed from their own work, but these can be problematic in use, as naming conventions vary between groups, and the degree of overlap can be hard to establish. Here we describe an approach, developed within the AIRR Community, which can allow researchers interested in the discovery of germline genes in a particular species to curate and maintain reference sets quickly and co-operatively, in advance of more formal systematization and ratification. Such an approach has the potential to improve the quality of resources available for AIRR-seq analysis. It has already been adopted in the publication of germline sets for common laboratory mouse strains.

### Subfunctionalization and constrained size of the immunoglobulin loci in Ambystoma mexicanum.

Martinez-Barnetche, Jesus

The Mexican axolotl Ambystoma mexicanum, is a well-known urodele amphibian and model species of tissue regeneration. It is a paedomorhic organism and little is known about how developmental particularities influence its adaptive immune response. Additionally, as many salamanders A. mexicanum has a gigantic genome (32 Gb), which raises the question of whether there are genome size limits for immunoglobulin transcription and V(D)J recombination. The current version of the A. mexicanum genome is a chromosome-scale assembly resulting of hybrid NGS, optical mapping and Hi-C. We have characterized the immunoglobulin loci using conventional reference sequence mapping, as well as RNA-seq and AIRR-seq data. Overall, loci are arranged similarly to Xenopus tropicalis, but there is extensive pseudogenization of V genes, pseudogenization of IGHF (phi), complete absence of the kappa locus and a restricted sigma locus (only 5 V genes). Although loci size is somewhat proportional to the increase of genome size, we found evidence of V-intron length and IGHM intron size are evolutionarily constrained. Similarly, IGHV and IGLV intergenic distances in A. mexicanum, human, D. rerio and X. tropicalis were smaller than whole genome and p450 family intergenic distances. We propose that genome growth is constrained at the Ig loci so that Ig transcription and splicing, as well as V(D)J recombination can proceed adequately.

### How does T-FR limit autoreactivity despite being outnumbered in the GC?

Mitra, Tanmay

The presence of T-follicular regulatory cells (TFRs) is critical in germinal centre (GC) to limit self-reactivity in the GC and differentiation of GC B cells into the self-reactive plasma cells. Although recent evidence is suggestive of a direct physical interaction between the GC B cells and TFRs, the exact mechanism by which TFRs mediate GC reactions and control autoimmunity is controversial and largely unknown. One critical question in this context is also the functional capability of the TFRs to limit autoreactivity despite being outnumbered as compared to T-follicular helper cells (TFHs). By invoking several phenomenological models based on different physiological mechanisms and possibilities that may govern the interactions of GC B cells with TFRs and TFHs and the resultant dynamics of the constituent cells of the GC, we investigate how the control of autoimmunity can be achieved in a physiologically realistic situation. We also analyze perturbations of the mechanisms which can lead to the loss of immunological homeostasis and contribute to the emergence of autoimmunity. Our analysis speaks in favour of a mechanism wherein the TFRs are predominantly self-specific and scan through the self-reactive centrocytes in the GC in a chemokine-dependent manner. In addition, an interaction with a TFR cell requires to invoke alteration in the intracellular signalling of the self-reactive GC B cells either by promoting the DZ phenotype, thus preventing their differentiation into self-reactive plasma cells, and/or by stopping them to acquire TFH help. The thus derived theory of control of autoreactivity in GC responses is compared to experimental readouts and we propose concluding experiments to support or contradict the theory.

### Topic modeling on AIRR-seq data enables identification and generation of disease-associated sequences without individual sequence labels

Slabodkin, Andrei

Adaptive immune receptor repertoire (AIRR) data are complex and carry disease and infection relevant information in the form of sequence-based immune signals. For an AIRR containing an immune signal, the fraction of AIR sequences bearing this signal may be very low. One of the major unresolved challenges in AIR diagnostics and therapeutics discovery, is to predictively isolate the immune signal from an AIRR and then to use this signal to engineer novel sequences. This is a machine learning (ML) problem that bridges repertoire and sequence level classification. Existing ML methods rely on engineered features or on a predefined knowledge about repertoire generation. To address this problem, We developed AIRRTM, an end-to-end generative model based on an encoder-decoder architecture and topic modeling (TM). AIRRTM reaches stable performance in identification and generation of disease specific sequences using repertoire labels but not sequence labels.

### Systematic cell-cell communication analysis translates transcriptional changes into a multi-cellular context

Steinheuer, Lisa Maria

Progress in high-throughput technologies such as RNA sequencing and mass cytometry now allows us to characterize cell types and excavate changes in transcriptional profiles at single-cell resolution. Furthermore, such high-dimensional data also enables us to investigate and quantify cell-cell communication patterns by analyzing the expression of cytokines and cytokine receptors. Here, we used rich datasets capturing cell-cell communication in the immune system, including patient samples from Systemic Lupus Erythematosus (SLE) and Inflammatory Bowel Disease (IBD) as well as tumour infiltrating lymphocytes in several cancer entities. To delineate statistical differences within these communication signals, we used and adapted published methods including CytoSig, NicheNet and CellPhoneDB and embedded them into a statistical framework. We detected profound differences in cell communication between healthy donors and disease patients in a systematic, large-scale, and unbiased manner. Moreover, we were able to increase the resolution down to cell type-specific communication patterns, highlighting the different contributions of immune cells. Next, we will investigate how to integrate other data modalities in order to detect relevant cell-cell interactions across different ‘omics’ layers. These insights can then be used to mechanistically model cellular environments such as the gut or tumour microenvironment (TME) and investigate potential perturbations upon drug therapy.

### Collective motility of T-cell crowds

Sultan, Shabaz

Immune activation and presentation of antigen to the immune repertoire is a complex spatial process. Millions of T cells move around inside lymph nodes at high speed, without getting jammed up. Antigen presenting cells physically interact with a large number of T cells to efficiently scan the receptor repertoire, a process emergent from intricate collective motility of the cell crowd. To understand how observed motility of T cells arise from crowd interactions and environmental constraints, we use high resolution data of different organ environments, and in vivo live cell two-photon microscopy data of T cell motility. We introduce this data to a novel simulation environment that is able to simulate millions of cells with full three dimensional morphology, at micron scale resolution. We combine this morphological simulation with a simulation of intra-cellular processes that lead to accurate cell motility. We show how cell-cell interactions combined with local polarisation of cells leads to collective motility matching empirical data. We further show that characteristic stop-and-go behaviour of T cells naturally emerges from said internal polarisation mechanism, but that this behaviour goes away when cells are placed in a crowd setting. We examine adhesion of cells to stromal structures in the environment, and how this influences stop-and-go behaviour. Finally, we look at how per cell motility influences the exploratory behaviour of the crowd, and how exploration behaviour differs in different organ environments.

### Characterisation of third-party donor-derived EBV-specific T cells for clinical treatment using TCR repertoire sequencing.

Sutherland, Catherine

Latent Epstein Barr Virus (EBV) reactivation in patients undergoing immune suppression for transplant can occasionally result in development of an aggressive B-cell lymphoma. Treatments are limited and in rituximab-refractory cases survival is less than 5% at 12 months. The Scottish National Blood Transfusion Service maintains a bank of EBV-specific T cells from healthy donors to treat this condition. Patients who complete treatment have a survival rate of over 65% which is strongly associated with HLA match between donor and recipient. However, survival rate is not entirely explained by patient condition or HLA match. Therefore, we investigated whether variability in the TCR repertoire between isolates could be linked to patient response. Fifteen EBV-specific T cell products, selected by either coculture with lymphablastoid cell lines or stimulation with EBV peptides, were characterised using high-throughput next-gen TCR sequencing. We found that the isolates were oligoclonal with substantial variation in clonality between different donors, and there was little sharing of sequences between samples. TCR sequences previously determined to be EBV-specific could be identified alongside sequences specific to many other viral epitopes. Selection method did not have a significant effect on repertoire composition but a slight increase in EBV-specific sequences was observed in peptide stimulated products.

### Analysing cell-cell interaction dynamics in the TME using a data-driven spatial model

van der Voort, Gemma

The tumour microenvironment (TME) is a complex biological niche populated by tumour cells, stromal cells and immune cells. A better quantitative understanding and characterisation of this environment and its heterogeneity between patients is of vital importance in developing effective treatment strategies. The fates of different T cell populations within the TME play a major role in cancer progression. For improved insight into these populations, we need to analyse cell-cell interaction dynamics in the TME, focussing on both CD4+ and CD8+ T cells in different stages of differentiation. To address this, we are developing a quantitative, data-driven spatial model of these cells within the TME. In order to establish a model with predictions of clinical relevance, a data-driven approach is key. Our approach has two main pillars. In the first, we established a mechanistic, qualitative model of T cell differentiation in the TME context. In the second, we developed an analysis pipeline for Co-detection by indexing (CODEX) imaging data. CODEX is a multiplexed imaging modality able to resolve up to 60 markers in situ, revealing highly detailed single-cell spatial relationships. We will leverage this spatial information to quantitatively characterise the spatial model of T cells in the TME.