Recent advances in sequencing technology have made it possible to sample the immune repertoire in exquisite detail. Deep sequencing of antibody/B-cell and T-cell receptor repertoires (AIRR-seq data) has enormous promise for understanding the dynamics of the immune repertoire in vaccinology, infectious diseases, autoimmunity, and cancer biology, but also poses significant challenges. “The data are very complicated, there are many steps to obtaining them, and each of those steps can be done slightly differently,” said Felix Breden, PhD, co-founder of the Adaptive Immune Receptor Repertoire (AIRR) Community and scientific director of iReceptor.
According to Breden, genes of the adaptive immune system are some of the most complicated, duplicated, and highly evolved genes in vertebrates. “Trying to understand the immunoglobulin and T-cell receptor loci and the expression of the T-cell and B-cell repertoires require data standardization to facilitate sharing. But the process of implementing standardization is slow, takes community initiative, face-to-face work, and follow-up.”
This need led Jamie Scott, PhD, Tom Kepler, PhD, and Breden to brainstorm an Open Science grassroots community in 2014. Today, the AIRR Community is an official committee of The Antibody Society and develops and promotes standards and recommendations for obtaining, analyzing, curating, and comparing/sharing AIRR-seq datasets.1 They validate tools to analyze AIRR-seq data and relate AIRR-seq datasets to other “big data” sets, such as microarray, flow cytometric, MiSeq, and single-cell gene-expression data, and address the legal and ethical issues involving the use and sharing of datasets derived from human sources.
The AIRR Community developed the AIRR Data Commons, which follow a distributed data model and is currently composed of seven globally distributed repositories that provide public access to more than 80 MiARR-compliant studies, including many COVID-19 studies, and the accompanying AIRR-seq data.2 MiARR is a set of standards and protocols for curating and sharing the immense repositories.
The focus of the iReceptor platform is to federate the large AIRR Data Commons and to facilitate the curation, analysis, and sharing of these antibody/B-cell and T-cell receptor repertoires. The platform connects the distributed network, allowing queries across multiple projects, labs, and institutions. Over five billion sequences and 8,987 repertoires are currently available from seven remote repositories, 70 research labs, and 85 studies.
“Think of the AIRR Data Commons as a beautiful art gallery where you can show your data, in a usable form, to the world, and the iReceptor Gateway as the Google of AIRR repertoires, enabling queries such as ‘federate all repertoires of ovarian cancer patients under a particular treatment.’ Studies produce huge amounts of data from only a few samples, and, sometimes, the signal is very weak and difficult to use for predictions. There is a real need is to bring the data together from multiple studies to get larger sample sizes,” said Breden.
Present functionalities allow searching for repertoires satisfying certain metadata, repertoires that contain specific CDR3 sequences, and identified repertoires for sequences derived from particular V, D, and J genes and alleles. Sequences can be downloaded in AIRR.tsv format, easily importable to other AIRR-seq analysis tools, or analyzed through the Gateway with common tools.
Important new functionalities of the AIRR Data Commons and the iReceptor Gateway are the ability to curate, share, and analyze single-cell profiling data. This approach allows linkage of an immune receptor with the physiological state of the cell. “It is a huge advantage in trying to understand and predict the behavior of the adaptive immune system,” said Breden.
“Since single-cell sample sizes are smaller, sharing becomes even more important. The AIRR Data Commons are AIRR-seq data repositories from multiple laboratories that anyone can use,” said Breden. Located at Simon Fraser University, the iReceptor Gateway, an implementation of the vision of the AIRR Community, follows the AIRR Community standards and is part of the EU/CIHR funded iReceptor Plus Consortium.3
Single-cell immune profiling is only going to get more complicated and the data more difficult to curate. “Community-adopted standards are increasingly important,” said Breden. “The AIRR Community is open and transparent. We publish papers with standards we have developed and then distribute them to be voted on by the entire community to get buy-in. Anyone can actively become involved in a Working Group.”
To query or contribute to the AIRR Data Commons contact support@ iReceptor.org.
References
- Trück J, Eugster A, Barennes P, Tipton CM, Luning Prak ET, et al. Biological controls for standardization and interpretation of adaptive immune receptor repertoire profiling (2021) eLife;10:e66274.
- Christley S, Aguiar A, Blanck G, Breden F, Bukhari SAC, et al. The ADC API: A web API for the programmatic query of the AIRR Data Commons. (2020) Big Data; 3:22.
- Corrie BD, Marthandan N, Zimonja B, Jaglale J, Zhou Z, et al. iReceptor: a platform for querying and analyzing antibody/B-cell and T-cell receptor repertoire data across federated repositories. (2018) Immunol Rev.; 284(1): 24–41.