However, when multiple assessments of hypotheses are conducted simultaneously, more than 5% of them are very likely to be statistically significant purely by chance; for example, whenmtests (here, number of genes) are performed, the experimentwise significance level will be given by 1 (1 )mmwhenis small and will represent the global Type I error rate (i.e., risk to incorrectly reject a true null hypothesisH0(false positive), here, for instance, the difference in proportions of IMGT clonotypes (AA) with the genekbetween two sets would be declared as significant while this is not the case). and immunoprofiles of the adaptive immune responses. It provides the identification of the variable (V), diversity (D) and joining (J) genes and alleles, analysis of the V-(D)-J junction and complementarity determining region 3 (CDR3) and the characterization of the IMGT clonotype (AA) (AA for amino acid) diversity and expression. IMGT/HighV-QUEST compares outputs of different batches, up to one million nucleotide sequencesfor the statistical module. These high throughput IG and AZD3839 TR repertoire immunoprofiles are of primary importance in vaccination, cancer, infectious diseases, autoimmunity and lymphoproliferative disorders, however their comparative statistical analysis still remains a challenge. We present a standardized statistical procedure to analyze IMGT/HighV-QUEST outputs for the evaluation of the significance of the IMGT clonotype (AA) diversity differences in proportions, per gene of a given group, between NGS IG and TR repertoire immunoprofiles. The procedure is usually generic and suitable for evaluating significance of the IMGT clonotype (AA) diversity and expression per gene, and for any IG and TR immunoprofiles of any species. == Introduction == IMGT, the international ImMunoGeneTics information system (http://www.imgt.org) [1], was created in 1989 by Marie-Paule Lefranc, Laboratoire dImmunoGntique Molculaire LIGM (Montpellier University and CNRS) at Montpellier, France, in order to standardize and to manage the complexity and the diversity of immunogenetics data. IMGT, built on IMGT-ONTOLOGY [2], is at the origin of immunoinformatics [3], a science at the interface between immunogenetics and bioinformatics. The adaptive immune response was acquired by jawed vertebrates (or gnathostomata) more than 450 million years ago and is found in all extant jawed vertebrate species from fishes to humans [3]. The potential antigen receptor repertoire of each individual is usually estimated to comprise about 1012different immunoglobulins (IG) or antibodies [4] and 1012different T cell receptors (TR) [5] per individual. This huge diversity is created by combinatorial and junctional diversity (together with somatic hypermutations for IG) and the limiting factor is only the number of B and T cells that an organism is usually genetically programmed to produce [3]. IG are made of two identical heavy (H) chains and two identical light (L) (kappa or lambda) chains, encoded by genes located in three major loci: the IG heavy (IGH) locus, IG kappa (IGK) locus and IG lambda (IGL) locus [4,6]. TR are made of two chains, alpha and beta, or AZD3839 gamma and delta, encoded by genes located in four major loci: the TR alpha (TRA), TR beta (TRB), TR gamma (TRG) and TR delta (TRD) [5,7] (see IMGT Repertoire,http://www.imgt.org/IMGTindex/locus.htmland IMGT/GENE-DB [8]). There are four IG or TR gene types: variable (V), diversity AZD3839 (D) (only for IGH, TRB and TRD), joining (J) and constant (C) genes, which define 24 IG and TR groups (e.g., IGHV, IGHD, IGHJ, , TRBV, TRBD, TRBJ, ) [2,3] (http://www.imgt.org/IMGTindex/group.html). The V, D, J and C genes contribute to the IG and TR chain synthesis [35]. The variable domain at the N-terminal end of each IG or TR chain AZD3839 results Mouse monoclonal to CK7 from a V-(D)-J rearrangement whereas the remaining of the chain, or constant region, is encoded by a C gene [35]. The analysis of the immune antigen receptor (IG and TR) repertoires has greatly benefited from the next generation sequencing (NGS) technologies. The vast amount of generated data necessitated the development of novel methods and analysis tools. IMGT/HighV-QUEST [9], a high throughput version of IMGT/V-QUEST [1014] was implemented by IMGT in October 2010 and is the first reference NGS web portal for IG and TR. IMGT/HighV-QUEST analyzes up to 1 1,000,000 IG and TR sequences from NGS high throughput and deep sequencing [9,15,16] and compares outputs of different batches, up to one million nucleotide sequences for the statistical module. The analysis is based on the IMGT-ONTOLOGY concepts of identification, description, classification and numerotation [2]. IMGT/HighV-QUEST AZD3839 [9,15] uses the same algorithm as IMGT/V-QUEST [1014]. It identifies the variable.