Ultrarapid pathogen identification by NGS assays

A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples
Genome Res. 2014 Jul;24(7):1180-92.
Samia N. Naccache, et al.

Unbiased next-generation sequencing (NGS) approaches enable comprehensive pathogen detection in the clinical microbiology laboratory and have numerous applications for public health surveillance, outbreak investigation, and the diagnosis of infectious diseases. However, practical deployment of the technology is hindered by the bioinformatics challenge of analyzing results accurately and in a clinically relevant timeframe. Here we describe SURPI (“sequence-based ultrarapid pathogen identification”), a computational pipeline for pathogen identification from complex metagenomic NGS data generated from clinical samples, and demonstrate use of the pipeline in the analysis of 237 clinical samples comprising more than 1.1 billion sequences. Deployable on both cloud-based and standalone servers, SURPI leverages two state-of-the-art aligners for accelerated analyses, SNAP and RAPSearch, which are as accurate as existing bioinformatics tools but orders of magnitude faster in performance. In fast mode, SURPI detects viruses and bacteria by scanning data sets of 7-500 million reads in 11 min to 5 h, while in comprehensive mode, all known microorganisms are identified, followed by de novo assembly and protein homology searches for divergent viruses in 50 min to 16 h. SURPI has also directly contributed to real-time microbial diagnosis in acutely ill patients, underscoring its potential key role in the development of unbiased NGS-based clinical assays in infectious diseases that demand rapid turnaround times.

NGS assays for pathogen diagnosis

journalistic version:


The “right not to know” genetic risk information

ACMG Recommendations for Reporting of Incidental Findings in Clinical Exome and Genome Sequencing
American College of Medical Genetics and Genomics
21 March 2013

Keywords: secondary findings, incidental findings, genome, genomic medicine, personalized medicine, whole-exome, whole-genome, sequencing

For example, sequencing could have areas of diminished or absent coverage in the genes examined for incidental findings that would be filled in by Sanger sequencing or other supplementary approaches if the gene were being evaluated for a primary indication.
In addition, while genome sequencing can provide increasingly reliable information on copy number variation and translocations, exome sequencing is currently less reliable, and neither technology can be used to measure tandem repeat size accurately.
For these reasons, we did not include some disorders where structural variants (e.g., translocations and inversions), repeat expansions, or copy number variations are the primary cause, and have not recommended that laboratories utilize orthogonal techniques to search for these variants in the genes named in the minimum list.

… Given these recommendations, the Working Group was concerned that a negative incidental findings report could be misconstrued by clinicians or patients as an assurance of the absence of a pathogenic variant, which is not always the case.

there is no single database currently available that represents an accurately curated compendium of known pathogenic variants, nor is there an automated algorithm to identify all novel variants meeting criteria for pathogenicity.

We recognize that this may be seen to violate existing ethical norms regarding the patient’s autonomy and “right not to know” genetic risk information.
However, in selecting a minimal list that is weighted toward conditions where prevalence may be high and intervention may be possible, we felt that clinicians and laboratory personnel have a fiduciary duty to prevent harm by warning patients and their families about certain incidental findings and that this principle supersedes concerns about autonomy, …

journalistic version:
“[I’m] not sure we have the cognitive capacity.”


Pleiotropic effects on psychopathology (Lancet 2013)

Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis.
Lancet. 2013 Apr 20;381(9875):1371-9.
Cross-Disorder Group of the Psychiatric Genomics Consortium.
Erratum in: Lancet. 2013 Apr 20;381(9875):1360.

Findings from family and twin studies suggest that genetic contributions to psychiatric disorders do not in all cases map to present diagnostic categories.
We aimed to identify specific variants underlying genetic effects shared between the five disorders in the Psychiatric Genomics Consortium: autism spectrum disorder, attention deficit-hyperactivity disorder, bipolar disorder, major depressive disorder, and schizophrenia.

We analyzed genome-wide single-nucleotide polymorphism (SNP) data for the five disorders

SNPs at four loci surpassed the cutoff for genome-wide significance (p<5×10(-8)) in the primary analysis: regions on chromosomes 3p21 and 10q24, and SNPs within two L-type voltage-gated calcium channel subunits, CACNA1C and CACNB2.
Model selection analysis supported effects of these loci for several disorders.
Loci previously associated with bipolar disorder or schizophrenia had variable diagnostic specificity.
Pathway analysis supported a role for calcium channel signalling genes for all five disorders.

Our findings show that specific SNPs are associated with a range of psychiatric disorders of childhood onset or adult onset. In particular, variation in calcium-channel activity genes seems to have pleiotropic effects on psychopathology.
These results provide evidence relevant to the goal of moving beyond descriptive syndromes in psychiatry, and towards a nosology informed by disease cause.

cited by:
Genes and the Human Condition (From Behavior to Biotechnology)
Coursera 2014

FTO—the first GWAS-identified obesity gene

The bigger picture of FTO—the first GWAS-identified obesity gene
Nature Reviews Endocrinology  10, 51–61 (2014)
Ruth J. F. Loos & Giles S. H. Yeo

Single nucleotide polymorphisms (SNPs) that cluster in the first intron of fat mass and obesity associated (FTO) gene are associated obesity traits in genome-wide association studies.
The minor allele increases BMI by 0.39 kg/m2 (or 1,130 g in body weight) and risk of obesity by 1.20-fold.
This association has been confirmed across age groups and populations of diverse ancestry; the largest effect is seen in young adulthood.
The effect of FTO SNPs on obesity traits in populations of African and Asian ancestry is similar or somewhat smaller than in European ancestry populations.
However, the BMI-increasing allele in FTO is substantially less prevalent in populations with non-European ancestry.
FTO SNPs do not influence physical activity levels; yet, in physically active individuals, FTO’s effect on obesity susceptibility is attenuated by approximately 30%.
Evidence from epidemiological and functional studies suggests that FTO confers an increased risk of obesity by subtly changing food intake and preference.
Moreover, emerging data suggest a role for FTO in nutrient sensing, regulation of mRNA translation and general growth.
In this Review, we discuss the genetic epidemiology of FTO and discuss how its complex biology might link to the regulation of body weight.

De novo mutations: 74 SNVs per genome

Current estimates of the average mutation frequencies for the different types of de novo genomic variation observed per generation per genome.

De novo mutations in human genetic disease
Nature Reviews Genetics 13, 565-575 (August 2012)
Joris A. Veltman, et al.

New mutations have long been known to cause genetic disease, but their true contribution to the disease burden can only now be determined using family-based whole-genome or whole-exome sequencing approaches.

In this Review we discuss recent findings suggesting that de novo mutations play a prominent part in rare and common forms of neurodevelopmental diseases, including intellectual disability, autism and schizophrenia.
De novo mutations provide a mechanism by which early-onset reproductively lethal diseases remain frequent in the population.
These mutations, although individually rare, may capture a significant part of the heritability for complex genetic diseases that is not detectable by genome-wide association studies.

GWAS explains only a small proportion of the total heritability

By exploiting allied phenotypic data, it is possible to examine the genetic contribution to such aspects of disease biology (including prognosis) by comparing the genetic profiles of patients with contrasting clinical phenotypes—a so-called ‘within-cases’ analysis.

Prognosis in autoimmune and infectious disease: new insights from genetics
Clinical & Translational Immunology (2014) 3, e15
James C Lee, et al.

Keywords: autoimmunity; FOXO3; genetics; infection; prognosis

despite the apparent success, GWAS results have only explained a relatively small proportion of the total heritability of each disease.[3]
Work is now underway to try to identify the ‘missing heritability’ through a variety of complementary methods, including:

  • whole-genome sequencing (to identify rare variants that may have larger effect sizes) and
  • studies to examine interactions between a given gene and other genes (epistasis) and
  • between genes and the environment.


Single-cell genomics for complex malaria

Single-cell genomics for dissection of complex malaria infections
Genome Res. 2014. Published in Advance May 8, 2014.
Shalini Nair, et al.

Most malaria infections contain complex mixtures of distinct parasite lineages.
These multiple-genotype infections (MGIs) impact virulence evolution, drug resistance, intra-host dynamics, and recombination, but are poorly understood.

To address this we have developed a single-cell genomics approach to dissect MGIs.
By combining cell sorting and whole-genome amplification (WGA), we are able to generate high-quality material from parasite-infected red blood cells (RBCs) for genotyping and next-generation sequencing.
We optimized our approach through analysis of >260 single-cell assays.
To quantify accuracy, we decomposed mixtures of known parasite genotypes and obtained highly accurate (>99%) single-cell genotypes.
We applied this validated approach directly to infections of two major malaria species, Plasmodium falciparum, for which long term culture is possible, and Plasmodium vivax, for which no long-term culture is feasible.

We demonstrate that our single-cell genomics approach can be used to generate parasite genome sequences directly from patient blood in order to unravel the complexity of P. vivax and P. falciparum infections.
These methods open the door for large-scale analysis of within-host variation of malaria infections, and reveal information on relatedness and drug resistance haplotypes that is inaccessible through conventional sequencing of infections.

journalistic version: