Jens Stoye

Computational Short Read Metagenomics
Publié le 26 février 2009, mise à jour le 27 mars 2009


Computational Short Read Metagenomics

Jens Stoye
Bielefeld University, Germany

Metagenomics is a new field of research on metagenomes, where natural microbial communities are studied. The new sequencing techniques like 454 or Solexa-Illumina sequencing promise new possibilities as they are able to produce huge amounts of data in much shorter time and with less efforts and costs than the traditional Sanger sequencing. But the data produced comes in even shorter reads (35-50 base pairs with Solexa-Illumina, 100-300 basepairs with 454 sequencing). CARMA [1] is a new pipeline for the characterization of the species composition and the genetic potential of microbial samples using 454-sequenced reads. The species composition can be described by classifying the reads into the taxonomic groups of organisms they most likely stem from. By assigning the taxonomic origins to the reads, a profile is constructed which characterizes the taxonomic composition of the corresponding community. The CARMA pipeline has already been successfully applied to 454-sequenced communities [2,3] including the characterization of a plasmid sample isolated from a wastewater treatment plant [4].

Using samples from a biogas plant we examined the applicability of this approach for the ultra-short Solexa-Illumina reads by comparing the results with those obtained by the 454-sequenced sample [5,6]. Our results using 77 million 50 bp-reads revealed that this approach indeed produces consistent results. Most diffences we have found are in the taxa of higher order, e.g. in the species level, and in general for species with a very low presence.

In order to apply CARMA to high-throughput sequencing data, we had to improve the accuracy and speed of our method in various ways : A preprocessing assembly phase using an adapted q-gram index [7] ; adaptation of the pipeline to take the information of mated reads into account to "increase" read length ; modification of the amino acid sequence distance function for the construction of the phylogenetic tree ; and implementation of a protein-q-gram index over a multiple alignment for the read-against-Pfam protein family matching.

References

[1] L. Krause, N.N. Diaz, A. Goesmann, S. Kelley, T.W. Nattkemper, F. Rohwer, R.A. Edwards, J. Stoye (2008) Phylogenetic classification of short environmental DNA fragments. Nucleic Acids Res. 36(7):2230-2239.

[2] E.A. Dinsdale, O. Pantos, S. Smriga, R.A. Edwards, F. Angly, L. Wegley, M. Hatay, D. Hall, E. Brown, M. Haynes, L. Krause, E. Sala, S.A. Sandin, R. Vega Thurber, B.L. Willis, F. Azam, N. Knowlton, F. Rohwer (2008) Microbial Ecology of Four Coral Atolls in the Northern Line Islands. PLoS ONE 3(2):e1584.

[3] S.A. Sandin, J.E. Smith, E.E. DeMartini, E.A. Dinsdale, S.D. Donner, A.M. Friedlander, T. Konotchick, M. Malay, J.E. Maragos, D. Obura, O. Pantos, G. Paulay, M. Richie, F. Rohwer, R.E. Schroeder, S. Walsh, J.B.C. Jackson, N. Knowlton, E. Sala (2008) Baselines and Degradation of Coral Reefs in the Northern Line Islands. PLoS ONE, 3(2):e1548.

[4] A. Schlüter, L. Krause, R. Szczepanowski, A. Goesmann, A. Pühler (2008) Genetic diversity and composition of a plasmid metagenome from a wastewater treatment plant. J. Biotechnol. 136(1-2):65-76.

[5] L. Krause, N.N. Diaz, R.A. Edwards, K.-H. Gartemann, H. Krömeke, H. Neuweger, A. Pühler, K.J. Runte, A. Schlüter, J. Stoye, R. Szczepanowski, A. Tauch, A. Goesmann (2008) Taxonomic composition and gene content of a methane-producing microbial community isolated from a biogas reactor. J. Biotechnol. 136(1-2), 91-101.

[6] A. Schlüter, T. Bekel, N.N. Diaz, M. Dondrup, R. Eichenlaub, K.-H. Gartemann, I. Krahn, L. Krause, H. Krömeke, O. Kruse, J.H. Mussgnug, H. Neuweger, K. Niehaus, A. Pühler, K.J. Runte, R. Szczepanowski, A. Tauch, A. Tilker, P. Viehöver, A. Goesmann (2008) The metagenome of a biogas-producing microbial community of a production-scale biogas plant fermenter analysed by the 454-pyrosequencing technology. J. Biotechnol. 136(1-2):77-90.

[7] K. Rasmussen, J. Stoye, E.W. Myers (2006) Efficient q-Gram Filters for Finding All epsilon-Matches over a Given Length. J. Comp. Biol. 13(2):296-308.


calle
calle
calle