What will we do with the samples we collect?
Very long answer: We will use four approaches to investigate the gut microbiota in all stool samples collected in the study, three of them based on next-generation nucleic acid sequencing and the fourth based on untargeted, high throughput chemical analysis.
16S rRNA sequence surveys: Ribosomes are the macromolecular complexes that carry out protein synthesis in all cellular life forms. The 16S rRNA gene encodes an essential part of the ribosome in Bacteria and Archaea (two of the three major domains of life), and portions of this gene can be chemically amplified from a wide range of microbial species. Because particular types of microbe are associated with distinct sequences in the amplified gene fragments, these sequences can be used like microbial ID cards. When this method is applied to entire microbial communities, the relative abundance of different ID cards (gene sequence variants) that we see is approximately correlated with the relative abundance of the corresponding microbial types in the community. These 16S rRNA gene surveys have some flaws and biases, but are still a powerful tool for inferring the composition of complex microbial communities such as the human gut microbiota. The same basic approach has been used for decades, but we will be using Illumina ® sequencing technology to obtain tens of thousands of 16S rRNA gene fragments per sample, instead of the dozens of sequences per sample that might have been obtained via Sanger sequencing in the 1990s.
Shotgun metagenomic sequencing: With the increasing capacity and decreasing cost of ‘next-generation’ gene sequencing, it has become feasible to obtain sequence data not only from a single type of gene (like 16S rRNA) in a microbial community, but from a random sample of all the genes (the metagenome) of a community. Many of the gene sequences will be recognizably similar to genes that have already been studied in cultivated microbes, allowing us to make reasonable guesses about what functions those genes have. If a 16S rRNA survey is like a collection of microbial ID cards, metagenomic sequencing is like a collection of pages from detailed microbial job descriptions – they let us infer what functions can be carried out in the community. Unfortunately, the data doesn’t come in the form of a complete, separate job description (a complete genome) for each type of microbe. It’s more like millions of pages (individual genes or gene fragments) from hundreds or thousands of different microbial job descriptions (genomes), all mixed together. Interpreting shotgun metagenomic sequence data is much tougher than interpreting 16S rRNA sequence data, but we can already learn a lot from it, and metagenomic analysis methods are rapidly improving.
Metatranscriptomic sequencing: While both 16S rRNA surveys and shotgun metagenomics start from microbial chromosomes, made of DNA molecules, metatranscriptomics begins with chemically similar, but functionally distinct RNA molecules. When a protein-coding gene in the DNA needs to be used to make a protein to carry out a particular task in a microbial cell, the first step is copying (or transcribing) the gene sequence from the DNA into an RNA molecule (specifically messenger RNA, or mRNA). The mRNA then directs a ribosome to make the required protein. Hence, examining all the mRNA transcripts from a community indicates what genes were actually being expressed (i.e. being used) at the time of sampling. To continue with the job analogy, the transcriptome of mRNA molecules from a microbe is more precise in time than the overall microbial 'job description' (genome), it's more like todays 'to do' list for a worker, so the metatranscriptome of an entire microbial community reflects all the tasks that are actually getting done in the community at the time of sampling. In comparison, the metagenome of the community reflects all the tasks that could be done, even if most of those tasks are done only occasionally. As with metagenomics, however, metatranscriptomics data has the complication that the various active tasks of different microbial types are jumbled together - we don't automatically get complete, separate task lists for each type of microbe in the community.
Metabolomics: Unlike the three methods above, metabolomics does not work with nucleic acids or use gene sequencing technology. Instead, it uses methods that simultaneously measure the concentrations of many different chemical compounds in biological material such as feces or urine. Since these chemicals are generally the starting material or end products (or both) for microbial and human metabolic pathways, differences in the identities or concentrations of chemicals detected between samples indicate differences in the microbial and/or human metabolism at the times the samples were taken. We will use two metabolomics methods, NMR (nuclear magnetic resonance) and HPLC-MS (high performace liquid chromatography-mass spectroscopy). The details of these quite-different methods are too involved to explain here, but NMR is generally less sensitive and identifies fewer compounds, but is very good at establishing the identity of the compounds it detects. HPLC-MS detects many more compounds with much greater sensitivity, but often the exact chemical identity of the detected compound is uncertain. As with metagenomics and metatranscriptomics, the biggest challenge of metabolomics is the interpretation of the complex dataset that it generates, but having all these types of data together for each sample will make our conclusions more robust.
Other analyses: Here’s how we’re using the optional procedures some participants are doing: 1) Blood samples will be used for standard clinical tests (e.g. checking cholesterol and trigyceride levels) and to assess immune activation (how ‘primed’ the immune system is to start an inflammatory response). 2) Genetic testing will determine which alleles (gene variants) are present for several different genes with variants that influence how the human body interacts with microbes, and gene expression profiling will let us infer which human genes are being expressed. 3) Physiological testing will be used to determine how much energy is expended during exercise, since this may be affected by our perturbations to the microbiota.
16S rRNA sequence surveys: Ribosomes are the macromolecular complexes that carry out protein synthesis in all cellular life forms. The 16S rRNA gene encodes an essential part of the ribosome in Bacteria and Archaea (two of the three major domains of life), and portions of this gene can be chemically amplified from a wide range of microbial species. Because particular types of microbe are associated with distinct sequences in the amplified gene fragments, these sequences can be used like microbial ID cards. When this method is applied to entire microbial communities, the relative abundance of different ID cards (gene sequence variants) that we see is approximately correlated with the relative abundance of the corresponding microbial types in the community. These 16S rRNA gene surveys have some flaws and biases, but are still a powerful tool for inferring the composition of complex microbial communities such as the human gut microbiota. The same basic approach has been used for decades, but we will be using Illumina ® sequencing technology to obtain tens of thousands of 16S rRNA gene fragments per sample, instead of the dozens of sequences per sample that might have been obtained via Sanger sequencing in the 1990s.
Shotgun metagenomic sequencing: With the increasing capacity and decreasing cost of ‘next-generation’ gene sequencing, it has become feasible to obtain sequence data not only from a single type of gene (like 16S rRNA) in a microbial community, but from a random sample of all the genes (the metagenome) of a community. Many of the gene sequences will be recognizably similar to genes that have already been studied in cultivated microbes, allowing us to make reasonable guesses about what functions those genes have. If a 16S rRNA survey is like a collection of microbial ID cards, metagenomic sequencing is like a collection of pages from detailed microbial job descriptions – they let us infer what functions can be carried out in the community. Unfortunately, the data doesn’t come in the form of a complete, separate job description (a complete genome) for each type of microbe. It’s more like millions of pages (individual genes or gene fragments) from hundreds or thousands of different microbial job descriptions (genomes), all mixed together. Interpreting shotgun metagenomic sequence data is much tougher than interpreting 16S rRNA sequence data, but we can already learn a lot from it, and metagenomic analysis methods are rapidly improving.
Metatranscriptomic sequencing: While both 16S rRNA surveys and shotgun metagenomics start from microbial chromosomes, made of DNA molecules, metatranscriptomics begins with chemically similar, but functionally distinct RNA molecules. When a protein-coding gene in the DNA needs to be used to make a protein to carry out a particular task in a microbial cell, the first step is copying (or transcribing) the gene sequence from the DNA into an RNA molecule (specifically messenger RNA, or mRNA). The mRNA then directs a ribosome to make the required protein. Hence, examining all the mRNA transcripts from a community indicates what genes were actually being expressed (i.e. being used) at the time of sampling. To continue with the job analogy, the transcriptome of mRNA molecules from a microbe is more precise in time than the overall microbial 'job description' (genome), it's more like todays 'to do' list for a worker, so the metatranscriptome of an entire microbial community reflects all the tasks that are actually getting done in the community at the time of sampling. In comparison, the metagenome of the community reflects all the tasks that could be done, even if most of those tasks are done only occasionally. As with metagenomics, however, metatranscriptomics data has the complication that the various active tasks of different microbial types are jumbled together - we don't automatically get complete, separate task lists for each type of microbe in the community.
Metabolomics: Unlike the three methods above, metabolomics does not work with nucleic acids or use gene sequencing technology. Instead, it uses methods that simultaneously measure the concentrations of many different chemical compounds in biological material such as feces or urine. Since these chemicals are generally the starting material or end products (or both) for microbial and human metabolic pathways, differences in the identities or concentrations of chemicals detected between samples indicate differences in the microbial and/or human metabolism at the times the samples were taken. We will use two metabolomics methods, NMR (nuclear magnetic resonance) and HPLC-MS (high performace liquid chromatography-mass spectroscopy). The details of these quite-different methods are too involved to explain here, but NMR is generally less sensitive and identifies fewer compounds, but is very good at establishing the identity of the compounds it detects. HPLC-MS detects many more compounds with much greater sensitivity, but often the exact chemical identity of the detected compound is uncertain. As with metagenomics and metatranscriptomics, the biggest challenge of metabolomics is the interpretation of the complex dataset that it generates, but having all these types of data together for each sample will make our conclusions more robust.
Other analyses: Here’s how we’re using the optional procedures some participants are doing: 1) Blood samples will be used for standard clinical tests (e.g. checking cholesterol and trigyceride levels) and to assess immune activation (how ‘primed’ the immune system is to start an inflammatory response). 2) Genetic testing will determine which alleles (gene variants) are present for several different genes with variants that influence how the human body interacts with microbes, and gene expression profiling will let us infer which human genes are being expressed. 3) Physiological testing will be used to determine how much energy is expended during exercise, since this may be affected by our perturbations to the microbiota.