What is CAMERA?

CAMERA stands for Community Cyberinfrastructure for Advanced Microbial Ecology Research and Analysis. The aim of this project is to serve the needs of the microbial ecology research community, and other scientists using metagenomics data, by creating a rich, distinctive data repository and a bioinformatics tools resource that will address many of the unique challenges of metagenomic analysis. The Project was initiated by the Gordon and Betty Moore Foundation, beginning in Jan 2006.

To achieve this aim, CAMERA has developed the cyberinfrastructure necessary to support the data, tools and resources that are needed to enable the scientific community to use the rapidly growing treasure of metagenomic information. The cyberinfrastructure will, of course, continue to be advanced as new software and other technologies become suitable or available. Success in this effort will accelerate our understanding of biology and deliver novel biological solutions to important societal challenges in health care, energy, and the environment.

CAMERA provides access to raw environmental sequence data, associated metadata, pre-computed annotation and analyses, and high-performance computational resources. It is based on innovative cyberinfrastructure leveraging emerging concepts in data storage, access, analysis, and synthesis not available in current gene sequence resources.

This resource, to the extent that it is available or can be ascertained, includes the metadata associated with collection of the samples: the location, date, and time of collection; the chemical and physical conditions where the sample was taken; and a measure of its living environment, i.e., all the other sequences found in the same sample. As they beomce available, CAMERA will continue to incorporate additional metagenomic and reference microbial complete genome sequences (along with genes and gene families), together with their annotations and associated environmental metadata. In addition, a suite of tools and a computational workflow have been developed to enable scientists to analyze the data in innovative and more comprehensive ways. Additional tools will be added when ascertained to be appropriate and effective for our users.

CAMERA releases new projects and data on a weekly basis to maintain a comprehensive collection of metagenomic and related genomic datasets up to date while, also maintaining local copies of up-to-date relevant, reference sequence datasets such as GenBank and RefSeq. CAMERA collects and links metadata relevant to environmental metagenome datasets with annotation in a semantically-aware environment allowing users to write expressive semantic queries against the database. To meet the research challenges, users are able to query metadata categories such as habitat, sample type, time, location, and other environmental physicochemical parameters.

CAMERA is compliant with the standards promulgated by the Genome Standards Consortium (GSC), and sustains a role within the GCS in extending standards for content and format of the metagenomic data and metadata and its submission to the CAMERA repository. To ensure wide, ready access to data and annotation, CAMERA also provides data submission tools to allow researchers to share and forward data to other metagenomics sites and to other community data archives such as GenBank. It has multiple interfaces for easy submission of large or complex datasets, and supports pre-registration of samples for sequencing. CAMERA integrates tools for sequence quality control, assembly, gene prediction and annotation, clustering, functional and comparative genomics applications, and many other downstream analysis tools.

All tools are organized in an extensible workflow system with provenance also being maintained, which allows users to view and download results, browse the specific details on how results are generated, and upload their own workflows. The flexibility of the system enables tool integration from the broader research community and promotes the maintenance of an extensible analysis environment. An important aspect of the workflow environment is the organization of workflows into a systematic network, in which the output for one functional unit can be used as an input for subsequent workflow runs. This allows researchers to build a complete end-to-end analysis stream by choosing different combinations of workflows based on their specific needs for a given analysis.

CAMERA also provides high-performance networking access and grid-based computing, and a large amount of rotating storage to support analyses. The project sustains version updates and software upgrades training sessions, and periodic solicitation of feedback to ensure the infrastructure and services continue to serve the needs of the scientific community. The success of this project will depend heavily on continuous input from the genomics, microbiology, molecular biology, ecology, and related communities about their needs and priorities. We encourage your feedback on the utility of the tools and data sets we make available and how we can improve on them in subsequent releases.

The approaches and observations in metagenonmics represents a major scientific revolution, which led a National Academy of Sciences study to call it a new science. For example, marine metagenomics is emerging as a focus for innovation at the interface of marine environmental science and information technology. Similar scientific dynamics arise from soil metagenomics and the study of host-based metagenomics or microbiomes. The pace of development and the power of gene sequencing for biological discovery are increasing rapidly with the application of shotgun sequencing technology to entire microbial communities.

Unlike the traditional culture-based sequencing methods, metagenomics arises from a breakthrough sequencing approach to examine the interaction of countless microbial species present at a specific environmental location and offers tremendous potential to understand better the functioning of natural ecosystems. It is enabling scientists to consider each gene in the context of its ecology: the composition of the rest of the community, the environmental conditions in which it is found, and its relationships with other species (and when relevant, with its host) with which it is found at other times and places.

Responding the opportunities and challenges of the new science, this resource, when fully annotated, will form a Knowledge Base of federated (microbial ecological) information relating to genomic sequences and their associated metadata to support a fundamental paradigm shift in the way in which the biological and biomedical sciences develop in the 21st century.