All About Chemistry... 2011 and beyond

Related Stories

Faster annotation system for prokaryotic genomes unveiled

Georgia Tech researchers, working with colleagues in the National Center for Biotechnology Information (NCBI), have released a new version of a genome annotation system capable of analyzing more than 2,000 prokaryotic genomes per day, helping researchers accelerate prokaryotic genomics-based studies worldwide.

In biology, prokaryote generally describes a microorganism that lacks a distinct membrane-bound nucleus and has its genetic material contained in a single molecule of DNA. These include bacteria and archaea.

The NCBI operates the Prokaryotic Genome Annotation Pipeline, a high- performance software system designed to analyze gene sequences of these microorganisms. As more high-quality genomes become available -- and as the cost of sequencing continues to fall -- the need for high-throughput analysis and annotation pipelines cannot be overstated.

The latest advance comes as the NCBI incorporates Georgia Tech's GeneMarkS+ into the PGAP system. Developed by Mark Borodovsky's team at Georgia Tech, GeneMarkS+ is a self-training machine learning tool for novel gene identification that can combine intrinsic evidence revealed by genomic sequence patterns with extrinsic evidence derived from already annotated genomes.

"The new system enables researchers to get critically important analysis that consistently integrates information of all sources of evidence nearly in real time instead of days and weeks," said Borodovsky, a Regents' professor with a joint appointment in the School of Computational Science and Engineering and the Coulter Department of Biomedical Engineering. "Our group is excited to be a part of the whole team working on this project with high international visibility."

Before implementing GeneMark+ into the pipeline, the system could handle only 20 annotations daily.

"Dr. Borodovsky worked closely with Tatiana Tatusova's team at NCBI to incorporate and refine GeneMarkS+ in the context of the NCBI annotation pipeline," said Jim Ostell, chief of NCBI's Information Engineering Branch. "It provides a critical core infrastructure to NCBI and to users of NCBI resources."

PGAP uses GeneMarkS+ in conjunction with proteomic evidence obtained from large groups of orthologous gene clusters representing the core protein complement for well-annotated species. As new organisms are sequenced, PGAP adjusts by mining the existing protein information to build new core protein clusters, iteratively improving its annotation based on the ever-increasing wealth of available evidence from submitted bacterial genomes.

The new system offers a modular structure, permitting easy extension with new algorithms. PGAP also provides extensive tracking of execution and decision making, and thus permits an easy trace-back to understand the evidence behind key algorithmic decisions. The PGAP process is described at

PGAP produces high-quality annotation designed to meet INSDC standards for sequence submission and follows UniProt naming guidelines. PGAP is available at NCBI for bacterial genomes as part of GenBank sequence submission, making it a valuable resource to researchers worldwide.

Story Source:

The above story is based on materials provided by Georgia Institute of Technology. Note: Materials may be edited for content and length.

Share this story with your friends!

Social Networking

Please recommend us on Facebook, Twitter and more:

Other social media tools

Global Partners

Tell us what you think of Chemistry 2011 -- we welcome both positive and negative comments. Have any problems using the site? Questions?

About us

Chemistry2011 is an informational resource for students, educators and the self-taught in the field of chemistry. We offer resources such as course materials, chemistry department listings, activities, events, projects and more along with current news releases.

Events & Activities

Are you interested in listing an event or sharing an activity or idea? Perhaps you are coordinating an event and are in need of additional resources? Within our site you will find a variety of activities and projects your peers have previously submitted or which have been freely shared through creative commons licenses. Here are some highlights: Featured Idea 1, Featured Idea 2.

About you

Ready to get involved? The first step is to sign up by following the link: Join Here. Also don’t forget to fill out your profile including any professional designations.

Global Partners