My latest open source software project went live today on Sourceforge. PyNAST is biological sequence alignment tool which applies the NAST (Nearest Alignment Space Termination) algorithm to align a set of input (or candidate) sequences against a template alignment. The NAST algorithm guarantees that the number of positions in the output alignment will be identical to the number of positions in the template alignment. This is extremely convenient, for example, when you have a multiple sequence alignment that was built manually, and you want to study newly acquired sequences in the context of data (such as phylogenetic trees) which were derived from the manual alignment.
NAST (originally published here) has primarily been used for aligning newly acquired 16s rDNA sequences against the Greengenes “core sets” via the Greengenes website. NAST has become a popular tool in microbial community analysis, but wider adoption has been limited by the difficulty of running the original implementation locally. Since users may need to align thousands or even hundreds of thousands of sequences, it is important for them to be able to run the software on their own laptops, servers, or clusters. PyNAST, which was developed in collaboration with some of the original NAST authors, provides a command line interface, an API, and a Mac OS X GUI (which will go live shortly) to provide convenient access in all of these environments. Additionally, because users can provide their own template alignments when running locally, PyNAST is not specific to 16s rDNA alignments.
We currently have an Applications Note which provides more details on PyNAST, in addition to some speed benchmarks, under review at Bioinformatics.