Getting Started with ATAC

From kmer
Jump to: navigation, search

How to run ATAC, aka, A2Amapper.

COMPILING and INSTALLING

If you are lucky, you can do:

sh configure.sh
gmake
cd atac-driver
sh install.sh <location-to-install>

However, when gmake complains or crashes, you'll need to build a patched gmake. Instructions for doing this are in build/patches/README.

RUNNING

ATAC can compare a single pair of assemblies, or it can compare a set of assemblies against a reference, or it can compare all-vs-all. It uses two directories to store intermediate results, the "genomes" and "meryl" directories.

The "genomes" directory, not used for pairwise ATAC, stores your genome sequences, and an index to the sequences.

The "meryl" directory is a cache for computed data that depends only on a single assembly.

IMPORTANT! Sequence files must NOT contain whitespace or line breaks in the sequence. You can copy sequences into the proper format using:

$ATACDIR/bin/leaff -f input.fasta -W > $GENOMEDIR/name.fasta

IMPORTANT! By convention, the "id1" sequence is the reference sequence, and the "id2" sequence is the assembly. Some statistics and output plots depend on this fact, but ATAC will work either way.

Pairwise ATAC

Pairwise ATAC is suitable for small assemblies, or for a single comparison. Sequences are supplied by the user with the -id1/-seq1 and -id2/-seq2 option pairs. The "id1" and "id2" are labels (nicknames) for the sequence files, and can be any string. These labels will appear in the output files. They do not need to be the name of the fasta file, as below.

perl $ATACDIR/bin/atac.pl \
  -meryldir  /home/work/microbe/meryl \
  -dir       /home/work/microbe/AMESAvsAMESB \
  -id1 AMESA -seq1 /home/work/microbe/AMESA.fasta \
  -id2 AMESB -seq2 /home/work/microbe/AMESB.fasta

Note that the meryl directory stores information about each sequence by that sequences' nickname. If you change a sequence file path but reuse a nickname, the new sequence file will NOT be used, as all information is already stored in the meryl directory.

Multiple ATAC

This is still a pairwise comparison, just saving precomputed information between runs. The "genomes" directory stores assembly sequences, and an index associating a nickname (id) with each sequence file.

For example, suppose we're interested in Bacillus anthracis. /home/work/microbe/genomes/assemblies.atai would then contain:

!format atac 1.0
S AMESA /home/work/microbe/genomes/NC_007530.fasta
S AMESB /home/work/microbe/genomes/NC_003997.fasta
S STERNE /home/work/microbe/genomes/NC_005945.fasta

representing the "Ames Ancestor", "Ames" and "Sterne" strains with nicknames AMESA, AMESB and STERNE.

To compute a mapping between the two Ames strains, we would then:

perl $ATACDIR/bin/atac.pl \
  -genomedir /home/work/microbe/genomes \
  -meryldir  /home/work/microbe/meryl \
  -dir       /home/work/microbe/AMESAvsAMESB \
  -id1 AMESA \
  -id2 AMESB

We can finish our all-against-all comparison with:

perl $ATACDIR/bin/atac.pl \
  -genomedir /home/work/microbe/genomes \
  -meryldir  /home/work/microbe/meryl \
  -dir       /home/work/microbe/AMESAvsSTERNE \
  -id1 AMESA \
  -id2 STERNE

Since we already computed information about AMESA, ATAC can use the saved information in the meryl directory again.

perl $ATACDIR/bin/atac.pl \
  -genomedir /home/work/microbe/genomes \
  -meryldir  /home/work/microbe/meryl \
  -dir       /home/work/microbe/AMESBvsSTERNE \
  -id1 AMESB \
  -id2 STERNE

OUTPUT

atac.pl will report the two output files, something similar to:

Finished! Output is:
  /home/work/microbe/AMESAvaAMESB/AMESAbsAMESB.k18.u9.f18.g0.atac
  /home/work/microbe/AMESAvaAMESB/AMESAbsAMESB.k18.u9.f18.g0.atac.clumps5000

The first file is the mapping itself, while the second file is the mapping annotated with "clumps".

There are three objects reported by ATAC.

  1. matches -- an gapless alignment
  2. runs -- a collection of ordered and oriented matches
  3. clumps -- a collection of runs/matches that are uninterrupted

CREDITS

ATAC/A2Amapper is currently maintained by Brian Walenz.

Original design and implementation by:

Liliana Florea (chainer/halign)
Aaron Halpern (clumpMaker)
Clark Mobarry (chainer)
Ross Lippert (chainer, build system)
Brian Walenz (seed generation, pipeline)
Daniel Fasulo (intial version of matchExtender)
Gene Myers (chainer/localalign)

The following publication should be cited if this software is used:

[1] Sorin Istrail, Granger G. Sutton, Liliana Florea, Aaron L. Halpern, Clark M. Mobarry, Ross Lippert, Brian Walenz, Hagit Shatkay, Ian Dew, Jason R. Miller, Michael J. Flanigan, Nathan J. Edwards, Randall Bolanos, Daniel Fasulo, Bjarni V. Halldorsson, Sridhar Hannenhalli, Russell Turner, Shibu Yooseph, Fu Lu, Deborah R. Nusskern, Bixiong Chris Shue, Xiangqun Holly Zheng, Fei Zhong, Arthur L. Delcher, Daniel H. Huson, Saul A. Kravitz, Laurent Mouchard, Knut Reinert, Karin A. Remington, Andrew G. Clark, Michael S. Waterman, Evan E. Eichler, Mark D. Adams, Michael W. Hunkapiller, Eugene W. Myers, and J. Craig Venter "Whole-genome shotgun assembly and comparison of human genome assemblies" PNAS, Feb 2004; 101: 1916 - 1921

http://www.pnas.org/cgi/content/abstract/101/7/1916