fasta - scan a protein or DNA sequence library for similar sequences
tfasta - compare a protein sequence to a DNA sequence library, translating the DNA sequence library `on-the-fly'.
lfasta - compare two protein or DNA sequences for local similarity and show the local sequence alignments
plfasta - compare two sequences for local similarity and plot the local sequence alignments
fasta [ -a -b # -c # -d # -f # -g # -l FASTLIBS -r STATFILE -m # -o -p # -Q -s SMATRIX -w # -x "# #" -y # -z -1 ] query-sequence-file library-file [ ktup ]
fasta [-Qabcdfghiklmnoprswxyz] query-file @library-name-file
fasta [-Qabcdfghiklmnoprswxyz] query-file "%PRMVI"
fasta [-abcdglmnoprswxy] - interactive mode
tfasta [-abcdfgkmoprsw3] protein-query-file DNA-library [ ktup ]
lfasta [-afgmnpswx] sequence-file-1 sequence-file-2 [ ktup ]
plfasta [-afgmnpsxv] sequence-file-1 sequence-file-2 [ ktup ]
fasta is used to compare a protein or DNA sequence to all of the entries in a sequence library. For example, fasta can compare a protein sequence to all of the sequences in the NBRF PIR protein sequence database. fasta will automatically decide whether the query sequence is DNA or protein by reading the query sequence as protein and determining whether the `amino-acid composition' is more than 85% A+C+G+T. fasta uses an improved version of the rapid sequence comparison algorithm described by Lipman and Pearson (Science, (1985) 227:1427) that is described in Pearson and Lipman, Proc. Natl. Acad. USA, (1988) 85:2444. The program can be invoked either with command line arguments or in interactive mode. The optional third argument, ktup sets the sensitivity and speed of the search. If ktup=2, similar regions in the two sequences being compared are found by looking at pairs of aligned residues; if ktup=1, single aligned amino acids are examined. ktup can be set to 2 or 1 for protein sequences, or from 1 to 6 for DNA sequences. The default if ktup is not specified is 2 for proteins and 6 for DNA.
fasta compares a query sequence to a sequence library which consists of sequence data interspersed with comments, see below. Normally fasta and tfasta search the libraries listed in the file pointed to by the environment variable FASTLIBS. The format of this file is described in the file FASTA.DOC. tfasta compares a protein sequence to a DNA sequence database, translating the DNA sequence library in 6 frames `on-the-fly' (3 frames with the -3 option). The search uses the standard BLOSUM50 scoring matrix, and uses a ktup=2 by default. tfasta searches a DNA sequence database in the standard text format described below.
lfasta and plfasta programs compare two sequences looking for local sequence similarities. While fasta and tfasta report only the best alignment between the query sequence and the library sequence, lfasta and plfasta will report all of the alignments between the two sequences with scores greater than a cut-off value. lfasta shows the actual local alignments between the two sequences and their scores, while plfasta produces a plot of the alignments that looks similar to a `dot-matrix' homology plot. On Unix systems, plfasta generates tektronix output that can either be displayed on a tektronix terminal or piped through the tek2ps program for output on the laser printer. On MS-DOS systems, plfasta uses the graphics capabilities of the computer screen together with the *.BGI graphics device drivers supplied by Borland with Turbo `C'.
The fasta programs use a standard text format sequence file. Lines beginning with '>' or ';' are considered comments and ignored; sequences can be upper or lower case, blanks, tabs and unrecognizable characters are ignored. fasta expects sequences to use the single letter amino acid codes, see protcodes(1). Library files for fasta should have the form shown below.
fasta and the other programs can be directed to change the scoring matrix, search parameters, output format, and default search directories by entering options on the command line (preceeded by a `-' or `/' for MS-DOS). All of the options should preceed the file name and ktup arguments). Alternately, these options can be changed by setting environment variables. The options and environment variables are:
(1) fasta musplfm.aa $AABANK
Compare the amino acid sequence in the file musplfm.aa with the complete PIR protein sequence library using ktup=2. Each "library" sequence (there need only be one) should start with a comment line which starts with a '>', e.g.
>LCBO bovine preprolactin WILLLSQ ... >LCHU human ... ...
(2) fasta -a -w 80 musplfm.aa lcbo.aa 1
Compare the amino acid sequence in the file musplfm.aa with the sequences in the file lcbo.aa using ktup=1. Show both sequences in their entirety, with 80 residues on each output line.
(3) fasta
Run the fasta program in interactive mode. The program will prompt for the file name for the query sequence, list alternative libraries to be seached (if FASTLIBS is set), and prompt for the ktup.
This version of fasta prompts for the library file to be searched from a list of file names that are saved in the file pointed to by the environment variable FASTLIBS. If FASTLIBS = fastgb.list, then the file fastgb.list might have the entries:
NBRF Protein$0P/u/lib/aabank.lib 0 GB Primate$1P@/u/lib/gpri.nam GB Rodent$1R@/u/lib/grod.nam GB Mammal$1M@/u/lib/gmammal.nam
Each line in this file has 4 fields: (1) The library name, separated from the remaining fields by a '$'; (2) A 0 or a 1 indicating protein or DNA library respectively; (3) A single letter that will be used to choose the library; (4) the location of the library file itself (the library file name can contain an optional library format specfier. fasta recognizes the following library formats:
</usr/slib/genbank (the directory for the library files) >glocus.idx (index file for GENBANK binary files) gpri1.seq 9 gpri2.seq 9 gpri3.seq 9 ... grod1.seq 9 ...
This version of fasta can also distinguish between normal text library files (as shown above in EXAMPLE (2)), and DNA libraries in the GENBANK compressed floppy disk format. These latter files are binary files that are distributed by Intelligenetics on floppy disks. Earlier versions of fasta (and fastn before it) used different programs to read the text library files (old fasta or ifastn) and the compressed files (old fastgb and gfastn). These routines have been combined in the current fasta.
You can use your own sequence files for fasta, just be certain to put a '>' and comment as the first line before the sequence. Only one library file type, the standard NBRF library format, is supported by the VAX/VMS programs. lfasta and plfasta do not required the '>' and comment line. fasta does.
rrdf2(1), protcodes(5), dnacodes(5)
Bill Pearson
wrp@virginia.EDU
Created by Tod M. Klingler, klingler@cmgm.stanford.edu