README for stand-alone BLAST

$Date: 2005/05/05 13:53:10 $

Table of contents
-----------------
1. Introduction
1. Available platforms
2. Getting the BLAST software
3a. Configuration for UNIX-like systems
3b. Configuration for Windows
4. Downloading databases

Introduction
------------


1. Available platforms
----------------------

NCBI provides binaries for the following platforms:

Apple MacOS X (ppc32)
FreeBSD 4.5 (ia32)
IBM AIX 5.1 (ppc64)
Linux (kernel 2.4, glibc 2.3.2) (ia32, ia64, amd64)
Microsoft Windows 2000 (ia32)
SGI IRIX 6.5 (mips64)
Sun Solaris 9 (ia32)
Sun Solaris 8 (sparc64)

We will attempt to produce binaries for other platforms upon request.

2. Getting the BLAST software
-----------------------------

Binaries are available from:

ftp://ftp.ncbi.nlm.nih.gov/blast/executables/LATEST-BLAST/

Filenames are of the following form:

program-version-architecture-os.extension


Please remember to FTP in binary mode.

3. Configuration for UNIX-like systems
--------------------------------------

Basically, there are three steps needed to setup the Standalone BLAST
executable for the UNIX platform.

1) Download the UNIX binary, uncompress and untar the file. It is
suggested that you do this in a separate directory, perhaps called
"blast".

2) Create a .ncbirc file. In order for Standalone BLAST to operate, you
have will need to have a .ncbirc file that contains the following lines:

[NCBI] 
Data="path/data/"

Where "path/data/" is the path to the location of the Standalone BLAST
"data" subdirectory. For Example: 

Data=/root/blast/data

The data subdirectory should automatically appear in the directory where
the downloaded file was extracted. Please note that in many cases it may
be necessary to delimit the entire path including the machine name and
or the net work you are located on. Your systems administrator can help
you if you do not know the entire path to the data subdirectory.

Make sure that your .ncbirc file is either in the directory that you
call the Standalone BLAST program from or in your root directory.

3) Format your BLAST database files. The main advantage of Standalone
BLAST is to be able to create your own BLAST databases. This can be done
with any file of FASTA formatted protein or nucleotide sequences. If you
are interested in creating your own database files you should refer to
the sections "Non-redundant defline syntax" and "Appendix 1: Sequence
Identifier Syntax" of the README in the BLAST database directory
(ftp://ftp.ncbi.nih.gov/blast/db/). You can also refer to the FASTA
description available from the BLAST search pages 
(http://www.ncbi.nlm.nih.gov/BLAST/fasta.html). 

However, for a testing purposes you should download one of the NCBI
databases and run a search against it.

In the BLAST database FTP directory (ftp://ftp.ncbi.nih.gov/blast/db/)
you will find the downloadable BLAST database files.  For your first
search we recommend downloading something relatively small like
ecoli.nt.Z (1349 Kb).  This is a FASTA formatted file of nucleotide
sequences which is also compressed.  Once uncompressed, you will need to
format the database using the 'formatdb' program which comes with your
Standalone BLAST executable. The list of arguments for this program and
all other BLAST programs are located at the end of the README in the
Standalone BLAST FTP directory (ftp://ftp.ncbi.nih.gov/blast/executable/). Or 
you can get these arguments by running each of the BLAST programs (formatdb, 
blastall etc.) with a single hyphen as the argument (Example: formatdb -). For
this document we are just going to show you the basic commands for formatting 
the database and running your first search.

To format the ecoli.nt database run the following from the command
line:

formatdb -i ecoli.nt -p F -o T

This will create seven index files that Standalone BLAST needs to
perform the searches and produce results. The ecoli.nt file is not
needed after formatdb has been done and you can delete this.

Next create a test nucleotide file to run against the new database.  It
may be easier to 'cheat' here and just extract a portion of a
nucleotide sequence you know is in the downloaded ecoli.nt database.
Make a text file called test.txt with the following sequence:

>Test
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC
TTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAA
TATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTACACAACATCCATGAAACGCATTAGCACCACC
ATTACCACCACCATCACCATTACCACAGGTAACGGTGCGGGCTGACGCGTACAGGAAACACAGAAAAAAG
CCCGCACCTGACAGTGCGGGCTTTTTTTTTCGACCAAAGGTAACGAGGTAACAACCATGCGAGTGTTGAA
GTTCGGCGGTACATCAGTGGCAAATGCAGAACGTTTTCTGCGTGTTGCCGATATTCTGGAAAGCAATGCC
AGGCAGGGGCAGGTGGCCACCGTCCTCTCTGCCCCCGCCAAAATCACCAACCACCTGGTGGCGATGATTG
AAAAAACCATTAGCGGCCAGGATGCTTTACCCAATATCAGCGATGCCGAACGTATTTTTGCCGAACTTTT

To run the first search enter the following command from the UNIX
command line in your BLAST directory:

blastall -p blastn -d ecoli.nt -i test.txt -o test.out

This should generate a results file called test.out in the Standalone
BLAST directory. 

Now you are ready to create your own databases and run BLAST searches.
For more information you should refer to the Standalone BLAST README (
ftp://ftp.ncbi.nih.gov/blast/executable/) and the BLAST literature. 
This will give you some idea of all the programs BLAST supports and the
use of different parameters for increasing or decreasing the stringency
of your results.

If you have any questions please send them to the
blast-help@ncbi.nlm.nih.gov e-mail address.


3b. Configuration for Windows
-----------------------------

There are three steps needed to setup the Standalone BLAST executable.

1) Download and compress the Standalone BLAST Windows binary.
 We suggest doing this in it's own directory, perhaps called
blast. This is a 'self-extracting' archive and all you need to do is run
this either through a Command Prompt (DOS Prompt) or by selecting "Run"
from the Windows "Start button" and browsing the blastcz.exe file.

2) Create an ncbi.ini file. In order for Standalone BLAST to operate,
you have will need to have an ncbi.ini file that contains the following
lines:

[NCBI] 
Data="C:\path\data\"

Where "C:path\data\" is the path to the location of the Standalone
BLAST "data" subdirectory. For example: 

Data=C:\blast\data

This data subdirectory should automatically appear in the directory
where the downloaded file was extracted.

Make sure that your ncbi.ini file is in the Windows or WINNT directory
on your machine. Note: If you already have an ncbi.ini file on your
machine from installing other NCBI software(Network Entrez, Sequin etc.)
you can skip this section. However, if you see the following error
message, you should rename the old ncbi.ini file to something like
ncbi.bak and follow the instructions in number 2 above.

Abrupt: code=1
FATAL ERROR: FindPath failed. 

C) The main advantage of Standalone BLAST is to be able to create your
own BLAST databases. This can be done with any file of FASTA formatted
protein or nucleotide sequences. If you are interested in creating your
own database you should refer to the sections "Non-redundant defline
syntax" and "Appendix 1: Sequence Identifier Syntax" of the README in
the BLAST database directory (ftp://ftp.ncbi.nih.gov/blast/db/). You can
also refer to the FASTA description available from the BLAST search
pages (http://www.ncbi.nlm.nih.gov/BLAST/fasta.html). 

However, for a testing purposes you should download one of the NCBI
databases and run a search against it.

In the BLAST database FTP directory ftp://ftp.ncbi.nih.gov/blast/db/
you will find the downloadable BLAST database files. For your first
search we recommend downloading something relatively small like
ecoli.nt.Z (1349 Kb).  This is a FASTA formatted file of nucleotide
sequences which is also compressed. (If you do not have a copy of UNIX
"uncompress" for your Windows PC contact NCBI Info at
info@ncbi.nlm.nih.gov).

Once uncompressed, you will now need to format the database using the
'formatdb' program which comes with your Standalone BLAST executable.
The list of arguments for this program and all other BLAST programs are
located at the end of the README in the Standalone BLAST FTP directory
(ftp://ftp.ncbi.nih.gov/blast/executable/). Or you can get these
arguments by running each of the BLAST programs (formatdb, blastall
etc.) with a single hyphen as the argument (Example: formatdb -). For
this document we are just going to show you the basic commands for
formatting the database and running your first search.

To format the ecoli.nt database run the following from the command
line:

formatdb -i ecoli.nt -p F -o T

This will create seven index files that Standalone BLAST needs to
perform the searches and produce results. The ecoli.nt file can be
removed once formatdb has been run.

Next create a test nucleotide file to run against the new database.  It
may be easier to 'cheat' here and just extract a portion of a
nucleotide sequence you know is in the downloaded ecoli.nt database.
So  make a text file called test.txt with the following sequence:

>Test
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC
TTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAA
TATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTACACAACATCCATGAAACGCATTAGCACCACC
ATTACCACCACCATCACCATTACCACAGGTAACGGTGCGGGCTGACGCGTACAGGAAACACAGAAAAAAG
CCCGCACCTGACAGTGCGGGCTTTTTTTTTCGACCAAAGGTAACGAGGTAACAACCATGCGAGTGTTGAA
GTTCGGCGGTACATCAGTGGCAAATGCAGAACGTTTTCTGCGTGTTGCCGATATTCTGGAAAGCAATGCC
AGGCAGGGGCAGGTGGCCACCGTCCTCTCTGCCCCCGCCAAAATCACCAACCACCTGGTGGCGATGATTG
AAAAAACCATTAGCGGCCAGGATGCTTTACCCAATATCAGCGATGCCGAACGTATTTTTGCCGAACTTTT

To run the first search just do the command:

blastall -p blastn -d ecoli.nt -i test.txt -o test.out

This should generate a results file called test.out in the Standalone
BLAST directory. Now you are ready to create your own databases and run
BLAST searches. For more information you should refer to the Standalone
BLAST README ( ftp://ftp.ncbi.nih.gov/blast/executable/) and the BLAST
literature.  This will give you some idea of all the programs BLAST
supports and the use of different parameters for increasing or
decreasing the stringency of your results.

If you have any questions please send them to the
blast-help@ncbi.nlm.nih.gov e-mail address.


SGI Note:
---------

SGI recommends the following threads patches on IRIX6 systems:

   For 6.2 systems, install SG0001404, SG0001645, SG0002000, SG0002420 and SG0002458 (in that order)
   For 6.3 systems, install SG0001645, SG0002420 and SG0002458 (in that order)
   For 6.4 systems, install SG0002194, SG0002420 and SG0002458 (in that order)

These patches can be obtained by calling SGI customer service or from the web: http://support.sgi.com/

System recommendations:
----------------------

BLAST uses memory-mapped files (on UNIX and NT systems), so it runs best if
it can read the entire BLAST database into memory, then keep on using it
there. Resources consumed reading a database into memory can easily
outweigh the cost of a BLAST search, so that the memory of a machine is
normally more important than the CPU speed. This means that one should have
sufficient memory for the largest BLAST database one will use, then run all
the searches against this databases in serial, then run queries against
another database in serial. This guarantees that the database will be read
into memory only once. As of Aug. 1997 the EST FASTA file is about 500 Meg,
which translates to about 170-200 Meg of BLAST database. At least another
100-200 Meg should be allowed for memory consumed by the actual BLAST
program. All of the FASTA databases together are about 1.5 Gig, the BLAST
databases produced from this will probably be about another Gig or so. 4 Gig
of disk space, to make room for software and output, is probably a pretty
good bet.

OSF1 and limit
--------------

Some OSF1 users have encountered "out of memory" problems when running searches
even though there seems to be plenty of memory on the machine and the search
runs well on other platforms.  The error message would look something like:

[blastall] FATAL ERROR: CoreLib [001.000]  gi|509180|emb|X71670.1|MMP17SAR: Failed to allocate 480 bytes

Often it is sufficient to simply raise the "datasize" limit, which specifies
the maximum allowed heap size.  The "datasize" limit can be changed by executing:

limit datasize unlimited

Note that this change only applies to the current session, so it is advisable to place
this command in some file sourced at startup, such as .login or .cshrc.