How to Use OligoArray 2.1 ?


Command line

In most of the case, you will have to tune OligoArray 2.1 by using various options described below. You can get the following help by typing java -jar OligoArray2.jar:
 
NAME
     OligoArray2.1.1 - Oligonucleotide design for Microarrays

SYNOPSIS
     java -jar OligoArray2.jar
     java -jar OligoArray2.jar [-h]
     java -jar OligoArray2.jar [-i] [-d] [-orRnlLDtTsxpPmNg]
 

DESCRIPTION
     OligoArray2 is a program to design specific oligonucleotide at the genome
     scale in order to perform gene expression profiling using microarrays

OPTIONS
     Command line options are described below.

     -i    The input file that contains sequences to process. Expected format is FastA. A file name is expected. This option is required

     -d    The Blast database that will be used to compute oligo's specificity. A database name is expected. This option is required

     -o    The output file that will contain oligonucleotide data. A file name is expected. Default is 'oligo.txt'

     -r    The file that will contain sequences for which the design failed. A file name is expected. Default is 'rejected.fas'

     -R    The log file that will contain informations generated during design. A file name is expected. Default is 'OligoArray.log'

     -n    The maximum number of oligonucleotides expected per input sequences. A positive integer is expected. Default is '1'

     -l    The minimum oligonucleotide length. An integer comprised between 15 and 75 is expected. (Default is '45')

     -L    The maximum oligonucleotide length. An integer comprised between 15 and 75 is expected. (Default is '47')

     -D    The maximum distance accepted between the 5' end of the oligo and the 3' end of the input sequence. A positive integer is expected. Default is '1500'

     -t    The minimum oligonucleotide Tm. A positive integer below 100 and below the maximum Tm is expected. (Default is '85')

     -T    The maximun oligonucleotide Tm. A positive integer below 100 and above the minimum Tm is expected. (Default is '90')

     -s    A temperature to use during secondary structure prediction. An oligo will be rejected if it can fold into a stable secondary structure at this temperature. A positive real is expected. Default is '65.0'

     -x    A threshold to start to consider putative cross-hybridizations. All targets hybridizing with this oligo with a Tm above this threshold will be reported. A positive integer is expected. Default is '65'

     -p    The minimum oligonucleotide GC content. A positive real below 100 and below the maximum GC content is expected. Default is '40'

     -P    The maximun oligonucleotide GC content. A positive real below 100 and above the minimum GC content is expected. Default is '60'

     -m    A list of prohibited sequences to mask in the input sequence. These sequences will never appear in the oligo sequence. Items are separated by semi-colon in the list: "CCCCC;GGGGG". Default is '""'

     -N    The number of sequences to process at the same time. Depending on the number of processors and the memory available, you can process up to 3 sequences in parallel per processors. Default is '1'

     -g    The minimum distance between the 5' end of two adjacent oligos. If you want to avoid any overlaps between oligos, you should use a value bigger than the maximum oligo length. A positive integer is expected Default is '1.5 * the average oligo size'

Graphical User Interface

At this time, there is no graphical user interface. I just need time ...




Input Files
 
Input Sequence File
        This file should contain all the sequences you want to process and should be at least a subset of the file used for generating the limited local blast database. A Fasta format is expected. One line starting with ">" for the comment, and then the following line contains the begining of your sequence. There is an important limitation concerning the comment line. For a given sequence, comment lines should be the same in both the input file and the input Blast database. I strongly suggest to use only one word, a unique identifier, to fill this line. Again, This file should be at least a subset of the file used for generating the limited local blast database since OligoArray uses sequence names to identify specific targets.
Local Limited Blast Database
        Oligonucleotide specificity is computed by searching for similar sequences in a database using the Blast program (Altschul et al. 1997 NAR 25(17):3389-402) available from NCBI. In order to reduce search time and increase sensitivity, I strongly suggest creating a new local Blast database limited to the sequences of interest. If you want to design oligonucleotides for gene expression studies in your favorite model, you should create a database containing only transcribed sequences from this organism. This is easy to do for an organism with fully sequenced and annotated genome. In the other case, you have to enter as much known sequence as possible. For more details on how to create this database, please check here.




Output Files
Oligonucleotides
        The oligonucleotide file contains all data related to the oligos in a tab delimited format (one oligo per line) ready to import into any spreadsheet program. The two first oligos of the oligo.txt.ref file included in the OligoArray 2.1 package are presented below. In this example, an oligo specific to its target is shown in green while a non specific oligo is shown in red.

YAL069W 49      47      -308.65 -374.59 -1014.59        85.38   YAL069W ACCACATGCCATACTCACCCTCACTTGTATACTGATTTTACGTACGC
YAL069W 126     47      -302.33 -366.4  -985.60 87.54   YAL069W; YJR162C (-16.90 -281.20 -781.58 73.24 acttaccctactctcacattccact-----ccatcacccatctctca); YLL065W (-16.90 -281.20 -781.58 73.24 acttaccctactctcacattccact-----ccatcacccatctctca); YFL063W (-19.58 -269.0 -737.57 77.20 -cttaccctactttcacattccact-----ccatggcccatctctca); YKL225W (-18.77 -281.90 -778.13 75.58 acttaccctactctcacattccact-----ccatggcccag-tctca)      ACTTACCCTACTCTCAGATTCCACTTCACTCCATGGCCCATCTCTCA

        First, the program reports the name of the input sequence (YAL069W), then the position of the 5' end of the oligo on the input sequence (49 nucleotides) and the length of the oligo (47 mer). The next four numbers are the free energy of formation of the dsDNA at 37 °C (-308.65 kcal/mol), the enthalpy (-374.59 kcal/mol), the entropy (-1014.59 cal/mol.K) and the melting temperature of the of the dsDNA (Tm; 85.38 °C).  The next item is a list of targets for the given oligo. If there is a single target for this oligo (specific), the program should report the name of the input sequence (YAL069W), otherwise it will report the specific target (YAL069W; ) plus a list of non specific targets separated by semi-colons. For each non specific target, it will report its name followed by the free energy at the temperature used to start to consider the non-specific hybridization (see option -x), the enthalpy,  the entropy and the Tm of the dsDNA and the sequence of this non-specific target (YJR162C (-16.90 -281.20 -781.58 73.24 acttaccctactctcacattccact-----ccatcacccatctctca);). In case of perfect sequence homology between two or more non-specific targets, all the sequence names are reported before the thermodynamic parameters and the sequence (name 1, name 2, name 3 (free energy enthalpy entropy Tm sequence)). Then, the program reports the oligonucleotide's sequence (5' to 3', same strand as the input sequence).
 

Rejected sequences
        When OligoArray cannot find an oligo for an input sequence, this input sequence is copied in a new file (RejectedSeq.fas by default). A sequence may be rejected due to absence of oligonucleotide with a Tm inside the Tm range choosen by user. It can be also rejected due to a sequence full of secondary structure.

       I use a Fasta format for this file, so you can use it directly to run a new design (remember to rename it before doing a new design otherwise you may overwrite it!)

Log file
       OligoArray automatically generates a log file to describe step by step the design process. The main interest of this file are the explanations of why the design failed for a given sequence. This can help to figure out if a parameter was too stringent and need to be relaxed for a second run. Here is an example:

Start YAL069W
Running Blast (-e 38.121849817022465)... DONE
YAL069W 271     rejected due to low percent of GC: 31.0
YAL069W 264     rejected due to low Tm: 81.32269931995535
YAL069W 263     rejected due to low percent of GC: 34.0
Updating thermodata... DONE
Folding CATCATTATCCACATTTTGATATCTATATCTCATTCGGCGGTCCCAA... DONE
Testing specificity... DONE
YAL069W 258     Non specific oligo ignored at this time
Updating thermodata... DONE
Folding ACGCCCATCATTATCCACATTTTGATATCTATATCTCATTCGGCGGT... DONE
Testing specificity... DONE
YAL069W 253     Non specific oligo ignored at this time
YAL069W 248     rejected due to low Tm: 80.55241311065993
YAL069W 247     rejected due to low Tm: 80.53499309801441
YAL069W 246     rejected due to low percent of GC: 34.0
YAL069W 241     rejected due to low percent of GC: 34.0
Updating thermodata... DONE
Folding TGTGCCATTTACCCATAACGCCCATCATTATCCACATTTTGATATCT... DONE
Testing specificity... DONE
YAL069W 236     Non specific oligo ignored at this time
Updating thermodata... DONE
Folding TACCCTGTGCCATTTACCCATAACGCCCATCATTATCCACATTTTGA... DONE
Testing specificity... DONE
YAL069W 231     Non specific oligo ignored at this time
Updating thermodata... DONE
Folding GTCTATACCCTGTGCCATTTACCCATAACGCCCATCATTATCCACAT... DONE
Testing specificity... DONE
YAL069W 226     Non specific oligo ignored at this time
Updating thermodata... DONE
Folding CAGCGGTCTATACCCTGTGCCATTTACCCATAACGCCCATCATTATC... DONE
Testing specificity...