You have two ways to use OligoArray: via a Graphical User Interface (just below) or via a command line.
OligoArray Graphical User Interface
To launch the GUI, just start OligoArray without any argument.
Before running OligoArray, one may change any parameter (You may be interested in my tips on oligo design for microarrays):
- Oligonucleotide length: Here you can set the oligonucleotide length. Please enter an integer between 0 and 100.
- Distance 5' - Stop: Here you can define the maximal distance that you can search between the oligo 5' end and the 3' target sequence end. This option is useful if you plan to label your probe by using a cDNA synthesis step based on oligo-dT. If you use random primers or other techniques to prepare your probes, you may want to disable this limit by using a large number (10000). Please enter a positive integer.
- Tm range: Here is the range of Tm that you can accept for the oligonucleotide (positive integer only). For a description of how Tm are computed, click here.
- Max. Tm for Structure: Any oligonucleotide containing a secondary structure stable at a temperature above this threshold will be automatically rejected. For a description of how secondary structure is predicted, click here.
- Max. number of oligonucleotide: Here is the number of oligo that you want to select per sequences in the input file. Please enter a positive integer.
- 5' tag: One may want to use an oligo for both microarray spotting and another application. Here you have a way to include short sequences, like a restriction site, at the 5' end of the oligo. This sequence will be include during specificity computation and secondary structure prediction but not during oligo Tm computation. You can enter only G, A, T or C, 5' to 3'.
- 3' tag: Same thing as for 3' tag, but concerning the 3' end of the oligo.
- Prohibited sequences: Here you can enter a list of sequences that you do not want to see in the oligonucleotide sequence. Any oligo containing these sequences will be rejected. It's useful if you want to avoid oligos containing a given restriction site or some stretches of the same nucleotide (i.e. GGGG or longer, etc.). You can enter more than one sequence, using a semi-colon (;) to separate each sequence. Please enter only G, A, T or C, 5' to 3'.
- Select sequence file: Here, you can select your input file. For more details about the input file format, please click here.
- Select Blast database: Here, you can select the Blast database used for specificity computation. When you create a Blast database from a file containing a set of sequences, there is more than one file created. To avoid confusion, OligoArray expects that you have selected the file with the .nsq suffix.
- Save As: Please choose a name for the main result file (described here).
- Run: Just click on that button to start the design process. No more actions are expected from the user.
- Cancel: Will cancel the run once the design of the current sequence will be completed (So it may need few seconds to take effect).
- Exit: Stop the program and close the window.
You can start OligoArray by using the command, just type:
java -jar OligoArray.jar seqFile blastDB saveAs oligoLength distance TmRange TmRange(+/-) maxTm mxNbOligo listProhibited linker5prime linker3prime
All arguments are expected. They are described just above in the section concerning the GUI. If you do not want to add extra sequences or reject some prohibited sequences, please enter an empty string ("") as argument.
Input Sequence FileThis file should contain all the sequences you want to process. A Fasta format is expected. One line starting with ">" for the comment, and then the following line contains the start of your sequence. There is an important limitation concerning the comment line. For the same sequence, comment lines should be the same in both the input file and in the input Blast database. I strongly suggest to use only one word, a unique identifier, to fill this line.Local Limited Blast Database
As example, here are the two first entries of the file containing the 6343 yeast ORFs used as demo in this web site. The file may be accessed here (9 MB)
Oligonucleotide specificity is computed by searching for similar sequences in a database using the Blast program (Altschul et al. 1997 NAR 25(17):3389-402) available from NCBI. In order to reduce search time and increase sensitivity, I strongly suggest creating a new local Blast database limited to the sequences of interest. If you want to design oligonucleotides for gene expression studies in your favorite model, you should create a database containing only transcribed sequences from this organism. This is easy to do for an organism with fully sequenced and annotated genome. In the other case, you have to enter as much known sequence as possible. For the human genome, I usually use the 90 - 95 000 unique Unigene sequences.
If you want to design oligonucleotides for every gene of an organism, you will use the input file to create the Blast database. Here are some explanations on how to format a Blast database and here the input file used to create the yeast blast database used for demo in the paper. It contains the 6343 yeast ORF sequences surrounded by 30 nucleotides from 5' and 3' UTR.
oligosThis is the file containing the main data. I use a Fasta format (see below). The comment line contains information related to the oligo. They are Tab delimited for easy parsing. First, I report the input sequence name, followed by the distance between the oligo 5' end and the input sequence 3' end (80 in the first example). I report also the oligo melting temperature (85 degrees C) and the putative secondary structure Tm (51.2 degrees C). Then I report the target(s) of this oligo. First are what I call family members, and then are possible weak cross-hybridizations (see here for a definition of family and cross-hybridization).Rejected sequences
In the first example (YAL064W), I show a perfect oligonucleotide. There is only a single target and no cross-hybridization (reported as "null"). The second oligo (YAL063C) represents a gene family. There is no way to discriminate between each member, so this oligonucleotide will represent four genes YAL063C, YAL065C, YAR050W and YHR211W. This oligonucleotide can also hybridize with a sequence from YOL087C, but this hybridization is weaker (see here for a definition of family and cross-hybridization).
The second oligo has four main targets. This does not exclude possibility that an oligo designed for one of the three other genes (YAL065C YAR050W YHR211W) is specific for that gene and, unlike the oligo designed for YAL063C, will not cross hybridize to the other family members. When two genes have very similar sequences but with one sequence longer than the other one, the oligo designed for the shorter sequence will hybridized with both sequences. However, the oligo designed for the longer sequence will be choosen in the specific part of the longer sequence. So by using these two oligos, it may be possible to determine the expression level of both of these genes.
The oligo sequence is reported 5' to 3' and the sequence strand corresponds to the one of the input sequence. So, if your sequence file contains transcribed sequences, the oligonucleotide sequence is the same as the mRNA sequence. If you want to use labeled cDNA as probe, you have to use the sequence from the output file to order the oligonucleotides.
>YAL064W 80 85 51.2 YAL064W null
>YAL063C 190 87 50.0 YAL063C YAL065C YAR050W YHR211W YOL087C
>YAL062W 50 89 61.8 YAL062W null
>YAL061W 250 83 39.2 YAL061W null
>YAL060W 450 86 52.9 YAL060W null
You can access here to a set of 50 mer oligonucleotides designed for the yeast Saccharomyces cerevisiae. Parameters were set to: Length = 50 mer, Max distance = 1000 nt, Tm = 87+/-5 degrees, Max Tm for structure = 63, One oligo per input sequence. No prohibited sequence or tag sequence. The Blast database was built on yeast_cdna.fas, a file containing the 6343 yeast ORFs flanked by a 30 nt 5' and 3' UTR. The design was done using the input file yeast_orf.fas.
When OligoArray cannot find an oligo for an input sequence, this input sequence is copied in a new file (RejectedSeq.fas). A sequence may be rejected due to absence of oligonucleotide with a Tm inside the Tm range choosen by user. It can be also rejected due to a sequence full of secondary structure.
I use a Fasta format for this file, so you can run it for a new design (remember to rename it before doing a new design !)