How to Use OligoArray?






Run OligoArray
 

You have two ways to use OligoArray: via a Graphical User Interface (just below) or via a command line.
OligoArray Graphical User Interface
 

To launch the GUI, just start OligoArray without any argument.

Before running  OligoArray, one may change any parameter (You may be interested in my tips on oligo design for microarrays):


Buttons :


Command line
 

You can start OligoArray by using the command, just type:

java -jar OligoArray.jar seqFile blastDB saveAs oligoLength distance TmRange TmRange(+/-) maxTm mxNbOligo listProhibited linker5prime linker3prime

All arguments are expected. They are described just above in the section concerning the GUI. If you do not want to add extra sequences  or reject some prohibited sequences, please enter an empty string ("") as argument.
 



Input Files
 
Input Sequence File
        This file should contain all the sequences you want to process. A Fasta format is expected. One line starting with ">" for the comment, and then the following line contains the start of your sequence. There is an important limitation concerning the comment line. For the same sequence, comment lines should be the same in both the input file and in the input Blast database. I strongly suggest to use only one word, a unique identifier, to fill this line.

        As example, here are the two first entries of the file containing the 6343 yeast ORFs used as demo in this web site. The file may be accessed here (9 MB)

>YAL069W
ATGATCGTAAATAACACACACGTGCTTACCCTACCACTTTATACCACCACCACATGCCATACTCACCCTC
ACTTGTATACTGATTTTACGTACGCACACGGATGCTACAGTATATACCATCTCAAACTTACCCTACTCTC
AGATTCCACTTCACTCCATGGCCCATCTCTCACTGAATCAGTACCAAATGCACTCACATCATTATGCACG
GCACTTGCCTCAGCGGTCTATACCCTGTGCCATTTACCCATAACGCCCATCATTATCCACATTTTGATAT
CTATATCTCATTCGGCGGTCCCAAATATTGTATAA
>YAL068C
ATGGTCAAATTAACTTCAATCGCCGCTGGTGTCGCTGCCATCGCTGCTACTGCTTCTGCAACCACCACTC
TAGCTCAATCTGACGAAAGAGTCAACTTGGTGGAATTGGGTGTCTACGTCTCTGATATCAGAGCTCACTT
AGCCCAATACTACATGTTCCAAGCCGCCCACCCAACTGAAACCTACCCAGTCGAAGTTGCTGAAGCCGTT
TTCAACTACGGTGACTTCACCACCATGTTGACCGGTATTGCTCCAGACCAAGTGACCAGAATGATCACCG
GTGTTCCATGGTACTCCAGCAGATTAAAGCCAGCCATCTCCAGTGCTCTATCCAAGGACGGTATCTACAC
TATCGCAAACTAG
 
 

Local Limited Blast Database
        Oligonucleotide specificity is computed by searching for similar sequences in a database using the Blast program (Altschul et al. 1997 NAR 25(17):3389-402) available from NCBI. In order to reduce search time and increase sensitivity, I strongly suggest creating a new local Blast database limited to the sequences of interest. If you want to design oligonucleotides for gene expression studies in your favorite model, you should create a database containing only transcribed sequences from this organism. This is easy to do for an organism with fully sequenced and annotated genome. In the other case, you have to enter as much known sequence as possible. For the human genome, I usually use the 90 - 95 000 unique Unigene sequences.

        If you want to design oligonucleotides for every gene of an organism, you will use the input file to create the Blast database.   Here are some explanations on how to format a Blast database and here the input file used to create the yeast blast database used for demo in the paper. It contains the 6343 yeast ORF sequences surrounded by 30 nucleotides from 5' and 3' UTR.




Output Files
oligos
        This is the file containing the main data. I use a Fasta format (see below). The comment line contains information related to the oligo. They are Tab delimited for easy parsing. First, I report the input sequence name,  followed by the distance between the oligo 5' end and the input sequence 3' end (80 in the first example). I report also the oligo melting temperature (85 degrees C) and the putative secondary structure Tm (51.2 degrees C). Then I report the target(s) of this oligo. First are what I call family members, and then are possible weak cross-hybridizations (see here for a definition of family and cross-hybridization).

        In the first example (YAL064W), I show a perfect oligonucleotide. There is only a single target and no cross-hybridization (reported as "null"). The second oligo (YAL063C) represents a gene family. There is no way to discriminate between each member, so this oligonucleotide will represent four genes YAL063C, YAL065C, YAR050W and YHR211W. This oligonucleotide can also hybridize with a sequence from YOL087C, but this hybridization is weaker (see here for a definition of family and cross-hybridization).

        The second oligo has four main targets. This does not exclude possibility that an oligo designed for one of the three other genes (YAL065C YAR050W YHR211W) is specific for that gene and, unlike the oligo designed for YAL063C, will not cross hybridize to the other family members. When two genes have very similar sequences but with one sequence longer than the other one, the oligo designed for the shorter sequence will hybridized with both sequences. However, the oligo designed for the longer sequence will be choosen in the specific part of the longer sequence. So by using these two oligos, it may be possible to determine the expression level of both of these genes.

        The oligo sequence is reported 5' to 3' and the sequence strand corresponds to the one of the input sequence. So, if your sequence file contains transcribed sequences, the oligonucleotide sequence is the same as the mRNA sequence. If you want to use labeled cDNA as probe, you have to use the sequence from the output file to order the oligonucleotides.

>YAL064W        80      85      51.2    YAL064W     null
TAACGCTGCTCTTCAAATCACGGACGCAATCAGTACTTGTACCTAATTTT
>YAL063C        190     87      50.0    YAL063C YAL065C YAR050W YHR211W         YOL087C
CACCAAGAGTCTAACAAGTTCCGGGTTGAGTACTATGTCGCAACAGCCTC
>YAL062W        50      89      61.8    YAL062W     null
GCTTCGTCATGGTGGCTGACGCAATGCTTGACCAGGGAGACGTTTTTTAG
>YAL061W        250     83      39.2    YAL061W     null
GATTGACATTGATAGAGCAAGACATATGATAACGGGCAGAGTCAACATTG
>YAL060W        450     86      52.9    YAL060W     null
GGCGTTGAGGTGTTCAATCCCTCCAAGCACGGTCATAAATCTATAGAGAT

        You can access here to a set of 50 mer oligonucleotides designed for the yeast Saccharomyces cerevisiae. Parameters were set to: Length = 50 mer, Max distance = 1000 nt, Tm = 87+/-5 degrees,  Max Tm for structure = 63, One oligo per input sequence. No prohibited sequence or tag sequence. The Blast database was built on yeast_cdna.fas, a file containing the 6343 yeast ORFs flanked by a 30 nt 5' and 3' UTR. The design was done using the input file yeast_orf.fas.
 
 

Rejected sequences
        When OligoArray cannot find an oligo for an input sequence, this input sequence is copied in a new file (RejectedSeq.fas). A sequence may be rejected due to absence of oligonucleotide with a Tm inside the Tm range choosen by user. It can be also rejected due to a sequence full of secondary structure.

I use a Fasta format for this file, so you can run it for a new design (remember to rename it before doing a new design !)