Laboratory Information Management System
The “General” tab is used to upload and manage genomes. The “Assembly” tab is used to create assemblies of contigs and align them to chromosomes.
From the GPM main page, click on the “General” tab on the top toolbar.
Before: all contigs are unplaced.
After: contigs are numbered and assigned to chromosomes.
To view the entire chromosome and all of its assigned contigs, click on the Chromosome number.
To view two assemblies side-by-side for easy visualization and facilitated gap filling, select a companion assembly from the drop down menu in chromosome view.
It will appear on top of the target assembly. Note that not all of the contigs will be perfectly aligned with their start and end points on the reference genome and will have to be moved.
Here is an example of what it looks like when a contig needs to be moved.
It can be seen that the alignment (red and yellow lines) of example contig (top blue line) begins much earlier in the reference genome (middle green line) than where it is currently placed.
The general idea is to determine where the alignment begins by following the path of red and yellow lines to where it intersects with the reference genome, and then move the contig to that point.
By scrolling to the left we can determine where that point of origin is.
From here, scroll back to the contig to be moved, right click on the blue portion, and click edit contig.
In this example, the alignment point of origin was approximately 40,150,000 bp. We will change the “Estimated Position” field to this value.
Shown is the new placement at 40,150,000bp. It clearly aligns well with its counterpart contig from the companion assembly.
To view a single contig and the sequence(s) that comprise it, simply click on the specified contig number.
Shown in green is the sequence that contig 5 in this example is composed of. Note that this contig is made of a single sequence. Multi-sequence contigs will be covered later under “Gap Filling and Merging Contigs”.
In this instance a gap on the bottom assembly will be filled using a sequence from the top assembly.
The blue circle emphasizes a break or gap between two contigs.
The purple square shows that the sequence in the top assembly bridges the gap on the bottom and shares alignment in the reference genome on both sides of the gap.
This criteria allows for using the top sequence to fill the bottom gap.
There are three pieces of information needed to fill this type of gap.
First the unique identification number of the sequence being used to fill the gap must be known. This is found by left clicking on the sequence (green portion) of the top contig.
Next is determining the contig number of the initial piece of the gap labeled “2”. This is easily determined by right clicking the blue portion of the first contig, and selecting “Edit Contig”. It can be seen that the contig number is “Ctg56”.
Last is the number of second contig in the bottom assembly to be filled. We can see circled in blue that it is “Ctg 57”. With these three pieces of information we can now fill the gap.
To Fill the gap, again right click on the initial contig prior to the gap to be filled, in this case Ctg56, and select “Edit Contig”. Now in the “Insert Sequence” field, enter the identification number of the sequence being used to to bridge the gap. Finally, check the box labeled “Append” and type in the number of the final contig in the sequence after the gap, then click “Save”. The gap has now been filled, however now redundant alignments need to be filtered out of the new contig.
Closing the chromosome view, we can now see that the new contig 56 contains 3 different sequences.
Entering contig view by clicking on contig 56, we can see three sequences present.
In order to visualize the alignment between sequences, they must be BLASTed against each other.
This is done by right clicking the first sequence labeled “1”, selecting the alignment tab, and selecting BLAST2SEQ.
Now alignment between each of the sequences can be visualized.
Now it is possible for the redundant overlap to be filtered out of the combined contig.
This is done by simply right clicking the yellow alignment, and selecting “Smart Redundancy Filter” for each overlapping portion.
Here we can see the final result of the gap filling and redundancy filtering process for contig 56.
The white or clear sections of the sequences are being hidden, while the green portions are being used in the final contig 56.
Before merging
After merging
If ever a misassembly is made when attempting to merge contigs or fill gaps, it is easy to correct using the “Break Contig” function.
To use this, enter the contig that is being modified and simply right click on any sequence in the contig to select “Break Contig”.
Drag and drop the “Break From Here” block in between the area that needs to be separated and click “Break”.
The separated pieces will receive new contig identification numbers.
Sometimes when attempting to fill gaps a sequence will need to be re-oriented prior to redundancy filtering.
This is simply done by right clicking the appropriate sequence and clicking “Flip Sequence”.
The result often looks much more manageable and can be more efficiently filtered for redundancy afterwards.
Occasionally misassemblies can occur and some contigs will be much longer than they should be, or contigs will be appear to be missing from certain chromosomes.
With the alignment checker function, it is easy to determine if a long sequence with little identity to the reference genome belongs elsewhere.
Right click on the sequence of interest and select “Alignment Checker” from the alignment tab.
Similar to the BLAST2SEQ function, this BLASTS the designated sequence. However the second “Sequence-2” field is left BLANK. This causes the desired sequence to be blasted to all of the sequences in the genome.
The output can be easily filtered by a variety of parameters which should provide information as to where the indicated sequence might belong.
Often there are many small (10-400kb) contigs that remain unplaced after finishing an assembly. These can be easily downloaded and BLASTed either individually or all together to determine if they are due to contamination.