Chapter 31 Practical Perl program

31.1 Add annotation information to DESeq2 results

Imaging we have a table that stores that differentially expressed genes information.

It includes three columns (tab-delimited):

cat data/DEG_list.txt
## #gene_id log2fc  p-val
## gene1    2   0.01
## gene2    3   0.04
## gene3    -2  0.06
## gene4    -8  0.001

We have another table which has the annotation information of each gene:

cat data/gene_annotation.txt
## gene_id  gene_shortname
## gene1    ROS
## gene2    WRKY
## gene3    ZmCCT
## gene4    WRKY1
## gene5    WRKY2
## gene6    wrky
## gene10   MCU

31.2 Merge overlap genomic regions

#!/usr/bin/perl -w
use strict;

my ($DMR, $out) = ;

open DMR, $DMR or die "File not found: ";
open OUT, "+>$out" or die "";

#read  a  line,
# go to next line
#     check overlap
#      if overlap: merge two regions. Then go to the next line.
#      if not overlap: print whatever we have now; then start from a line
while(my $region = <DMR>){
    chomp $region;
    my ($chrom1, $start1, $end1) = split(/\t/, $region);
    BLOCK:{
        my $new_region = <DMR>;

        if(!$new_region){
            print OUT "$chrom1\t$start1\t$end1\n";
        }else{
            chomp $new_region;
            my ($chrom2, $start2, $end2) = split(/\t/, $new_region);
             ## start1===============end1
             ##           start2=====================end2
             ## start1===============================new_end1
            if($chrom2 eq $chrom1 && $start2 >= $start1 && $start2 <=$end1){
                $end1 = $end2;
                redo BLOCK;
            }else{
                print OUT "$chrom1\t$start1\t$end1\n";
                ($chrom1, $start1, $end1) = ($chrom2, $start2, $end2);
                redo BLOCK;
            }
        }
    }
}
cat data/DMR_region.txt
## chr1 100 200
## chr1 150 250
## chr1 200 300
## chr1 500 600
## chr1 550 650
## chr2 300 800
## chr2 400 1000
## chr3 500 1000
perl code_perl/genomic_coordinate_merge.pl  data/DMR_region.txt  data/DMR_region_merged.txt 
cat data/DMR_region_merged.txt 
## chr1 100 300
## chr1 500 650
## chr2 300 1000
## chr3 500 1000