Chapter 26 Varible in Perl

Perl provides three kinds of variables: scalars, arrays, and hash(aka associative arrays). The initial character of the name identifies the particular type of variable and, hence, its functionality.

Type Character Example Is a name for:
Scalar $ $length An individual value (number or string)
Array @ @gene_list A list of values, keyed by number
Hash % %gene_annotation A group of values, keyed by string

26.1 Scalar variable

In Perl, scalar variables can be used to store mainly two types of data: string and numbers.

Perl does not differentiate between a number and a string, nor does it differentiate between integers and reals.

In order to tell the computer what to print, we need to use variables. In Perl, the name of a scalar variable starts with the dollar sign $. You can assign either a number or a string to it.

#!/usr/bin/perl
use warnings;
use strict;

#assign two strings to two variables
my $dna_seq1 = "ACCTCGGTACAGTGAATGGGAAACGTAGCTGAT";
my $dna_seq2 = "TGCCGATCGTAATAGCTCGCTATCTAGCTCGATCGTCGTA";

#Returns the length in characters of the value of EXPR
my $dna_length1 = length $dna_seq1;
my $dna_length2 = length $dna_seq2;

print "The length of first DNA sequence ($dna_seq1) is: $dna_length1\n";
print "The length of the Second DNA sequence ($dna_seq2) is: $dna_length2\n";
perl code_perl/variable_assign.pl
## The length of first DNA sequence (ACCTCGGTACAGTGAATGGGAAACGTAGCTGAT) is: 33
## The length of the Second DNA sequence (TGCCGATCGTAATAGCTCGCTATCTAGCTCGATCGTCGTA) is: 40

Here are the explanations of this script.

# assign two DNA sequences to two variables
my $dna1 = "CTCGACCAGGACGATGAATGGGCGATGAAAATCT";
my $dna2 = "CGCTAAACGCTAAACCCTAAACGCTAAACCTCTGAATCCTTAATCGCT";

The first line here is a comment. In Perl, # (pound sign) is the comment character. The comments can be used to document the program and improve the readability. During the execution, the comments will be ignored. The second and third lines declare two string scalar variables ($dna1 and $dna2) to store two DNA sequences.

#Returns the length in characters of the value of EXPR
my $dna_length1 = length $dna_seq1; 
my $dna_length2 = length $dna_seq2;

The code above has one comment line and declares two integer scalar variables to store the return values of built-on function length.

Now you can output the variables to see what are stored in them using the following code:

print "The length of first DNA sequence ($dna_seq1) is: $dna_length1\n";
print "The length of the Second DNA sequence ($dna_seq2) is: $dna_length2\n\n";

This script shows you what string scalar varibles and integer scalr varibles.

26.2 Arithmetic operations in Perl

Most arithmetic operators are binary operators; this means they take two arguments. Unary operators only take one argument. Arithmetic operators are very simple and often transparent.

Here we’re mainly going to talk about basic arithmetic opertators including addition (+), substraction (-), multiplication (*), division (/) and the modulus operation (%). Modulus (%) returns the remainder of a division (/) operation.

## If we know the lengths of two sequences and we want to calulate the sum of the lengths.
my $dna_length1 = 100;
my $dna_length2 = 200;
my $total_length = $dna_length1 + $dna_length2;
print "The total length of these two sequences is $total_length \n";

my $length_diff = $dna_length1 - $dna_length2; 
print "The length difference is $length_diff\n";

my $average_length = $total_length / 2;
print "The average length of two DNA sequences is $average_length \n";
## The total length of these two sequences is 300 
## The length difference is -100
## The average length of two DNA sequences is 150

The code above shows us how to

#Imaging the CC content of first DNA sequence is 0.5.
#How many CC do we have in first DNA?
my $cg_content = 0.5;
my $cg_number = $dna_length1 * 0.5;
#Imaging we have a 10 bps DNA sequnces,
#how many possible DNA sequences do we have?
my $dna_nucleotide = "ATCG";
my $dna_nucleo_number = length $dna_nucleotide;
my $dna_length = 10;
my $possible_number = $dna_nucleo_number ** $dna_length;
print "We have $possible_number possibilities.\n";
## We have 1048576 possibilities.

26.2.1 Shorthand operations

The expression $x += 3; is the shorthand version of $x = $x + 3;, they have exactly the same result:

use strict;
use warnings;

my $dna_length = 10;
print "DNA length: $dna_length\n";

$dna_length += 3;
print "DNA length: $dna_length after '\$dna_length += 3'\n";

$dna_length -= 3;  
print "DNA length: $dna_length after '\$dna_length -= 3'\n";
## DNA length: 10
## DNA length: 13 after '$dna_length += 3'
## DNA length: 10 after '$dna_length -= 3'

26.2.2 Auto increment and auto decrement

++ and -- are provided for the auto increment and auto decrement operators. They increase and decrease respectively the value of a scalar variable by 1.

use strict;
use warnings;

my $dna_length = 10;
print "DNA length: $dna_length\n";

$dna_length ++;
print "DNA length: $dna_length after '\$dna_length ++'\n";

$dna_length --;
print "DNA length: $dna_length after `\$dna_length --`\n";
## DNA length: 10
## DNA length: 11 after '$dna_length ++'
## DNA length: 10 after `$dna_length --`

26.2.3 inf and -inf in Perl

In Bioinformaticics analysis, you may need to calculate fold change of expression between two samples (let’s say sample 1/sample 2). If sample 2 has an expression level of 0,

Deviding a zero will lead an eror message: Illegal division by zero at ...

print 2/0, "\n";

But Perl can recognize special values inf,+inf,-inf`:

When we use them in a multiplication, we get “inf” again:

$result = "inf" * 100;
print "$result\n";
## Inf
$result = "-inf" * 0;
print "$result\n";
## NaN
$result = "-inf" * 1;
print "$result\n";
## -Inf

Dividing by an infinity variant returns its limit, which is 0:

$result = 1/inf;
print "$result\n";
## 0
$result = 1/-inf;
print "$result\n";
## 0

You might see -0 if the underlying C library handles that. It’s the same as regular 0.

26.3 use strict; user warnings and my

For starters, use strict; (and to a lesser extent, use warnings;) helps find typos in variable names. Even experienced programmers make such errors. A common case is forgetting to rename an instance of a variable when cleaning up or refactoring code.

Using use strict; use warnings; catches many errors sooner than they would be caught otherwise, which makes it easier to find the root causes of the errors. The root cause might be the need for an error or validation check, and that can happen regardless of programmer skill.

What’s good about Perl warnings is that they are rarely spurious, so there’s next to no cost to using them.

In the script below, $dna_lenght2 is a typo. If you run this script, it will give you the output without any error message, although it’s not the right output.

#!/usr/bin/perl

#assign 2 numbers to 2 variable
$dna1 = "CTCGACCAGGACGATGAATGGGCGATGAAAATCT";
$dna2 = "CGCTAAACGCTAAACCCTAAACGCTAAACCTCTGAATCCTTAATCGCT";

#Returns the length in characters of the value of EXPR
$dna_length1 = length $dna1;
$dna_length2 = length $dna2;

print "First DNA: $dna1; Length: $dna_length1\n";
print "Second DNA: $dna2; Length: $dna_length2\n";

#calculate the total length of two DNA sequences
$tot_length = $dna_length1 + $dna_lenght2;
print "The total length of two DNA sequences is $tot_length \n";

Let’s try to run this script:

perl code_perl/var_assign_no_strict_warnings.pl
First DNA: CTCGACCAGGACGATGAATGGGCGATGAAAATCT; Length: 34
Second DNA: CGCTAAACGCTAAACCCTAAACGCTAAACCTCTGAATCCTTAATCGCT; Length: 48
The total length of two DNA sequences is 34 

So the script above is supposed to output the length of two DNA sequences and the sum of the lengths.

In the chunk of code above, $dna_lenght2 is an empty varible without storing any information. By defaut, Perl considers this as ZERO when doing plus operation. Although there was no error message given here, we infact have an incorrect output.

If we add use strict and use warnings, we need to decare each variable in the script. Let us see what will happen if we have an typo.

#!/usr/bin/perl
use warnings;
use strict;

#assign 2 numbers to 2 variable
my $dna1 = "CTCGACCAGGACGATGAATGGGCGATGAAAATCT";
my $dna2 = "CGCTAAACGCTAAACCCTAAACGCTAAACCTCTGAATCCTTAATCGCT";

#Returns the length in characters of the value of EXPR
my $dna_length1 = length $dna1;
my $dna_length2 = length $dna2;

print "First DNA: $dna1; Length: $dna_length1\n";
print "Second DNA: $dna2; Length: $dna_length2\n";

#calculate the total length of two DNA sequences
my $tot_length = $dna_length1 + $dna_lenght2;
print "The total length of two DNA sequences is $tot_length \n";
Global symbol "$dna_lenght2" requires explicit package name (did you forget to declare "my $dna_lenght2"?) at code_perl/var_assign_strict_warnings.pl line 17.
Execution of code_perl/var_assign_strict_warnings.pl aborted due to compilation errors.

Now if we run this script, we encounter error mesage and the script can’t be sucessfuly excuted.

Global symbol "$dna_lenght2" requires explicit package name 
(did you forget to declare "my $dna_lenght2"?) at code_perl/
var_assign_strict_warnings.pl line 17.
Execution of code_perl/var_assign_strict_warnings.pl aborted
 due to compilation errors.

26.4 Array

26.4.1 Init an array

my @base_pair = ('A', 'T', 'C', "C", "G");
print "";
print @base_pair, "\n";
print join("\t", @base_pair), "\n";
## ATCCG
## A    T   C   C   G

26.4.2 Array index

my @base_pair = ('A', 'T', 'C', "C", "G");
### Extract the first element in the array
my $first_base = $base_pair[0];
print "First base is: $first_base\n\n"; 

### Extract the last element in the array
my $last_base = $base_pair[4];
print "Last base is: $last_base\n\n"; 

### Extract the last element in the array using index `-1`
my $last_base = $base_pair[-1];
print "Last base using '-1' is: $last_base\n\n"; 

### Extract the last element in the array using index `$#`
my $last_base = $base_pair[$#base_pair];
print "Last base using '\$#' is : $last_base\n\n"; 
## First base is: A
## 
## Last base is: G
## 
## Last base using '-1' is: G
## 
## Last base using '$#' is : G

26.4.3 Length of the array

To get the length of the array, you can use scalar @array or just array.

my @gene_expr = (1, 3, 10);
my $len = scalar @gene_expr;  ## or @gene_expr
print "Array has: @gene_expr\n";
print "Length of array: ", $len, "\n";
## Array has: 1 3 10
## Length of array: 3

Another way is to use $#array. $#array will return the index of the last element. Since the index starts with 0, to get the length we use $#array + 1.

If the length of array is empty, $#array will return -1.

my @gene_expr = ();
my $len = $#gene_expr + 1;
print "Array has $len element\n";
print "Length of array: ", $len, "\n";
## Array has 0 element
## Length of array: 0

26.4.4 Sort Arrays in Perl

26.4.4.1 Sort alphebetically

my @base_pair = ('A', 'T', 'C', "C", "G");
my @sorted_bp = sort @base_pair;
print "Array before sorted: ", "@base_pair", "\n";
print "Array after sorted: ", "@sorted_bp", "\n";
## Array before sorted: A T C C G
## Array after sorted: A C C G T

26.4.4.2 Sort numerically

To sort an array numerically, we use spaceship operator: <=>.

my @genome_coor = (100, 300, 200, 500);
my @sorted_coor = sort {$a <=> $b} @genome_coor;

print "Array before sorted: ", "@genome_coor", "\n";
print "Array after sorted: ", "@sorted_coor", "\n";
## Array before sorted: 100 300 200 500
## Array after sorted: 100 200 300 500

Similarly, array can be also sorted numerically in decreasing order.

my @genome_coor = (100, 300, 200, 500);
## {$a <=> $b} is modified as {$b <=> $a} 
my @sorted_coor = sort {$b <=> $a} @genome_coor;  

print "Array before sorted: ", "@genome_coor", "\n";
print "Array after sorted: ", "@sorted_coor", "\n";
## Array before sorted: 100 300 200 500
## Array after sorted: 500 300 200 100

26.4.5 Use push, pop, shift and unshift in Perl

26.5 Hash in Perl

A Perl hash is defined by key-value pairs. Perl stores elements of a hash in such an optimal way that you can look up its values based on keys very fast.

With the array, you use indices to access its elements. However, you must use descriptive keys to access hash’s element. A hash is sometimes referred to as an associative array.

Like a scalar or an array variable, a hash variable has its own prefix. A hash variable must begin with a percent sign ( %). The prefix % looks like key/value pair so remember this trick to name the hash variables.

The following example defines a simple hash.

my %gene_info = ("gene1"=>"ROS",
                 "gene2"=>"WRKY",
                 "gene3"=>"WRKY"
                );

print "$gene_info{gene1}\n";
print "$gene_info{gene2}\n";
print "$gene_info{gene3}\n";
## ROS
## WRKY
## WRKY
my %gene_info = ("gene1"=>"ROS",
                 "gene2"=>"WRKY",
                 "gene3"=>"WRKY"
                );
my @hash_key  = keys %gene_info;
@hash_key = sort @hash_key;
#print "Keys of \%gene_info: ", @hash_key, "\n";
print "Keys of \%gene_info: ", join("\t", sort keys %gene_info), "\n";
#print "Keys of \%gene_info: ", keys %gene_info, "\n";
## Keys of %gene_info: gene1    gene2   gene3
my %gene_info = ("gene1"=>"ROS",
                 "gene2"=>"WRKY",
                 "gene3"=>"WRKY"
                );
foreach(sort keys %gene_info){
    print "The value of  is: ", $gene_info{}, "\n";
}
## The value of gene1 is: ROS
## The value of gene2 is: WRKY
## The value of gene3 is: WRKY

26.5.1 Sort keys numerically using sort {$a<=>$b}

my %read_dep = ("1"=>"100",
                 "2"=>"200",
                 "3"=>"50",
                 "10"=>"20",
                );

### sort in increasing order
foreach(sort {$a<=>$b} keys %read_dep){
    print "The value of  is: ", $read_dep{}, "\n";
}

print "\nSort in descending order: \n";
### sort {$b<=>$a} if you want to sort in descending order python
foreach(sort {$b<=>$a} keys %read_dep){
    print "The value of  is: ", $read_dep{}, "\n";
}
## The value of 1 is: 100
## The value of 2 is: 200
## The value of 3 is: 50
## The value of 10 is: 20
## 
## Sort in descending order: 
## The value of 10 is: 20
## The value of 3 is: 50
## The value of 2 is: 200
## The value of 1 is: 100

26.5.2 Use exists function on a hash

Given an expression that specifies an element of a hash, exists returns true if the specified element in the hash has ever been initialized, even if the corresponding value is undefined.

my %gene_info = ("gene1"=>"ROS",
                 "gene2"=>"WRKY",
                 "gene3"=>"WRKY"
                );
if(exists $gene_info{"gene1"}){
    print "The key 'gene1' exists in \%gene_info\n";
}
## The key 'gene1' exists in %gene_info