Quick Reviews on Python Programming Basics

Asst. Prof. Teerasak E-kobon

11 Jan 2021

The previous Python class prepared you with basics in the Python programming which will be quickly reviewed in this chapter. Different types of data have been used to create programs/codes/scripts. Constants have fixed immutable values (integer, float, complex number, boolean (TRUE or FALSE), none, character) while variable values can be changed. There are several types of data collection which includes string ("ATGCAATT"), tuple (A = ('A', 'T', 'G', 'C')), list (A = [1, 2, 3, 4]), set (A = {'A', 'T', 'G', 'C'}), and dictionary (A = {"UAA":"STOP", "UGA":"STOP", "UAG":"STOP"}). 

Several operations can process these data: mathematical operation (+, -, *, /, // (floored division), % (remainder), **(power), +=, *=), logical operation (>, <, ==), assignment operation (=), and many other functions previously developed (such as index(), [ : ] for indexing, in for membership checking, not in for non-membership, print(), find(), join(), split(), range(), etc). The control statements allow conditional checking (if, if...else, and if...elif...else) and repetition (for and while loops, continue, and break). Module importing with the import statement allows the programmer to well organize and reuse any existing codes. 

The Python programmers could create new functions that handle the provided arguments/parameters and return the outputs using the def function(): statement.  Functions working for the same tasks can be grouped in the same module. For example, the creation of reverse-complement function:

Example 1 A function for reverse complement

def reverseComplement(sequence, isDna=True):

     from string import maketrans

     if isDna:

          sequence = sequence.replace('U','T')

          transTable = maketrans('ATGC', 'TACG')

     else:

          sequence = sequence.replace('T','U')

          transTable = maketrans('AUGC', 'UACG')

     complement = sequence.translate(transTable)

     reverseComp = complement[::-1]

     return reverseComp

Question 1: From this code example, do you understand the code? Can you execute the codes on your Python terminal? Does this function work when you tested?

Python can also read in and write out large files using open() to access and represent the opened file, close() to represent the file closing, read() and write() with modes of r for reading the file content and w for writing the file. For example, we can read the FASTA file by using this readFastaFile() function.

Example 2 Function to read a FASTA file

def readFastaFile(fileName):

     fileObj = open(fileName, 'rU')

     sequences = []

     seqFragments = []

     for line in fileObj:

          if line.startswith('>'):

               # found start of next sequence

               if seqFragments:

                    sequence = ''.join(seqFragments)

                    sequences.append(sequence)

               seqFragments = []

          else:

               # found more of existing sequence

               seq = line.rstrip() # remove newline character

               seqFragments.append(seq)

     if seqFragments:
     # should be the case if file is not empty

     sequence = ''.join(seqFragments)

     sequences.append(sequence)

     fileObj.close()

     return sequences

Question 2 Can you try this function on your computer with any available FASTA files that you download from the NCBI database? How could you explain the codes?

Similarly, writing data to a file can be achieved by the write() function as shown in the below example. The writeFastaSeq() function helps the generation of multiple sequences in one FASTA file.

Example 3 A function to create a multi-FASTA file

def writeFastaSeqs(comments, sequences, fastaFile, width=60):

     fileObj = open(fastaFile, 'w')

     for i, seq in enumerate(sequences):

          numLines = 1 + (len(seq)-1)//width

          seqLines = [seq[width*x:width*(x+1)] for x in range(numLines)]

     seq = '\n'.join(seqLines)

     fileObj.write('> %s\n%s\n' % (comments[i], seq))

     fileObj.close()

Question 3 Can you run this function to generate a multi-FASTA file using your own set-up arguments of comments (the line starting with '>'), sequences, and file name?

 

Question 4 If we want to read data from a csv file (each data is separated by a comma ','.), can you suggest the codes and demonstrate how to do this?

At the end of Python 311 and 312, the idea of object-oriented programming (OOP) has been introduced. The Python object consists of attributes and methods/functions. Similar objects can be grouped in the same class. A new object of the class can be instantiated and modified. When creating the new object, the constructor function ( def __init__(self): ) is always called within the class. The OOP programming idea supports the solving of a complex problem by breaking the problem into smaller tasks and solving by different methods in the class or from different classes. Consider the following example of molecule, protein and amino acid classes.

Example 4 Codes for the molecule, protein, and amino acid classes

class Molecule:

     def __init__(self, name):

          if not name:

               raise Exception('name must be set to something')

          self.name = name

     def getName(self):

          return self.name

     def getCapitalisedName(self):

          name = self.getName()

          return name.capitalize()

 

class Protein(Molecule):

     def __init__(self, name, sequence):

          Molecule.__init__(self, name)

          self.aminoAcids = []

          for code in sequence:

               aminoAcid = AminoAcid(code)

               self.aminoAcids.append(aminoAcid)

     def getAminoAcids(self):

          return self.aminoAcids

     def getSequence(self):

          return [aminoAcid.code for aminoAcid in self.aminoAcids]

     def getMass(self):

          mass = 18.02 # N-terminus H and C-terminus OH

          aminoAcids = self.getAminoAcids()

          for aminoAcid in aminoAcids:

               mass += aminoAcid.getMass()

          return mass

 

class AminoAcid:

     massDict = { "A": 71.07, "R":156.18, "N":114.08, "D":115.08,

          "C":103.10, "Q":128.13, "E":129.11, "G": 57.05,

          "H":137.14, "I":113.15, "L":113.15, "K":128.17,

          "M":131.19, "F":147.17, "P": 97.11, "S": 87.07,

          "T":101.10, "W":186.20, "Y":163.17, "V": 99.13 }

     acceptableCodes = set(massDict.keys())

     def __init__(self, code):

          if code not in self.acceptableCodes:

               text = 'code = "%s", must be in list %s'

               raise Exception(text % (code, sorted(self.acceptableCodes)))

          self.code = code

     def getMass(self):

          return self.massDict[self.code]

 

 

Question 5 Can you explain how the three classes work? Please also demonstrate how to use these classes.

Assignment 1: The activity for this chapter requires each student to answer/explain Questions 1-5 and submit the codes linked to each question in the Google Classroom.

******************