Todos Santos Computational Biology and Genomics 2018

Introduction to Python

In this exercise, we will go through some of the basics of Python using Jupyter Notebook, the Python interpretor, and scripts.

For a web-based book that loosely follows the exercises below, see Python for Everyone by Charles R. Severance (https://www.py4e.com/html3/).

OUTLINE

  1. Python basics.
  2. Values, variables, and expressions.
  3. Conditional statments.
  4. Further Python training.

Part 1. Python basics

See Python for Everyone, chapter 1 (https://www.py4e.com/html3/).

Jupyter Notebook

We will use Jupyter Notebook to demonstrate some of the basics of Python programming. Jupyter Notebook provides an interactive user friendly way of writing and testing snippets of code, but for more complex programs, scripts (described below) are much more efficient.

The print() function

For our first code, we will have Python output a message using the print() function (to execute code in the code boxes, type shift + return):

In [1]:
print('Hello, world!')
Hello, world!

Python is very particular about syntax. Try removing the parentheses:

In [2]:
print 'Hello, world!'
  File "<ipython-input-2-788c64630141>", line 1
    print 'Hello, world!'
                        ^
SyntaxError: Missing parentheses in call to 'print'

Try removing the quotation marks:

In [3]:
print(Hello, world!)
  File "<ipython-input-3-0f666b979d54>", line 1
    print(Hello, world!)
                      ^
SyntaxError: invalid syntax

Try changing the double quotes to single quotes:

In [4]:
print("Hello, world!")
Hello, world!

Notice that Python doesen't distinguish between single and double quotes. But when delimiting something, the style must match on either endv('string' or "string", not 'string"). Sometimes it's useful to use one or the other.

If we want the message to span multiple lines, we can use triple quotes:

In [5]:
print('''Hello,
world!''')
Hello,
world!

The Python interpretor

Open a terminal and type python --version at the prompt ($). If your default version of python is 2.x, and you've installed python version 3.x, you can invoke it by typing python3. Python 2 and Python 3 have important differences, so be sure to use Python3 or learn these differences.

At the prompt (>>>), use the print() function to print the phrase "I love Python!".

Python scripts

The Jupyter notebook and the Python intepretor are great for writing and testing snippets of code, but when we want to actually create a Python program we write the instructions in a file, often called a script, using text editing software (e.g. gedit, TextWrangler, and Notepad++).

Open a text editor and create a new document. Save the document as first_script.py in a new folder called python_scripts or some other descriptive name. The .py extension is the conventional way of designating a file as a python script but it is not necessary to execute the script for Unix-based operating systems.

Everything in the file will be intepreted as code, unless its commented out with a #.

Write a Python script that outputs the message "My first Python script!". The text editing software you use should color the code to make it easier to read. Adding the .py extension should be sufficient to trigger syntax coloring within the file.

To execute the script from the command line, type python (or python3, if Python v2 is the default) followed by the name of the script (e.g. python3 first_script.py).

Math

Python does math in a fairly intuitive way. For example, what is 1 plus 1?

In [6]:
1+1
Out[6]:
2

What is 1 divided by 2?

In [7]:
1/2
Out[7]:
0.5

What is 2 times 3?

In [8]:
2*3
Out[8]:
6

What is 2 cubed ('to the power of' uses the syntax **)?

In [9]:
2**3
Out[9]:
8

Try some more complex math. Does Python follow conventional rules for presedence of mathamatical operations?

In [10]:
10-3*2
Out[10]:
4

In math, we often work with variables, such as x and y. Try creating a variable X and assign a value to it, just like you would if you were doing an algebra problem.

In [11]:
x = 4

Now use the variable in an equation.

In [12]:
x*2
Out[12]:
8

Try returning the value of the variable using the print() function.

In [13]:
print('x')
x

If you followed the syntax used for the the print function in the earlier example, you probably got as output exactly what you entered. What happens if you remove the quotation marks?

In [14]:
print(x)
4

Try returning the value of a math operation using the print() function. Test what happens when you include or exclude quotation marks.

In [15]:
print('4*2')
4*2
In [16]:
print(4*2)
8

Write a script that does some basic math, such as calculate the square root of 9 and stores the result as a variable, such as x, and then prints the result to the terminal. Name the script something like math_test.py and save it in the scripts folder you created earlier.

Modify the above script so that within the print function it specifies that the valued being returned is the variable x (or whatever you called it). For example, the output might be: x = 3. Hint: commas are used to separate types of things you want to return using the print function, such as a literal string (e.g. x =), from a variable (e.g. x).


Part 2. Values, variables, and expressions

See Python for Everyone, chapter 2 (https://www.py4e.com/html3/02-variables).

Values

We have now seen examples of two types of values: integers and strings. As demonstrated using the print() function, these two types of values are interpreted differently. Strings, which are just sequences of characters, such as Hello, world!, have the designation str. Integers, which are of course whole numbers, are one type of numerical value of type int. Floating-point numbers (numbers containing a decimal point) belong to a second type called float. Numbers, both int and float type, can be treated as strings but strings cannot be treated as numbers, as we demonstrated using the print() function.

Variables

Variables can be assigned numbers (either int or float) or strings (str). For example, we can create a variable dna, and assign a sequence of As, Cs, Ts, and Gs to it as follows (recall that variables are assigned with the syntax variable_name = value):

In [17]:
dna = 'ACTG'

Because we are assigning a string to the variable dna, as with the print() function, the value has to be in quotes. What if we were assigning a number to the variable dna?

In [18]:
dna = '5'

Things start to get a little bit tricky. If we include quotes around a number, than it becomes a string and a string no longer has a numerical value. So even though the variable dna may appear to be a number, it will depend on if it was assigned with or without quotes.

Assign a number to a variable without quotes:

In [19]:
num = 5

Use the variable in a mathematical operation:

In [20]:
num + 5
Out[20]:
10

Now try assigning a number to a variable with quotes:

In [21]:
num = '5'

Try using the variable in a mathematical operation:

In [22]:
num + 5
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-22-a928a9003eed> in <module>()
----> 1 num + 5

TypeError: Can't convert 'int' object to str implicitly

To figure out what type of value a variable is, use type(variable_name):

In [23]:
type(num)
Out[23]:
str

Variable names can be just about anything in Python but they cannot start with a number or have spaces or special characters (underscores are ok) . It is best to give variables descriptive names and use lowercase letters, in particular the first letter should be lower case. And although it is permissible, it is sometimes confusing to give a variable the same name as a function (such as print) and Python has ~30 special kewords that are off limits.

Statements

In Python, a statement is any code that can be executed. 1+1 is a statement. dna = "ATGCC" is also a statement. Statements can be thought of as any action or command.

Expressions

An expression is something that represents something. An expression can be a number or a string. 1+1 is an expression. Basically, anything that has a value is an expression. It can be a single value, a combination of values, variable, and operators, and calls to functions as long as it boils down to a single value. Expressions are things that need to be evaluated. As we saw earlier, in interactive mode, the value of expressions are returned upon hitting enter but in a script, they are not. Any section of code that evaluates to a value within a statment is an expression.

In [24]:
val = 5+7

Operators

Earlier we introduced several mathematical operators: +, -, /, *. The operator % is called the modular operator but it returns the remainder of a division.

What is the remainder of 11/3?

In [25]:
11%3
Out[25]:
2

It may not seem particularly useful now, but when working with large datasets, it can come in handy.

We saw the plus operator, +, in a mathematical context, but it can also be used to concatenate strings:

In [26]:
'ATG' + 'CTG'
Out[26]:
'ATGCTG'

The input() function

Python's input function - input() - allows you to collect input from a user and then perform actions on that input:

In [27]:
input()
ATG
Out[27]:
'ATG'

Not particularly useful on its own, but the input can also be stored as a variable or directly incorporated into a function.

Try storing input as a variable:

In [28]:
x = input()
ATG

Now print the user input using the print() function.

In [29]:
print(x)
ATG

What if we wanted to print a literal string and the value of a variable? Python actually makes it somewhat complicated because literal strings require quotes and variables are not interpolated if containted in quotes. We first encoutered this problem above.

Print the following statement: Your sequence is variable., where variable is the value assigned to the variable by the user. There are several ways to do this, earlier we used commas, for example:

In [30]:
print('Your sequence is', x, '.')
Your sequence is ATG .

Alternatively you can use + signs to delimit strings from variables. The syntax is as follows: print("string" + variable + "string"). A drawback to this approach is that it does not work with int and float type values. But int and float values can be converted to string type using str(value) within most statements.

In [31]:
print('Your sequence is ' + x + '.')
Your sequence is ATG.

Write a script that prompts the user for an RNA sequence and prints a friendly message with the sequence to the terminal.

Modify the script from above to prompt the user for two separate RNA sequences and then concatenate the sequences and print the result.


Part 3. Conditional statements

See Python for Everyone, chapter 3 (https://www.py4e.com/html3/03-conditional).

If we want to evaulate if something is true or false and then perform different actions depending on the outcome, we can use if-else statements. An if-else statement, which is common feature of most programming languages, is a conditional statement: if some condition is true, do something; else, do something different. The else part of the statement is often optional.

Let's assign a number input by the user to a variable, such as number. This time, we'll include a user friendly prompt.

In [32]:
number = input('Enter a number:')
Enter a number:5

Now, we will write an if statement to determine if the number falls within a particular range. The Python syntax for an if-else statement is:

if condition:
    block of code to execute
else:
    block of code to execute

Note the colon after the conditional statement and the indentation of code that is to be executed if the condition is met. Indentation is Python's way delimiting blocks of code within conditional statements and in some other contexts that we will cover later in the course. Let us determine if the variable number is <10:

In [33]:
if number < 10:
    print('The value is less than 10')
else:
    print('The value is greater than or equal to 10.')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-33-6cecd2aded81> in <module>()
----> 1 if number < 10:
      2     print('The value is less than 10')
      3 else:
      4     print('The value is greater than or equal to 10.')

TypeError: unorderable types: str() < int()

You probably got an error message. Look closely at the error message and try to decipher the problem.

By default, the input() function treats user input as a string, even if it is a number. So we have to specifically tell Python to treat the value as a number. It doesn't matter when we do that as long as it is not after we use the value. To specify a value as an integer, use the syntax int(value):

In [34]:
number = int(input('Enter a number:'))
Enter a number:5
In [35]:
if number < 10:
    print('The value is less than 10')
else:
    print('The value is greater than or equal to 10.')
The value is less than 10

Or we could set the variable to an integer value in the if statement:

In [36]:
if int(number) < 10:
    print('The value is less than 10')
else:
    print('The value is greater than or equal to 10.')
The value is less than 10

We can include an else-if statement, with the syntax elif, to distinguish between greater than 10 and equal to 10:

In [37]:
if int(number) < 10:
    print('The value is less than 10')
elif int(number) > 10:
    print('The value is greater than 10')
else:
    print('The value is 10.')
The value is less than 10

But what if your number has a decimal point?

Recall that numbers containing decimal points belong to a different class called float so we need to specify that the value belongs to the type float.

Again store a number containing a decimal point as a variable but this time specify that the input should be treated as a floating-point number using float():

In [38]:
number = float(input('Enter a number:'))
Enter a number:5.5

Now try to use it in an if statement:

In [39]:
if number < 10:
    print('The value is less than 10')
else:
    print('The value is greater than or equal to 10.')
The value is less than 10

Iterations

It is often necessary to repeat an operation multiple times. In Python there are several ways to do this.

The for loop allows you to loop over a defined list of objects. Contrast this with the while loop, which is open ended. for loops are typically used when we want to repeat a block of code a fixed number times:

for loops have the following general structure:

for condition:
    block of code to execute

This is a good time to introduce a new function: range(). The range() function allows you to specify a range of numbers to iterate through using the following syntax: range(start, stop[, step]). Essentially, it generates a list of numbers between start and stop at option step intervasl which are generally iterated over in for loops.

Let's look at a real example:

In [40]:
for n in range(0,10):
    print(n)
0
1
2
3
4
5
6
7
8
9

Notice it goes up to the stop value but does not include it.

By default, if you don't specify a starting number, 0 is used:

In [41]:
for n in range(10):
    print(n)
0
1
2
3
4
5
6
7
8
9

Let's print every odd number between 1 and 10:

In [42]:
for n in range(1,10,2):
    print(n)
1
3
5
7
9

What would range(-10, 10, 2) return?

In [43]:
for n in range(-10, 10, 2):
    print(n)
-10
-8
-6
-4
-2
0
2
4
6
8

The while loop is similar in Python and bash:

while some condition is true:
    block of code to execute

Let's look at an example:

In [44]:
n = 10
while n > 0:
    print(n)
    n -= 1
10
9
8
7
6
5
4
3
2
1

Membership and Identify Operators

If we want to determine if something is in a list or string, we can use the in operator:

In [45]:
seq = 'UAG'
'U' in seq
Out[45]:
True

Conversely, we can use the operator not in to determine if something is not in a list or string:

In [46]:
seq = 'UAG'
'T' not in seq
Out[46]:
True

We can also use in to iterate through a string or list:

In [47]:
for nt in seq:
    print(nt)
U
A
G

If we want to determine if something has a particular value, we can use the == operator:

In [48]:
seq = 'AUG'
seq == 'AUG'
Out[48]:
True

Putting it all together

Let's assign a sequence to a variable and then determine the reverse complement of it using if statements embedded within a for loop:

In [49]:
seq = 'ATGCGG'
revcomp = '' # assign empty variable
for nt in seq:
    if nt == 'A':
        revcomp = 'T' + revcomp
    elif nt == 'T':
        revcomp = 'A' + revcomp       
    elif nt == 'C':
        revcomp = 'G' + revcomp
    elif nt == 'G':
        revcomp = 'C' + revcomp
    else:
        print('Non DNA character encountered!')
        break # exit out of the loop
print(revcomp)
CCGCAT

The previous example, although not a very pythonic way of doing things, illustrates that coding involves a lot of problem solving. break is a way to end a loop prematurely.

The len() function

We can cacluate the length of the sequence using the len() function:

In [50]:
length = len(seq)
print(length)
6

Python has many built in functions that make it simple to do routine tasks. Much of Python programming centers around developing custom functions.

Revisiting the bash script, iscript.sh

Write a Python script, similar to the bash script from our previous exercise, that computes the reverse, complement, reverse complement, and length of a sequence.

Practice Problems

Use the table of comparison operators below for the problems.

OPERATOR    FUNCTION

`
less        than
<=          less than or equal to
>           greater than
>=          greater than or equal to
==          equal
!=          not equal

Practice Problems

  1. PCR is often dones in 96 well plates. More high-throughput PCR assays can be done in plates containing 4 times as many reaction wells. In the Python intepretor, calculate how many PCRs can beone in a plate with 4 times 96 wells and save the value as a variable. Print the variable, along with a meassage to the terminal window while in the Python intepretor.

  2. Write a script that assigns the values 96 and 4 to two separate variables and then uses them in an math equation to solve the product. Print the results to the terminal window.

  3. If you didn't already do so, modify the script from above with comments describing what is being done at each step.

  4. Modify the script above so that it prints the results to the terminal window with a friendly message spanning two lines.

  5. Modify the script above so that it evaluates the math expression within the print function.

  6. Open a new Jupyter notebook and create a sample problem with a markdown cell posing a question, such as the math operation from above, and a code cell solving the question using Python. A third cell may be needed to print or return the answer to the question depending on how you set it up.

  7. Prompt the user for a number and print a message if the number is positive and a different message if the number is negative.
  8. Modify your code in 2 to also print the absolute value of the number only if it is negative.
  9. Prompt the user for a number and identify whether the number is odd or even using only operators that we've covered so far.
  10. Prompt the user for a DNA sequence and print the sequence with 5' and 3' appended to either end (e.g. user enters ATGCT, program prints 5' ATGCT 3'). Hint: use backlash \ to negate special characters.
  11. Prompt the user for two separate sequences of DNA, concatenate them into one, and print the concatenated sequence with a descriptive message.
  12. Prompt the user for two sequences and print the concatenation of the two sequences using a single statement with only one line of code.
  13. Write a script that prompts user for a sequence and then calculates the length and prints the result to the terminal.

  14. Write a script that prompts user for a number and then tests whether the number is less than or equal to 10 and prints the result to the terminal.

  15. Write a script that prompts user for a number and then calcuates the absolute value of the number and prints the absolute value to the screen.

  16. Write a script that prompts the user for a sequences and then tests whether the sequence is DNA or RNA.


Part 4. Further Python training

Work through chapters 1-11 of Python for Everyone. Be sure to do all exercises.

Explore other educational resources at https://wiki.python.org/moin/BeginnersGuide/Programmers

Practice, practice, practice.

Google search your problems. Stack Overflow is the most reliable resource. If you have a question, someone has already answered it.