Todos Santos Computational Biology and Genomics 2018
In this exercise, we will go through some of the basics of Python using Jupyter Notebook, the Python interpretor, and scripts.
For a web-based book that loosely follows the exercises below, see Python for Everyone by Charles R. Severance (https://www.py4e.com/html3/).
See Python for Everyone, chapter 1 (https://www.py4e.com/html3/).
We will use Jupyter Notebook to demonstrate some of the basics of Python programming. Jupyter Notebook provides an interactive user friendly way of writing and testing snippets of code, but for more complex programs, scripts (described below) are much more efficient.
print()
function¶For our first code, we will have Python output a message using the print()
function (to execute code in the code boxes, type shift + return):
print('Hello, world!')
Python is very particular about syntax. Try removing the parentheses:
print 'Hello, world!'
Try removing the quotation marks:
print(Hello, world!)
Try changing the double quotes to single quotes:
print("Hello, world!")
Notice that Python doesen't distinguish between single and double quotes. But when delimiting something, the style must match on either endv('string'
or "string"
, not 'string"
). Sometimes it's useful to use one or the other.
If we want the message to span multiple lines, we can use triple quotes:
print('''Hello,
world!''')
Open a terminal and type python --version
at the prompt ($
). If your default version of python is 2.x, and you've installed python version 3.x, you can invoke it by typing python3
. Python 2 and Python 3 have important differences, so be sure to use Python3 or learn these differences.
At the prompt (>>>
), use the print()
function to print the phrase "I love Python!".
The Jupyter notebook and the Python intepretor are great for writing and testing snippets of code, but when we want to actually create a Python program we write the instructions in a file, often called a script, using text editing software (e.g. gedit, TextWrangler, and Notepad++).
Open a text editor and create a new document. Save the document as first_script.py
in a new folder called python_scripts
or some other descriptive name. The .py
extension is the conventional way of designating a file as a python script but it is not necessary to execute the script for Unix-based operating systems.
Everything in the file will be intepreted as code, unless its commented out with a #
.
Write a Python script that outputs the message "My first Python script!". The text editing software you use should color the code to make it easier to read. Adding the .py
extension should be sufficient to trigger syntax coloring within the file.
To execute the script from the command line, type python
(or python3
, if Python v2 is the default) followed by the name of the script (e.g. python3 first_script.py
).
Python does math in a fairly intuitive way. For example, what is 1 plus 1?
1+1
What is 1 divided by 2?
1/2
What is 2 times 3?
2*3
What is 2 cubed ('to the power of' uses the syntax **
)?
2**3
Try some more complex math. Does Python follow conventional rules for presedence of mathamatical operations?
10-3*2
In math, we often work with variables, such as x and y. Try creating a variable X
and assign a value to it, just like you would if you were doing an algebra problem.
x = 4
Now use the variable in an equation.
x*2
Try returning the value of the variable using the print()
function.
print('x')
If you followed the syntax used for the the print function in the earlier example, you probably got as output exactly what you entered. What happens if you remove the quotation marks?
print(x)
Try returning the value of a math operation using the print()
function. Test what happens when you include or exclude quotation marks.
print('4*2')
print(4*2)
Write a script that does some basic math, such as calculate the square root of 9 and stores the result as a variable, such as x
, and then prints the result to the terminal. Name the script something like math_test.py and save it in the scripts folder you created earlier.
Modify the above script so that within the print
function it specifies that the valued being returned is the variable x
(or whatever you called it). For example, the output might be: x = 3
. Hint: commas are used to separate types of things you want to return using the print function, such as a literal string (e.g. x =
), from a variable (e.g. x
).
See Python for Everyone, chapter 2 (https://www.py4e.com/html3/02-variables).
We have now seen examples of two types of values: integers and strings. As demonstrated using the print()
function, these two types of values are interpreted differently. Strings, which are just sequences of characters, such as Hello, world!, have the designation str
. Integers, which are of course whole numbers, are one type of numerical value of type int
. Floating-point numbers (numbers containing a decimal point) belong to a second type called float
. Numbers, both int
and float
type, can be treated as strings but strings cannot be treated as numbers, as we demonstrated using the print()
function.
Variables can be assigned numbers (either int
or float
) or strings (str
). For example, we can create a variable dna
, and assign a sequence of As, Cs, Ts, and Gs to it as follows (recall that variables are assigned with the syntax variable_name = value
):
dna = 'ACTG'
Because we are assigning a string to the variable dna
, as with the print()
function, the value has to be in quotes. What if we were assigning a number to the variable dna
?
dna = '5'
Things start to get a little bit tricky. If we include quotes around a number, than it becomes a string and a string no longer has a numerical value. So even though the variable dna
may appear to be a number, it will depend on if it was assigned with or without quotes.
Assign a number to a variable without quotes:
num = 5
Use the variable in a mathematical operation:
num + 5
Now try assigning a number to a variable with quotes:
num = '5'
Try using the variable in a mathematical operation:
num + 5
To figure out what type of value a variable is, use type(variable_name)
:
type(num)
Variable names can be just about anything in Python but they cannot start with a number or have spaces or special characters (underscores are ok) . It is best to give variables descriptive names and use lowercase letters, in particular the first letter should be lower case. And although it is permissible, it is sometimes confusing to give a variable the same name as a function (such as print
) and Python has ~30 special kewords that are off limits.
In Python, a statement is any code that can be executed. 1+1
is a statement. dna = "ATGCC"
is also a statement. Statements can be thought of as any action or command.
An expression is something that represents something. An expression can be a number or a string. 1+1
is an expression. Basically, anything that has a value is an expression. It can be a single value, a combination of values, variable, and operators, and calls to functions as long as it boils down to a single value. Expressions are things that need to be evaluated. As we saw earlier, in interactive mode, the value of expressions are returned upon hitting enter but in a script, they are not. Any section of code that evaluates to a value within a statment is an expression.
val = 5+7
Earlier we introduced several mathematical operators: +
, -
, /
, *
. The operator %
is called the modular operator but it returns the remainder of a division.
What is the remainder of 11/3
?
11%3
It may not seem particularly useful now, but when working with large datasets, it can come in handy.
We saw the plus operator, +
, in a mathematical context, but it can also be used to concatenate strings:
'ATG' + 'CTG'
input()
function¶Python's input function - input()
- allows you to collect input from a user and then perform actions on that input:
input()
Not particularly useful on its own, but the input can also be stored as a variable or directly incorporated into a function.
Try storing input as a variable:
x = input()
Now print the user input using the print()
function.
print(x)
What if we wanted to print a literal string and the value of a variable? Python actually makes it somewhat complicated because literal strings require quotes and variables are not interpolated if containted in quotes. We first encoutered this problem above.
Print the following statement: Your sequence is variable
., where variable
is the value assigned to the variable by the user. There are several ways to do this, earlier we used commas, for example:
print('Your sequence is', x, '.')
Alternatively you can use +
signs to delimit strings from variables. The syntax is as follows:
print("string" + variable + "string")
. A drawback to this approach is that it does not work with int
and float
type values. But int
and float
values can be converted to string type using str(value)
within most statements.
print('Your sequence is ' + x + '.')
Write a script that prompts the user for an RNA sequence and prints a friendly message with the sequence to the terminal.
Modify the script from above to prompt the user for two separate RNA sequences and then concatenate the sequences and print the result.
See Python for Everyone, chapter 3 (https://www.py4e.com/html3/03-conditional).
If we want to evaulate if something is true or false and then perform different actions depending on the outcome, we can use if-else
statements. An if-else
statement, which is common feature of most programming languages, is a conditional statement: if some condition is true, do something; else, do something different. The else part of the statement is often optional.
Let's assign a number input by the user to a variable, such as number
. This time, we'll include a user friendly prompt.
number = input('Enter a number:')
Now, we will write an if statement to determine if the number falls within a particular range. The Python syntax for an if-else
statement is:
if condition:
block of code to execute
else:
block of code to execute
Note the colon after the conditional statement and the indentation of code that is to be executed if the condition is met. Indentation is Python's way delimiting blocks of code within conditional statements and in some other contexts that we will cover later in the course. Let us determine if the variable number
is <10:
if number < 10:
print('The value is less than 10')
else:
print('The value is greater than or equal to 10.')
You probably got an error message. Look closely at the error message and try to decipher the problem.
By default, the input()
function treats user input as a string, even if it is a number. So we have to specifically tell Python to treat the value as a number. It doesn't matter when we do that as long as it is not after we use the value. To specify a value as an integer, use the syntax int(value)
:
number = int(input('Enter a number:'))
if number < 10:
print('The value is less than 10')
else:
print('The value is greater than or equal to 10.')
Or we could set the variable to an integer value in the if statement:
if int(number) < 10:
print('The value is less than 10')
else:
print('The value is greater than or equal to 10.')
We can include an else-if statement, with the syntax elif
, to distinguish between greater than 10 and equal to 10:
if int(number) < 10:
print('The value is less than 10')
elif int(number) > 10:
print('The value is greater than 10')
else:
print('The value is 10.')
But what if your number has a decimal point?
Recall that numbers containing decimal points belong to a different class called float
so we need to specify that the value belongs to the type float
.
Again store a number containing a decimal point as a variable but this time specify that the input should be treated as a floating-point number using float()
:
number = float(input('Enter a number:'))
Now try to use it in an if
statement:
if number < 10:
print('The value is less than 10')
else:
print('The value is greater than or equal to 10.')
It is often necessary to repeat an operation multiple times. In Python there are several ways to do this.
The for
loop allows you to loop over a defined list of objects. Contrast this with the while
loop, which is open ended. for
loops are typically used when we want to repeat a block of code a fixed number times:
for
loops have the following general structure:
for condition:
block of code to execute
This is a good time to introduce a new function: range()
. The range()
function allows you to specify a range of numbers to iterate through using the following syntax: range(start, stop[, step])
. Essentially, it generates a list of numbers between start
and stop
at option step
intervasl which are generally iterated over in for
loops.
Let's look at a real example:
for n in range(0,10):
print(n)
Notice it goes up to the stop value but does not include it.
By default, if you don't specify a starting number, 0 is used:
for n in range(10):
print(n)
Let's print every odd number between 1 and 10:
for n in range(1,10,2):
print(n)
What would range(-10, 10, 2) return?
for n in range(-10, 10, 2):
print(n)
The while
loop is similar in Python and bash:
while some condition is true:
block of code to execute
Let's look at an example:
n = 10
while n > 0:
print(n)
n -= 1
If we want to determine if something is in a list or string, we can use the in
operator:
seq = 'UAG'
'U' in seq
Conversely, we can use the operator not in
to determine if something is not in a list or string:
seq = 'UAG'
'T' not in seq
We can also use in
to iterate through a string or list:
for nt in seq:
print(nt)
If we want to determine if something has a particular value, we can use the ==
operator:
seq = 'AUG'
seq == 'AUG'
Let's assign a sequence to a variable and then determine the reverse complement of it using if
statements embedded within a for
loop:
seq = 'ATGCGG'
revcomp = '' # assign empty variable
for nt in seq:
if nt == 'A':
revcomp = 'T' + revcomp
elif nt == 'T':
revcomp = 'A' + revcomp
elif nt == 'C':
revcomp = 'G' + revcomp
elif nt == 'G':
revcomp = 'C' + revcomp
else:
print('Non DNA character encountered!')
break # exit out of the loop
print(revcomp)
The previous example, although not a very pythonic way of doing things, illustrates that coding involves a lot of problem solving. break
is a way to end a loop prematurely.
len()
function¶We can cacluate the length of the sequence using the len()
function:
length = len(seq)
print(length)
Python has many built in functions that make it simple to do routine tasks. Much of Python programming centers around developing custom functions.
iscript.sh
¶Write a Python script, similar to the bash script from our previous exercise, that computes the reverse, complement, reverse complement, and length of a sequence.
Use the table of comparison operators below for the problems.
OPERATOR FUNCTION
`
less than
<= less than or equal to
> greater than
>= greater than or equal to
== equal
!= not equal
PCR is often dones in 96 well plates. More high-throughput PCR assays can be done in plates containing 4 times as many reaction wells. In the Python intepretor, calculate how many PCRs can beone in a plate with 4 times 96 wells and save the value as a variable. Print the variable, along with a meassage to the terminal window while in the Python intepretor.
Write a script that assigns the values 96 and 4 to two separate variables and then uses them in an math equation to solve the product. Print the results to the terminal window.
If you didn't already do so, modify the script from above with comments describing what is being done at each step.
Modify the script above so that it prints the results to the terminal window with a friendly message spanning two lines.
Modify the script above so that it evaluates the math expression within the print
function.
Open a new Jupyter notebook and create a sample problem with a markdown
cell posing a question, such as the math operation from above, and a code
cell solving the question using Python. A third cell may be needed to print or return the answer to the question depending on how you set it up.
Write a script that prompts user for a sequence and then calculates the length and prints the result to the terminal.
Write a script that prompts user for a number and then tests whether the number is less than or equal to 10 and prints the result to the terminal.
Write a script that prompts user for a number and then calcuates the absolute value of the number and prints the absolute value to the screen.
Write a script that prompts the user for a sequences and then tests whether the sequence is DNA or RNA.
Work through chapters 1-11 of Python for Everyone. Be sure to do all exercises.
Explore other educational resources at https://wiki.python.org/moin/BeginnersGuide/Programmers
Practice, practice, practice.
Google search your problems. Stack Overflow is the most reliable resource. If you have a question, someone has already answered it.