Background

Many computational tools used on large biological data sets are designed for linux servers and must be run from the command line. Although many users initially find command-line computing more frustrating than working with programs that have a graphical user interface (GUI), there are many circumstances where the UNIX command line is better suited to automating large-scale analyses. And many key software tools simply do not have a GUI, making command-line skills a necessity.

Objectives

In this exercise, we will attempt to demystify the computer terminal and introduce some common command-line tools.

Software and Dependencies

Protocol

1. Establish ssh session with linux server

For many of the exercises in this workshop, we will be connecting to a remote linux server. If you are working on a Mac, open the Terminal application and use ssh to connect to this server as follows, where XXXUSERXXX is your user name and NAME_OF_SERVER is the server we will be using in the workshop. You will also be prompted for the password that we provide in the workshop.

ssh XXXUSERXXX@NAME_OF_SERVER

If you are working on a Windows machine, you will want to open an ssh client such as PuTTY to initiate an ssh session with this server.

Once you have successfully connected, you should see a command line interface that looks like the window shown below. You will be entering commands at the “command prompt”, which is indicated by $.


2. Change directories

When you are in a command-line session, you are always working in a specific location or directory on the machine. When you first ssh into a remote server, you will be in your home directory. Let’s move to a different directory. This would be the equivalent of clicking on different folders on your own computer. We have created a directory called TodosSantos in your home directory, and that directory has another directory inside of it called unix_commands. To move into that directory enter the following command.

cd TodosSantos/unix_commands


3. Determine current working directory

If you want to know your current location on the server, you can use the pwd (print working directory) command. Enter that command to confirm you have successfully changed directories.

pwd


4. Create new file and make a copy of it

To create a new blank file, you can use the touch command.

touch blank_file.txt

You can copy a file with the cp command followed by the name of the file to be copied and the name of the new file. Note that if you copy a file to a filename that already exists, the existing file will be overwritten (so be careful). Let’s, make a copy of the (blank) file and call it birthday.txt.

cp blank_file.txt birthday.txt


5. List files in current directory

To confirm that we just created the two new files, use the ls (list) command to list all files and directories in the current working directory. You should see the names of your two new files.

ls

You can also get more detailed information about the files in the current directory, including file size, modification dates, ownership, and read/write/execute permission with the -l option.

ls -l

Also, you can include any “hidden” files in your list with the -a option.

ls -a

Note that we are not explicitly stating the directory for which we want to list files. If we do not specify the directory, it is assumed that we are asking for the current working directory (i.e., the one we are in).

We could also specify a diretory. For example ~ is a shorthand way to refer to your home directory, so the following command will list the files in your home directory.

ls ~

Another useful shorthand is . which refers to your present directory. So the following command should report the same output as the ls command by itself.

ls .

In contrast, .. refers to “parent” directory for your current working directory (i.e., one level up in the file system). So try the following command to see what it reports.

ls ..

All of the examples so far represent shorthand ways to refer to locations on the machine. However, you can always refer to the “path” for a location, which involves the series of directory names separated by/characters. Paths can be specified starting from the current working directory (a “relative” path) or starting from the “root” of the entire file system (an “absolute” path). You will often need to do this when referring to files or programs that are not in your current working directory.

The pwd command that you used in part 3 above should have given you the full absolute path to your current working directory. Enter the ls command with that full path and confirm that it produces the same output as ls . and ls by itself.


6. Create a new directory and move file into it

To generate a new directory you can use the mkdir command. Let’s make a new directory called unix_temp.

mkdir unix_temp

You can use the mv command to move a file (you can also rename a file when you move it or even leave it in the same place and rename it). Let’s move the birthday.txt file into the unix_temp directory that we just created.

mv birthday.txt unix_temp

Confirm that you successfully moved the file by listing the files in unix_temp with ls.

ls unix_temp

Now, move into the directory we just created.

cd unix_temp


7. Add content to file

echo is a simple command that just prints back whatever it is given. Try it out by entering echo followed by your name.

echo YOUR NAME HERE
## YOUR NAME HERE

You should see that this printed your name to the screen. By default most programs will print their output to the screen. However, you can redirect the output to a file by using > or >> followed by the filename. There is a very important difference between these two options. > will replace all the current contents of that file with the output (or create a new file if one does not already exist). In contrast >> will append the output to the bottom of the content that is already in the existing file.

So let’s repeat the above command but this time redirect the output to our new birthday.txt file.

echo YOUR NAME HERE > birthday.txt

And now add a line that states your birthday. Notice that we will now use >> so we do not erase what we added in the previous command.

echo birthday: YOUR BIRTHDAY HERE >> birthday.txt

Then add a line that states what day of the week you were born on. If you do not know that, you can look it up with the command cal YOUR_MONTH YOUR_YEAR.

echo born on: YOUR DAY OF THE WEEK >> birthday.txt


8. View contents of file

Let’s confirm that we successfully added content to our birthday.txt file. There are multiple ways to view the contents of a file. First, the cat (concatenate) function will print the contents of a file to the screen.

cat birthday.txt
## YOUR NAME HERE
## birthday: YOUR BIRTHDAY HERE
## born on: YOUR DAY OF THE WEEK

The cat function is a fine option for a small file like this, but for large files this could take a very long time and be too much content at once. In such cases, less is a better program because it lets you view a file one section at a time and page through it.

less birthday.txt

To exit a less session and go back to the command prompt, type q.


9. Edit file with a command line text editor

Often you will want to edit a text file. Just like you might use programs like TextEdit, Notepad, or Microsoft Word, you can use text editors that you call from the command line. One of the simplest is called nano. Open your file in nano with the following command.

nano birthday.txt

You should see a window like the one below.



Use the arrow keys to navigate to the bottom of the text. Then type “age: XX years old”, where XX is your age. To save your changes, type ctrl-o and hit return/enter. Then exit the nano session by typing ctrl-x. You can confirm that you have successfully edited your file by viewing its contents with cat or less as above.


10. Delete file

To remove your file (permanently; there is no undoing this!), you can use the rm command as follows. Note that if you wanted to delete an entire directory and all its contents (be even more careful with this!), you would use the rm -r followed by the directory name.

rm birthday.txt

Then use ls to confirm that the file is now gone.


11. Practice UNIX commands

Try using the commands that you just learned to accomplish the following tasks. During the exercise, try using the following extra fearures

  • Up-arrow to recall the commands from previous lines
  • tab to complete the name of a file or program name after typing part of it
  • ctrl+a to go the beginning of a command line
  • ctrl+e to go the end of a line
  • ctrl+u to delete text you have typed at the command line

New exercise

  1. Create new directory named dna within the unix_temp directory.

  2. Create a new file named dna.txt within the dna directory. Confirm that your new file and new directory are located where you expect.

  3. Insert a 12-nt DNA sequence in the 5’-3’ direction into the dna.txt file.

  4. Insert 12 pipes (i.e., '||||||||||||' – be sure to include quotes around the pipes because pipes have a special meaning from the command line) into the dna.txt file.

  5. Add the reverse complement of the original DNA sequence in the 3’-5’ direction to the file.

  6. Display the contents of the dna.txt file in the terminal. Are the sequences properly basepaired?

  7. From the command line, delete the the entire dna directory (including the dna.txt file).

  8. Confirm that the dna directory was deleted.

  9. Once you have completed the exercise, change back into the unix_commands directory. Rember that .. is a shortcut reference to the “parent directory” for your current directory. Practice using some of the commands in this summary file. Commands for viewing or manipulating files can be practiced on the miRNA.fa file. Try executing the welcome.sh script from the command line.