Many computational tools used on large biological data sets are designed for linux servers and must be run from the command line. Although many users initially find command-line computing more frustrating than working with programs that have a graphical user interface (GUI), there are many circumstances where the UNIX command line is better suited to automating large-scale analyses. And many key software tools simply do not have a GUI, making command-line skills a necessity.
In this exercise, we will attempt to demystify the computer terminal and introduce some common command-line tools.
ssh
.cd
.pwd
.ls
.mkdir
.touch
.cp
.mv
.echo
.>
or >>
.cat
or less
.rm
. To delete a directory, use the -r
option.For many of the exercises in this workshop, we will be connecting to a remote linux server. If you are working on a Mac, open the Terminal application and use ssh
to connect to this server as follows, where XXXUSERXXX is your user name and NAME_OF_SERVER is the server we will be using in the workshop. You will also be prompted for the password that we provide in the workshop.
ssh XXXUSERXXX@NAME_OF_SERVER
If you are working on a Windows machine, you will want to open an ssh client such as PuTTY to initiate an ssh session with this server.
Once you have successfully connected, you should see a command line interface that looks like the window shown below. You will be entering commands at the “command prompt”, which is indicated by $
.
When you are in a command-line session, you are always working in a specific location or directory on the machine. When you first ssh into a remote server, you will be in your home directory. Let’s move to a different directory. This would be the equivalent of clicking on different folders on your own computer. We have created a directory called TodosSantos
in your home directory, and that directory has another directory inside of it called unix_commands
. To move into that directory enter the following command.
cd TodosSantos/unix_commands
If you want to know your current location on the server, you can use the pwd
(print working directory) command. Enter that command to confirm you have successfully changed directories.
pwd
To create a new blank file, you can use the touch
command.
touch blank_file.txt
You can copy a file with the cp
command followed by the name of the file to be copied and the name of the new file. Note that if you copy a file to a filename that already exists, the existing file will be overwritten (so be careful). Let’s, make a copy of the (blank) file and call it birthday.txt
.
cp blank_file.txt birthday.txt
To confirm that we just created the two new files, use the ls
(list) command to list all files and directories in the current working directory. You should see the names of your two new files.
ls
You can also get more detailed information about the files in the current directory, including file size, modification dates, ownership, and read/write/execute permission with the -l
option.
ls -l
Also, you can include any “hidden” files in your list with the -a
option.
ls -a
Note that we are not explicitly stating the directory for which we want to list files. If we do not specify the directory, it is assumed that we are asking for the current working directory (i.e., the one we are in).
We could also specify a diretory. For example ~
is a shorthand way to refer to your home directory, so the following command will list the files in your home directory.
ls ~
Another useful shorthand is .
which refers to your present directory. So the following command should report the same output as the ls
command by itself.
ls .
In contrast, ..
refers to “parent” directory for your current working directory (i.e., one level up in the file system). So try the following command to see what it reports.
ls ..
All of the examples so far represent shorthand ways to refer to locations on the machine. However, you can always refer to the “path” for a location, which involves the series of directory names separated by/
characters. Paths can be specified starting from the current working directory (a “relative” path) or starting from the “root” of the entire file system (an “absolute” path). You will often need to do this when referring to files or programs that are not in your current working directory.
The pwd
command that you used in part 3 above should have given you the full absolute path to your current working directory. Enter the ls
command with that full path and confirm that it produces the same output as ls .
and ls
by itself.
To generate a new directory you can use the mkdir
command. Let’s make a new directory called unix_temp
.
mkdir unix_temp
You can use the mv
command to move a file (you can also rename a file when you move it or even leave it in the same place and rename it). Let’s move the birthday.txt
file into the unix_temp
directory that we just created.
mv birthday.txt unix_temp
Confirm that you successfully moved the file by listing the files in unix_temp
with ls
.
ls unix_temp
Now, move into the directory we just created.
cd unix_temp
echo
is a simple command that just prints back whatever it is given. Try it out by entering echo
followed by your name.
echo YOUR NAME HERE
## YOUR NAME HERE
You should see that this printed your name to the screen. By default most programs will print their output to the screen. However, you can redirect the output to a file by using >
or >>
followed by the filename. There is a very important difference between these two options. >
will replace all the current contents of that file with the output (or create a new file if one does not already exist). In contrast >>
will append the output to the bottom of the content that is already in the existing file.
So let’s repeat the above command but this time redirect the output to our new birthday.txt
file.
echo YOUR NAME HERE > birthday.txt
And now add a line that states your birthday. Notice that we will now use >>
so we do not erase what we added in the previous command.
echo birthday: YOUR BIRTHDAY HERE >> birthday.txt
Then add a line that states what day of the week you were born on. If you do not know that, you can look it up with the command cal YOUR_MONTH YOUR_YEAR
.
echo born on: YOUR DAY OF THE WEEK >> birthday.txt
Let’s confirm that we successfully added content to our birthday.txt
file. There are multiple ways to view the contents of a file. First, the cat
(concatenate) function will print the contents of a file to the screen.
cat birthday.txt
## YOUR NAME HERE
## birthday: YOUR BIRTHDAY HERE
## born on: YOUR DAY OF THE WEEK
The cat
function is a fine option for a small file like this, but for large files this could take a very long time and be too much content at once. In such cases, less
is a better program because it lets you view a file one section at a time and page through it.
less birthday.txt
To exit a less
session and go back to the command prompt, type q
.
Often you will want to edit a text file. Just like you might use programs like TextEdit, Notepad, or Microsoft Word, you can use text editors that you call from the command line. One of the simplest is called nano
. Open your file in nano
with the following command.
nano birthday.txt
You should see a window like the one below.
Use the arrow keys to navigate to the bottom of the text. Then type “age: XX years old”, where XX is your age. To save your changes, type ctrl-o
and hit return/enter. Then exit the nano
session by typing ctrl-x
. You can confirm that you have successfully edited your file by viewing its contents with cat
or less
as above.
To remove your file (permanently; there is no undoing this!), you can use the rm
command as follows. Note that if you wanted to delete an entire directory and all its contents (be even more careful with this!), you would use the rm -r
followed by the directory name.
rm birthday.txt
Then use ls
to confirm that the file is now gone.
Try using the commands that you just learned to accomplish the following tasks. During the exercise, try using the following extra fearures
tab
to complete the name of a file or program name after typing part of itctrl+a
to go the beginning of a command linectrl+e
to go the end of a linectrl+u
to delete text you have typed at the command lineNew exercise
Create new directory named dna
within the unix_temp
directory.
Create a new file named dna.txt
within the dna
directory. Confirm that your new file and new directory are located where you expect.
Insert a 12-nt DNA sequence in the 5’-3’ direction into the dna.txt
file.
Insert 12 pipes (i.e., '||||||||||||'
– be sure to include quotes around the pipes because pipes have a special meaning from the command line) into the dna.txt
file.
Add the reverse complement of the original DNA sequence in the 3’-5’ direction to the file.
Display the contents of the dna.txt
file in the terminal. Are the sequences properly basepaired?
From the command line, delete the the entire dna
directory (including the dna.txt
file).
Confirm that the dna
directory was deleted.
Once you have completed the exercise, change back into the unix_commands
directory. Rember that ..
is a shortcut reference to the “parent directory” for your current directory. Practice using some of the commands in this summary file. Commands for viewing or manipulating files can be practiced on the miRNA.fa
file. Try executing the welcome.sh
script from the command line.