As you begin to develop your own scripts (in Bash, R, Python, Perl, etc.) for performing bioinformatic analyses, three challenges will quickly emerge: 1) keeping track of different versions of code as you edit and add to your scripts, 2) publishing and sharing your code with the research community, and 3) collaborating with other researchers in such a way that multiple individuals can be modifying parts of the code at the same time. GitHub offers a means to address all three of these challenges. GitHub is based on software called Git, which performs “version control” by tracking the history of changes in a file. GitHub is a centralized location to store code in publicly available repositories. These repositories can then be “cloned” to local machines so that many users can run your code. You can also modify your files and re-upload them to the GitHub repository, which will store multiple versions and capture the timeline of editing. Although we will not address collaborative development of code in this exercise, GitHub also offers tools to manage and merge alternative versions of code that are being modified in parallel by different developers.
The goal of this exercise will be to introduce you to the basic functionality of GitHub. If you do not already have one, you will create a GitHub account. You will then create a new public repository on GitHub and clone it to your local machine, where you will edit it and add content. Finally, you will push those modifications back to the public GitHub repository so that your changes are available to the world.
If you do not already have a GitHub account, use a web browser and navigate to the GitHub website. Follow the instructions to create an account. The core functionality of a GitHub account is free, so do not choose any options that require paying money.
Using a web browser, navigate to your GitHub repository page (https://github.com/XXXUSERXXX?tab=repositories), where XXXUSERXXX is the user name you just created for your GitHub account. You will also need to make sure that you are signed on to your account, but you already will be if you just created the account in step 1.
From this page, click the “New” button to create a new repository.
On the resulting screen…
Open a terminal session. If you have git command line tools installed on your machine, you can perform this exercise locally. If not, you can ssh to the linux server for the workshop.
Now, change directories (cd
) to a location where you would like to download the GitHub repository that you just made.
Then enter the following command to download your repository, where XXXUSERXXX is your GitHub user name.
git clone https://github.com/XXXUSERXXX/todos_santos_test.git
Once, this action completes, you should see that it has created a new directory called todos_santos_test
, which contains a single file called README.md
(technically, there are also some hidden files in a directory called .git
, which Git will use to track changes within this directory).
Open the README.md
file in the text editor of your choice. This is a simple text file written in the language Markdown. This language has a relatively user-friendly and readable syntax that can be converted to HTML for displaying to users on the GitHub website. You would typically use this document to provide information and instructions to users of your code. For our purposes today, just add any text you want (e.g., your name, the date, whatever is on your mind, etc.). But make sure to add something and then save the file. If you like, you can do this from the command line:
cd todos_santos_test
echo adding some new content >> README.md
In addition to editing the README.md
, let’s add a new file to this folder. This can be also done with the echo
command (or just by copying any file of your choosing into the directory).
echo some text of your choosing > new_file.txt
Now the repository on your local machine is different than the one you cloned from GitHub. If you want to update GitHub to reflect your changes, there is a simple series of commands to follow.
First, let’s confirm that Git is properly tracking our edits with the following command.
git status
You should see that your README.md
files is flagged as being modified and new_file.txt
is flagged as being an untracked (i.e., new) file.
Now enter the following command to add your changes such that they are included next time you “commit” your changes. The -A
flag applies this command to all files (you could also add individual files of your choosing).
git add -A
Now commit your changes with the git commit
command. The -m
flag allows you to add a message associated with this commit. It is always a good idea to give a short message that indicates the main point of this update.
git commit -m 'My first GitHub commit!'
[Note the first time you make a commit from a new computer you may be prompted to run git config
to set user.name
and user.email
. Follow the instructions in the prompt to do so. Then re-run the git commit
command above.]
Finally, push these changes to GitHub. This command will actually update the public GitHub repository based on the changes you committed above.
git push
Now return to your repository on GitHub (and reload the page if necessary). You should now see that it contains the updated README.md
file and the new new_file.txt
file. In addition, if you click on the link to “2 commits”, you can see that you can navigate through both the old version and the updated version of the repository.
The steps we just learned are sufficient for posting you scripts on GitHub (something that journals are increasingly likely to require when you publish your research) and for keeping track of the history of updates as you edit your code over time. However, Git and GitHub have much more functionality for managing coding projects, especially when you are working together with other researchers. If you would like to start exploring these additional functions, you could start with this introductory tutorial from GitHub.