Background

As you begin to develop your own scripts (in Bash, R, Python, Perl, etc.) for performing bioinformatic analyses, three challenges will quickly emerge: 1) keeping track of different versions of code as you edit and add to your scripts, 2) publishing and sharing your code with the research community, and 3) collaborating with other researchers in such a way that multiple individuals can be modifying parts of the code at the same time. GitHub offers a means to address all three of these challenges. GitHub is based on software called Git, which performs “version control” by tracking the history of changes in a file. GitHub is a centralized location to store code in publicly available repositories (somtimes called ‘repos’ for short). These repositories can then be “cloned” to local machines so that many users can run your code. You can also modify your files and re-upload them to the GitHub repository, which will store multiple versions and capture the timeline of editing. Although we will not address collaborative development of code in this exercise, GitHub also offers tools to manage and merge alternative versions of code that are being modified in parallel by different developers.

Objectives

The goal of this exercise will be to introduce you to the basic functionality of GitHub. If you do not already have one, you will create a GitHub account. You will then create a new public repository on GitHub and clone it to your local machine, where you will edit it and add content. Finally, you will push those modifications back to the public GitHub repository so that your changes are available to the world.

Software and Dependencies

Protocol

1. Create a GitHub account

If you do not already have a GitHub account, use a web browser and navigate to the GitHub website. Follow the instructions to create an account. The core functionality of a GitHub account is free, so do not choose any options that require paying money. You can skip over the questions about how you will use GitHub. Just sign up for the free version.


2. Create a new GitHub repository

From your GitHub homepage, click the “Create Repository” button. You can also navigate to your GitHub repository page by choosing the “Your repositories” option from the pulldown menu from your user icon.



On the resulting screen…

  • Give your repository a name such as “todos_santos_test”.
  • Check the option to “Add a README file”.
  • Finally, click the “Create repository” button.



3. Clone your newly created repository

First, copy the address for cloning your newly created repository from the “Code” dropdown menu (HTTPS option).

Open a terminal session. If you have git command line tools installed on your machine, you can perform this exercise locally. If not, you can ssh to the linux server for the workshop.

Now, change directories (cd) to a location where you would like to download the GitHub repository that you just made.

Then enter the following command to download your repository, where GITHUB_REPO is the address you copied from above.

git clone GITHUB_REPO

Once, this action completes, you should see that it has created a new directory called todos_santos_test, which contains a single file called README.md (technically, there are also some hidden files in a directory called .git, which Git will use to track changes within this directory).


4. Edit and add files to the repository

Open the README.md file in the text editor of your choice. This is a simple text file written in the language Markdown. This language has a relatively user-friendly and readable syntax that can be converted to HTML for displaying to users on the GitHub website. You would typically use this document to provide information and instructions to users of your code. For our purposes today, just add any text you want (e.g., your name, the date, whatever is on your mind, etc.). Do this with a text editor and make sure to save the file.


In addition to editing the README.md, let’s add a new file to this folder. This can be also done with the echo command (or just by copying any file of your choosing into the directory).

cd todos_santos_test
echo "any text of your choosing" > new_file.txt


Before pushing your changes to the GitHub repository, you will need to do a couple things to configure the current machine for interacting with your GitHub account. Enter the following commands:

git config --global user.email YOUR_EMAIL_ADDRESS
git config --global user.name YOUR_GITHUB_USERNAME

You will also need to generate a personal access token by following these instructions. For step 8 in these instructions, you should check the repo box. Save the personal access token that is generated. You will use it like a password.


5. Push your edits to GitHub

Now the repository on your local machine is different than the one you cloned from GitHub. If you want to update GitHub to reflect your changes, there is a simple series of commands to follow.

First, let’s confirm that Git is properly tracking our edits with the following command.

git status

You should see that your README.md files is flagged as being modified and new_file.txt is flagged as being an untracked (i.e., new) file.


Now enter the following command to add your changes such that they are included next time you “commit” your changes. The -A flag applies this command to all files (you could also add individual files of your choosing).

git add -A


Now commit your changes with the git commit command. The -m flag allows you to add a message associated with this commit. It is always a good idea to give a short message that indicates the main point of this update.

git commit -m 'My first GitHub commit!'


Finally, push these changes to GitHub. This command will actually update the public GitHub repository based on the changes you committed above. Use your GitHub account name for the Username, but use your personal access token generated above (not your GitHub password) for the Password. Also, there are ways to cache your login credentials so that you do not have to enter your username and personal access token each time.

git push


Now return to your repository on GitHub (and reload the page if necessary). You should now see that it contains the updated README.md file and the new new_file.txt file. In addition, if you click on the link to “2 commits”, you can see that you can navigate through both the old version and the updated version of the repository.




The steps we just learned are sufficient for posting you scripts on GitHub (something that journals are increasingly likely to require when you publish your research) and for keeping track of the history of updates as you edit your code over time. However, Git and GitHub have much more functionality for managing coding projects, especially when you are working together with other researchers. If you would like to start exploring these additional functions, you could start with this introductory tutorial from GitHub.