Creating and Tracking with Git

Overview

Teaching: 20 min
Exercises: 20 min
Questions
  • Where does Git store information?

  • How do I record changes in Git?

  • How do I record notes about what changes I made and why?

Objectives
  • Create a local Git repository.

  • Go through the modify-add-commit cycle for one or more files

Let’s use Git now that it’s been configured. We’ll do our work in the TheDataShop folder. So, let’s create the directory - or folder - for our work and then move into that directory. Check where you are using the command pwd. To change your working directory, use the cd command.

pwd

#go to Desktop
cd Desktop

#make folder
mkdir TheDataShop

#go into that directory
cd TheDataShop

#check TheDataShop contents (list files)
$ls

There is nothing, as expected. To show hidden files, add the flag -a.

$ ls -a 
./  ../ 

At this point we have the expected output. Let’s add a new file in this folder - let’s add the agenda and create a notes file. Click here to download the Agenda and then add it to your directory. Now, use the code below to create the file, titled “notes.txt”, which will contain the text “Day 1 notes.”

$ echo "Day 1 notes" > notes.txt

#Now, listing contents, we see the added file.
$ ls
notes.txt     TDS-Agenda.docx

Let’s read the contents of the file with the cat command.

$ cat notes.txt

We can ask now if our new file, notes.txt is being tracked. We can do this with git status command.

$ git status
fatal: Not a git repository (or any of the parent directories): .git

This message means that the TheDataShop folder is not under the control of Git, and none of the documents within this folder are being tracked.

To place a folder under Git control, we need to initialize our TheDataShop folder to make it a repository—a place where Git can store versions of our files:

#check that you are in TheDataShop
$ pwd

#check the contents
$ls

#initialize TheDataShop directory with Git
$ git init
Initialized empty Git repository in .../TheDataShop/.git/

#check contents to see the added directory
$ ls -a
./  notes.txt
../  TDS-Agenda.docx
.git/

The folder (in this case, TheDataShop) that contains .git sub-directory is called repository. Git uses this (.git) special sub-directory to store all the information about the project, including all files and sub-directories located within the project’s directory. If we ever delete the .git sub-directory, we will lose the project’s history.

We can check that everything is set up correctly by asking Git to tell us the status of our project. Let’s try the git status command now.

$ git status
On branch master

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)

        notes.txt   TDS-Agenda.docx

nothing added to commit but untracked files present (use "git add" to track)

You can see that initializing a directory makes it visible to Git.

Activity 3A: Places to Create Git Repositories

Along with tracking information for your workshop (the project we have already created), say one would also like to track > information about each lesson. Despite a collaborator’s concerns, you create a Lesson1 project inside your TheDataShop directory with the following sequence of commands:

$ cd ..             # goes up a directory
$ cd TheDataShop    # go into TheDataShop directory, which is already a Git repository
$ ls -a          # ensure the .git sub-directory is still present in the Thesis directory
$ mkdir Lesson1    # make a sub-directory TheDataShop/Lesson1
$ cd Lesson1       # go into Lesson1 sub-directory
$ git init       # make the Lesson1 sub-directory a Git repository
$ ls -a          # ensure the .git sub-directory is present indicating we have created a new Git repository

Is the git init command, run inside the Lesson1 sub-directory, required for tracking files stored in the Ch1 sub-directory?

Solution

No. You don’t need to make the Lesson1 sub-directory into a Git repository because the TheDataShop repository will track all files, sub-directories, and sub-directory files under the TheDataShop directory. Thus, in order to track all information about Lesson1, you only needed to add the Lesson1 sub-directory to the TheDataShop directory.

Additionally, Git repositories can interfere with each other if they are “nested” in the directory of another: The outer repository will try to version-control the inner repository. Therefore, it’s best to create each new Git repository in a separate directory. To be sure that there is no conflicting repository in the directory, check the output of git status. If it looks like the following, you are good to go to create a new repository as shown above:

$ git status
fatal: Not a git repository (or any of the parent directories): .git

Correcting git init Mistakes

Since a nested repository is redundant and may cause confusion down the road, you would like to remove the nested repository. How can you undo your last git init in the Lesson1 sub-directory?

Solution – USE WITH CAUTION!

To recover from this little mistake, just remove the .git folder in the Lesson1 sub-directory by running the following command from inside the ‘Lesson1’ directory:

$ rm -rf Lesson1/.git

But be careful! Running this command in the wrong directory, will remove the entire git-history of a project you might want to keep. Therefore, always check your current directory using the command pwd.

If you are still in Lesson1, navigate back to TheDataShop using the cd command. Now Git tells us what files are in the directory and what is their status. In our case, Git says that there is a notes.txt file (and the agenda if you added it) and it is not tracked. The “untracked files” message means that there’s a file in the directory that Git isn’t keeping track of. Git also tells us that we need to use git add command to start tracking this file:

$ git add notes.txt

$ git status
On branch master

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

        new file:   notes.txt

The current version of notes.txt is now ready (or staged) to be recorded by Git. If we check the status of our project again (git status), Git tells us that it’s noticed the new file. To record the current version of notes.txt, git commit command is used.

#commit changes
$ git commit -m "Start notes for data workshop"

[master (root-commit) 76604e5] first note
 1 file changed, 1 insertion(+)
 create mode 100644 notes.txt

Git insists that we add files to the set we want to commit before actually committing anything. This allows us to commit our changes in stages and capture changes in logical portions rather than only large batches. For example, suppose we’re adding a few citations to relevant research to our thesis. We might want to commit those additions, and the corresponding bibliography entries, but not commit some of our work drafting the conclusion (which we haven’t finished yet).

To allow for this, Git has a special staging area where it keeps track of things that have been added to the current change-set but not yet committed. When we run git commit, Git takes everything we have told it to save by using git add and stores a copy permanently inside the special .git directory. This permanent copy is called a commit (or revision) and its short identifier is 76604e5 (Your commit will have another identifier.)

We use the -m flag (for “message”) to record a short, descriptive, and specific comment that will help us remember later on what we did and why. If we just run git commit without the -m option, Git will launch BBEdit (or whatever other text editor you have configured as core.editor) so that we can write a longer message.

Good commit messages start with a brief (<50 characters) summary of changes made in the commit. If you want to go into more detail, add a blank line between the summary line and your additional notes.

Now run git status again.

#check status
$ git status
On branch master
nothing to commit, working tree clean

Everything is up to date!

You can check the history of your commits with git log. This command lists all commits made to a repository in reverse chronological order. The listing for each commit includes the commit’s full identifier (which starts with the same characters as the short identifier printed by the git commit command earlier), the commit’s author, when it was created, and the log message Git was given when the commit was created:

$ git log

commit 76604e57022d52b2ebf3c92d0f5358354850fa37 (HEAD -> master)
Author: pow123 <peace@uta.edu>
Date:   Sun Feb 24 15:15:00 2018 -0600

    Start notes for data workshop
    
#or for a faster view
$ git log --oneline

In summary, here are the steps that must be completed to track changes in your documents with Git.

Now, run git log. The output of git log tells you the history of your changes. Your commit messages are very important, in case you want to restore an old version of the document, they will help you to pick out the version you want.

Your versions (or commits) have unique identifiers. In addition, the most recent version can be identified by HEAD. It is also good practice to always review changes before saving them. We do this using git diff. This shows us the differences between the current state of the file and the most recently saved version:

#you can specify only first few characters of the commit identifier.
$ git diff 76604e5 HEAD notes.txt

Now, let’s see how to turn an existing directory into a git repository. You might want to track files for some of your existing projects. Let’s create a new directory: git_github on your Desktop. How will you place this directory under Git control?

#Go to Desktop
$ cd ..

#make the directory
mkdir git_github

#change into the Directory
$ cd ~/Desktop/git_github

#Add a file
$ echo "Testing Testing" > testfile.txt

#initialize
$ git init

#check status
$ git status

#prepare files for tracking
$ git add .

#commit changes 
$ git commit -m "added git_github directory"

#check commit history
$ git log

Now every file in this directory is being tracked. Notice that we added multiple folders and files in the same commit. If you want to know what files were included in this commit:

# print the list of files that are part of a given commit
$ git show --name-only 8af5f83

As you continue working throughout this workshop, you will be adding new directories and files to it. Let’s try it.

Activity 3C: Tracking File Changes

Make a file called git_steps.txt inside git_github. Record git commands you must use to start tracking your changes. Save this file and commit git_github directory to Git.

NOTE:git init should only be run one time in the root directory of your project.

Activity 3D: Committing Multiple Files

The staging area can hold changes from any number of files that you want to commit as a single snapshot.

  1. Add some text to git_steps.txt.
  2. Create a new file github_steps.txt and add text there.
  3. Add changes from both files to the staging area, and commit those changes.

Solution

First we add to git_steps.txt then we create github_steps.txt:

$ open -t git_steps.txt
$ cat git_steps.txt
Steps to using Git
$ echo "Steps to using GitHub" > github_steps.txt
$ cat github_steps.txt
Steps to using GitHub

Now you can add both files to the staging area. We can do that in one line:

$ git add git_steps.txt github_steps.txt

Or with multiple commands:

$ git add git_steps.txt
$ git add github_steps.txt

Now the files are ready to commit. You can check that using git status. If you are ready to commit use:

$ git commit -m "Added steps for Git and GitHub"
[master e6a7d94] Added steps for Git and GitHub
 2 files changed, 2 insertions(+)
 create mode 100644 git_steps.txt
 create mode 100644 github_steps.txt

Optional Activity: Modifying Files

  • Write a three-line biography for yourself in a file called about_me.txt, commit your changes
  • Modify one line, add a fourth line
  • Display the differences between its updated state and its original state.

Solution

Create your biography file about_me.txt using BBEdit or another text editor.

$ git add about_me.txt
$ git commit -m'Adding biography file'

Modify the file as described (modify one line, add a fourth line). To display the differences between its updated state and its original state, use git diff:

$ git diff about_me.txt

Optional: Author and Committer

For each of the commits you have done, Git stored your name twice. You are named as the author and as the committer. You can observe that by telling Git to show you more information about your last commits:

$ git log --format=full

When committing you can name someone else as the author:

$ git commit --author="Anne Example <anne@example.net>"

Create a new repository and create two commits: one without the --author option and one by naming a colleague of yours as the author. Run git log and git log --format=full. Think about ways how that can allow you to collaborate with your colleagues.

Solution

$ git add about_me.txt
$ git commit -m "Update my bio." --author="Anne Example <anne@example.net>"
[master 963e793] Update my bio.
Author: Anne Example <anne@example.net>
1 file changed, 2 insertions(+), 2 deletions(-)

$ git log --format=full
commit 963e7931f16d91f1559197cb91d819fb87ca06e1
Author: Author: Anne Example <anne@example.net>
Commit: pow123 <peace@uta.edu>

Update my bio.

commit aaa3271e5e26f75f11892718e83a3e2743fab8ea
Author: pow123 <peace@uta.edu>
Commit: pow123 <peace@uta.edu>

My initial bio.

Key Points

  • git init initializes a repository.

  • Git stores all of it’s repository data in the .git directory.