31 Fork and clone
Use fork and clone to get a copy of someone else’s repo if there’s any chance you will want to propose a change to the owner, i.e. send a pull request. If you are waffling between “just clone” and “fork and clone”, go with “fork and clone”.
We want to achieve this:
Below we show a couple of methods for fork and clone and you should pick one:
- Use a combination of the browser, command line Git, and RStudio
- Via
usethis::create_from_github()
Vocabulary: OWNER/REPO
refers to what we call the source repo, owned by OWNER
, who is not you.
YOU/REPO
refers to your fork, i.e. your remote copy of the source repo, on GitHub.
This is the same vocabulary used elsewhere, such as the chapter on common remote configurations.
This is a good time to navigate to the GitHub repo of interest, i.e. the source repo OWNER/REPO
.
31.1 Fork and clone without usethis
I assume you’re already visiting the source repo in the browser. In the upper right hand corner, click Fork.
This creates a copy of REPO
in your GitHub account and takes you there in the browser.
Now we are looking at YOU/REPO
.
Clone YOU/REPO
, which is your copy of the repo, a.k.a. your fork, to your local machine.
Make sure to clone your repo, not the source repo.
Elsewhere, we describe multiple methods for cloning a remote repo.
Pick one:
- The cloning instructions in Existing project, GitHub first cover usethis and RStudio.
- The cloning instructions in Connect to GitHub show how to do this with command line Git.
Make a conscious decision about the local destination directory and HTTPS vs SSH URL.
31.1.1 Finish the fork and clone setup
If you stop at this point, you have what I regard as an incomplete setup, described elsewhere as “fork (salvageable)”.
This is sad, because there is no direct connection between your local copy of the repo and the source repo OWNER/REPO
.
There are two more recommended pieces of setup:
- Configure the source repo as the
upstream
remote - Configure your local
main
branch (or whatever the default is) to trackupstream/main
, notorigin/main
The nickname upstream
can technically be whatever you want.
There is a strong tradition of using upstream
in this context and, even though I have better ideas, I believe it is best to conform.
Every book, blog post, and Stack Overflow thread that you read will use upstream
here.
Save your psychic energy for other things.
These steps make it easier for you to stay current with developments in the source repo.
We talk more below about why you should never commit to the default branch, e.g. main
, when you’re working in a fork (see 31.4).
31.1.2 Configure the upstream
remote
The first step is to get the URL of the source repo OWNER/REPO
.
Navigate to the source repo on GitHub.
It is easy to get to from your fork, YOU/REPO
, via the “forked from” link in the upper left.
Use the big green “Code” button to get the URL for OWNER/REPO
on your clipboard.
Be intentional about whether you copy the HTTPS or SSH URL.
You can configure the upstream
remote with command line Git, usethis, or RStudio.
Here’s how to use command line Git in a shell:
usethis::use_git_remote()
allows you to configure a Git remote.
Execute this in R:
usethis::use_git_remote(
name = "upstream",
url = "https://github.com/OWNER/REPO.git"
)
Finally, you can do this in RStudio, although it feels a bit odd. Click on “New Branch” in the Git pane (“two purple boxes and a white square”).
This will reveal a button to “Add Remote”.
Click it.
Enter upstream
as the remote name and paste the URL for OWNER/REPO
that you got from GitHub.
Click “Add”.
Decline the opportunity to add a new branch by clicking “Cancel”.
Regardless of how you configured upstream
, do this in a shell:
31.1.3 Set upstream tracking branch for the default branch
This is optional but highly recommended for most fork and clone situations.
We’re going to set upstream/main
from the source repo as the upstream tracking branch of local main
.
(If your default branch has a different name, substitute accordingly.)
This is desirable so that a simple git pull
pulls from the source repo, not from your fork.
It also means a simple git push
will (attempt to) push to the source repo, which will almost always be rejected since you probably do not have permission.
This failure will alert you to the fact that you’re doing something questionable, while it’s still easy to back out.
First, fetch info for the upstream
remote.
This is especially important if you just configured upstream
for the first time.
The two commands below do the same thing; the first is just shorthand for the second. Do this with command line Git in a shell:
If you found this fork and clone workflow long and tedious, consider using usethis::create_from_github()
next time!
31.2 usethis::create_from_github("OWNER/REPO", fork = TRUE)
The usethis package has a convenience function, create_from_github()
, that can do “fork and clone” (as well as just clone).
The fork
argument controls whether the source repo is cloned or fork-and-cloned.
Note that create_from_github(fork = TRUE)
requires that you have configured a GitHub personal access token.
I assume you’re already visiting the source repo in the browser. Now click the big green button that says “<> Code”. Copy a clone URL to your clipboard. If you’re taking our default advice, copy the HTTPS URL. But if you’re opting for SSH, then make sure to copy the SSH URL.
You can execute this next command in any R session. If you use RStudio, then do this in the R console of any RStudio instance. In either case, after successful completion, you should find yourself in the new project that is the local repo connected to your fork.
usethis::create_from_github(
"https://github.com/OWNER/REPO",
destdir = "~/path/to/where/you/want/the/local/repo/",
fork = TRUE
)
The first argument is repo_spec
and it accepts the GitHub repo specification in various forms.
In particular, you can use the URL we just copied for the source repo.
The destdir
argument specifies the parent directory where you want the new folder (and local Git repo) to live.
If you don’t specify destdir
, usethis defaults to some very conspicuous place, like your desktop.
If you like to keep Git repos in a certain folder on your computer, you can personalize this default by setting the usethis.destdir
option in your .Rprofile
.
The fork
argument specifies whether to clone (fork = FALSE
) or fork and clone (fork = TRUE
).
You often don’t need to specify fork
and can just enjoy the default behaviour, which is governed by your permissions on the source repo.
By default, fork = FALSE
if you can push to the source repo and fork = TRUE
if you cannot.
Here is what that might look like (note that we’re accepting the default behaviour for many arguments):
usethis::create_from_github("https://github.com/OWNER/REPO")
#> ℹ Defaulting to 'https' Git protocol
#> ✔ Setting `fork = TRUE`
#> ✔ Creating '/some/path/to/local/REPO/'
#> ✔ Forking 'OWNER/REPO'
#> ✔ Cloning repo from 'https://github.com/YOU/REPO.git' into '/some/path/to/local/REPO'
#> ✔ Setting active project to '/some/path/to/local/REPO'
#> ℹ Default branch is 'main'
#> ✔ Adding 'upstream' remote: 'https://github.com/OWNER/REPO.git'
#> ✔ Pulling changes from 'upstream/main'.
#> ✔ Setting remote tracking branch for local 'main' branch to 'upstream/main'
#> ✔ Setting active project to '<no active project>'
For an RStudio user, create_from_github(fork = TRUE)
does all of this:
- Forks the source repo on GitHub.
- Clones your fork to a new local repo (and RStudio Project).
This configures your fork as the
origin
remote. - Configures the source repo as the
upstream
remote. - Sets the upstream tracking branch for
main
(or whatever the default branch is) toupstream/main
. - Opens a new RStudio instance in the new local repo (and RStudio Project).
31.3 Engage with the new repo
If you used usethis::create_from_github()
or did fork and clone via Existing project, GitHub first, you are probably in an RStudio Project for this new repo.
Regardless, get yourself into this project, whatever that means for you, using your usual method.
Explore the new repo in some suitable way. If it is a package, you could run the tests or check it. If it is a data analysis project, run a script or render an Rmd. Convince yourself that you have gotten the code.
You should now be in the perfect position to sync up with ongoing developments in the source repo and to propose new changes via a pull request from your fork.
You can use the commands below to review more of the nitty gritty Git details of your fork and clone setup:
- Command line Git in a shell:
git remote -v
-
git remote show origin
(orupstream
) git branch -vv
- In R:
usethis::git_remotes()
usethis::git_sitrep()
In the shell, git remote -v
should reveal that your remotes are configured like so:
origin https://github.com/YOU/REPO.git (fetch)
origin https://github.com/YOU/REPO.git (push)
upstream https://github.com/OWNER/REPO.git (fetch)
upstream https://github.com/OWNER/REPO.git (push)
Comparable info is available In R with usethis::git_remotes()
:
git_remotes()
#> $origin
#> [1] "https://github.com/YOU/REPO.git"
#>
#> $upstream
#> [1] "https://github.com/OWNER/repo.git"
In the shell, with the default branch checked out, git branch -vv
should reveal that upstream/main
is the upstream tracking branch:
All of this info about remotes and branches is also included in the rich information reported with usethis::git_sitrep()
.
31.4 Don’t mess with main
Here is some parting advice for how to work in a fork and clone and situation.
If you make any commits in your local repository, I strongly recommend that you work in a new branch, not main
(or whatever the default branch is called).
I strongly recommend that you do not make commits to main
of a repo you have forked.
If you commit to main
in a repo you don’t own, it creates a divergence between that branch’s history in the source repo and in your repo.
Nothing but pain will come from this.
(If you’ve already done this, we discuss how to fix the situation in Um, what if I did touch main
?.)
When you treat main
as read-only, it makes life much easier when you want to pull upstream work into your copy.
The OWNER
of REPO
will also be happier to receive your pull request from a non-main
branch.
For more detail, this Q&A on Stack Overflow is helpful: Why is it bad practice to commit to your fork’s master branch?.