Git is a widely used version control system. It can be used to store versions of any text files (such as code or reports) so that you can track changes over time and revert to previous versions if needed. This is very useful when working on your own project, but near-essential when working collaboratively on research software.
The design paradigm behind Git is that of multiple, self-contained repositories – in which all code within a repository is part of the same project. When you copy a repository you take everything. In contrast, when working with older, more centralised, version control repositories, you would only copy the data you needed (and always refer back to the central repository for version information). The Git paradigm is useful for distributed, open-source, development (it is, after all, designed for Linux), but could lead to unnecessary duplication of code across projects, which is why Git submodules was developed.
Git submodules enables you to use the contents of one git hub repository inside another, so that you can use code without repeating yourself.
Some use-cases for this include:
- You have developed (and maybe are continuing to develop) some code, “myCode” that you are using in “analysis A” and “analysis B” . You would like each analysis to be in a separate git repository so that you can reference them both individually – perhaps they both need different versions of “myCode”.
- You have created a Python module and an R library for your research. Both use a C library that you have written; keeping the C library in sync between both your Python and R repositories is getting hard and it would be better to have them call the C library in its own repository.
- Your code has become either: large, diverse, or complex; and could benefit from being split up, especially if users may only want to use a small part of your code. By splitting it into smaller projects with their own repositories, and having one main repository that access the others, it would be easier for people to use.
Submodules enable you to load a repository from another repository. This enables you to build a library of useful functions in a single repository, kept separate from your project repositories, which will simply load the library repository as required.
Here are some basic commands to get you started using submodules:
To add an existing Git repository to a project as a submodule:
git submodule add https://github.com/example/example_repo local_directory
This creates the local directory, populating it with code from the remote repository, and creating a .gitmodules file containing the remote mapping information. The .gitmodules file can be saved as normal in your git repository, retaining this link for you.
To activate a submodule after cloning a project:
git submodule init
git submodule update
An example of using submodules in practice can be found in this repository of an example workflow for the BioExcel CWL Example Simulation System project. This uses git submodules to load the required BioExcel Building Blocks library – avoiding having to duplicate the library while also easing access to this tool for researchers by not requiring them to install the libraries themselves.
Git submodules has many features, including allowing recursive submodules, tracking specific branches, and pushing changes made in the submodule back to the original repository. Information on how to use these submodule features are available in the git online documentation.