The Problem
A client of mine recently had a major release of their product which saw changes to their service signup and onboarding URLs. This is a ten-year-old company. As such, they have over 30 actively maintained repos on their GitHub account. Most of them had the now incorrect signup URL in their README.md
files.
I'm sure there must be a mass clone/push tool out there already for GitHub, but it also seemed like an ideal time to write a little Python script that would allow me to slurp down a whole bunch of git repos and subsequently firehose them back up to GitHub (or ostensibly any other git server).
I figured this would be the process:
- Using a txt file list of repo URIs, iterate over them and
clone
them all. - Create a working branch and make the updates.
- Iterate over all the cloned repos and run
add
,commit
andpush
.
Step 2 was going to be pretty quick as it was just a find/replace for any instances of the signup URL across all of the cloned repos. I also wanted to double-check the work before pushing. For this reason, I decided I would make this a manual step.
The Approach
So my work was to perform the mass clone, and the mass push. Had I had to execute all of these 30 times, well, that's just the kind of tedium I don't need.
After some checking and reading, and a little help from Gemini, I'd decided that I'd be able to do what I needed with the subprocess
, os
and pathlib
libraries. pathlib
would be handy to work with the file system. os
would allow me to navigate around the file system. subprocess
would take care of invoking the git
commands from my script.
The Slurp of Clones
I decided on a file system as such:
root
|
repos
The repos
directory would be where I would stuff all the cloned repos. So my script would simply read my txt file, loop over the git URLs, and invoke git clone
for each one.
So all in all, a very simple script.
The fun thing for me was doing a list comprehension for the first time.
with open('../repourls.txt', 'r') as file: url_list = file.readlines() url_list = [line.strip() for line in url_list]
The first bit of the code snippet above opens my txt file with the repo URLs. But what if there are leading or trailing spaces in any of the lines? Well, we can remove that white space with the strip
function. But what a nice bit of syntactic sugar to take the list, go over each line, call strip
on that line, and finally assign all the lines back to the initial url_list
variable.
The Firehose of Push
Once I'd done my fixes to all the relevant README.md
files, I had a bunch of directories which had changes and others without.
root
|
repos
|
repo-1
repo-2
repo-3
In fact some had changes to a README.ja.md
file as this client operates in Japan. So for my mass push I needed to:
- Scan through each directory.
- If changes were detected, call
add
on the changedREADME.md
files. - Make the
commit
. - Pause to check for anything that didn't look right and have the user confirm to
push
. - If confirmed, push.
- Move onto the next directory
The notable thing about this script was how much I relied on git. For instance, I could have checked if there was a README.ja.md
file and only then attempted to call git add
. But what happens if you call add
on a file with no changes? Well...nothing. So, since this was really only meant to be a quick solution and not production-ready code, I just called add
and didn't worry about it.
But I didn't want to accidentally throw extra files into the commit, which is why I was specifically calling add
on the files themsealves and not a blanket git add .
.
The End Result
This little project was quite fun to get working. And saved me some time in the long run as I've already used it a couple of times now to make changes (I put the wrong URL in the first time :no_mouth:). Although I had another mass change to roll out that I'll be using this with.
If you're wondering how I produced the list of 30ish git URLs, that was in another project where I'd written some python to access the GitHub API. I'll try to write that up in another blog. That project was about producing a GitHub org audit.