pchittum

Quick Python Script for Mass Git Changes

The Problem

A client of mine recently had a major release of their product which saw changes to their service signup and onboarding URLs. This is a ten-year-old company. As such, they have over 30 actively maintained repos on their GitHub account. Most of them had the now incorrect signup URL in their README.md files.

I'm sure there must be a mass clone/push tool out there already for GitHub, but it also seemed like an ideal time to write a little Python script that would allow me to slurp down a whole bunch of git repos and subsequently firehose them back up to GitHub (or ostensibly any other git server).

I figured this would be the process:

  1. Using a txt file list of repo URIs, iterate over them and clone them all.
  2. Create a working branch and make the updates.
  3. Iterate over all the cloned repos and run add, commit and push.

Step 2 was going to be pretty quick as it was just a find/replace for any instances of the signup URL across all of the cloned repos. I also wanted to double-check the work before pushing. For this reason, I decided I would make this a manual step.

The Approach

So my work was to perform the mass clone, and the mass push. Had I had to execute all of these 30 times, well, that's just the kind of tedium I don't need.

After some checking and reading, and a little help from Gemini, I'd decided that I'd be able to do what I needed with the subprocess, os and pathlib libraries. pathlib would be handy to work with the file system. os would allow me to navigate around the file system. subprocess would take care of invoking the git commands from my script.

The Slurp of Clones

pexels-bertellifotografia-12497806.jpg

I decided on a file system as such:

root
  |
  repos

The repos directory would be where I would stuff all the cloned repos. So my script would simply read my txt file, loop over the git URLs, and invoke git clone for each one.

So all in all, a very simple script.

The fun thing for me was doing a list comprehension for the first time.

with open('../repourls.txt', 'r') as file: 
    url_list = file.readlines()

url_list = [line.strip() for line in url_list]

The first bit of the code snippet above opens my txt file with the repo URLs. But what if there are leading or trailing spaces in any of the lines? Well, we can remove that white space with the strip function. But what a nice bit of syntactic sugar to take the list, go over each line, call strip on that line, and finally assign all the lines back to the initial url_list variable.

The Firehose of Push

pexels-greta-hoffman-7728966.jpg

Once I'd done my fixes to all the relevant README.md files, I had a bunch of directories which had changes and others without.

root
  |
  repos
    |
    repo-1
    repo-2
    repo-3    

In fact some had changes to a README.ja.md file as this client operates in Japan. So for my mass push I needed to:

  • Scan through each directory.
  • If changes were detected, call add on the changed README.md files.
  • Make the commit.
  • Pause to check for anything that didn't look right and have the user confirm to push.
  • If confirmed, push.
  • Move onto the next directory

The notable thing about this script was how much I relied on git. For instance, I could have checked if there was a README.ja.md file and only then attempted to call git add. But what happens if you call add on a file with no changes? Well...nothing. So, since this was really only meant to be a quick solution and not production-ready code, I just called add and didn't worry about it.

But I didn't want to accidentally throw extra files into the commit, which is why I was specifically calling add on the files themsealves and not a blanket git add ..

The End Result

This little project was quite fun to get working. And saved me some time in the long run as I've already used it a couple of times now to make changes (I put the wrong URL in the first time :no_mouth:). Although I had another mass change to roll out that I'll be using this with.

If you're wondering how I produced the list of 30ish git URLs, that was in another project where I'd written some python to access the GitHub API. I'll try to write that up in another blog. That project was about producing a GitHub org audit.

Thoughts? Leave a comment