How to effectively search the Git log

Searching the Git log can be approached quite differently whether you're willing to find a specific bit of code that was committed or trying to find a change in a commit message. With this in mind you can come up with two personas:

  • Project Manager
  • Developer

They will both have very valid reasons to search the Git log, but likely with different purposes.

For instance, the developer might want to quickly search a specific bit of code instead of having to resort to a) searching the codebase and b) using git blame to understand better by whom and when the change was introduced. On the other hand, the Project Manager might simply need to know high-level changes like "When was feature X introduced?".

Let's take a few examples.

The Project Manager way

Alright. Say we wish to know when support for Zend assertions was added to our project. Simple. We can type:

$ git log -S "Zend"
commit a97d68e75e1eb15fb64d9b192cc6c73cafa7dd98  
Author: Aurelien Navarre <[email protected]>  
Date:   Tue Sep 6 18:27:17 2016 +0200

    Manage Zend assertions in dev and prod mode

As a Project Manager I don't really need to know more, do I? With this I can reference a specific Git commit, find the author, the date, and get confirmation in the commit message that this is indeed what I'm looking for.

Problem with this is not only it's case sensitive, but it's searching in code lines as well. E.g.

$ git log -S "zend_extension"
commit dfa75be74a5bd550fd19e9d09277c7ad62f76071  
Author: Aurelien Navarre <[email protected]>  
Date:   Fri Jun 3 19:34:17 2016 +0200

    Add Xdebug

Our commit message says 'Add Xdebug' but the code in this specific commit contains 'zend_extension'. Nope, not what we're looking for.

We then need to proceed differently and improve our methodology. Let's try using git log with the built-in --grep argument.

  • Passing -i makes our search case-insensitive.
  • Passing --oneline is really only for cosmetic purposes so you can safely ignore.
$ for i in zend zend_assertions; do git log -i --oneline --grep "$i"; done
a97d68e Manage Zend assertions in dev and prod mode  

See what we just did here? Searching 'zend' and 'zend_assertions' no longer return two different results, but only one, from a commit message containing the string 'zend', as we were expecting.

The developer way

Now, let's take a different approach and find which commit introduced a specific change. Typically something a developer would want to know without getting the commit messages returned instead.

I tried the git grep command, but there's one thing it doesn't return by default: the Git hash. And really, what's the point of searching in a git-managed codebase without being able to find both the searched string the corresponding commit ID?

$ git grep -i "zend"
orchestration/playbooks/drupal-dev.yml:- name: Check the Zend assertions status  
orchestration/playbooks/drupal-dev.yml:  shell: grep "^zend.assertions = -1" "{{ php_ini }}" || echo "on"  
orchestration/playbooks/drupal-dev.yml:  register: zend_assertions_enable  
orchestration/playbooks/drupal-dev.yml:  changed_when: zend_assertions_enable.stdout != "on"  
orchestration/playbooks/drupal-dev.yml:- name: Turn on Zend assertions  
(snipped)

So, even if git grep comes with a good fraction of the goodness that we've been used to with GNU grep, it's not necessarily what we want here.

Making it smarter with git rev-list returns the information we need, but it's a mess.

$ git grep -i "zend" $(git rev-list --all)
b0a6d1f6de67ff05fece6cc5fed8da39177107d7:orchestration/playbooks/drupal-dev.yml:- name: Check the Zend assertions status  
b0a6d1f6de67ff05fece6cc5fed8da39177107d7:orchestration/playbooks/drupal-dev.yml:  shell: grep "^zend.assertions = -1" "{{ php_ini }}" || echo "on"  
b0a6d1f6de67ff05fece6cc5fed8da39177107d7:orchestration/playbooks/drupal-dev.yml:  register: zend_assertions_enable  
b0a6d1f6de67ff05fece6cc5fed8da39177107d7:orchestration/playbooks/drupal-dev.yml:  changed_when: zend_assertions_enable.stdout != "on"  
b0a6d1f6de67ff05fece6cc5fed8da39177107d7:orchestration/playbooks/drupal-dev.yml:- name: Turn on Zend assertions  
(snipped)
a97d68e75e1eb15fb64d9b192cc6c73cafa7dd98:orchestration/playbooks/drupal-dev.yml:- name: Check the Zend assertions status  
(snipped)

Why? Because it'll return the searched string for each and every single commit, causing duplication of results and lack of readability. E.g. the string "Turn on Zend assertions" is returned 9 times here.

$ git grep -i "zend" $(git rev-list --all) | grep -c "Turn on Zend assertions"
9  

Because I still can't find exactly what I need, I'm using the below one-liner instead.

$ for i in `git log --pretty=format:'%h' --date=short -100` ; do echo $i ; git show $i | egrep -v '(diff --git|index |\+\+\+|---|@@|Author: |Date: |commit )' | grep -i --color=auto "zend"; done
aa81ee3  
95a705e  
0bcd55c  
a97d68e  
    Manage Zend assertions in dev and prod mode
+- name: Check the Zend assertions status
+  shell: grep "^zend.assertions = -1" "{{ php_ini }}" || echo "on"
+  register: zend_assertions_enable
+  changed_when: zend_assertions_enable.stdout != "on"
+- name: Turn on Zend assertions
+    regexp: '^(.*)zend.assertions = -1(.*)$'
+    replace: '\1zend.assertions = 1\2'
+  when: zend_assertions_enable.stdout != "on"
+- name: Check the Zend assertions status
+  shell: grep "^zend.assertions = 1" "{{ php_ini }}" || echo "off"
+  register: zend_assertions_disable
+  changed_when: zend_assertions_disable.stdout != "off"
+- name: Turn off Zend assertions
+    regexp: '^(.*)zend.assertions = 1(.*)$'
+    replace: '\1zend.assertions = -1\2'
+  when: zend_assertions_disable.stdout != "off"
1a35536  
(snipped)
36c87a2  
-      zend_extension = /usr/lib/php5/{{ php_extensions }}/xdebug.so
+      zend_extension = {{ php_extensions }}/xdebug.so
2f2cbda  
(snipped)

It simply loops on an arbitrary number of Git commits and returns the searched string (only once) in the added/removed/modified lines when there's a match. The only remaining bit is for me to git show <hash> when I think I have found the relevant commit hash. Sure it's an ugly and overly complex one-liner but it gets the job done. Until someone points me to a better solution that I'm pretty sure exists.

Aurelien Navarre

Read more posts by this author.

Lyon, France