For developers who live on the command line, a shell prompt (bash or zsh) and an editor (vi or emacs) are the only IDE (integrated development environment) we need. For searching trees of source code, grep and maybe find will suit our needs.
grep will search whatever files you may have, but it wasn’t designed for source code, and so it doesn't always deliver optimal results. Although grep is ubiquitous, there are other useful ways to search your source code. I created one of them, ack, back in 2004, and another, ag was adapted from ack in 2012. Let’s take a look at what they offer:
What is ack?
First, ack doesn't require specifying filenames to search. It assumes that you want to start in the current directory and descend through the tree (grep's -r
switch), searching through all text files, while ignoring version control directories like .svn
and .git
. These are usually the most common behaviors a programmer wants, so ack makes them the default.
The first big difference you'll see with ack is that its output is more oriented to human-readability. Here’s grepping for print
in a source tree:
And here’s searching for print
with ack. The output is colorized and results from the same file are grouped together visually. It's designed to help programmers quickly and easily see the matches and tell where they came from.
ack knows that you probably have a lot of different kinds of source code in your tree, and that you often don't want to see results from a certain file. ack lets you specify files to include or exclude. If you want to search only Java files, then use the --java
switch. If you want to ignore HTML and PHP files, use --nohtml --nophp
.
A handy option for dealing with case-insensitive matches was taken from the vim editor: smartcase. With --smart-case
, ack will make the search case-insensitive, as if you'd specified -i
, if the term you're searching for is in all lowercase. This is an option you typically have in your .ackrc file, rather than specifying it on the command line.
Don’t worry about learning an entirely new set of options. Although ack adds many new features, it maintains the most common options that you’re used to in grep, such as -i
for case-insensitive, -w
for word searching, -C
for context and so on.
Along comes ag
Geoff Greer was a happy ack user, but he wanted something even faster. So, he created ag
, The Silver Searcher, with most of the features of ack, but rewritten in C. He used pthreads for parallelization, mmap for optimizing file I/O, and other speedy features, but at the expense of portability. (A Windows port was started but has not been kept up.)
Geoff made ag look exactly like ack, and at first ack and ag had almost identical feature sets, but over time ag added new features and ack moved to 2.0, and their feature sets diverged. Let’s look at some of the more interesting features unique to each tool.
ack lets you define your own filetypes
If you have source code files of a type that ack doesn't know about, you can specify it with the --type-set
switch. Say you have a lot of COBOL files, which is one of the few languages ack doesn’t support by default. To tell ack that .cob extensions are COBOL, just add this to your .ackrc file:
--type-set=cobol:ext:cob
You can also tell ack to base filetypes on an exact filename match, a regular expression match against the filename, or even a regex match against the first line of the file. Run ack --dump
to see the default filetype specifications and examples of how to use --type-set
and --type-add
.
ag doesn't allow specifying your own filetypes. If you want to add a new filetype, you have to modify the source and rebuild.
ag lets you search within compressed files
ag's --search-zip
flag treats compressed files as if they were normal text files. This has long been a feature request for ack, but there's no way to implement it without sacrificing portability.
ack offers custom output
ack allows you to output your matches in any format you want using the --output
switch. Say you were searching for #include files in your C code. You would simply run:
$ ack '#include'
src/util.c
1:#include <ctype.h>
2:#include <string.h>
3:#include <stdio.h>
...
But what if you just wanted to see the filenames that were included? You'd form a capture group with parentheses and output just that, like so:
$ ack '#include <(.+)>' --output='$1'
src/util.c
1:ctype.h
2:string.h
3:stdio.h
Eliminate the filenames with -h
and pipe the output to sort -u
and you can get a deduped list of all include files in the project
$ ack '#include <(.+)>' --output='$1' -h | sort -u
ctype.h
dirent.h
errno.h
fcntl.h
...
This feature is powerful because you can put any Perl expression inside the value for
--output
.
ag offers editor integration
ag has excellent support for integrating with various editors. ag's --ackmate
outputs ack's results in a format that the TextMate editor can understand, and --vimgrep
does the same for vim. ack doesn't support this, but can emulate it using the --output
option.
ack allows project-level .ackrc files
ack allows you to have project-specific .ackrc files. Say you have a project that uses COBOL files, but you don't want to have the custom --type-set
settings in your global /etc/ackrc or your local ~/.ackrc. You can put a .ackrc file at the root directory of your project and put the settings in there. If you put this .ackrc under version control, then everyone working on your project automatically gets the COBOL --type-set
s as well.
ag reuses your VCS's ignore files
Version control systems typically have some way to ignore files in the VCS. ag will check those files for you and ignore them in searches. If your .gitignore
says to ignore all files with .html extension, then ag will ignore them as well. ack lacks this feature.
grep, ack, or ag, which one should you use?
There's no wrong answer. Each tool has its own strengths, and you should use whichever best fits your needs. This quick cheat sheet can help inform your decision:
grep
- Available on all Unix-like systems by default, but not on Windows.
- Everyone knows it, and should be used for scripting purposes.
ack
- Very portable, runs on any system that runs Perl, including Windows
- Ignores backup files, binary files, your VCS’s work files, and other unwanteds
- True Perl regular expressions, not PCRE, because it's written in Perl
- Flexible output with the
--output
option - User-definable file types
- Project-level configuration
ag
- Very fast
- Uses your VCS's ignore files to know what to ignore
- Searches compressed files
- Better editor integration
- Not as portable; Windows version is out of date
That's just a quick comparison. Checking the --help
output of each of the tools will show more similarities and differences.
Remember that you're not restricted to using only one tool. Feel free to use whatever is most appropriate at any given time, ack or ag for searching source code, and grep to search other text files. Of course, there are plenty more search tools to explore as well, including the new pt
platinum searcher tool, which is Go based. For info on more alternatives, check out http://beyondgrep.com/more-tools/.
Note: ack 1.x had different, sometimes-confusing default searching behaviors. Also, versions of ack 2 between 2.00 and 2.11 had a serious security hole that was fixed in 2.12. Please make sure to use 2.12 or higher.
If you'd like to discuss this topic further, you can join the discussion on New Relic's Community Forums here!
The views expressed on this blog are those of the author and do not necessarily reflect the views of New Relic. Any solutions offered by the author are environment-specific and not part of the commercial solutions or support offered by New Relic. Please join us exclusively at the Explorers Hub (discuss.newrelic.com) for questions and support related to this blog post. This blog may contain links to content on third-party sites. By providing such links, New Relic does not adopt, guarantee, approve or endorse the information, views or products available on such sites.