I spend more time working with text than anything else. The multi-monitor, high-resolution graphics revolution hasn’t brought me graphics, just dozens of windows full of text. If you’re a software developer, chances are you are swimming in text too. Source code, documentation, configuration files, templates, logs–they are all searchable text. For special purpose tasks, like searching Java classes, I rely on my IDE, but for many things I run a command in a terminal.
In this post I’ll show you the most powerful command line tools I’ve found for searching and editing text.
Searching Text
regex: grep, ack, & Silver Searcher
The 1973 Unix search command, grep, is still a popular and powerful tool for searching text. It uses regular expressions (also called regex) to find lines that match a pattern. Once you get good with grep, that experience will carry over to many other tools and text editors. This simple command searches all the files in the current directory:
grep -r 'PATTERN' .
Grep is starting to show its age, however. Most implementations have not been updated with modern Perl-like regular expressions. Grep’s recursive file search is also difficult to control and often searches files, e.g. binary files, that aren’t interesting. Ack solves those two problems and adds more attractive output. It’s an even simpler command than grep:
ack 'PATTERN'
Silver Searcher is a very fast Ack clone. It can be more difficult to install, but the speed-up is surprisingly nice–I never thought Ack was slow until I tried Silver Searcher.
ag 'PATTERN'
All three commands support a few options I use all the time:
-
-i
ignore case -
-w
match whole words only -
-C4
show 4 lines on boths sides of the match -
-A4
show 4 lines after the match -
-B4
show 4 lines before the match -
-l
only show the filenames containing a match -
-v
invert the match–show lines not matching
perl & ruby
Occasionally, a regex search doesn’t work well because the search is too complex or requires more than simple matching. Awk and Perl are both useful for these searches. Here’s an example of Perl searching text for numbers greater than 100:
perl -n -e '/(\d{3,})/ && $1 > 100 && print' *
I’ll show Perl for editing files later; I find more uses for it than Awk. If you’d rather not touch Perl even for command line stuff, similar searches can be performed with Ruby:
ruby -n -e 'puts $_ if ($_ =~ /(\d{3,})/ && $1.to_i > 100)' *
Finding Changes In Text
Often when I’m searching text, I’m not looking at the text—I’m looking at the changes. That requires history. The original Unix diff command figures out changes by comparing an old file and new file. Unlike grep, I rarely use diff because I rarely have an old copy of a file—version control has replaced all the manual backups that I used to do. Learning diff is still useful though, because it introduced the unified output format that is still popular today:
diff -r -u OLD-DIRECTORY NEW-DIRECTORY
The diff command that I normally use is the git diff subcommand. Most often I use it to look at working copy changes:
git diff .
Both diff commands support options I use all the time:
-
-w
ignore whitespace changes -
-U4
show 4 lines of context on both sides of a change
Searching the output of diff sounds kind of bizarre, but it’s useful because diff can generate a lot of output. I often just search the diff output in my terminal (command-K, run diff, command-F to search), but piping diff into grep works too:
git diff -U10 . | grep -C10 'PATTERN'
Finding Files
Our file organization systems haven’t changed much, and the original Unix find command is still really useful. It will find all files with a given name or attribute in a directory:
find . -name '*.java' # files modified less than an hour ago find . -type f -mtime -1h # Java files modified less than an hour ago find . -name '*.java' -mtime -1h
It simply displays the list of matching files that we will see and is incredibly useful for controlling other tools.
Both Ack and Silver Searcher support the “-l
” option to only show the file names containing match, instead of the matching text. This also works well with the file type options of both commands.
ack -l 'PATTERN' ag -l 'PATTERN' # files named either *.html or *.htm ag --html -l # files named *.java and containing the enum keyword ag --java -l -w 'enum'
Once you have a list of files that you want to work on, you can use xargs to run another COMMAND on every file:
find . -name 'FILE' -print0 | xargs -0 COMMAND # display the files I changed today that contain the word foo find . -type f -mtime -8h -print0 | xargs -0 grep -l -w foo
The “-print0
” and “-0
” options ensure that files with special characters (mainly spaces) work properly. I’ve gotten in the habit of avoiding command line backticks and “$(find . -name 'FILE')
” because those don’t handle files with spaces correctly.
Ack and Silver Searcher also support safe handling of files with spaces:
ack -l --print0 'PATTERN' | xargs -0 COMMAND ag -l -0 'PATTERN' | xargs -0 COMMAND
Changing Text with Batch Edits
Software is usually well-factored and easy to change by hand—some people get by with Notepad after all. Eventually though we all run into a big tedious change. Naming convention changes in particular can be impossible for an IDE and mind-numbing for a human. Learning how to write editor macros can help; learning a batch editor can help even more. Sed and Awk are the classic Unix batch editors, but in my experience Perl has more powerful one-liners: its reputation as a swiss army chainsaw is well deserved.
Perl normally batch edits a file one line at a time. This regex search and replace can’t cross line boundaries or use multi-line context:
perl -p -i -e 's/PATTERN/REPLACEMENT/g' FILES ...
The option “-0777
” forces Perl to read entire files into memory so multi-line regex matches work. Perl’s regex implementation is actually more powerful than traditional regular expressions, so you can do things like recursively match text or perform computations on matches.
perl -0777 -p -i -e 's/PATTERN/REPLACEMENT/mg' FILES ...
Here’s an example that adds a correctly indented @Timed
annotation to any method that already has a @UnitOfWork
annotation, but only if the @Timed
annotation isn’t already there:
perl -0777 -p -i \ -e 's/(^(\s*)\@UnitOfWork.*\n)(?!\s*\@Timed)/\1\2\@Timed\n/mg' \ *.java
If you want to make that change to every Java file in a directory tree, Perl combines well with find:
find . -name '*.java' -print0 | xargs -0 \ perl -0777 -p -i \ -e 's/(^(\s*)\@UnitOfWork.*\n)(?!\s*\@Timed)/\1\2\@Timed\n/mg'
Making Changes Quickly and Safely
The Perl commands for editing multiple files, especially when combined with a find command, can make huge changes very quickly. When you make a mistake, that mistake can easily wipe out your entire project. To protect yourself against that, you absolutely must use version control.
If you aren’t already working under version control (why not?!) or aren’t using git, create a temporary git repo to easily undo bad changes:
git init git add . git commit -m originals
Here’s the approach I use when making batch edits. It’s fast partly because it’s so safe I don’t mind experimenting with the text trying to get the changes I want:
- Run a batch edit command
- Examine changes with “
git diff .
” - Undo bad changes with “
git checkout -- .
” - If more work is needed “
git add .
” and go back to step 1 - Otherwise when the changes are complete, commit or delete the temporary git repo
Hey Ken, your posting is very helpful, thank your very much.
To find a file I tend to use ‘du’ much more often than ‘find’. In detail:
du -a | ag ‘foo’
is fast and easy to type.
Amazing article even for an amateur . There’s a lot to take out from here and to learn further. Thanks