Article summary
I use the terminal for a lot of my work, so when I need to process output from other tools, I have a lot of options. I started out using shell scripts, but eventually moved on to scripting languages — first with Perl, then Ruby, with occasional Python. Lately I’ve been getting familiar with my shell (bash
) again, finding new ways to stretch its usefulness as a tool on its own, without pulling in typical auxiliary support from grep
, sed
, or awk
.
Bash is a big program, and the bulky man
page can hide some gems. One area where it’s quite capable, which I only recently dug into, is string handling. Except it’s not really called that, so you might not have noticed it. As James Coglan rightfully puts it:
The thing about “rtfm” is that bash stuff is ungooglable unless you know the jargon for the thing you’re trying to do.
— jcoglan.txt (@jcoglan) January 26, 2014
In this case, what I’m going to show is a couple of useful bits from the “Parameter Expansion” section of the bash man page. And though I otherwise enjoy the stress reduction of telling my shell to man bash
, I actually learned these tricks by reading the Advanced Bash-Scripting Guide, which usually calls this “Parameter Substitution”. It even elaborates my point about knowing what to look for:
Bash supports a surprising number of string manipulation operations. Unfortunately, these tools lack a unified focus.
The ABSG is a great guide, with detailed examples (like pattern matching) and a handy table for the (reasonably-named) “String Operations”. Let’s look at a few of them.
Remove Shortest or Longest Match from Beginning or End
My first example of interesting string tools in bash
are #
and %
. They’re both for removing parts of a string, and they can be used singly or doubly:
#
removes the shortest match from the beginning##
removes the longest match from the beginning%
removes the shortest match from the end%%
removes the longest match from the end
I’ll paraphrase some examples from another rich bash guide to affect some useful path munging:
var=/Users/karlin/git/langton_loops/index.html.erb
echo ${var} # => /Users/karlin/git/langton_loops/index.html.erb
echo ${var#*.} # => html.erb
echo ${var##*.} # => erb
echo ${var%/*.*} # => /Users/karlin/git/langton_loops
file=${var##/*/} # => index.html.erb
echo ${file%.*} # => index.html
echo ${file%%.*} # => index
Seeing how these are pitiful symbols for their associated behavior, the same guide points out a helpful mnemonic:
The # key is on the left side of the $ key and operates from the left, while % is to right
I remember that by visualizing this shift-laden dance across the second row:
# ## $ %% %
Replace First or All Substring Matches
Rather than just remove parts of the string, you can also tell bash
to replace them with some slashes in your expansion:
var='1,2 3,4 5,6'
echo ${var/,/&} # => 1&2 3,4 5,6
echo ${var//,/&} # => 1&2 3&4 5&6
This has singly reduced my use of both sed
and perl
on the command-line, where those tools seemed to spend most of their time replacing bits of strings for me.
But… why?
I recently wanted a quick way to get the IP of a Mac in a Ruby script. I wrote this:
puts %x{ifconfig en0 inet}[/\sinet ([^ ]*)/,1]
Yeah! Sure, that works. But it’s not really that far from this:
ip=$(ifconfig en0 inet); ip=${x##*inet };echo ${ip%% *}
In my opinion, knowing many ways of wrangling strings means I’ll be more likely to chose the right one depending on context.
This is great. Shell scripting is a skill that I think most developers give short shrift but is incredibly valuable. I wasn’t even aware of these string manipulation operators and thanks for the links to resources as well. Fun article.
Thanks Mike, I’m glad it was helpful!
I totally agree that shell scripting skills are valuable. Even if you are getting shell-like work done in a favorite scripting language, you’d usually end up configuring environments and executing the process through your shell. Knowing the basics of string ops and for loops opens up a whole category of tasks to quick completion in the shell, without firing up an editor at all.
Great blog, I have a question on spooling duplicate strings. Example of EDI file below I am interested in spooling out for analysis line 2 & 3 that are duplicates. For purposes of test lets call the file name test.txt. Could you please assist with the approach I should use..
N1*12324*01212015
REF*59*abcdefg
REF*59*abcdefg
XYZ*IL*3407
REF*59
Perl has an EDI library.
Been using it for 10 years+