String Tricks that Bash Knows

I use the terminal for a lot of my work, so when I need to process output from other tools, I have a lot of options. I started out using shell scripts, but eventually moved on to scripting languages — first with Perl, then Ruby, with occasional Python. Lately I’ve been getting familiar with my shell (`bash`) again, finding new ways to stretch its usefulness as a tool on its own, without pulling in typical auxiliary support from `grep`, `sed`, or `awk`.

Bash is a big program, and the bulky `man` page can hide some gems. One area where it’s quite capable, which I only recently dug into, is **string handling**. Except it’s not really called that, so you might not have noticed it. As James Coglan rightfully puts it:

In this case, what I’m going to show is a couple of useful bits from the “Parameter Expansion” section of the bash man page. And though I otherwise enjoy the stress reduction of telling my shell to `man bash`, I actually learned these tricks by reading the [Advanced Bash-Scripting Guide](http://tldp.org/LDP/abs/html/), which usually calls this [“Parameter Substitution”](http://tldp.org/LDP/abs/html/parameter-substitution.html#PARAMSUBREF). It even elaborates my point about knowing what to look for:

> Bash supports a surprising number of string manipulation operations. Unfortunately, these tools lack a unified focus.

The ABSG is a great guide, with detailed examples (like [pattern matching](http://tldp.org/LDP/abs/html/parameter-substitution.html#PATTMATCHING)) and a handy table for the (reasonably-named) [“String Operations”](http://tldp.org/LDP/abs/html/refcards.html#AEN22664). Let’s look at a few of them.

## Remove Shortest or Longest Match from Beginning or End

My first example of interesting string tools in `bash` are `#` and `%`. They’re both for removing parts of a string, and they can be used singly or doubly:

* `#` removes the shortest match from the beginning
* `##` removes the longest match from the beginning
* `%` removes the shortest match from the end
* `%%` removes the longest match from the end

I’ll paraphrase some examples from [another rich bash guide](http://www.softpanorama.org/Scripting/Shellorama/Reference/string_operations_in_shell.shtml) to affect some useful path munging:

var=/Users/karlin/git/langton_loops/index.html.erb
echo ${var}         # => /Users/karlin/git/langton_loops/index.html.erb
echo ${var#*.}      # => html.erb
echo ${var##*.}     # => erb
echo ${var%/*.*}    # => /Users/karlin/git/langton_loops

file=${var##/*/}    # => index.html.erb
echo ${file%.*}     # => index.html
echo ${file%%.*}    # => index

Seeing how these are pitiful symbols for their associated behavior, the same guide points out a helpful mnemonic:

> The # key is on the left side of the $ key and operates from the left, while % is to right

I remember that by visualizing this shift-laden dance across the second row:

# ## $ %% %

## Replace First or All Substring Matches

Rather than just remove parts of the string, you can also tell `bash` to replace them with some slashes in your expansion:

var='1,2 3,4 5,6'
echo ${var/,/&}    # => 1&2 3,4 5,6
echo ${var//,/&}   # => 1&2 3&4 5&6

This has singly reduced my use of both `sed` and `perl` on the command-line, where those tools seemed to spend most of their time replacing bits of strings for me.

## But… why?

I recently wanted a quick way to get the IP of a Mac in a Ruby script. I wrote this:

puts %x{ifconfig en0 inet}[/\sinet ([^ ]*)/,1]

Yeah! Sure, that works. But it’s not really that far from this:

ip=$(ifconfig en0 inet); ip=${x##*inet };echo ${ip%% *}

In my opinion, knowing many ways of wrangling strings means I’ll be more likely to chose the right one depending on context.
 

Conversation
  • Mike Hall says:

    This is great. Shell scripting is a skill that I think most developers give short shrift but is incredibly valuable. I wasn’t even aware of these string manipulation operators and thanks for the links to resources as well. Fun article.

    • Karlin Fox says:

      Thanks Mike, I’m glad it was helpful!

      I totally agree that shell scripting skills are valuable. Even if you are getting shell-like work done in a favorite scripting language, you’d usually end up configuring environments and executing the process through your shell. Knowing the basics of string ops and for loops opens up a whole category of tasks to quick completion in the shell, without firing up an editor at all.

  • eric stewart says:

    Great blog, I have a question on spooling duplicate strings. Example of EDI file below I am interested in spooling out for analysis line 2 & 3 that are duplicates. For purposes of test lets call the file name test.txt. Could you please assist with the approach I should use..

    N1*12324*01212015
    REF*59*abcdefg
    REF*59*abcdefg
    XYZ*IL*3407

    REF*59

    • Eddie 7 says:

      Perl has an EDI library.
      Been using it for 10 years+

  • Comments are closed.