Free Republic
Browse · Search
General/Chat
Topics · Post Article

Skip to comments.

Pattern Matching In Bash
Linux Journal ^ | 15 April 2019 | Mitch Frazier

Posted on 04/22/2019 4:44:05 AM PDT by ShadowAce

Wildcards have been around forever. Some even claim they appear in the hieroglyphics of the ancient Egyptians. Wildcards allow you to specify succinctly a pattern that matches a set of filenames (for example, *.pdf to get a list of all the PDF files). Wildcards are also often referred to as glob patterns (or when using them, as "globbing"). But glob patterns have uses beyond just generating a list of useful filenames. The bash man page refers to glob patterns simply as "Pattern Matching".

 

First, let's do a quick review of bash's glob patterns. In addition to the simple wildcard characters that are fairly well known, bash also has extended globbing, which adds additional features. These extended features are enabled via the extglob option.

Pattern Description
* Match zero or more characters
? Match any single character
[...] Match any of the characters in a set
?(patterns) Match zero or one occurrences of the patterns (extglob)
*(patterns) Match zero or more occurrences of the patterns (extglob)
+(patterns) Match one or more occurrences of the patterns (extglob)
@(patterns) Match one occurrence of the patterns (extglob)
!(patterns) Match anything that doesn't match one of the patterns (extglob)

 

For example:

$ ls
a.jpg  b.gif  c.png  d.pdf ee.pdf

$ ls *.jpg a.jpg

$ ls ?.pdf d.pdf

$ ls [ab]* a.jpg b.gif

$ shopt -s extglob # turn on extended globbing

$ ls ?(*.jpg|*.gif) a.jpg b.gif

$ ls !(*.jpg|*.gif) # not a jpg or a gif c.png d.pdf ee.pdf

When first using extended globbing, many of them didn't seem to do what I initially thought they ought to do. For example, it appeared to me that, given a.jpg, the pattern ?(*.jpg|a.jpg) should not match, because a.jpg matched both patterns, and the ? is "zero or one", right? Wrong. My confusion was due to a misreading of the description: it's not the filename that can match only once, it's the pattern that can match only once. Think of it terms of regular expressions:

Glob Regular Expression Equivalent Description
?(patterns) (regex)? Match an optional regex
*(patterns) (regex)* Match zero or more occurrences of a regex
+(patterns) (regex)+ Match one or more occurrences of a regex
@(patterns) (regex) Match the regex (one occurrence)

 

So, for example:

$ ls *.pdf
ee.pdf  e.pdf  .pdf

$ ls ?(e).pdf # zero or one "e" allowed e.pdf .pdf

$ ls *(e).pdf # zero or more "e"s allowed ee.pdf e.pdf .pdf

$ ls +(e).pdf # one or more "e"s allowed ee.pdf e.pdf

$ ls @(e).pdf # only one e allowed e.pdf

And while I'm comparing glob patterns to regular expressions, there's an important point to be made that may not be immediately obvious: glob patterns are just another syntax for doing pattern matching in general in bash. And you can use them in a number of different places:

The following example uses pattern matching in the expression of an if statement to test whether a variable has a value of "something" or "anything":

$ shopt +s extglob

$ a=something $ if [[ $a == +(some|any)thing ]]; then echo yes; else echo no; fi yes

$ a=anything $ if [[ $a == +(some|any)thing ]]; then echo yes; else echo no; fi yes

$ a=nothing $ if [[ $a == +(some|any)thing ]]; then echo yes; else echo no; fi no

The following example uses pattern matching in a case statement to determine whether a file is an image file:

shopt +s extglob
for f in $*
do
    case $f in
    !(*.gif|*.jpg|*.png))  # ! == does not match
        echo "Not an image: $f"
        ;;
    *)
        echo "Image: $f"
        ;;
    esac
done
$ bash script.sh a.jpg b.gif c.png d.pdf e.pdf
Image: a.jpg
Image: b.gif
Image: c.png
Not an image: d.pdf
Not an image: e.pdf

In the example above, the pattern !(*.gif|*.jpg|*.png) will match a filename if it's not a gif, jpg or png.

The following example uses pattern matching in a %% parameter expansion to remove the extension from all image files:

shopt -s extglob
for f in $*
do
    echo ${f%%*(.gif|.jpg|.png)}
done
$ bash script.sh a.jpg b.gif c.png d.pdf e.pdf
a
b
c
d.pdf
e.pdf

A feature that I just recently became aware of is that you can do the above action in one fell swoop: if you use "*" or "@" as the variable name, the transformation is done on all the command-line arguments at once. [Note to self: always read the last half of the paragraph from now on]:

shopt -s extglob
echo ${*%%*(.gif|.jpg|.png)}
$ bash script.sh a.jpg b.gif c.png d.pdf e.pdf
a b c d.pdf e.pdf

And that works on arrays too:

shopt -s extglob
array=($*)
echo ${array[*]%%*(.gif|.jpg|.png)}
$ bash script.sh a.jpg b.gif c.png d.pdf e.pdf
a b c d.pdf e.pdf

The biggest takeaway here is to stop thinking of wildcards as a mechanism just to get a list of filenames and start thinking of them as glob patterns that can be used to do general pattern matching in your bash scripts. Think of glob patterns as regular expressions in a different language.


Any code found in my articles should be considered licensed as follows:

# Copyright 2019 Mitch Frazier <mitch -at- linuxjournal.com>
#
# This software may be used and distributed according to the terms of the
# MIT License or the GNU General Public License version 2 (or any later version).


TOPICS: Computers/Internet
KEYWORDS: bash; linux

1 posted on 04/22/2019 4:44:05 AM PDT by ShadowAce
[ Post Reply | Private Reply | View Replies]

To: rdb3; Calvinist_Dark_Lord; JosephW; Only1choice____Freedom; Ernest_at_the_Beach; martin_fierro; ...

2 posted on 04/22/2019 4:44:27 AM PDT by ShadowAce (Linux - The Ultimate Windows Service Pack)
[ Post Reply | Private Reply | To 1 | View Replies]

To: ShadowAce

If I knew more about computers I would understand this. As it is I am “bash”ing my head against a brick wall.

CC


3 posted on 04/22/2019 4:54:50 AM PDT by Celtic Conservative (My cats are more amusing than 200 channels worth of TV)
[ Post Reply | Private Reply | To 1 | View Replies]

To: ShadowAce

4 posted on 04/22/2019 5:49:01 AM PDT by Flick Lives
[ Post Reply | Private Reply | To 1 | View Replies]

To: Flick Lives

LOL! xkcd is great.


5 posted on 04/22/2019 6:11:02 AM PDT by dinodino
[ Post Reply | Private Reply | To 4 | View Replies]

Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.

Free Republic
Browse · Search
General/Chat
Topics · Post Article

FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson