Friday, March 11, 2022

sed command on OSX not supporting '+' or '?' by default

Regex pattern has 3 ways to indicate the occurrence of a character:

"c*"        matching letter 'c'    0 to N    times
"c+"       matching letter 'c'     1 to N    times
"c?"        matching letter 'c'    0 or 1     time

For example, if you want to replace "p...c" in input string,   INSTR="aaabbbcccpppccc"

On Linux, you do        echo $INSTR | sed 's/p\+c/_/'

The same thing on OSX not working, just failed silently.

$ echo $INSTR | sed 's/p\+c/_/'

aaabbbcccpppccc

$ echo $INSTR | sed 's/p+c/_/'

aaabbbcccpppccc


This is because, by default on OSX, sed interprets basic regex.  Instead, we specify using extended regex (-E)

$ echo $INSTR | sed -E 's/p+c/_/'

aaabbbccc_cc

We don't want '*' or the early ccc will be matched.

$ echo $INSTR | sed -E 's/p*c/_/'

aaabbb_ccpppccc

This in fact gives the same result as 's/z*c/_' because 'z' can be zero time

Now, similarly, to indicate 0 or 1 time, use -E as well

$ echo $INSTR | sed -E 's/p?c/_/'

aaabbb_ccpppccc

$ echo $INSTR | sed -E 's/b?c/_/'

aaabb_ccpppccc






Labels: , , ,

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home