s/// returns out of place newline

3774 views regex
0

I'm trying to use PCRE to reorder the content of an md5 file. For each line, I want the filename without the path then the hash. The best command I've come up with is:

$ perl -pe 's|^([[:alnum:]]+).*?([^/]+)$|$2 $1|' DCIM.md5

The input file (DCIM.md5) is produced by md5sum on Linux. It looks like this:

e26ff03dc1bac80226e200c0c63d17a2  ./Path1/IMG_20150201_160548.jpg
01f92572e4c6f2ea42bd904497e4f939  ./Path 2/IMG_20150204_190528.jpg
afce027c977944188b4f97c5dd1bd101  ./Path3/Path 4/IMG_20151011_193008.jpg
  1. The hash is matched by the first group ([[:alnum:]]+) in the
    regular expression.
  2. Then the spaces and the path to the file are
    matched by .*?.
  3. Then the filename is matched by ([^/]+).
  4. The expression is enclosed with ^ (apparently non-necessary here) and $. Without the $, the expression does not output what I expect.
  5. I use | rather than / as a separator to avoid escaping it in file paths.

That command returns:

IMG_20150201_160548.jpg
 e26ff03dc1bac80226e200c0c63d17a2IMG_20150204_190528.jpg
 01f92572e4c6f2ea42bd904497e4f939IMG_20151011_193008.jpg
 afce027c977944188b4f97c5dd1bd101IMG_20151011_195133.jpg

The matching is correct, the output sequence is correct (filename without path then hash) but the spacing is not: there's a newline after the filename. I expect it after the hash, like this:

IMG_20150201_160548.jpg e26ff03dc1bac80226e200c0c63d17a2
IMG_20150204_190528.jpg 01f92572e4c6f2ea42bd904497e4f939
IMG_20151011_193008.jpg afce027c977944188b4f97c5dd1bd101

It seems to me that my command outputs the newline character, but I don't know how to change this behavior. Or possibly the problem comes from the shell, not the command?

Finally, some version information:

$ perl -version
This is perl 5, version 22, subversion 1 (v5.22.1) built for i686-linux-gnu-thread-multi-64int
(with 69 registered patches, see perl -V for more detail)

answered question

[^/]will match non slash (including newlines). Maybe try [^\n/]

2 Answers

12

[^/]+ matches newlines, so the ones in your input are part of $2, which gets printed out first (And there's no newline in $1 so that's not getting printed at the end of your transformed $_...)

Solution: Read up on the -l option from perlrun. In particular:

-l[octnum] enables automatic line-ending processing. It has two separate effects. First, it automatically chomps $/ (the input record separator) when used with -n or -p. Second, it assigns $\ (the output record separator) to have the value of octnum so that any print statements will have that separator added back on. If octnum is omitted, sets $\ to the current value of $/ .

posted this
13

use [^/\n] instead of [^/]:

perl -pe 's|^([[:alnum:]]+).*?([^/\n]+)$|$2 $1|' DCIM.md5

posted this

Have an answer?

JD

Please login first before posting an answer.