I'm trying to use PCRE to reorder the content of an md5 file. For each line, I want the filename without the path then the hash. The best command I've come up with is:
$ perl -pe 's|^([[:alnum:]]+).*?([^/]+)$|$2 $1|' DCIM.md5
The input file (
DCIM.md5) is produced by
md5sum on Linux. It looks like this:
e26ff03dc1bac80226e200c0c63d17a2 ./Path1/IMG_20150201_160548.jpg 01f92572e4c6f2ea42bd904497e4f939 ./Path 2/IMG_20150204_190528.jpg afce027c977944188b4f97c5dd1bd101 ./Path3/Path 4/IMG_20151011_193008.jpg
- The hash is matched by the first group
- Then the spaces and the path to the file are
- Then the filename is matched by
- The expression is enclosed with
^(apparently non-necessary here) and
$. Without the
$, the expression does not output what I expect.
- I use
/as a separator to avoid escaping it in file paths.
That command returns:
IMG_20150201_160548.jpg e26ff03dc1bac80226e200c0c63d17a2IMG_20150204_190528.jpg 01f92572e4c6f2ea42bd904497e4f939IMG_20151011_193008.jpg afce027c977944188b4f97c5dd1bd101IMG_20151011_195133.jpg
The matching is correct, the output sequence is correct (filename without path then hash) but the spacing is not: there's a newline after the filename. I expect it after the hash, like this:
IMG_20150201_160548.jpg e26ff03dc1bac80226e200c0c63d17a2 IMG_20150204_190528.jpg 01f92572e4c6f2ea42bd904497e4f939 IMG_20151011_193008.jpg afce027c977944188b4f97c5dd1bd101
It seems to me that my command outputs the newline character, but I don't know how to change this behavior. Or possibly the problem comes from the shell, not the command?
Finally, some version information:
$ perl -version This is perl 5, version 22, subversion 1 (v5.22.1) built for i686-linux-gnu-thread-multi-64int (with 69 registered patches, see perl -V for more detail)
[^/]+ matches newlines, so the ones in your input are part of
$2, which gets printed out first (And there's no newline in
$1 so that's not getting printed at the end of your transformed
Solution: Read up on the
-l option from perlrun. In particular:
-l[octnum] enables automatic line-ending processing. It has two separate effects. First, it automatically chomps $/ (the input record separator) when used with -n or -p. Second, it assigns $\ (the output record separator) to have the value of octnum so that any print statements will have that separator added back on. If octnum is omitted, sets $\ to the current value of $/ .