# filepaths.txt is a file with thousands lines
cat filepaths.txt | xargs -n 1 basename
It takes a while (seconds) to finish running the above command. A file with thousands lines usually is not considered as a big volume. Why is xargs slow in the above command?
After read a SO post, it turns out xargs
in the above command runs basename
thousands times,
therefore it has bad performance.
Can it be faster?
According to man xargs
,
xargs reads items from the standard input … delimited by blanks … or newlines and executes the command … followed by items read from standard input. The command line for command is built up until it reaches a system-defined limit (unless the -n and -L options are used). … In general, there will be many fewer invocations of command than there were items in the input.
This will normally have significant performance benefits.
It means xargs
can pass a batch of “items” to the command.
Unfortunately, the -n 1
option in the command forces xargs
to just take one “item” a time.
To make it fast, use the -a
option of basename
, which let basename
be able to handle multiple arguments at once.
time cat filepaths.txt | xargs -n 1 basename > /dev/null
real 0m2.409s
user 0m0.044s
sys 0m0.332s
time cat filepaths.txt | xargs basename -a > /dev/null
real 0m0.004s
user 0m0.000s
sys 0m0.000s
Thousands times faster.
–show-limits
cat /dev/null | xargs --show-limits --no-run-if-empty
Your environment variables take up 2027 bytes
POSIX upper limit on argument length (this system): 2093077
POSIX smallest allowable upper limit on argument length (all systems): 4096
Maximum length of command we could actually use: 2091050
Size of command buffer we are actually using: 131072
Maximum parallelism (--max-procs must be no greater): 2147483647
It shows xargs
can feed a lot bytes into the command once (2091050 bytes here).
-P
Some commands can usefully be executed in parallel too; see the -P option.