Process Substitution

2019 Dec 19

From man bash, for “process substitution” it says

It takes the form of <(list) or >(list)

Note: there is no space between < or > and (.

Also, it says,

The process list is run with its input or output connected to a FIFO or some file in /dev/fd. The name of this file is passed as an argument to the current command as the result of the expansion.

The <(list) Form

$ diff <(echo a) <(echo b)
1c1
< a
---
> b

Usually the diff takes two files and compares them. The process substitution here, <(echo a), creates a file in /dev/fd, for example /dev/fd/63. The stdout of echo a command is connected to /dev/fd/63. Meanwhile, /dev/fd/63 is used as an input file/parameter of diff command. Similar for <(echo b). After Bash does the substitution, the command is like diff /dev/fd/63 /dev/fd/64. In diff’s point of view, it just compares two normal files.

In this example, one advantage of process substitution is eliminating the need of temporary files, like

$ echo a > tmp.a && echo b > tmp.b \ 
    && diff tmp.a tmp.b            \
    && rm tmp.{a,b}

\ The >(list) Form

$ echo . | tee >(ls)

Similar, Bash creates a file in /dev/fd when it sees >(ls). Again, let’s say the file is /dev/fd/63. Bash connects /dev/fd/63 with stdin of ls command, also the file /dev/fd/63 is used as a parameter of tee command. The tee views /dev/fd/63 as a normal file. tee writes content, here is ., into the file, and the content will “pipe” into the stdin of ls.

Compare with Pipe

Pipe, cmd-a | cmd-b, basically just passes stdout of the command on the left to the stdin of the command on the right. Its data flow is restricted, which is from left to right.

Process substitution has more freedom.

# use process substitution
$ grep -f <(echo hello) file-a
hello
# use pipe
$ echo hello | xargs -I{} grep {} file-a
hello

And for commands like diff <(echo a) <(echo b), it’s not easy to be done by pipe.

More Examples

$ paste <(echo hello) <(echo world)
hello   world

How it works

From man bash,

Process substitution is supported on systems that support named pipes (FIFOs) or the /dev/fd method of naming open files.

Read more about named pipe and /dev/fd.

For /dev/fd,

The main use of the /dev/fd files is from the shell. It allows programs that use pathname arguments to handle standard input and standard output in the same manner as other pathnames.

In my OS (Ubuntu), the Bash uses /dev/fd for process substitution.

$ ls -l <(echo .)
lr-x------ 1 user user 64 12月 19 11:19 /dev/fd/63 -> pipe:[4926830]

Bash replaces <(echo .) with /dev/fd/63. The above command is like ls -l /dev/fd/63.

Or find the backing file via,

$ echo <(true)
/dev/fd/63

(After my Bash does the substitution, the command becomes echo /dev/fd/63, which outputs /dev/fd/63.)

CLI

Daily Dev Log

2019 Jun 24

CLI

“foo –help” vs “man foo”

We just can’t remember options of CLI tools. In most cases, --help, like grep --help, is the go-to way to look for help.

For example, if you forget what -H of grep does.

$ grep --help | grep -- -H
  -H, --with-filename       print the file name for each match

man is too formal, wordy and overwhelming, comparing with --help. Usually, we can find what we want in the output of --help, without turning to man.

Another advantage of --help is that it’s always available, and built in the command itself. If you have ever used Git Bash for Windows, you will find that man is not available there. Relying on man only means you have to google the man(ual) in the browsers.

Or to save some typing, add a bash function into ~/.bashrc.

h() { $1 --help; }

Then type h grep to show the help.

CLI

xargs is Slow

2019 May 29
# filepaths.txt is a file with thousands lines
cat filepaths.txt | xargs -n 1 basename

It takes a while (seconds) to finish running the above command. A file with thousands lines usually is not considered as a big volume. Why is xargs slow in the above command?

After read a SO post, it turns out xargs in the above command runs basename thousands times, therefore it has bad performance.

Can it be faster?

According to man xargs,

xargs reads items from the standard input … delimited by blanks … or newlines and executes the command … followed by items read from standard input. The command line for command is built up until it reaches a system-defined limit (unless the -n and -L options are used). … In general, there will be many fewer invocations of command than there were items in the input.
This will normally have significant performance benefits.

It means xargs can pass a batch of “items” to the command. Unfortunately, the -n 1 option in the command forces xargs to just take one “item” a time. To make it fast, use the -a option of basename, which let basename be able to handle multiple arguments at once.

time cat filepaths.txt | xargs -n 1 basename > /dev/null 

real    0m2.409s
user    0m0.044s
sys     0m0.332s
time cat filepaths.txt | xargs basename -a > /dev/null 

real    0m0.004s
user    0m0.000s
sys     0m0.000s

Thousands times faster.

–show-limits
cat /dev/null | xargs --show-limits --no-run-if-empty

Your environment variables take up 2027 bytes
POSIX upper limit on argument length (this system): 2093077
POSIX smallest allowable upper limit on argument length (all systems): 4096
Maximum length of command we could actually use: 2091050
Size of command buffer we are actually using: 131072
Maximum parallelism (--max-procs must be no greater): 2147483647

It shows xargs can feed a lot bytes into the command once (2091050 bytes here).

-P

Some commands can usefully be executed in parallel too; see the -P option.

CLI

Daily Dev Log

2019 Jan 24

CLI

“su - app” vs “su app”

From man su,

   -, -l, --login
      Provide an environment similar to what the user would expect had the user logged in directly.

So with su - app, after switch to the user app, you end up in the user’s HOME directory, and have the user’s ~/.bash_profile (not ~/.bashrc) executed.

Tools like RVM need a “login shell”.

RVM by default adds itself currently to ~/.bash_profile file

So if use su app, RVM will not be ready there for you.

CLI

Daily Dev Log

2019 Jan 15

CLI

Convert Line Endings from DOS/Windows Style to Unix/Linux Style

tr -d '\015' <DOS-file >UNIX-file

For what character \015 is, see man 7 ascii. Or, there is a command called ascii, if it’s installed, type ascii '\015'.

More ascii Commands

$ ascii '\r'
ASCII 0/13 is decimal 013, hex 0d, octal 015, bits 00001101: called ^M, CR
Official name: Carriage Return
C escape: '\r'
Other names: 

Search Manuals

-k Search the short descriptions and manual page names for the keyword

$ man -k ascii
ascii (1)            - report character aliases
ascii (7)            - ASCII character set encoded in octal, decimal, and hexadecimal
...

Pitfalls

Do Not Use the Same File to Redirect Input and Output

tr -d '\015' <DOS-file >DOS-file

The above command will delete all content in the file!

From man bash,

[n]>word, if it does exist it is truncated to zero size.

(How did I find the file back? Luckily, the working directory is managed by Dropbox, and I found it back in the Dropbox.)

CLI

grep Command Examples

2019 Jan 1

First, grep –help lists most of its options, which is the go-to command for most grep questions.

Like most CLI tools, options of grep can be combined. For example, -io is same as -i -o, -A3 is same as -A 3. Also, the options can be anywhere in the command.

$ grep hello a.txt -i --color

Stop after first match

$ grep -m 1 search-word file

-m, –max-count=NUM stop after NUM matches

Only print the 1000th match.

$ grep -m1000 search-word file | tail -n1
$ grep -l search-word *.txt

-l, –files-with-matches print only names of FILEs containing matches

It’s useful when you grep lots of files and only care about names of matched files.

Find unmatched files

-L, –files-without-match print only names of FILEs containing no match

-L is the opposite of -l option. It outputs the files which don’t contain the word to search.

$ grep -L search-word *.txt

Show line number of matched lines

$ grep -n search-word file

-n, –line-number print line number with output lines

Don’t output filename when grep multiple files

When grep multiple files, by default filename is included in the output. Like,

$ grep hello *.txt
a.txt:hello
b.txt:hello

Use -h to not output filenames.

$ grep -h hello *.txt
hello
hello

-h, –no-filename suppress the file name prefix on output

Search in “binary” files

Sometimes, a text file may contains a few non-printable characters, which makes grep consider it as a “binary” file. grep doesn’t print matched lines for a “binary” file.

$ printf "hello\000" > test.txt
$ grep hello test.txt 
Binary file test.txt matches

Use -a to let grep know the file should be seen as a “text” file.

$ grep -a hello test.txt 
hello

-a, –text equivalent to –binary-files=text

Search in directories

-r, –recursive like –directories=recurse

-R, –dereference-recursive likewise, but follow all symlinks

Without specifying a directory, grep searches in current working directory by default.

$ grep -R hello
b.md:hello
a.txt:hello

Specify directories.

$ grep -R hello tmp/ tmp2/
tmp/b.md:hello
tmp/a.txt:hello
tmp2/b.md:hello
tmp2/a.txt:hello

–include=FILE_PATTERN search only files that match FILE_PATTERN

Use --include to tell grep the pattern of the filenames you’re interested in.

$ grep -R hello --include="*.md"
b.md:hello

-i, –ignore-case ignore case distinctions

$ grep -i Hello a.txt 
hello
HELLO

The pattern to search begins with - (hyphen)

$ grep -- -hello a.txt
-hello

To know what -L option does.

$ grep --help | grep -- -L
  -L, --files-without-match  print only names of FILEs containing no match
CLI

Daily Dev Log

2018 Dec 12

CLI

comm, find lines in one file but not in another file

# fine lines only in file-a
comm -23 file-a file-b

From comm --help,

-2 suppress column 2 (lines unique to FILE2)

-3 suppress column 3 (lines that appear in both files)

# fine lines both in file-a and file-b
comm -12 file-a file-b

Google keywords: “linux command two file not contain” hit link

CLI

在Windows上安装tmux

2018 Apr 11

Windows上的Git BASH提供了大部分常用的Linux命令行工具,比如grep、sed等,但是并没有提供tmux。 实际上Git for Windows提供了包管理(package management)功能,

Git for Windows is based on MSYS2 which bundles Arch Linux’ Pacman tool for dependency management.

借助pacman,Git for Windows可以安装额外的命令行工具,比如tmux。 但是,在Git BASH里,pacman并没有默认开启

This is intended. We do not ship pacman with Git for Windows. If you are interested in a fully fledged package manager maintained environment you have to give the Git for Windows SDK a try.

需要安装Git for Windows SDK来开启pacman。 安装好之后,打开Git SDK(和Git Bash一样,是一个终端模拟器),

$ pacman -Ss tmux

会找到两个包,

msys/tmux 2.6-1
	A terminal multiplexer
msys/tmux-git 2.5.94.g73b9328c-1 
	A terminal multiplexer
$ pacman -S msys/tmux-git

安装的时候可能会报下面的错误,

$ pacman -S msys/tmux
warning: database file for 'git-for-windows-mingw32' does not exist
error: failed to prepare transaction (could not find database)

打开/etc/pacman.conf文件,注释掉下面的行即可,

#[git-for-windows]
#Server = https://wingit.blob.core.windows.net/x86-64

#[git-for-windows-mingw32]
#Server = https://wingit.blob.core.windows.net/i686

安装好之后,就可以在Windows上(Git SDK)使用tmux了。

tmux on Windows

pacman的用法可参见Git for Windows的Wiki

环境:Windows 10

(如果发现某些程序,比如ssh,报错,可以尝试用pacman -Syu升级所有package。)

CLI