Daily Dev Log: "--help" vs. "man"

2019 Jun 24

We just can’t remember options of CLI tools. In most cases, --help, like grep --help, is the go-to way to look for help.

For example, if you forget what -H of grep does.

$ grep --help | grep -- -H
  -H, --with-filename       print the file name for each match

man is too formal, wordy and overwhelming, comparing with --help. Usually, we can find out most of what we want in the output of --help, without turning to man.

In some environments, man pages may not be available there. One such example is Git Bash for Windows. Commands in Git Bash for Windows don’t have man pages. Relying on man only means you have to google the man(ual) in the browsers.

To save some typing, a bash function can also be added into the ~/.bashrc.

h() { $1 --help; }

Then type h grep to show the help.

Note that different commands or different variants of a command may print help text in different verbosity. For example, the builtin grep in OS X prints help text like below.

$ grep --help
usage: grep [-abcDEFGHhIiJLlmnOoqRSsUVvwxZ] [-A num] [-B num] [-C[num]]
	[-e pattern] [-f file] [--binary-files=value] [--color=when]
	[--context[=num]] [--directories=action] [--label] [--line-buffered]
	[--null] [pattern] [file ...]

It’s much less than the Linux grep’s help text. (Most of CLI tools support this level of help text at least.)

xargs is Slow

2019 May 29
# filepaths.txt is a file with thousands lines
cat filepaths.txt | xargs -n 1 basename

It takes a while (seconds) to finish running the above command. A file with thousands lines usually is not considered as a big volume. Why is xargs slow in the above command?

After read a SO post, it turns out xargs in the above command runs basename thousands times, therefore it has bad performance.

Can it be faster?

According to man xargs,

xargs reads items from the standard input … delimited by blanks … or newlines and executes the command … followed by items read from standard input. The command line for command is built up until it reaches a system-defined limit (unless the -n and -L options are used). … In general, there will be many fewer invocations of command than there were items in the input.
This will normally have significant performance benefits.

It means xargs can pass a batch of “items” to the command. Unfortunately, the -n 1 option in the command forces xargs to just take one “item” a time. To make it fast, use the -a option of basename, which let basename be able to handle multiple arguments at once.

time cat filepaths.txt | xargs -n 1 basename > /dev/null 

real    0m2.409s
user    0m0.044s
sys     0m0.332s
time cat filepaths.txt | xargs basename -a > /dev/null 

real    0m0.004s
user    0m0.000s
sys     0m0.000s

Thousands times faster.

–show-limits
cat /dev/null | xargs --show-limits --no-run-if-empty

Your environment variables take up 2027 bytes
POSIX upper limit on argument length (this system): 2093077
POSIX smallest allowable upper limit on argument length (all systems): 4096
Maximum length of command we could actually use: 2091050
Size of command buffer we are actually using: 131072
Maximum parallelism (--max-procs must be no greater): 2147483647

It shows xargs can feed a lot bytes into the command once (2091050 bytes here).

-P

Some commands can usefully be executed in parallel too; see the -P option.

Fix an Escaped Hyperlink Bug

2019 Mar 26

There was a bug in timeline page. Hyperlinks (<a>) were unwanted escaped, showed in the page like below.

if a feedback is submitted in <a href="/about">About</a> page, ...

The fix is rather simple as below.

-    <li><%= p %></li>
+    <li><%= p.html_safe %></li>

Basically, html_safe tells rails not to HTML escape a string by claiming the string is “safe” (doing nothing on the string but setting a flag). html_safe should not be called on any user input strings, otherwise you’ll be under risk of XSS attacks. For this site, the fix is safe since all the content in the timeline page is solely input by myself, not by other malicious users.

More about html_safe

html_safe is a method of SafeBuffer, which is a String wrapper designed to prevent XSS attack. SafeBuffer is almost the same as String, beside having a difference behavior on concatenation. When a “safe” SafeBuffer A is appended by an “unsafe” SafeBuffer or String B, B will be HTML escaped before concatenation. By default a SafeBuffer/String is not marked safe.

Rails uses SafeBuffer to prevent XSS attack. For the below erb,

  <li><%= p %></li>

Rails translates it into something like,

  '<li>'.html_safe + p + '<li>'.html_safe

Therefore, p, which may be from user input, is HTML escaped when concatenated, and rendered safely in the result page.

More than One Way in Ruby

Using raw has the same effect, but show the intention in a much clearer way.

  <li><%= raw p %></li>

And one more, <%== is equivalent to raw, in case you really want to save some keystrokes.

will_paginate with Bootstrap

2019 Jan 29

will_paginate doesn’t come with Bootstrap style pagination by default. However, as its doc says, it does support customization by providing your own LinkRenderer.

to customize HTML output of will_paginate, you’ll need to subclass WillPaginate::ActionView::LinkRenderer

Below is a simple implementation to render pagination links with Bootstrap(4) css components.

# app/helpers/application_helper.rb

module ApplicationHelper
  def will_paginate(collection_or_options = nil, options = {})
    if collection_or_options.is_a? Hash
      options, collection_or_options = collection_or_options, nil
    end
    unless options[:renderer]
      options = options.merge :renderer => BootstrapRenderer
    end
    super *[collection_or_options, options].compact
  end

  class BootstrapRenderer < WillPaginate::ActionView::LinkRenderer
    protected
    def html_container(html)
      tag :nav, tag(:ul, html, class: "pagination pagination-sm"), container_attributes
    end

    def page_number(page)
      tag :li, link(page, page, rel: rel_value(page), class: 'page-link'),
        class: (page == current_page ? 'page-item active': 'page-item')
    end

    def previous_or_next_page(page, text, classname)
      tag :li, link(text, page || '#', class: 'page-link'),
        class: ['page-item', classname, ('disabled' unless page)].join(' ')
    end
  end
end
# app/views/posts/index.html.erb

# use 'text-center' to center inline-blocks (<ul>) within <nav>
<%= will_paginate @posts, params: @will_paginate_params, class: 'text-center' %>

Daily Dev Log: "su - app" vs. "su app"

2019 Jan 24

From man su,

   -, -l, --login
      Provide an environment similar to what the user would expect had the user logged in directly.

So with su - app, after switch to the user app, you end up in the user’s HOME directory, and have the user’s ~/.bash_profile (not ~/.bashrc) executed.

Tools like RVM need a “login shell”.

RVM by default adds itself currently to ~/.bash_profile file

So if use su app, RVM will not be ready there for you after su.

Daily Dev Log: Avoid the Pitfall of Using the Same File to Redirect Input and Output

2019 Jan 15

Pitfalls

Do Not Use the Same File to Redirect Input and Output

tr -d '\015' <DOS-file >DOS-file

The above command will delete all content in the file!

From man bash,

[n]>word, if it does exist it is truncated to zero size.

(How did I find the file back? Luckily, the working directory is managed by Dropbox, and I found it back in the Dropbox.)

CLI

Convert Line Endings from DOS/Windows Style to Unix/Linux Style

tr -d '\015' <DOS-file >UNIX-file

(For what character \015 is, see man 7 ascii or ascii '\015' if the ascii command is installed.)

More ascii Command Examples

$ ascii '\r'
ASCII 0/13 is decimal 013, hex 0d, octal 015, bits 00001101: called ^M, CR
Official name: Carriage Return
C escape: '\r'
Other names: 

Search Manuals

-k Search the short descriptions and manual page names for the keyword

$ man -k ascii
ascii (1)            - report character aliases
ascii (7)            - ASCII character set encoded in octal, decimal, and hexadecimal
...

Fix a Maven Dependency Conflict

2019 Jan 4

The Dependency Conflict Cannot be Resolved

A project I work on is using both “com.google.sitebricks:sitebricks”, a now inactively under development web framework, and “org.drools:drools-compiler” as its dependencies, both of which then depend on “org.mvel:mvel2”. Maven used to find a version of mvel2 to satisfy both of the two dependencies.

However, when the “drools-compiler” dependency is upgraded to a newest version (“7.13.0.Final”), something unfortunate happens. The “drools-compiler:7.13.0.Final” uses a mvel2 version, which is incompatible with the one sitebricks uses. The “drools-compiler” uses some new APIs from the newer version of mvel2, unfortunately, that version of mvel2 deletes some APIs “sitebricks” uses. In this case, Maven cannot resolve the dependency conflict easily since there is NO version of mvel2 to satisfy “drools-compiler:7.13.0.Final” and “sitebricks:*”.

A Fix

Use Maven Shade Plugin to package sitebricks and its mvel2 dependency into a “shaded jar”, and also use this plugin to “relocate” mvel2 classes inside the “shaded jar”. “Relocation” here is to move mvel2 classes from package “org.mvel2” to some other package like “org.shaded.mvel2”, and to also modify bytecode of some sitebricks classes, which refers mvel2 classes, correspondingly, so that these “mvel2 classes” do not conflict with those used by “drools-compiler”.

Create a Maven project/module for the “shaded jar”.

<artifactId>sitebricks-shaded</artifactId>
<packaging>jar</packaging>
<build>
  <plugins>
    <plugin>
      <groupId>org.apache.maven.plugins</groupId>
      <artifactId>maven-shade-plugin</artifactId>
      <version>3.2.1</version>
      <executions>
        <execution>
          <phase>package</phase>
          <goals>
            <goal>shade</goal>
          </goals>
          <configuration>
            <finalName>sitebricks-shaded-${project.version}</finalName>
            <relocations>
              <relocation>
                <pattern>org.mvel2</pattern>
                <shadedPattern>org.shaded.mvel2</shadedPattern>
              </relocation>
            </relocations>
            <artifactSet>
              <includes>
                <include>org.mvel:mvel2</include>
                <include>com.google.sitebricks:*</include>
              </includes>
            </artifactSet>
          </configuration>
        </execution>
      </executions>
    </plugin>
  </plugins>
</build>
<dependencies>
  <dependency>
    <groupId>com.google.sitebricks</groupId>
    <artifactId>sitebricks</artifactId>
  </dependency>
</dependencies>

In the project, use the “sitebricks-shaded” as a dependency instead.

<dependency>
  <groupId>com.foo.bar</groupId>
  <artifactId>sitebricks-shaded</artifactId>
  <version>${project.version}</version>
</dependency>
<dependency>
  <groupId>org.drools</groupId>
  <artifactId>drools-compiler</artifactId>
  <version>${drools.version}</version>
</dependency>

And one more step.

Miss Newline Characters When "cat" Text Files

2019 Jan 4

The cat is often used to concatenate text files into one single file. In most cases, the cat works fine like below.

$ echo line 1 > file1.txt
$ echo line 2 > file2.txt
$ cat file{1,2}.txt
line 1
line 2

However, if some of files to be concatenated don’t end with the newline character, using cat to concatenate files may not generate expected file.

# -n, let echo not add the trailing newline character
$ echo -n line 1 > file1.txt
$ echo line 2 > file2.txt
$ cat file{1,2}.txt
line 1line 2

Note that in the above example, file1.txt doesn’t end with newline, so when two files concatenated there is no newline between them. This may not be the expected result. For example, we have multiple large text files. Every line in each file is a user ID. We want to concatenate these files into one file to be fed into a processing program at once. If some of files are not ended with newline, using cat may generate ill user IDs like user-id-foouser-id-bar. If the input volume is huge, these problematic IDs usually would not be detected by human eyes.

If the newlines between files is important in your case, using awk is safer.

# -n, let echo not add the trailing newline character
$ echo -n line 1 > file1.txt
$ echo line 2 > file2.txt
$ $ awk 1 file{1,2}.txt
line 1
line 2

See this SO answer.

Also, it’s a good idea to tune text editors to always show non-printable characters like the newline. Or, use cat -e, which prints invisible characters and a $ for the newline.

$ cat -e file1.txt | tail -1

grep Command Examples

2019 Jan 1


First, grep –help lists most of its options, which is the go-to command for most grep questions.

Like most CLI tools, options of grep can be combined. For example, -io is same as -i -o, -A3 is same as -A 3. Also, the options can be anywhere in the command.

$ grep hello a.txt -i --color

Stop after first match

$ grep -m 1 search-word file

-m, –max-count=NUM stop after NUM matches

Only print the 1000th match.

$ grep -m1000 search-word file | tail -n1
$ grep -l search-word *.txt

-l, –files-with-matches print only names of FILEs containing matches

It’s useful when you grep lots of files and only care about names of matched files.

Find unmatched files

-L, –files-without-match print only names of FILEs containing no match

-L is the opposite of -l option. It outputs the files which don’t contain the word to search.

$ grep -L search-word *.txt

Show line number of matched lines

$ grep -n search-word file

-n, –line-number print line number with output lines

Don’t output filename when grep multiple files

When grep multiple files, by default filename is included in the output. Like,

$ grep hello *.txt
a.txt:hello
b.txt:hello

Use -h to not output filenames.

$ grep -h hello *.txt
hello
hello

-h, –no-filename suppress the file name prefix on output

Search in “binary” files

Sometimes, a text file may contains a few non-printable characters, which makes grep consider it as a “binary” file. grep doesn’t print matched lines for a “binary” file.

$ printf "hello\000" > test.txt
$ grep hello test.txt 
Binary file test.txt matches

Use -a to let grep know the file should be seen as a “text” file.

$ grep -a hello test.txt 
hello

-a, –text equivalent to –binary-files=text

Search in directories

-r, –recursive like –directories=recurse

-R, –dereference-recursive likewise, but follow all symlinks

Without specifying a directory, grep searches in current working directory by default.

$ grep -R hello
b.md:hello
a.txt:hello

Specify directories.

$ grep -R hello tmp/ tmp2/
tmp/b.md:hello
tmp/a.txt:hello
tmp2/b.md:hello
tmp2/a.txt:hello

–include=FILE_PATTERN search only files that match FILE_PATTERN

Use --include to tell grep the pattern of the filenames you’re interested in.

$ grep -R hello --include="*.md"
b.md:hello

-i, –ignore-case ignore case distinctions

$ grep -i Hello a.txt 
hello
HELLO

The pattern to search begins with - (hyphen)

$ grep -- -hello a.txt
-hello

To know what -L option does.

$ grep --help | grep -- -L
  -L, --files-without-match  print only names of FILEs containing no match

Use pattern file

-f FILE, –file=FILE Obtain patterns from FILE, one per line. If this option is used multiple times or is combined with the -e (–regexp) option, search for all patterns given. The empty file contains zero patterns, and therefore matches nothing.

$ cat test.txt
111
222
333

$ cat patterns.txt
111
333

$ grep -f patterns.txt test.txt
111
333

NOTE: Do not put an empty line, i.e. a line with \n only, in the pattern file. Otherwise, the pattern file would match every line, since every line contains \n as its last character. It’s easy to make a mistake to put empty lines in the end of the pattern file.

Use -c, or --count to print only count of matching lines. For example, below command line is to find out the count of <OrderLine> tag in files of current directory.

$ grep "<OrderLine>" -c -R . 

It outputs like below.

./order-1.xml:3
./order-2.xml:9
./order-3.xml:1

To sort the output, use command like below.

$ grep "<OrderLine>" -c -R . | sort -t : -k 2
./order-3.xml:1
./order-1.xml:3
./order-2.xml:9

Daily Dev Log: Find Lines in One File but Not in Another

2018 Dec 12

We can use comm to find lines in one file but not in another file

# fine lines only in file-a
comm -23 file-a file-b

From comm --help,

-2 suppress column 2 (lines unique to FILE2)

-3 suppress column 3 (lines that appear in both files)

So to find lines exist in both file-a and file-b.

comm -12 file-a file-b

Google keywords: “linux command two file not contain” hit link