The jq Command Examples
- The
keys
builtin function - Array/Object Value Iterator: .[]
exp as $x | ...
and String Interpolation- More Complex Expression in String interpolation
- Array construction:
[]
- Object Construction:
{}
- Object Construction:
{}
and Array construction:[]
- The
sort
function - The
sort_by
function - Select/Filter
- Multiple Conditions in
select
Some jq
examples. All quotes are from the jq manual.
A sample json file is as below.
$ cat sample.json
{
"apple-weight": [
60
],
"orange-weight": [
50
],
"banana-weight": [
20,
35
]
}
The keys
builtin function
$ jq '. | keys' sample.json
[
"apple-weight",
"banana-weight",
"orange-weight"
]
The builtin function keys, when given an object, returns its keys in an array.
Array/Object Value Iterator: .[]
$ jq '. | keys[]' sample.json
"apple-weight"
"banana-weight"
"orange-weight"
If you use the .[index] syntax, but omit the index entirely, it will return all of the elements of an array.
Running .[] with the input [1,2,3] will produce the numbers as three separate results, rather than as a single array.
You can also use this on an object, and it will return all the values of the object.
exp as $x | ...
and String Interpolation
$ jq '. | keys[] as $k | "\($k), \(.[$k])"' sample.json
"apple-weight, [60]"
"banana-weight, [20,35]"
"orange-weight, [50]"
The expression
exp as $x | ...
means: for each value of expression exp, run the rest of the pipeline with the entire original input, and with $x set to that value. Thus as functions as something of a foreach loop.
The '. | keys[] as $k | "\($k), \(.[$k])"'
means for each value of . | keys[]
, which are “apple-weight”, “banana-weight” and “orange-weight”,
run the rest of pipeline, i.e. "\($k), \(.[$k])"
, which is string interpolation.
String interpolation - (foo)
More Complex Expression in String interpolation
$ jq '. | keys[] as $k | "\($k), \(.[$k][0])" ' sample.json
"apple-weight, 60"
"banana-weight, 20"
"orange-weight, 50"
\(.[$k][0])
is replaced with the value of .["apple-weight"][0]
.
Array construction: []
$ jq -c '. | keys[] as $k | [$k, .[$k][0]] ' sample.json
["apple-weight",60]
["banana-weight",20]
["orange-weight",50]
$ jq '[ . | keys[] as $k | [$k, .[$k][0]] ] ' sample.json
[
[
"apple-weight",
60
],
[
"banana-weight",
20
],
[
"orange-weight",
50
]
]
If you have a filter X that produces four results, then the expression [X] will produce a single result, an array of four elements.
The . | keys[] as $k | [$k, .[$k][0]]
produces three results, enclosing it with []
produces an array of these three elements.
Object Construction: {}
$ jq ' . | keys[] as $k | {category: $k, weight: .[$k][0]} ' sample.json
{
"category": "apple-weight",
"weight": 60
}
{
"category": "banana-weight",
"weight": 20
}
{
"category": "orange-weight",
"weight": 50
}
Object Construction: {}
and Array construction: []
$ jq '[ . | keys[] as $k | {category: $k, weight: .[$k][0]} ] ' sample.json
[
{
"category": "apple-weight",
"weight": 60
},
{
"category": "banana-weight",
"weight": 20
},
{
"category": "orange-weight",
"weight": 50
}
]
The sort
function
$ jq '[ . | keys[] as $k | [$k, .[$k][0]] ] | sort ' sample.json
[
[
"apple-weight",
60
],
[
"banana-weight",
20
],
[
"orange-weight",
50
]
]
The sort functions sorts its input, which must be an array.
Values are sorted in the following order:
null
,false
,true
, …
The [ . | keys[] as $k | [$k, .[$k][0]] ]
is an array of three elements, each of which itself is an array.
These three elements, according to the manual, are sorted “in lexical order”.
The sort_by
function
$ jq '[ . | keys[] as $k | {category: $k, weight: .[$k][0]} ] | sort_by(.weight) ' sample.json
[
{
"category": "banana-weight",
"weight": 20
},
{
"category": "orange-weight",
"weight": 50
},
{
"category": "apple-weight",
"weight": 60
}
]
sort_by(foo) compares two elements by comparing the result of foo on each element.
The [ . | keys[] as $k | {category: $k, weight: .[$k][0]} ]
is an array of three objects.
The | sort_by(.weight)
sorts these three objects by comparing their weight
property.
The final result is still an array, but sorted.
Select/Filter
$ jq '[ . | keys[] as $k | {category: $k, weight: .[$k][0]} ] | sort_by(.weight) | .[] | select(.weight >= 50) ' sample.json
{
"category": "orange-weight",
"weight": 50
}
{
"category": "apple-weight",
"weight": 60
}
The function select(foo) produces its input unchanged if foo returns true for that input, and produces no output otherwise.
The [ . | keys[] as $k | {category: $k, weight: .[$k][0]} ] | sort_by(.weight)
produces a sorted array.
The following .[]
, i.e. array iterator, feeds select(.weight >= 50)
with three elements of that array.
The final result is elements whose weight
is equal or larger than 50
.
The command below, using map
, produces the same result.
$ jq '[ . | keys[] as $k | {category: $k, weight: .[$k][0]} ] | sort_by(.weight) | map(select(.weight >= 50)) ' sample.json
[
{
"category": "orange-weight",
"weight": 50
},
{
"category": "apple-weight",
"weight": 60
}
]
Multiple Conditions in select
$ jq '[ . | keys[] as $k | {category: $k, weight: .[$k][0]} ] | sort_by(.weight) | .[] | select( (.weight >= 50) and (.weight < 60)) ' sample.json
{
"category": "orange-weight",
"weight": 50
}
Sort .csv Files by Columns in Command Line
The sort
command can be used to sort .csv files by specific columns.
Have an example .csv file like below.
$ cat orders.csv
user,date,product,amount,unit price
user-2,2020-05-11,product-2,2,500
user-3,2020-04-11,product-1,2,600
user-1,2020-06-11,product-3,2,100
user-1,2020-06-21,product-1,6,600
user-1,2020-04-12,product-3,2,100
To sort orders by highest unit price, run the command below.
$ sort -r --field-separator=',' --key=5 -n orders.csv
user-3,2020-04-11,product-1,2,600
user-1,2020-06-21,product-1,6,600
user-2,2020-05-11,product-2,2,500
user-1,2020-06-11,product-3,2,100
user-1,2020-04-12,product-3,2,100
user,date,product,amount,unit price
The --field-separator
option (or -t
) specifies ,
as the field separator character. By default, sort
considers
blank space as the field separator character.
The --key=5
let sort
use the fifth field of lines to sort the lines.
The -n
is to sort numerically, and -r
is to sort in reverse order.
To fix the headers of the .csv file at the very first row after sorting, process substitution can be used.
$ cat <(head -1 orders.csv) \
<(tail -n +2 orders.csv|sort -r --field-separator=',' --key=5 -n)
user,date,product,amount,unit price
user-3,2020-04-11,product-1,2,600
user-1,2020-06-21,product-1,6,600
user-2,2020-05-11,product-2,2,500
user-1,2020-06-11,product-3,2,100
user-1,2020-04-12,product-3,2,100
To sort orders by highest unit price and amount, provide multiple --key
options as below.
$ cat <(head -1 orders.csv) \
<(tail -n +2 orders.csv|sort -r -t ',' -k 5 -k 4 -n)
user,date,product,amount,unit price
user-1,2020-06-21,product-1,6,600
user-3,2020-04-11,product-1,2,600
user-2,2020-05-11,product-2,2,500
user-1,2020-06-11,product-3,2,100
user-1,2020-04-12,product-3,2,100
The format of value of --field-separator
could be a bit more complex.
For example, to sort orders by the day of order date, run the command below.
$ sort -t , -n -k 2.9 orders.csv
user,date,product,amount,unit price
user-1,2020-06-11,product-3,2,100
user-2,2020-05-11,product-2,2,500
user-3,2020-04-11,product-1,2,600
user-1,2020-04-12,product-3,2,100
user-1,2020-06-21,product-1,6,600
The -k 2.9
means for each line sort
uses strings which starts from the ninth position of the second field till the end of the line.
The -k 2.9,5
means for each line sort
only looks at strings which starts from the ninth position of the second field and ends at the last character
of the fifth field.
The -k 2.9,5.2
means sort
only looks at strings which starts from the ninth position of the second field and ends at the second character
of the fifth field.
For more details, check the man sort
.
Process Substitution
From man bash
, for “process substitution” it says
It takes the form of <(list) or >(list)
Note: there is no space between <
or >
and (
.
Also, it says,
The process list is run with its input or output connected to a FIFO or some file in /dev/fd. The name of this file is passed as an argument to the current command as the result of the expansion.
The <(list)
Form
$ diff <(echo a) <(echo b)
1c1
< a
---
> b
Usually the diff
takes two files and compares them.
The process substitution here, <(echo a)
, creates a file in /dev/fd
, for example /dev/fd/63
.
The stdout of echo a
command is connected to /dev/fd/63
.
Meanwhile, /dev/fd/63
is used as an input file/parameter of diff
command.
Similar for <(echo b)
.
After Bash does the substitution, the command is like diff /dev/fd/63 /dev/fd/64
.
In diff
’s point of view, it just compares two normal files.
In this example, one advantage of process substitution is eliminating the need of temporary files, like
$ echo a > tmp.a && echo b > tmp.b \
&& diff tmp.a tmp.b \
&& rm tmp.{a,b}
The >(list)
Form
$ echo . | tee >(ls)
Similar, Bash creates a file in /dev/fd
when it sees >(ls)
.
Again, let’s say the file is /dev/fd/63
.
Bash connects /dev/fd/63
with stdin of ls
command, also the file /dev/fd/63
is used as a parameter of
tee
command.
The tee
views /dev/fd/63
as a normal file.
tee
writes content, here is .
, into the file, and the content will “pipe” into the stdin of ls
.
Compare with Pipe
Pipe, cmd-a | cmd-b
, basically just passes stdout of the command on the left to the stdin of the command on the right.
Its data flow is restricted, which is from left to right.
Process substitution has more freedom.
# use process substitution
$ grep -f <(echo hello) file-a
hello
# use pipe
$ echo hello | xargs -I{} grep {} file-a
hello
And for commands like diff <(echo a) <(echo b)
, it’s not easy to be done by pipe.
More Examples
$ paste <(echo hello) <(echo world)
hello world
How it works
From man bash
,
Process substitution is supported on systems that support named pipes (FIFOs) or the /dev/fd method of naming open files.
Read more about named pipe and /dev/fd.
For /dev/fd
,
The main use of the /dev/fd files is from the shell. It allows programs that use pathname arguments to handle standard input and standard output in the same manner as other pathnames.
In my OS (Ubuntu), the Bash uses /dev/fd
for process substitution.
$ ls -l <(echo .)
lr-x------ 1 user user 64 12月 19 11:19 /dev/fd/63 -> pipe:[4926830]
Bash replaces <(echo .)
with /dev/fd/63
.
The above command is like ls -l /dev/fd/63
.
Or find the backing file via,
$ echo <(true)
/dev/fd/63
(After my Bash does the substitution, the command becomes echo /dev/fd/63
,
which outputs /dev/fd/63
.)
Daily Dev Log: "--help" vs. "man"
We just can’t remember options of CLI tools.
In most cases, --help
, like grep --help
, is the go-to way to look for help.
For example, if you forget what -H
of grep
does.
$ grep --help | grep -- -H
-H, --with-filename print the file name for each match
man
is too formal, wordy and overwhelming, comparing with --help
.
Usually, we can find out most of what we want in the output of --help
, without turning to man
.
In some environments, man
pages may not be available there. One such example is Git Bash for Windows.
Commands in Git Bash for Windows don’t have man pages.
Relying on man
only means you have to google the man(ual) in the browsers.
To save some typing, a bash function can also be added into the ~/.bashrc
.
h() { $1 --help; }
Then type h grep
to show the help.
Note that different commands or different variants of a command may print help text in different verbosity.
For example, the builtin grep
in OS X prints help text like below.
$ grep --help
usage: grep [-abcDEFGHhIiJLlmnOoqRSsUVvwxZ] [-A num] [-B num] [-C[num]]
[-e pattern] [-f file] [--binary-files=value] [--color=when]
[--context[=num]] [--directories=action] [--label] [--line-buffered]
[--null] [pattern] [file ...]
It’s much less than the Linux grep
’s help text.
(Most of CLI tools support this level of help text at least.)
xargs is Slow
# filepaths.txt is a file with thousands lines
cat filepaths.txt | xargs -n 1 basename
It takes a while (seconds) to finish running the above command. A file with thousands lines usually is not considered as a big volume. Why is xargs slow in the above command?
After read a SO post, it turns out xargs
in the above command runs basename
thousands times,
therefore it has bad performance.
Can it be faster?
According to man xargs
,
xargs reads items from the standard input … delimited by blanks … or newlines and executes the command … followed by items read from standard input. The command line for command is built up until it reaches a system-defined limit (unless the -n and -L options are used). … In general, there will be many fewer invocations of command than there were items in the input.
This will normally have significant performance benefits.
It means xargs
can pass a batch of “items” to the command.
Unfortunately, the -n 1
option in the command forces xargs
to just take one “item” a time.
To make it fast, use the -a
option of basename
, which let basename
be able to handle multiple arguments at once.
time cat filepaths.txt | xargs -n 1 basename > /dev/null
real 0m2.409s
user 0m0.044s
sys 0m0.332s
time cat filepaths.txt | xargs basename -a > /dev/null
real 0m0.004s
user 0m0.000s
sys 0m0.000s
Thousands times faster.
–show-limits
cat /dev/null | xargs --show-limits --no-run-if-empty
Your environment variables take up 2027 bytes
POSIX upper limit on argument length (this system): 2093077
POSIX smallest allowable upper limit on argument length (all systems): 4096
Maximum length of command we could actually use: 2091050
Size of command buffer we are actually using: 131072
Maximum parallelism (--max-procs must be no greater): 2147483647
It shows xargs
can feed a lot bytes into the command once (2091050 bytes here).
-P
Some commands can usefully be executed in parallel too; see the -P option.
Daily Dev Log: "su - app" vs. "su app"
From man su
,
-, -l, --login Provide an environment similar to what the user would expect had the user logged in directly.
So with su - app
, after switch to the user app
, you end up in the user’s HOME directory,
and have the user’s ~/.bash_profile
(not ~/.bashrc
) executed.
Tools like RVM need a “login shell”.
RVM by default adds itself currently to ~/.bash_profile file
So if use su app
, RVM will not be ready there for you after su
.
Daily Dev Log: Avoid the Pitfall of Using the Same File to Redirect Input and Output
Pitfalls
Do Not Use the Same File to Redirect Input and Output
tr -d '\015' <DOS-file >DOS-file
The above command will delete all content in the file!
From man bash
,
[n]>word, if it does exist it is truncated to zero size.
(How did I find the file back? Luckily, the working directory is managed by Dropbox, and I found it back in the Dropbox.)
CLI
Convert Line Endings from DOS/Windows Style to Unix/Linux Style
tr -d '\015' <DOS-file >UNIX-file
(For what character \015
is, see man 7 ascii
or ascii '\015'
if the ascii
command is installed.)
More ascii Command Examples
$ ascii '\r'
ASCII 0/13 is decimal 013, hex 0d, octal 015, bits 00001101: called ^M, CR
Official name: Carriage Return
C escape: '\r'
Other names:
Search Manuals
-k Search the short descriptions and manual page names for the keyword
$ man -k ascii
ascii (1) - report character aliases
ascii (7) - ASCII character set encoded in octal, decimal, and hexadecimal
...
Miss Newline Characters When "cat" Text Files
The cat
is often used to concatenate text files into one single file.
In most cases, the cat
works fine like below.
$ echo line 1 > file1.txt
$ echo line 2 > file2.txt
$ cat file{1,2}.txt
line 1
line 2
However, if some of files to be concatenated don’t end with the newline character,
using cat
to concatenate files may not generate expected file.
# -n, let echo not add the trailing newline character
$ echo -n line 1 > file1.txt
$ echo line 2 > file2.txt
$ cat file{1,2}.txt
line 1line 2
Note that in the above example, file1.txt doesn’t end with newline, so when two files
concatenated there is no newline between them.
This may not be the expected result. For example, we have multiple large text files.
Every line in each file is a user ID. We want to concatenate these files into one file
to be fed into a processing program at once. If some of files are not ended with newline,
using cat
may generate ill user IDs like user-id-foouser-id-bar
.
If the input volume is huge, these problematic IDs usually would not be detected by human
eyes.
If the newlines between files is important in your case, using awk
is safer.
# -n, let echo not add the trailing newline character
$ echo -n line 1 > file1.txt
$ echo line 2 > file2.txt
$ $ awk 1 file{1,2}.txt
line 1
line 2
See this SO answer.
Also, it’s a good idea to tune text editors to always show non-printable characters like the
newline. Or, use cat -e
, which prints invisible characters and a $
for the newline.
$ cat -e file1.txt | tail -1
grep Command Examples
- Stop after first match
- Print only filename if match
- Find unmatched files
- Show line number of matched lines
- Don’t output filename when grep multiple files
- Search in “binary” files
- Search in directories
- Ignore case when search
- The pattern to search begins with - (hyphen)
- Use pattern file
- Print only count of matching lines
First, grep –help lists most of its options, which is the go-to command for most grep questions.
Like most CLI tools, options of grep can be combined.
For example, -io
is same as -i -o
, -A3
is same as -A 3
.
Also, the options can be anywhere in the command.
$ grep hello a.txt -i --color
Stop after first match
$ grep -m 1 search-word file
-m, –max-count=NUM stop after NUM matches
Only print the 1000th match.
$ grep -m1000 search-word file | tail -n1
Print only filename if match
$ grep -l search-word *.txt
-l, –files-with-matches print only names of FILEs containing matches
It’s useful when you grep lots of files and only care about names of matched files.
Find unmatched files
-L, –files-without-match print only names of FILEs containing no match
-L
is the opposite of -l
option.
It outputs the files which don’t contain the word to search.
$ grep -L search-word *.txt
Show line number of matched lines
$ grep -n search-word file
-n, –line-number print line number with output lines
Don’t output filename when grep multiple files
When grep multiple files, by default filename is included in the output. Like,
$ grep hello *.txt
a.txt:hello
b.txt:hello
Use -h
to not output filenames.
$ grep -h hello *.txt
hello
hello
-h, –no-filename suppress the file name prefix on output
Search in “binary” files
Sometimes, a text file may contains a few non-printable characters, which makes grep consider it as a “binary” file. grep doesn’t print matched lines for a “binary” file.
$ printf "hello\000" > test.txt
$ grep hello test.txt
Binary file test.txt matches
Use -a
to let grep know the file should be seen as a “text” file.
$ grep -a hello test.txt
hello
-a, –text equivalent to –binary-files=text
Search in directories
-r, –recursive like –directories=recurse
-R, –dereference-recursive likewise, but follow all symlinks
Without specifying a directory, grep searches in current working directory by default.
$ grep -R hello
b.md:hello
a.txt:hello
Specify directories.
$ grep -R hello tmp/ tmp2/
tmp/b.md:hello
tmp/a.txt:hello
tmp2/b.md:hello
tmp2/a.txt:hello
–include=FILE_PATTERN search only files that match FILE_PATTERN
Use --include
to tell grep the pattern of the filenames you’re interested in.
$ grep -R hello --include="*.md"
b.md:hello
Ignore case when search
-i, –ignore-case ignore case distinctions
$ grep -i Hello a.txt
hello
HELLO
The pattern to search begins with - (hyphen)
$ grep -- -hello a.txt
-hello
To know what -L
option does.
$ grep --help | grep -- -L
-L, --files-without-match print only names of FILEs containing no match
Use pattern file
-f FILE, –file=FILE Obtain patterns from FILE, one per line. If this option is used multiple times or is combined with the -e (–regexp) option, search for all patterns given. The empty file contains zero patterns, and therefore matches nothing.
$ cat test.txt
111
222
333
$ cat patterns.txt
111
333
$ grep -f patterns.txt test.txt
111
333
NOTE: Do not put an empty line, i.e. a line with \n
only, in the pattern file.
Otherwise, the pattern file would match every line, since every line contains \n
as
its last character. It’s easy to make a mistake to put empty lines in the end of the
pattern file.
Print only count of matching lines
Use -c
, or --count
to print only count of matching lines.
For example, below command line is to find out the count of <OrderLine>
tag in files of current directory.
$ grep "<OrderLine>" -c -R .
It outputs like below.
./order-1.xml:3
./order-2.xml:9
./order-3.xml:1
To sort the output, use command like below.
$ grep "<OrderLine>" -c -R . | sort -t : -k 2
./order-3.xml:1
./order-1.xml:3
./order-2.xml:9
Daily Dev Log: Find Lines in One File but Not in Another
We can use comm
to find lines in one file but not in another file
# fine lines only in file-a
comm -23 file-a file-b
From comm --help
,
-2 suppress column 2 (lines unique to FILE2)
-3 suppress column 3 (lines that appear in both files)
So to find lines exist in both file-a and file-b.
comm -12 file-a file-b
Google keywords: “linux command two file not contain” hit link