The sort
command can be used to sort .csv files by specific columns.
Have an example .csv file like below.
$ cat orders.csv
user,date,product,amount,unit price
user-2,2020-05-11,product-2,2,500
user-3,2020-04-11,product-1,2,600
user-1,2020-06-11,product-3,2,100
user-1,2020-06-21,product-1,6,600
user-1,2020-04-12,product-3,2,100
To sort orders by highest unit price, run the command below.
$ sort -r --field-separator=',' --key=5 -n orders.csv
user-3,2020-04-11,product-1,2,600
user-1,2020-06-21,product-1,6,600
user-2,2020-05-11,product-2,2,500
user-1,2020-06-11,product-3,2,100
user-1,2020-04-12,product-3,2,100
user,date,product,amount,unit price
The --field-separator
option (or -t
) specifies ,
as the field separator character. By default, sort
considers
blank space as the field separator character.
The --key=5
let sort
use the fifth field of lines to sort the lines.
The -n
is to sort numerically, and -r
is to sort in reverse order.
To fix the headers of the .csv file at the very first row after sorting, process substitution can be used.
$ cat <(head -1 orders.csv) \
<(tail -n +2 orders.csv|sort -r --field-separator=',' --key=5 -n)
user,date,product,amount,unit price
user-3,2020-04-11,product-1,2,600
user-1,2020-06-21,product-1,6,600
user-2,2020-05-11,product-2,2,500
user-1,2020-06-11,product-3,2,100
user-1,2020-04-12,product-3,2,100
To sort orders by highest unit price and amount, provide multiple --key
options as below.
$ cat <(head -1 orders.csv) \
<(tail -n +2 orders.csv|sort -r -t ',' -k 5 -k 4 -n)
user,date,product,amount,unit price
user-1,2020-06-21,product-1,6,600
user-3,2020-04-11,product-1,2,600
user-2,2020-05-11,product-2,2,500
user-1,2020-06-11,product-3,2,100
user-1,2020-04-12,product-3,2,100
The format of value of --field-separator
could be a bit more complex.
For example, to sort orders by the day of order date, run the command below.
$ sort -t , -n -k 2.9 orders.csv
user,date,product,amount,unit price
user-1,2020-06-11,product-3,2,100
user-2,2020-05-11,product-2,2,500
user-3,2020-04-11,product-1,2,600
user-1,2020-04-12,product-3,2,100
user-1,2020-06-21,product-1,6,600
The -k 2.9
means for each line sort
uses strings which starts from the ninth position of the second field till the end of the line.
The -k 2.9,5
means for each line sort
only looks at strings which starts from the ninth position of the second field and ends at the last character
of the fifth field.
The -k 2.9,5.2
means sort
only looks at strings which starts from the ninth position of the second field and ends at the second character
of the fifth field.
For more details, check the man sort
.