grep Command Examples
- Stop after first match
- Print only filename if match
- Find unmatched files
- Show line number of matched lines
- Don’t output filename when grep multiple files
- Search in “binary” files
- Search in directories
- Ignore case when search
- The pattern to search begins with - (hyphen)
- Use pattern file
- Print only count of matching lines
First, grep –help lists most of its options, which is the go-to command for most grep questions.
Like most CLI tools, options of grep can be combined.
For example, -io
is same as -i -o
, -A3
is same as -A 3
.
Also, the options can be anywhere in the command.
$ grep hello a.txt -i --color
Stop after first match
$ grep -m 1 search-word file
-m, –max-count=NUM stop after NUM matches
Only print the 1000th match.
$ grep -m1000 search-word file | tail -n1
Print only filename if match
$ grep -l search-word *.txt
-l, –files-with-matches print only names of FILEs containing matches
It’s useful when you grep lots of files and only care about names of matched files.
Find unmatched files
-L, –files-without-match print only names of FILEs containing no match
-L
is the opposite of -l
option.
It outputs the files which don’t contain the word to search.
$ grep -L search-word *.txt
Show line number of matched lines
$ grep -n search-word file
-n, –line-number print line number with output lines
Don’t output filename when grep multiple files
When grep multiple files, by default filename is included in the output. Like,
$ grep hello *.txt
a.txt:hello
b.txt:hello
Use -h
to not output filenames.
$ grep -h hello *.txt
hello
hello
-h, –no-filename suppress the file name prefix on output
Search in “binary” files
Sometimes, a text file may contains a few non-printable characters, which makes grep consider it as a “binary” file. grep doesn’t print matched lines for a “binary” file.
$ printf "hello\000" > test.txt
$ grep hello test.txt
Binary file test.txt matches
Use -a
to let grep know the file should be seen as a “text” file.
$ grep -a hello test.txt
hello
-a, –text equivalent to –binary-files=text
Search in directories
-r, –recursive like –directories=recurse
-R, –dereference-recursive likewise, but follow all symlinks
Without specifying a directory, grep searches in current working directory by default.
$ grep -R hello
b.md:hello
a.txt:hello
Specify directories.
$ grep -R hello tmp/ tmp2/
tmp/b.md:hello
tmp/a.txt:hello
tmp2/b.md:hello
tmp2/a.txt:hello
–include=FILE_PATTERN search only files that match FILE_PATTERN
Use --include
to tell grep the pattern of the filenames you’re interested in.
$ grep -R hello --include="*.md"
b.md:hello
Ignore case when search
-i, –ignore-case ignore case distinctions
$ grep -i Hello a.txt
hello
HELLO
The pattern to search begins with - (hyphen)
$ grep -- -hello a.txt
-hello
To know what -L
option does.
$ grep --help | grep -- -L
-L, --files-without-match print only names of FILEs containing no match
Use pattern file
-f FILE, –file=FILE Obtain patterns from FILE, one per line. If this option is used multiple times or is combined with the -e (–regexp) option, search for all patterns given. The empty file contains zero patterns, and therefore matches nothing.
$ cat test.txt
111
222
333
$ cat patterns.txt
111
333
$ grep -f patterns.txt test.txt
111
333
NOTE: Do not put an empty line, i.e. a line with \n
only, in the pattern file.
Otherwise, the pattern file would match every line, since every line contains \n
as
its last character. It’s easy to make a mistake to put empty lines in the end of the
pattern file.
Print only count of matching lines
Use -c
, or --count
to print only count of matching lines.
For example, below command line is to find out the count of <OrderLine>
tag in files of current directory.
$ grep "<OrderLine>" -c -R .
It outputs like below.
./order-1.xml:3
./order-2.xml:9
./order-3.xml:1
To sort the output, use command like below.
$ grep "<OrderLine>" -c -R . | sort -t : -k 2
./order-3.xml:1
./order-1.xml:3
./order-2.xml:9
Daily Dev Log: Find Lines in One File but Not in Another
We can use comm
to find lines in one file but not in another file
# fine lines only in file-a
comm -23 file-a file-b
From comm --help
,
-2 suppress column 2 (lines unique to FILE2)
-3 suppress column 3 (lines that appear in both files)
So to find lines exist in both file-a and file-b.
comm -12 file-a file-b
Google keywords: “linux command two file not contain” hit link
Inject a Method Interceptor in Guice
I recently made a mistake to new an object (MethodInterceptor
) in an plain old way in a Guice configuration,
which caused the object’s @Inject
-annotated fields, like Logger
for example, were initialized with null
value.
public class FooInterceptor implements MethodInterceptor{
@Inject
private Logger logger;
@Override
public Object invoke(MethodInvocation invocation) throws Throwable {
// NPE if logger not inject correctly
logger.info("start to invoke");
return invocation.proceed();
}
}
public class BarModule extends AbstractModule {
@Override
protected void configure() {
bindInterceptor(Matchers.subclassesOf(OrderApi.class),
Matchers.any(),
// new the interceptor in the plain old way
new FooInterceptor());
}
}
Solution
Guice wiki clearly states that requestInjection()
should be used for injection of a “method interceptor”.
How do I inject a method interceptor?
In order to inject dependencies in an AOP MethodInterceptor, use requestInjection() alongside the standard bindInterceptor() call.
public class NotOnWeekendsModule extends AbstractModule {
protected void configure() {
MethodInterceptor interceptor = new WeekendBlocker();
// for injection of a "method interceptor"
requestInjection(interceptor);
bindInterceptor(any(), annotatedWith(NotOnWeekends.class), interceptor);
}
}
Some Thoughts
Once you decide to use a dependency injection framework, like Guice here, please DO NOT new Java objects in the plain old way any more. Otherwise, you may very possibly fail to set up the objects’ dependencies correctly.
Secondly, use constructor-injection for mandatory dependency, like Logger
here.
It’s impossible to forget to inject a dependency using constructor-injection, even if that object is constructed in the plain old way.
(However, too many dependencies injected via the constructor makes the constructor look a bit ugly.)
Browsers Ignore Change in Hosts File
I modified C:\Windows\System32\drivers\etc\hosts
for a local test.
However, the browser did not respect to the change in hosts file.
At last, I found it’s due to the proxy settings in my machine.
Change in hosts took effect once after I unchecked all proxy configuration in
Control Panel -> Internet Options -> Connections -> LAN settings.
If unfortunately in your network, connections are proxied by force (for example, in a corporate network), you can try to let the proxy bypass some domains by adding the domains into LAN settings -> Proxy server -> Advanced -> Exceptions.
OneFeed is Serving HTTPS
Now OneFeed is living on https. Its certificate is signed by Let’s Encrypt, “a free, automated, and open Certificate Authority”. Using Let’s Encrypt’s ACME (Automatic Certificate Management Environment) protocol and a client of the protocol, requesting and renewing site certificate is just done automatically.
ACME protocol defines serveral challenges which a protocol client can use to prove it (the host running the client) owns the domain. Also the protocol defines how to request, renew, and revoke certificates. With clear definition of interaction with ACME server (CA) and client (your site), all process can be automated. Certbot is a recommended ACME client.
To set up https on this site, I use this great post as a reference. Basic steps are:
- use a certbot docker image to get certificate from Let’s Encrypt for the first time.
- update configuration of web server using the certificate.
- set up a cron job to auto-renew the certificate.
Use Docker Images to Get Certificate
Certbot is in active development. Use the certbot docker image (by default latest image), so that we don’t bother ourselves with updating certbot to newest version. And use the nginx docker image to set up a basic web server to fulfill ACME challenges, so that our production web server’s configuration gets untouched when requesting a certificate. (Also plus since OneFeed is already living within docker containers, using docker/docker-compose is an easy decision.) The containers used in this step are discarded/cleaned up as soon as certificate fetched for the first time.
Update Configuration of the Production Web Server
Just google how to set up https on the web server. For OneFeed, https is set up on a nginx server.
Set Up a Cron Job to Renew and Reload/Restart Web Server
The reference post uses docker kill --signal=HUP production-nginx-container
to send signal to nginx container’s nginx process for server reloading.
However, since OneFeed is not using plain nginx container, but a passenger-docker,
therefore using docker-compose restart
to reload certificate instead.
0 23 * * * docker run --rm -it --name certbot-renew \
-v /CERTBOT_VOLUME/etc/letsencrypt:/etc/letsencrypt \
-v /CERTBOT_VOLUME/var/lib/letsencrypt:/var/lib/letsencrypt \
-v /CERTBOT_VOLUME/data/letsencrypt:/data/letsencrypt \
-v /CERTBOT_VOLUME/var/log/letsencrypt:/var/log/letsencrypt \
certbot/certbot renew --webroot -w /data/letsencrypt --quiet \
&& cd YOUR_DOCKER_COMPOSE_WORKING_DIR \
&& docker-compose restart
X-Forwarded-For, Forwarded, X-Real-IP and Nginx
X-Forwarded-For
Http header X-Forwarded-For can be used to get the IP address of the REAL client, especially in a network with proxies and load balancers.
The
X-Forwarded-For
(XFF) header is a de-facto standard header for identifying the originating IP address of a client connecting to a web server through an HTTP proxy or a load balancer. When traffic is intercepted between clients and servers, server access logs contain the IP address of the proxy or load balancer only. To see the original IP address of the client, the X-Forwarded-For request header is used.
The syntax is,
X-Forwarded-For: <client>, <proxy1>, <proxy2>
X-Forwarded-For: 203.0.113.195, 70.41.3.18, 150.172.238.178
When a Http request flows through a proxy, the proxy appends its IP address to X-Forwarded-For
header
(if it respects this header).
Forwarded
However, since X-
headers are not recommended anymore,
Custom proprietary headers can be added using the ‘X-‘ prefix, but this convention was deprecated in June 2012.
a standardized and enhanced header, Forwarded, is introduced.
# the original request is from 192.0.2.60, and passed through proxy 203.0.113.43
Forwarded: for=192.0.2.60; proto=http; by=203.0.113.43
# client can also append some obfuscated identifier like "secret" here, server can
# then use it validate the integrity of a client.
Forwarded: for=23.45.67.89;secret=egah2CGj55fSJFs, for=10.1.2.3
X-Real-IP
Another somehow relevant header is X-Real-IP
, which contains a single IP.
You may find it, for example, somewhere in Nginx docs (ngx_http_proxy_module doc,
ngx_http_realip_module doc).
在Windows上安装tmux
Windows上的Git BASH提供了大部分常用的Linux命令行工具,比如grep、sed等,但是并没有提供tmux。 实际上Git for Windows提供了包管理(package management)功能,
Git for Windows is based on MSYS2 which bundles Arch Linux’ Pacman tool for dependency management.
借助pacman,Git for Windows可以安装额外的命令行工具,比如tmux。 但是,在Git BASH里,pacman并没有默认开启,
This is intended. We do not ship pacman with Git for Windows. If you are interested in a fully fledged package manager maintained environment you have to give the Git for Windows SDK a try.
需要安装Git for Windows SDK来开启pacman。 安装好之后,打开Git SDK(和Git Bash一样,是一个终端模拟器),
$ pacman -Ss tmux
会找到两个包,
msys/tmux 2.6-1
A terminal multiplexer
msys/tmux-git 2.5.94.g73b9328c-1
A terminal multiplexer
$ pacman -S msys/tmux-git
安装的时候可能会报下面的错误,
$ pacman -S msys/tmux
warning: database file for 'git-for-windows-mingw32' does not exist
error: failed to prepare transaction (could not find database)
打开/etc/pacman.conf
文件,注释掉下面的行即可,
#[git-for-windows]
#Server = https://wingit.blob.core.windows.net/x86-64
#[git-for-windows-mingw32]
#Server = https://wingit.blob.core.windows.net/i686
安装好之后,就可以在Windows上(Git SDK)使用tmux了。
pacman的用法可参见Git for Windows的Wiki。
环境:Windows 10
(如果发现某些程序,比如ssh,报错,可以尝试用pacman -Syu
升级所有package。)
删除Arrays.asList返回的列表的元素会发生异常
对Arrays.asList(...)
返回的List进行remove()
/removeAll()
操作时,会抛出UnsupportedOperationException
异常。
据Arrays.asList(...)
的javadoc,这个方法返回的List实现类是基于数组的。
Returns a fixed-size list backed by the specified array. (Changes to the returned list “write through” to the array.)
这个List
实现类是Arrays
类的一个私有静态类,所有方法基本上只是简单地代理到内部的一个数组成员E[]
。
数组是不支持删除操作的,所以remove()
会抛异常。
实际上对所有基于数组的List
实现类最好都不要进行删除操作。
ArrayList
虽然支持remove()
,但是remove()
的实现会导致内部数组的拷贝“平移”,影响效率。
Guava Cache异步刷新的一个实现
Guava的cache提供了refresh功能。 在指定的时间间隔后,guava可以(惰性地/lazily)更新缓存。 默认的refresh实现是同步的(synchronously),一个线程在更新缓存时,其它线程会等待。 具体见LoadingCache和CacheLoader的javadoc。
LoadingCache
void refresh(K key)
… Loading is asynchronous only if CacheLoader.reload(K, V) was overridden with an asynchronous implementation.
CacheLoader
public ListenableFuture<V> reload(K key, V oldValue) throws Exception
… This implementation synchronously delegates to load(K).
下面提供了一个异步的CacheLoader实现,使得一个线程在refresh缓存时,其它线程可以不必等待,继续使用旧的缓存值。