The contains() Method in Java Collection Is Not "Type Safe"

2020 May 8
Currency currency;
//...
if(currency.getSupportedCountries().contains(country)) {
    //...
}

The Currency.getSupportedCountries() returns a Collection. Originally, the returned Collection was Collection<Country>. The country object in the above if-condition was of type Country. The program has been well tested and worked as expected.

However, due to whatever reason, the getSupportedCountries() is refactored to return a Collection<String>. The Java compiler complains nothing about the refactor. But the if-condition now is never true in any cases, since the equals() method of String has no idea about the equality with Country and vice versa. A bug! It’s hard to detect this kind of bug, if the code is not well covered by unit tests or end-to-end tests.

In this sense, the contains() method in Java Collection is not type safe.

How to Avoid

First, never change the behavior of an API when refactor. In the above case, the signature of the getSupportedCountries() API has changed. This is a breaking change, which usually causes the client code fails to compile. Unfortunately, in above case the client code doesn’t fail fast in the compile phase. It’s better to add new API like getSupportedCountryCodes() which returns a Collection<String>, and @Deprecated the old API, which can be further deleted some time later.

Second, make code fully covered by test cases as much as possible. Test cases can detect the bug earlier in the test phase.

Why contains() Is Not Generic

Why contains() is not designed as contains(E o), but as contains(Object o)? There are already some discussion on this design in StackOverflow, like this one and this one. It’s said it’s no harm to let methods like contains() in Collection be no generic. Being no generic, the contains() can accept a parameter of another type, which is seen as a “flexible” design. However, the above case shows that this design does have harm and cannot give developers enough confidence.

A method accepting a parameter of Object means it accepting any type, which is too “dynamic” for a “static” language.

Another question is why a static language needs a “root” Object?

Daily Dev Log: Avoid the Pitfall of Using the Same File to Redirect Input and Output

2019 Jan 15

Pitfalls

Do Not Use the Same File to Redirect Input and Output

tr -d '\015' <DOS-file >DOS-file

The above command will delete all content in the file!

From man bash,

[n]>word, if it does exist it is truncated to zero size.

(How did I find the file back? Luckily, the working directory is managed by Dropbox, and I found it back in the Dropbox.)

CLI

Convert Line Endings from DOS/Windows Style to Unix/Linux Style

tr -d '\015' <DOS-file >UNIX-file

(For what character \015 is, see man 7 ascii or ascii '\015' if the ascii command is installed.)

More ascii Command Examples

$ ascii '\r'
ASCII 0/13 is decimal 013, hex 0d, octal 015, bits 00001101: called ^M, CR
Official name: Carriage Return
C escape: '\r'
Other names: 

Search Manuals

-k Search the short descriptions and manual page names for the keyword

$ man -k ascii
ascii (1)            - report character aliases
ascii (7)            - ASCII character set encoded in octal, decimal, and hexadecimal
...

Miss Newline Characters When "cat" Text Files

2019 Jan 4

The cat is often used to concatenate text files into one single file. In most cases, the cat works fine like below.

$ echo line 1 > file1.txt
$ echo line 2 > file2.txt
$ cat file{1,2}.txt
line 1
line 2

However, if some of files to be concatenated don’t end with the newline character, using cat to concatenate files may not generate expected file.

# -n, let echo not add the trailing newline character
$ echo -n line 1 > file1.txt
$ echo line 2 > file2.txt
$ cat file{1,2}.txt
line 1line 2

Note that in the above example, file1.txt doesn’t end with newline, so when two files concatenated there is no newline between them. This may not be the expected result. For example, we have multiple large text files. Every line in each file is a user ID. We want to concatenate these files into one file to be fed into a processing program at once. If some of files are not ended with newline, using cat may generate ill user IDs like user-id-foouser-id-bar. If the input volume is huge, these problematic IDs usually would not be detected by human eyes.

If the newlines between files is important in your case, using awk is safer.

# -n, let echo not add the trailing newline character
$ echo -n line 1 > file1.txt
$ echo line 2 > file2.txt
$ $ awk 1 file{1,2}.txt
line 1
line 2

See this SO answer.

Also, it’s a good idea to tune text editors to always show non-printable characters like the newline. Or, use cat -e, which prints invisible characters and a $ for the newline.

$ cat -e file1.txt | tail -1

在文本编辑器里显示空白字符

2016 Oct 28

下午同事遇到一个bug,数据库始终连接不上。 网络检查正常(用数据库的client可以正常连接)、配置文件也是“正常的”(和其他可以正确连接的同事的配置“一摸一样”)。 最后发现是配置文件中的数据库密码的结尾多了几个空格😓。

所以最好在编辑器/IDE里显示空白字符。绝大部分编辑器都是支持空白字符显示的,包括Vim。

另一个空格相关的bug