Best practices for turning Java objects to byte streams and back again to Java objects.
Check the rest of the series here.
Item 85: Prefer alternatives to Java serialization
“[Serialization] was a horrible mistake in 1997.“ – Mark Reinhold at the Ask The Architect Interview at Devoxx Belgium 2018.
The problem with Java serialization is in fact not the serialization part, but the deserialization part. The issue is that when you invoke
readObject on an
ObjectInputStream class, you are able to instantiate objects of almost any type on the classpath, so long as the type implements the
Serializable marker interface. In the process, this method can execute code from any of those types.
“The best way to avoid serialization exploits is never to deserialize anything.“
“There is no reason to use Java serialization in any new system you write.” Instead, use cross-platform structured-data representations like JSON or Protobuf.
If for any reason Java serialization can not be avoided you should “never deserialize untrusted data“ and should use object deserialization filtering added in Java 9 (
java.io.ObjectInputFilter) which allows us to blacklist or whitelist a set of desired types. Prefer whitelisting because that way you allow only the known types and reject everything else.
Item 86: Implement
Serializable with great caution
“A major cost of implementing Serializable is that it decreases the flexibility to change a class’s implementation once it has been released.”
Serialized form (or byte-stream encoding) of a class essentially becomes part of the exported API. This is especially dangerous if you use default Java serialization which actually means that private or package-private instance fields effectively become part of your public API. If you change some of the class’s internals and serialize it, clients trying to deserialize it with the old version will fail. This is even worse if you do not specify your own
The cost of testing also increases with new releases of
Serializable class, because now you also have to test does the serialization-deserialization process succeeds between the new and the old release. Implementing
Serializable also means opening up security holes mentioned in Item 85.
“Inner classes should not implement Serializable” because of compiler-generated synthetic fields that reference enclosing instances and local variables from enclosing scopes. The default serialized form is not clearly defined.
Static member classes can implement
Item 87: Consider using a custom serialized form
Default Java serialization should not be your default choice. Only when the default serialized form is almost identical to our custom form could be accepted.
“The default serialized form is likely to be appropriate if an object’s physical representation is identical to its logical content.” This means that if you have some implementation detail in your class that does not represent the logical data that the client uses and cares about, you should not rely on off-the-shelf Java serialization. All of that object internals will end up in the byte-stream encoding (serialized form). Going this way will make that implementation detail effectively part of your public API and you can not change them without braking the whole serialization-deserialization process (see Item 86). But, even with this in place, you should provide a
readObject method to ensure object invariants and security. The best thing to do regarding this is to make sure that every field that can be declared as transient should be. Keep in mind that all of these fields will be initialized to their default values when the instance is deserialized. Again, provide
readObject to set proper values if the defaults are not acceptable.
To apply your custom serialization logic simply provide private instance methods
writeObject to your class. Although private,
writeObject is essentially a public API method and it should be documented as such. Use
@serial tags to tell Javadoc utility to place this documentation on the serialized forms page. Also, if necessary,
writeObject method should be marked as synchronized to ensure that the correct object state is preserved.
“Regardless of what serialized form you choose, declare an explicit serial version UID in every serializable class you write.” A private static final
serialVersionUID field can be some random Long value and doesn’t have to be unique. If you already have serialized form and you want to use the same autogenerated serialVersionUID value for that old version for the new version, you can use serialver CLI utility on the old version of the class to get that computed value.
“Do not change the serial version UID unless you want to break compatibility with all existing serialized instances of a class.”
Item 88: Write readObject methods defensively
Just as you would check constructor arguments for validity and make defensive copies of its parameters where necessary, the same thing needs to be done with
readObject method. The reason is that
readObject is effectively public API, it is something like a constructor that receives only a byte stream as its parameter. If a
readObject method fails to do this, class invariants can easily be violated.
“When an object is deserialized, it is critical to defensively copy any field containing an object reference that a client must not possess.” Every immutable serializable class (or generally, if an object reference field has to remain private) should make defensive copies of its mutable fields in the
readObject method. It is basically the same thing as we would do in normal constructors or getter methods when clients of a class must not get the reference of the internal field. An important thing to note is that, unfortunately, these fields can not be final, because it would not be possible to update them in
readObject method with the new references of the defensive copies. If copied fields are valid the throw
“To summarize, anytime you write a readObject method, adopt the mindset that you are writing a public constructor that must produce a valid instance regardless of what byte stream it is given. “
Item 89: For instance control, prefer enum types to readResolve
If the class that is implemented as a singleton implements
Serializable interface it stops being singleton. The reason is that the
readObject method, whether explicit or default, will return newly created instance which will not be the same as the one that was created by our singleton implementation at initialization time.
One way to deal with this is to utilize
readResolve feature which allows us to substitute newly created instance by
readObject with another one. If a class provides
readResolve method with proper declaration it will be invoked on a newly created object after it is deserialized and the object reference returned by this method will be returned instead of a newly created one by
“If you depend on readResolve for instance control, all instance fields with object reference types must be declared transient”. The reason is that using non-transient fields it is possible for a skilled attacker with a tricky technique to get the reference of a deserialized object before
readResolve is invoked. Also, as you will ignore deserialized object altogether and return the one created when the class was initialized, you do not need these fields to be serialized.
A better and preferred way to handle serializable instance-controlled class (singletons) is by implementing it as a single-element enum type. This way, Java will guarantee that there can not be another instance besides the declared constant.
Item 90: Consider serialization proxies instead of serialized instances
To avoid serialization downsides mentioned throughout this chapter you can use a technique known as serialization proxy pattern. This pattern is simple and powerful. The idea is that you do not want to serialize your full-blown class with all the implementation details and other stuff, but some other simple data object that only represents the logical state of your class. You can view this object as the serialization DTO object of your main class.
The way you implement this pattern is by providing a private static nested class that will represent the logical state of its enclosing type (your main class). It is this nested class that will actually be serialized. The nested class should have a constructor that receives the enclosing type so it can copy the fields it needs and, as your main class, has to implement
Serializable interface. The way you tell the serialization mechanism to serialize this other object is by providing a private
writeReplace method in the enclosing class. So when the serialization process starts on your main class, this method will be invoked and there you create and return your logical representation nested class which then will be serialized. Finally, just provide
readResolve method in the nested class so can you create and return the instance of the enclosing type. Remember,
readResolve method is invoked on the deserialized object and allows us to return some other object instead of the deserialized one.
Generally when you need to provide
writeObject methods and you have some nontrivial invariants use this pattern.
You liked this, then you can share it so others can like it as well 🙂