@Transactional(readOnly=true) with Hibernate, MaxScale and Galera

Feb 16, 2022

Conclusion

Some extra configuration is required when using Spring Boot @Transactional(readOnly=true) annotations together with Hibernate, MaxScale and Galera. Transactions otherwise become readwrite and all traffic is sent to to the master MariaDB node. This can be solved by opening an extra MaxScale port that's read only and using a more advanced JDBC url.

The JDBC url should look something like this:
jdbc:mariadb:replication://maxscale:3306,maxscale:3308/database?assureReadOnly=true

The additional MaxScale configuration should look something like this:

[Read-Only-Service]
type=service
router=readconnroute
servers=master,slave1,slave2
router_options=slave

[Read-Only-Listener]
type=listener
service=Read-Only-Service
port=3308

The Use Case

Like a true hacker, I want to make the following technology play nice together (while using rad stock photos):

Spring Boot: Specifically when using the @Transactional(readOnly=true) annotation.
Hibernate ORM: https://hibernate.org/orm/
MariaDB Connector/J: https://mariadb.com/kb/en/about-mariadb-connector-j/
MaxScale: https://mariadb.com/kb/en/maxscale/
Galera: https://galeracluster.com/

A Galera MariaDB cluster can be used for high availability as well as scalability. I focus on scalability in this case. I want to leverage the slaves in the Galera cluster for read only SQL statements.

I have a MaxScale database proxy in front of the Galera cluster. The goal with that is decoupling the database infrastructure from our Java application development. For example I can add more slaves to the Galera cluster without also having to add them to the JDBC urls scattered all over our micro services.

I will be using MariaDB Connector/J as the JDBC implementation to establish connection between Java and MaxScale. That's the suggested way since MaxScale is also a MariaDB product.

Finally I'll be using Spring Boot and Hibernate. I want to use the convenient Spring annotation @Transactional(readOnly=true). Reads should then go to the slave nodes in the Galera cluster.

I want to use Spring Data JPA rather than Hibernate explicitly. This is the common way to do it. If you pick "Spring Data JPA" at https://start.spring.io/ you are actually using Hibernate under the hood:

screenshot of https://start.spring.io highlighting the dependency Spring Data JPA

The Problem

All SQL statements will be sent to the master Galera node in the naive setup.

You'd think that this JDBC url would be enough (but it isn't):
jdbc:mariadb://maxscale:3306/database

The getting started guides, such as https://mariadb.com/docs/deploy/maxscale-mariadbmon-readwritesplit/, describes how to set up MaxScale with the readwritesplit router. This router looks at the SQL and tries to discern if it's read or write and routes to master or slave accordingly.

That sounds simple enough, but for some reason it goes wrong and all is sent to the master node 😕.

Software Layers

skyscraper as image for multiple layers of software

We have multiple layers software trying to collaborate here. Somewhere along the way the information is lost that we want read only traffic. To understand how and why that happens I will look closer at the different layers and their responsibilities.

Software Layer: Spring

Spring supplies us with the annotation @Transactional(readOnly=true). According to their documentation this annotation is a big deal: "Comprehensive transaction support is among the most compelling reasons to use the Spring Framework" (https://docs.spring.io/spring-framework/docs/4.2.x/spring-framework-reference/html/transaction.html).

This annotation is part of the Spring Framework transaction abstraction. This abstraction has multiple implementations. Spring does not do persistance itself. In our case we will be using the Hibernate implementation.

Hibernate is it's own library and Spring uses that of course, but there's also the glue on the Spring side, such as: org.springframework.orm.hibernate5.HibernateTransactionManager.

Setting the readOnly=true in @Transactional(readOnly=true) sends some hints to Hibernate and it's underlying DataSource. The most interesting part is setReadOnly(true) will be called on the underlying JDBC java.sql.Connection.

Software Layer: Hibernate

Hibernate ORM does all of the Object Relational Mapping.

Hibernate itself does not care that much about @Transactional(readOnly=true). Some optimizations are in place https://vladmihalcea.com/spring-read-only-transaction-hibernate-optimization/, but when it comes to transaction handling Hibernate does not do as much as you might think.

Hibernate just uses java.sql.Connection#setAutoCommit(false) to start transactions. That results in the SQL SET autocommit=0. I personally expected Hibernate to do START TRANSACTION or START TRANSACTION READ ONLY but that is NOT at all the case. And Hibernate does NOT do any java.sql.Connection#setReadOnly(true/false), that read only state setting on the JDBC connection is rather handled by Spring as written above.

Software Layer: MariaDB Connector/J

The JDBC implementation for MariaDB is called MariaDB Connector/J. Or in Maven terminology something like:

<dependency>
    <groupId>org.mariadb.jdbc</groupId>
    <artifactId>mariadb-java-client</artifactId>
</dependency>

And its job is to implement the interface like: public class MariaDbConnection implements java.sql.Connection.

So how does it handle java.sql.Connection#setReadOnly(true/false)? That is the key question.

Turns out that setReadOnly does two things:

Choose the target database node when in replication mode.
Potentially execute SET SESSION TRANSACTION READ ONLY to make the connection read only.

When you run in a master slave database environment you should apparently use a mode called replication (https://mariadb.com/kb/en/about-mariadb-connector-j/#failover-and-load-balancing-modes) and those JDBC urls look something like: jdbc:mariadb:replication://master:3306,slave1:3306,slave2:3306/database. And .readOnly(true) would make it pick one of the slaves. But in our case we use a MaxScale proxy that hides the master and slaves behind... and I'll get into that later.

The connection MAY be made read only by executing SET SESSION TRANSACTION READ ONLY. Interestingly that's NOT done by default in versions 1.X and 2.X of MariaDB Connector/J. For it to happen you must both use replication mode and use the special flag assureReadOnly like: jdbc:mariadb:replication://master:3306,slave1:3306,slave2:3306/database?assureReadOnly=true

https://mariadb.com/kb/en/about-mariadb-connector-j/#failover-and-load-balancing-parameters: assureReadOnly: When this parameter enabled when a Failover and Load Balancing Mode is in use, and a read-only connection is made to a host, assure that this connection is in read-only mode by setting the session to read-only. Default to false.

So notably here we lose information about the read only state for this first time. It happens since we are using MaxScale to hide our replication infrastructure from the Java application, and so the MariaDbConnection throws all kinds of information away.

How could we go about solving that? Perhaps by repeating the one and only MaxScale host as both master and slave in the JDBC url like this? jdbc:mariadb:replication://maxscale:3306,maxscale:3306/database?assureReadOnly=true

SPOILER: That still does not work 😱.

Software Layer: MaxScale

So MaxScale is an intelligent database proxy. But it's not intelligent enough for the URL jdbc:mariadb:replication://maxscale:3306,maxscale:3306/database?assureReadOnly=true to work.

In fact the java.sql.Connection will be made read only in the sense that SET SESSION TRANSACTION READ ONLY is executed. But MaxScale version 6 is not smart enough. So I wrote a 🐛 bug report about that on their Jira https://jira.mariadb.org/browse/MXS-3980 and it turns out that MaxScale 7 (yet to be released) will handle this.

So what can we do for now?

We can set up a second MaxScale read only listener. Listeners in MaxScale are basically ports. The default port for MySQL/MariaDB is 3306. In most examples I've found 3308 seem to be used for read only ports.

Here's an example in the official GitHub repository: https://github.com/mariadb-corporation/MaxScale/blob/6.2/docker/maxscale.cnf.d/example.cnf

So the additional MaxScale configuration should look something like this:

[Read-Only-Service]
type=service
router=readconnroute
servers=master,slave1,slave2
router_options=slave

[Read-Only-Listener]
type=listener
service=Read-Only-Service
port=3308

I'm not an expert but it seems wise to include the master among the servers and use router_options=slave since the master will only be used as a last resort if all slaves are down. https://mariadb.com/kb/en/mariadb-maxscale-25-readconnroute/ A server assigned as a slave of a master. If all slaves are down, but the master is still available, then the router will use the master.

So will a JDBC url like this do the trick? jdbc:mariadb:replication://maxscale:3306,maxscale:3308/database?assureReadOnly=true

YES! Fu***n finally! That works 🎉!

Software Layer: Galera

So apparently Galera is not to blame here.

Though we need to be mindful that Galera supports automatic failover. If the master goes down a slave is promoted and we must configure MaxScale to support that.

A big point with the Galera cluster in this story is the scalability. In short the purpose of having slaves is to read from them, and now we can. Even the SET SESSION TRANSACTION READ ONLY will reach down to the slave and allow for optimizations. A readwrite transaction can't be as well optimized as a readonly transaction, even if it turns out to only contain SELECT statements.

Recent and Future Software Versions

A new version 3.X of MariaDB Connector/J was recently released. In the release notes https://mariadb.com/kb/en/mariadb-connector-j-303-release-notes/ it turns out that option assureReadOnly has now been removed. It defaults to true now, but you must use a replication:// JDBC url for it to kick in.

So the URL you want would look like this on 3.X: jdbc:mariadb:replication://maxscale:3306,maxscale:3308/database

In the future when MaxScale 7 is released we no longer need the second port 3308 and the URL becomes even simpler: jdbc:mariadb:replication://maxscale:3306,maxscale:3306/database

Debugging Tools

Assuming that you (yes you! the reader!) has a similar problem with you application, what are some good tools for debugging the problem?

The IntelliJ debugger is always great, but additionally I found that dedicated REST endpoints for testing and Wireshark was very useful.

Debugging Tool: Dedicated REST Endpoints

The gist of the idea is to create endpoints with various @Transactional annotations that execute the SQL: SELECT @@hostname;.

We can then curl against these endpoints a few times each and see that they use the slave nodes in the Galera cluster when we expect them to.

I have some sample code below you can take inspiration from. It's just three files:

TransactionalHostnameController.java

package tech.olof.troubleshooting;

import lombok.RequiredArgsConstructor;
import org.springframework.security.access.annotation.Secured;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

/**
 * Controller and it's corresponding service are meant for troubleshooting with curl by humans:
 * curl "http://localhost:8080/troubleshooting/transactionalhostname/v1/none"
 * curl "http://localhost:8080/troubleshooting/transactionalhostname/v1/readonly"
 * curl "http://localhost:8080/troubleshooting/transactionalhostname/v1/readwrite"
 */
@RestController
@RequestMapping(value = "/troubleshooting/transactionalhostname/v1")
@RequiredArgsConstructor
public class TransactionalHostnameController {
    private final TransactionalHostnameService transactionalHostnameService;

    @GetMapping("/none")
    public String getWithNone() {
        return transactionalHostnameService.getWithNone();
    }

    @GetMapping("/readonly")
    public String getWithReadOnly() {
        return transactionalHostnameService.getWithReadOnly();
    }

    @GetMapping("/readwrite")
    public String getWithReadWrite() {
        return transactionalHostnameService.getWithReadWrite();
    }
}

TransactionalHostnameService.java

package tech.olof.troubleshooting;

public interface TransactionalHostnameService {
    String getWithNone();
    String getWithReadOnly();
    String getWithReadWrite();
}

TransactionalHostnameServiceImpl.java

package tech.olof.troubleshooting;

import org.springframework.stereotype.Component;
import org.springframework.transaction.annotation.Propagation;
import org.springframework.transaction.annotation.Transactional;

import javax.persistence.EntityManager;
import javax.persistence.PersistenceContext;
import javax.persistence.Query;

@Component
public class TransactionalHostnameServiceImpl implements TransactionalHostnameService {
    @PersistenceContext
    private EntityManager entityManager;

    private String getHostname() {
        Query nativeQuery = entityManager.createNativeQuery("SELECT @@hostname;");
        return (String) nativeQuery.getSingleResult();
    }

    @Transactional(propagation = Propagation.NEVER)
    @Override
    public String getWithNone() {
        return getHostname();
    }

    @Transactional(propagation = Propagation.REQUIRED, readOnly = true)
    @Override
    public String getWithReadOnly() {
        return getHostname();
    }

    @Transactional(propagation = Propagation.REQUIRED, readOnly = false)
    @Override
    public String getWithReadWrite() {
        return getHostname();
    }
}

You could also make yourself a bash script to do the curl calls:

#!/usr/bin/env bash

BASEURL="http://localhost:8080/troubleshooting/transactionalhostname/v1"

function test-endpoint {
    local ENDPOINT="${1}"
    echo "ENDPOINT: ${ENDPOINT}"
    for i in {1..10}; do
        local RESULT="$(curl --silent "${BASEURL}/${ENDPOINT}")"
        echo "${RESULT}"
    done
    echo ""
}

test-endpoint "none"
test-endpoint "readonly"
test-endpoint "readwrite"

And hopefully output would look something like this:

ENDPOINT: none
slave2
slave2
slave2
slave2
slave2
slave2
slave1
slave2
slave2
slave1

ENDPOINT: readonly
slave1
slave1
slave1
slave1
slave2
slave2
slave2
slave2
slave1
slave1

ENDPOINT: readwrite
master
master
master
master
master
master
master
master
master
master

Debugging Tool: Wireshark

🦈 With Wireshark we can see the actual SQL queries sent to the server.

Notably https://github.com/ttddyy/datasource-proxy is NOT enough in this case (otherwise big fan of that one). We must see the SQL queries sent by the JDBC layer and datasource-proxy is too high up in the software layers to catch that.

The command line tool is called tshark and can be installed on macOS with:

brew install wireshark

Next you must figure out which network interface to listen to. It's probably en0.

This command is useful to figure that out on macOS:

networksetup -listallhardwareports

Anyhow the tshark command goes like this:

tshark -i "en0" -Y "mysql.command==3" -T "fields" -e "mysql.query" > output.txt

-i "en0": This is where you say which interface wireshark should listen at.
-Y "mysql.command==3" adds a filter to only see the interesting mysql commands.
-T "fields" choses the simple output format "fields".
-e "mysql.query" and the only "field" we want is the query itself.

"Where's my SET SESSION TRANSACTION READ ONLY statements?" you might be wondering. Perhaps you start the wireshark capture too late? You should start wireshark before your Spring Boot application. Since there's connection pools at play the connections often get's created earlier than you might think. Directly att Spring Boot application startup.

Okay, thanks, bye.

Thank you for reading!
Remember to hit like 💖 and subscribe 🔔 if you liked this content.
Please leave a comment in the section below.

(jk, no such features on this blog fam)