Beware of the Reconnect Using CMAN

Sven Illert -

The Oracle Connection Manager (CMAN) is a useful addition to your Oracle ecosystem that can fulfill many purposes. It can act as a proxy between different networks, serve as a traffic director, or simply simplify your network configuration on the client side. The latter may be the case if your application is not able to handle Data Guard connections properly, as CMAN can hide this complexity.

I have set up many configurations where two connection managers operate behind a load balancer and serve traffic to two or more Exadata systems in the backend, where databases are protected by Data Guard across multiple data centers or regions.

When it comes to configuring the connection manager, you normally have two options for how CMAN can find the database.

  1. Configure the listener_networks parameter in the database itself and specify the address of the connection manager as a remote listener.

  2. Configure the connection manager with the next_hop parameter so that CMAN knows where to look for the requested service names.

Approach No. 1 is the most common and well-known, but it can become quite configuration-heavy when you have many databases. With many databases, it is also more prone to errors. The second approach is easier to set up if you have a limited number of systems working in the backend—for example, a typical MAA architecture with two Exadata systems.

For the latter approach, you would normally specify the next_hop parameter as shown in the following listing:

(next_hop=
  (DESCRIPTION=
    (CONNECT_TIMEOUT=5)(RETRY_COUNT=50)(RETRY_DELAY=3)(TRANSPORT_CONNECT_TIMEOUT=3)
    (ADDRESS_LIST=(LOAD_BALANCE=on)
      (ADDRESS=(PROTOCOL=TCPS)(HOST=172.20.1.10)(PORT=2484))
      (ADDRESS=(PROTOCOL=TCPS)(HOST=172.20.1.11)(PORT=2484))
      (ADDRESS=(PROTOCOL=TCPS)(HOST=172.20.1.12)(PORT=2484))
    )
    (ADDRESS_LIST=(LOAD_BALANCE=on)
      (ADDRESS=(PROTOCOL=TCPS)(HOST=172.20.2.10)(PORT=2484))
      (ADDRESS=(PROTOCOL=TCPS)(HOST=172.20.2.11)(PORT=2484))
      (ADDRESS=(PROTOCOL=TCPS)(HOST=172.20.2.12)(PORT=2484))
    )
  )
)

At first glance, this configuration looks typical. One might expect that, in the event of a disconnect from the database due to a service outage, the connection manager would handle the reconnection automatically. However, this expectation is incorrect.

Such an event leads to the client connection being terminated, and the client itself must initiate a new connection to the connection manager. With the configuration above, the connection from CMAN to the database will also attempt to reconnect.

If the database is completely offline and you have the above timeout and retry parameters configured on both the client and the connection manager side, this can lead to a connect storm. Each client creates up to 50 connection attempts, which themselves lead to 50 CMAN sessions trying to reconnect for at least 150 seconds.

Very quickly, the system can become overloaded, and nobody will be able to connect to any of the databases accessed through the connection manager.

My recommendation is not to use any connection failover parameters at all, so that each connection attempt is made only once.