Client Connection Recovery and Failover
RabbitMQ client recovery is an application responsibility. The broker can accept reconnections, but the application still needs appropriate endpoint selection, retries, acknowledgements, and idempotency behavior.
Use this guide to harden producers and consumers against node restarts, rolling upgrades, transient network failures, and site-level endpoint changes.
TOC
Design PrinciplesJava ExamplePython ExampleEndpoint Failover StrategyVerification ChecklistRelated InformationDesign Principles
Use the following design principles for production clients:
- Configure more than one broker address when the client library supports it.
- Separate producer and consumer connections so one blocked flow does not affect the other.
- Use heartbeats and reasonable connection timeouts.
- Enable automatic recovery when the client library provides it.
- Use publisher confirms for producers.
- Use manual acknowledgements and idempotent processing for consumers.
- Treat site-level failover as an application configuration change even if node-level automatic recovery is enabled.
Java Example
The RabbitMQ Java client supports automatic connection and topology recovery:
Use publisher confirms on producer channels and check confirmation failures in application code.
Python Example
With pika, implement an explicit reconnect loop and recreate channels or consumers after connection failures:
If your consumer depends on declared topology, recreate or verify the topology after reconnect according to the behavior of your client library.
Endpoint Failover Strategy
Use one of the following endpoint strategies:
Automatic recovery across nodes in one cluster does not automatically switch the application to a different DR site. Site failover still requires configuration, DNS, or service-discovery changes.
Verification Checklist
After you change client recovery settings, verify that:
- The application can connect to more than one broker address.
- Publishers use confirms and fail visibly when publishes are not accepted.
- Consumers use manual acknowledgement and can restart cleanly.
- The reconnect loop does not create duplicate consumer registrations.
- TLS settings remain valid for every endpoint that the client can use.