Marathon offers multiple ways to run (Docker-containerized) applications, the most popular ones being
- BRIDGE-networked containers with dynamic high ports exposed
- HOST-networked containers with host machine ports
- containers with dedicated IP addresses ([IP-per-task](https://mesosphere.github.io/marathon/docs/ip-per-task.html)).
Traefik tries to detect the configured mode and route traffic to the right IP addresses. It is possible to force using task hosts with the `forceTaskHostname` option.
Given the complexity of the subject, it is possible that the heuristic fails.
Apart from filing an issue and waiting for the feature request / bug report to get addressed, one workaround for such situations is to customize the Marathon template file to the individual needs.
This does _not_ require rebuilding Traefik but only to point the `filename` configuration parameter to a customized version of the `marathon.tmpl` file on Traefik startup.
Traefik also attempts to determine the right port (which is a [non-trivial matter in Marathon](https://mesosphere.github.io/marathon/docs/ports.html)).
Following is the order by which Traefik tries to identify the port (the first one that yields a positive result will be used):
Some Marathon applications may expose multiple ports. Traefik supports creating one so-called _segment_ per port using [segment labels](/configuration/backends/marathon#applications-with-multiple-ports-segment-labels).
For instance, assume that a Marathon application exposes a web API on port 80 and an admin interface on port 8080. It would then be possible to make each service available by specifying the following Marathon labels:
```
traefik.web.port=80
```
```
traefik.admin.port=8080
```
(Note that the service names `web` and `admin` can be chosen arbitrarily.)
Technically, Traefik will create one pair of frontend and backend configurations for each service.
It is possible to define [readiness checks](https://mesosphere.github.io/marathon/docs/readiness-checks.html) (available since Marathon version 1.1) per application and have Marathon take these into account during the startup phase.
The idea is that each application provides an HTTP endpoint that Marathon queries periodically during an ongoing deployment in order to mark the associated readiness check result as successful if and only if the endpoint returns a response within the configured HTTP code range.
Beginning with version 1.4, Traefik respects readiness check results if the Traefik option is set and checks are configured on the applications accordingly.
It is on the order of one readiness check timeout interval (as configured on the application specification) and guarantees that non-ready tasks do not receive traffic prematurely.
If readiness checks are not possible, a current mitigation strategy is to enable [retries](/configuration/commons#retry-configuration) and make sure that a sufficient number of healthy application tasks exist so that one retry will likely hit one of those.
It is possible to install a [termination handler](https://mesosphere.github.io/marathon/docs/health-checks.html) (available since Marathon version 1.3) with each application whose responsibility it is to delay the shutdown process long enough until the backend has been taken out of load-balancing rotation with reasonable confidence (i.e., Traefik has received an update from the Marathon event bus, recomputes the available Marathon backends, and applies the new configuration).
Specifically, each termination handler should install a signal handler listening for a SIGTERM signal and implement the following steps on signal reception:
1. Keep accepting HTTP requests for a certain period of time.
1. Stop accepting new connections.
1. Finish serving any in-flight requests.
1. Shut down.
Traefik already ignores Marathon tasks whose state does not match `TASK_RUNNING`; since terminating tasks transition into the `TASK_KILLING` and eventually `TASK_KILLED` state, there is nothing further that needs to be done on Traefik's end.
How long HTTP requests should continue to be accepted in step 2 depends on how long Traefik needs to receive and process the Marathon configuration update.
Under regular operational conditions, it should be on the order of seconds, with 10 seconds possibly being a good default value.
Paired with termination handlers, they would cover for those cases where either the termination sequence or Traefik cannot complete their part of the orchestration process in time.
For that reason, the Treafik health check provides an additional check that responds more rapidly and does not require a configuration reload to happen.
Additionally, it protects from cases that the Marathon health check may not be able to cover, such as a network split.
It may seem obvious to reuse the Marathon health checks as a signal to Traefik whether an application should be taken into load-balancing rotation or not.
Apart from the increased latency a failing health check may have, a major problem with this is is that Marathon does not persist the health check results.
Consequently, if a master re-election occurs in the Marathon clusters, all health check results will revert to the _unknown_ state, effectively causing all applications inside the cluster to become unavailable and leading to a complete cluster failure.
Re-elections do not only happen during regular maintenance work (often requiring rolling upgrades of the Marathon nodes) but also when the Marathon leader fails spontaneously.
As such, there is no way to handle this situation deterministically.
Finally, Marathon health checks are not mandatory (the default is to use the task state as reported by Mesos), so requiring them for Traefik would raise the entry barrier for Marathon users.
Traefik used to use the health check results as a strict requirement but moved away from it as [users reported the dramatic consequences](https://github.com/containous/traefik/issues/653).
Additionally, the approach is less flexible compared to a custom termination handler since only the latter allows for the implementation of custom termination sequences that go beyond simple request draining (e.g., persisting a snapshot state to disk prior to terminating).