# Diego Metrics

A list of component-level metrics emitted by Diego. Contributors interested in adding new metrics should visit our [contributor doc](contributing.md#Metrics) for a list of code conventions we follow.

* [Auctioneer](#auctioneer)
* [BBS](#bbs)
* [Locket](#locket)
* [Rep](#rep)
* [Route Emitter](#route-emitter)
* [SSH Proxy](#ssh-proxy)
* [General Golang metrics](#general-golang-metrics)

## Auctioneer

| Metric                                      | Description                                                                                                                                                | Unit             |
| ------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- | ----             |
| `AuctioneerFailedCellStateRequests`         | Cumulative number of cells the auctioneer failed to query for state. Emitted during each auction.                                                          | number           |
| `AuctioneerFetchStatesDuration`             | Time the auctioneer took to fetch state from all the cells when running its auction. Emitted during each auction.                                          | ns               |
| `AuctioneerLRPAuctionsFailed`               | Cumulative number of LRP instances that the auctioneer failed to place on Diego cells. Emitted during each auction.                                        | number           |
| `AuctioneerLRPAuctionsStarted`              | Cumulative number of LRP instances that the auctioneer successfully placed on Diego cells. Emitted during each auction.                                    | number           |
| `AuctioneerTaskAuctionsFailed`              | Cumulative number of Tasks that the auctioneer failed to place on Diego cells.  Emitted during each auction.                                               | number           |
| `AuctioneerTaskAuctionsStarted`             | Cumulative number of Tasks that the auctioneer successfully placed on Diego cells. Emitted during each auction.                                            | number           |
| `LockHeld`                                  | Whether an auctioneeer holds the auctioneer lock (in locket): 1 means the lock is held, and 0 means the lock was lost. Emitted periodically by the active auctioneer.  | 0 or 1 (boolean) |
| `LockHeld.` `v1-locks-auctioneer_lock`         | Whether an auctioneeer holds the auctioneer lock (in consul): 1 means the lock is held, and 0 means the lock was lost. Emitted periodically by the active auctioneer.  | 0 or 1 (boolean) |
| `LockHeldDuration.` `v1-locks-auctioneer_lock` | Time the active auctioneeer has held the auctioneer lock. Emitted periodically by the active auctioneer.                                                   | ns               |
| `RequestCount`                              | Cumulative number of requests the auctioneer has handled through its API.  Emitted periodically.                                                           | number           |
| `RequestLatency`                            | Time the auctioneer took to handle requests to its API endpoints. Emitted when the auctioneer handles requests.                                            | ns               |

## BBS

| Metric                                                | Description                                                                                                                                                                                                                     | Unit              |   |
|-------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------|---|
| `BBSMasterElected`                                    | Emitted once when the BBS is elected as master.                                                                                                                                                                                 | number (always 1) |   |
| `ConvergenceLRPDuration`                              | Time the BBS took to run its LRP convergence pass. Emitted every time LRP convergence runs.                                                                                                                                     | ns                |   |
| `ConvergenceLRPPreProcessingActualLRPsDeleted`        | Cumulative number of times the BBS has detected and deleted a malformed ActualLRP in its LRP convergence pass. Emitted periodically.                                                                                            | number            |   |
| `ConvergenceLRPPreProcessingMalformedRunInfos`        | Cumulative number of times the BBS has detected a malformed DesiredLRP RunInfo in its LRP convergence pass. Emitted periodically.                                                                                               | number            |   |
| `ConvergenceLRPPreProcessingMalformedSchedulingInfos` | Cumulative number of times the BBS has detected a malformed DesiredLRP SchedulingInfo in its LRP convergence pass. Emitted periodically.                                                                                        | number            |   |
| `ConvergenceLRPPreProcessingOrphanedRunInfos`         | Cumulative number of times the BBS has detected and deleted an orphaned DesiredLRP RunInfo in its LRP convergence pass. Emitted periodically.                                                                                   | number            |   |
| `ConvergenceLRPRuns`                                  | Cumulative number of times BBS has run its LRP convergence pass. Emitted periodically.                                                                                                                                          | number            |   |
| `ConvergenceTaskDuration`                             | Time the BBS took to run its Task convergence pass. Emitted every time Task convergence runs.                                                                                                                                   | ns                |   |
| `ConvergenceTaskRuns`                                 | Cumulative number of times the BBS has run its Task convergence pass. Emitted periodically.                                                                                                                                     | number            |   |
| `ConvergenceTasksKicked`                              | Cumulative number of times the BBS has updated a Task during its Task convergence pass. Emitted periodically.                                                                                                                   | number            |   |
| `ConvergenceTasksPruned`                              | Cumulative number of times the BBS has deleted a malformed Task during its Task convergence pass. Emitted periodically.                                                                                                         | number            |   |
| `CrashedActualLRPs`                                   | Total number of LRP instances that have crashed. Emitted periodically.                                                                                                                                                          | number            |   |
| `CrashingDesiredLRPs`                                 | Total number of DesiredLRPs that have at least one crashed instance. Emitted periodically.                                                                                                                                      | number            |   |
| `Domain.` `<domain-name>`                             | Whether the `<domain-name>` domain is up-to-date, so that instances from that domain have been synchronized with DesiredLRPs for Diego to run. 1 means the domain is up-to-date, no data means it is not. Emitted periodically. | always 1 when present |   |
| `EncryptionDuration`                                  | Time the BBS took to ensure all BBS records are encrypted with the current active encryption key. Emitted each time a BBS becomes the active master.                                                                            | ns                |   |
| `LRPsClaimed`                                         | Total number of LRP instances that have been claimed by some cell. Emitted periodically.                                                                                                                                        | number            |   |
| `LRPsDesired`                                         | Total number of LRP instances desired across all LRPs. Emitted periodically.                                                                                                                                                    | number            |   |
| `LRPsExtra`                                           | Total number of LRP instances that are no longer desired but still have a BBS record. Emitted periodically.                                                                                                                     | number            |   |
| `LRPsMissing`                                         | Total number of LRP instances that are desired but have no record in the BBS.  Emitted periodically.                                                                                                                            | number            |   |
| `LRPsRunning`                                         | Total number of LRP instances that are running on cells. Emitted periodically.                                                                                                                                                  | number            |   |
| `LRPsUnclaimed`                                       | Total number of LRP instances that have not yet been claimed by a cell. Emitted periodically.                                                                                                                                   | number            |   |
| `LockHeld`                                            | Whether a BBS holds the BBS lock (in locket): 1 means the lock is held, and 0 means the lock was lost. Emitted periodically by the active BBS server.                                                                           | 0 or 1 (boolean)  |   |
| `LockHeld.` `v1-locks-bbs_lock`                       | Whether a BBS holds the BBS lock (in consul): 1 means the lock is held, and 0 means the lock was lost. Emitted periodically by the active BBS server.                                                                           | 0 or 1 (boolean)  |   |
| `LockHeldDuration.` `v1-locks-bbs_lock`               | Time the active BBS has held the BBS lock (in consul). Emitted periodically by the active BBS server.                                                                                                                           | ns                |   |
| `MigrationDuration`                                   | Time the BBS took to run migrations against its persistence store. Emitted each time a BBS becomes the active master.                                                                                                           | ns                |   |
| `OpenFileDescriptors`                                 | Current (non-cumulative) number of open file descriptors held by the BBS. Emitted periodically.                                                                                                                                 | number            |   |
| `RequestCount`                                        | Cumulative number of requests the BBS has handled through its API. Emitted periodically.                                                                                                                                        | number            |   |
| `RequestLatency`                                      | Maximum amount of time the BBS took to handle a request to one its API endpoints over a 60-second interval. Emitted every 60 seconds.                                                                                           | ns                |   |
| `TasksCompleted`                                      | Total number of Tasks that have completed. Emitted periodically.                                                                                                                                                                | number            |   |
| `TasksPending`                                        | Total number of Tasks that have not yet been placed on a cell. Emitted periodically.                                                                                                                                            | number            |   |
| `TasksResolving`                                      | Total number of Tasks locked for deletion. Emitted periodically.                                                                                                                                                                | number            |   |
| `TasksRunning`                                        | Total number of Tasks running on cells. Emitted periodically.                                                                                                                                                                   | number            |   |
| `TasksSucceeded`                                      | Cumulative number of tasks completed successfully. **Note** This metric has a `cell-id` tag that can be used to get the per cell metric                                                                                         | number            |   |
| `TasksFailed`                                         | Cumulative number of tasks that failed. **Note** This metric has a `cell-id` tag that can be used to get the per cell metric                                                                                                    | number            |   |
| `TasksStarted`                                        | Cumulative number of tasks that has started so far. **Note** This metric has a `cell-id` tag that can be used to get the per cell metric                                                                                        | number            |   |



## Locket

| Metric                                               | Description                                                                                                                                                                                   | Unit   |
| ---------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----   |
| `ActiveLocks`                                        | Total number of active locks. Emitted periodically.                                                                                                                                           | number |
| `ActivePresences`                                    | Total number of active presences. Emitted periodically.                                                                                                                                       | number |
| `DBOpenConnections`                                  | Number of open connections to the SQL database.                                                                                                                                               | number |
| `DBQueriesFailed`                                    | Cumulative number of SQL queries that failed.                                                                                                                                                 | number |
| `DBQueriesInFlight`                                  | Number of SQL queries currently in flight.                                                                                                                                                    | number |
| `DBQueriesTotal`                                     | Cumulative number of SQL queries executed, including `BEGIN`, `COMMIT`, and `ROLLBACK` statements.                                                                                            | number |
| `DBQueriesSucceeded`                                 | Cumulative number of SQL queries that finished successfully.                                                                                                                                  | number |
| `DBQueryDurationMax`                                 | Maximum duration of all queries that have run in the last 60 seconds. Emitted every 60 seconds.                                                                                               | ns     |
| `LocksExpired`                                       | Cumulative number of locks that have expired. Emitted when a lock is expired.                                                                                                                 | number |
| `PresenceExpired`                                    | Cumulative number of presences that have expired. Emitted when a presence is expired.                                                                                                         | number |
| `RequestsStarted`                                    | Cumulative number of requests of a particular type that have been made. Currently tracking `Lock`, `Release`, `Fetch`, and `FetchAll` requests. Emitted every 60 seconds.                     | number |
| `RequestsSucceeded`                                  | Cumulative number of requests of a particular type that have completed successfully. Currently tracking `Lock`, `Release`, `Fetch`, and `FetchAll` requests. Emitted every 60 seconds.        | number |
| `RequestsFailed`                                     | Cumulative number of requests of a particular type that have failed for any reason. Currently tracking `Lock`, `Release`, `Fetch`, and `FetchAll` requests. Emitted every 60 seconds.         | number |
| `RequestsInFlight`                                   | Number of requests of a particular type currently being handled by locket. Currently tracking `Lock`, `Release`, `Fetch`, and `FetchAll` requests. Emitted every 60 seconds.                  | number |
| `RequestLatencyMax`                                  | Maximum request latency emitted by a request of a particular type in the last 60 seconds. Currently tracking `Lock`, `Release`, `Fetch`, and `FetchAll` requests. Emitted every 60 seconds.   | number |


## Rep

| Metric                                               | Description                                                                                                                                                | Unit             |
| ---------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- | ----             |
| `CapacityRemainingDisk`                              | Remaining amount of disk available for this cell to allocate to containers.  Emitted periodically.                                                         | mebibytes        |
| `CapacityRemainingMemory`                            | Remaining amount of memory available for this cell to allocate to containers.  Emitted periodically.                                                       | mebibytes        |
| `CapacityTotalDisk`                                  | Total amount of disk available for this cell to allocate to containers. Emitted periodically.                                                              | mebibytes        |
| `CapacityTotalMemory`                                | Total amount of memory available for this cell to allocate to containers.  Emitted periodically.                                                           | mebibytes        |
| `CapacityAllocatedDisk`                              | Amount of disk allocated to containers on this cell.  Emitted periodically.                                                                                | mebibytes        |
| `CapacityAllocatedMemory`                            | Amount of memory allocated to containers on this cell.  Emitted periodically.                                                                              | mebibytes        |
| `ContainerUsageDisk`                                 | Amount of disk used by containers on this cell.  Emitted periodically.                                                                                     | mebibytes        |
| `ContainerUsageMemory`                               | Amount of memory used by containers on this cell.  Emitted periodically.                                                                                   | mebibytes        |
| `CapacityRemainingContainers`                        | Remaining number of containers this cell can host. Emitted periodically.                                                                                   | number           |
| `CapacityTotalContainers`                            | Total number of containers this cell can host. Emitted periodically.                                                                                       | number           |
| `ContainerCount`                                     | Number of containers hosted on the cell. Emitted periodically.                                                                                             | number           |
| `CredCreationFailedCount`                            | Count of failed instance identity credential creations. Emitted after every failed credential creation.                                                    | number           |
| `CredCreationSucceededCount`                         | Count of successful instance identity credential creations. Emitted after every successful credential creation.                                            | number           |
| `CredCreationSucceededDuration`                      | Time the rep took to create instance identity credentials. Emitted after every successful credential creation.                                             | ns               |
| `GardenContainerCreationDuration`                    | Time the rep's Garden backend took to create a container. Emitted after every successful container creation. (Deprecated)                                  | ns               |
| `GardenContainerCreationSucceededDuration`           | Time the rep's Garden backend took to create a container. Emitted after every successful container creation.                                               | ns               |
| `GardenContainerCreationFailedDuration`              | Time the rep's Garden backend took to create a container. Emitted after every failed container creation.                                                   | ns               |
| `GardenContainerDestructionSucceededDuration`        | Time the rep's Garden backend took to destroy a container. Emitted after every successful container destruction.                                           | ns               |
| `GardenContainerDestructionFailedDuration`           | Time the rep's Garden backend took to destroy a container. Emitted after every failed container destruction.                                               | ns               |
| `RepBulkSyncDuration`                                | Time the cell rep took to synchronize the ActualLRPs it has claimed with its actual garden containers. Emitted periodically by each rep.                   | ns               |
| `StalledGardenDuration`                              | Time the rep is waiting on its garden backend to become healthy during startup.  Emitted only if garden not responsive when the rep starts up.             | ns               |
| `StartingContainerCount`														 | Number of containers currently in a Reserved, Initializing, or Created state. Emitted periodically.																												| number 					 |
| `StrandedEvacuatingActualLRPs`                       | Evacuating ActualLPRs that timed out during the evacuation process. Emitted when evacuation doesn't complete successful.                                   | number           |
| `UnhealthyCell`                                      | Whether the cell has failed to pass its healthcheck against the garden backend.  0 signifies healthy, and 1 signifies unhealthy. Emitted periodically.     | 0 or 1 (boolean) |
| `VolmanMountDuration`                                | Time volman took to mount a volume. Emitted by each rep when volumes are mounted.                                                                          | ns               |
| `VolmanMountDurationFor`                             | Time volman took to mount a volume with a specific volume driver. Emitted by each rep when volumes are mounted.                                            | ns               |
| `VolmanMountErrors`                                  | Count of failed volume mounts. Emitted periodically by each rep.                                                                                           | number           |
| `VolmanUnmountDuration`                              | Time volman took to unmount a volume. Emitted by each rep when volumes are mounted.                                                                        | ns               |
| `VolmanUnmountDurationFor`                           | Time volman took to unmount a volume with a specifc volume driver. Emitted by each rep when volumes are mounted.                                           | ns               |
| `VolmanUnmountErrors`                                | Count of failed volume unmounts. Emitted periodically by each rep.                                                                                         | number           |

## Route Emitter

| Metric                                            | Description                                                                                                                                                            | Unit             |
| -------------------------------------------       | ----------------------------------------------------------------------------------------------------------------------------------------------------------             | ----             |
| `AddressCollisions`                               | Number of detected conflicting routes. A conflicting route is a set of two distinct instances with the same IP address on the routing table.                           | number           |
| `ConsulDownMode`                                  | Whether the route-emitter is able to connect with the consul correctly.                                                                                                | 0 or 1 boolean   |
| `HTTPRouteCount`                                  | Number of HTTP route associations (route-endpoint pairs) in the route-emitter's routing table. Emitted periodically when emitter is in local mode.                     | number           |
| `HTTPRouteNATSMessagesEmitted`                    | Cumulative number of HTTP routing messages the route-emitter sends over NATS to the gorouter.                                                                          | number           |
| `InternalRouteNATSMessagesEmitted`                | Cumulative number of internal routing messages the route-emitter sends over NATS to the service discovery controller.                                                  | number           |
| `LockHeld.` `v1-locks-route_emitter_lock`         | Whether a route-emitter holds its Consul lock: 1 means the lock is held, and 0 means the lock was lost. Emitted periodically by the active route-emitter.              | 0 or 1 (boolean) |
| `LockHeldDuration.` `v1-locks-route_emitter_lock` | Time the active route-emitter has held the Consul lock. Emitted periodically by the active route-emitter.                                                              | ns               |
| `RouteEmitterSyncDuration`                        | Time the route-emitter took to perform its synchronization pass. Emitted periodically.                                                                                 | ns               |
| `RoutesRegistered`                                | Cumulative number of NATS route registrations emitted from the route-emitter as it reacts to changes to LRPs.                                                          | number           |
| `RoutesSynced`                                    | Cumulative number of route registrations emitted from the route-emitter during its periodic route-table emission.                                                      | number           |
| `RoutesTotal`                                     | Number of combined HTTP and TCP route associations (route-endpoint pairs) in the route-emitter's routing table. Emitted periodically.                                  | number           |
| `RoutesUnregistered`                              | Cumulative number of NATS route unregistrations emitted from the route-emitter as it reacts to changes to LRPs.                                                        | number           |
| `TCPRouteCount`                                   | Number of TCP route associations (route-endpoint pairs) in the route-emitter's routing table. Emitted periodically when emitter is in local mode.                      | number           |
| `MessagesEmitted`                                 | Cumulative number of routing messages the route-emitter sends over NATS. Deprecated in favor of `HTTPRouteNATSMessagesEmitted` and `InternalRouteNATSMessagesEmitted`. | number           |

## SSH Proxy

| Metric                                      | Description                                                                                                                                                | Unit   |
| ------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- | ----   |
| `ssh-connections`                           | Total number of SSH connections an SSH proxy has established. Emitted periodically by each SSH proxy.                                                      | number |

## General Golang metrics

These metrics are automatically emitted by dropsonde on all the Diego components.

| Metric                                      | Description                                                                                                                                                | Unit   |
| -------------------------------------------    | ----------------------------------------------------------------------------------------------------------------------------------------------------------       | ----             |
| `memoryStats.lastGCPauseTimeNS`             | Amount of time the Golang process paused for garbage collection.                                                                                           | ns     |
| `memoryStats.numBytesAllocatedHeap`         | Number of bytes the Golang process has allocated on the heap.                                                                                              | bytes  |
| `memoryStats.numBytesAllocatedStack`        | Number of bytes the Golang process has allocated on the stack.                                                                                             | bytes  |
| `memoryStats.numBytesAllocated`             | Total number of bytes allocated by the Golang process.                                                                                                     | bytes  |
| `memoryStats.numFrees`                      | Number of memory deallocations the Golang process has performed.                                                                                           | number |
| `memoryStats.numMallocs`                    | Number of memory allocations the Golang process has performed.                                                                                             | number |
| `numCPUS`                                   | Number of CPU cores available for the Golang process to use.                                                                                               | ns     |
| `numGoRoutines`                             | Number of goroutines the Golang process is running.                                                                                                        | number |
