Monitoring using Prometheus¶
Example prometheus.yml¶
lakeFS exposes metrics through the same port used by the lakeFS service, using the standard /metrics path.
An example could look like this:
prometheus.yml
Metrics exposed by lakeFS¶
By default, Prometheus exports metrics with OS process information like memory and CPU. It also includes Go-specific metrics such as details about GC and a number of goroutines. You can learn about these default metrics in this post.
In addition, lakeFS exposes the following metrics to help monitor your deployment:
| Name in Prometheus | Description | Labels | 
|---|---|---|
| api_requests_total | lakeFS API requests (counter) | code: http status method: http method | 
| api_request_duration_seconds | Durations of lakeFS API requests (histogram) | operation: name of API operation code: http status | 
| gateway_request_duration_seconds | lakeFS S3-compatible endpoint request (histogram) | operation: name of gateway operation code: http status | 
| s3_operation_duration_seconds | Outgoing S3 operations (histogram) | operation: operation name error: "true" if error, "false" otherwise | 
| gs_operation_duration_seconds | Outgoing Google Storage operations (histogram) | operation: operation name error: "true" if error, "false" otherwise | 
| azure_operation_duration_seconds | Outgoing Azure storage operations (histogram) | operation: operation name error: "true" if error, "false" otherwise | 
| kv_request_duration_seconds | Durations of KV requests(histogram) | operation: name of KV operation type: KV type(dynamodb, postgres, etc) | 
| dynamo_request_duration_seconds | Time spent doing DynamoDB requests | operation: DynamoDB operation name | 
| dynamo_consumed_capacity_total | The capacity units consumed by operation | operation: DynamoDB operation name | 
| dynamo_failures_total | The total number of errors while working for kv store | operation: DynamoDB operation name | 
| pgxpool_acquire_count | PostgreSQL cumulative count of successful acquires from the pool | db_name default to the kv table name (kv) | 
| pgxpool_acquire_duration_ns | PostgreSQL total duration of all successful acquires from the pool in nanoseconds | db_name default to the kv table name (kv) | 
| pgxpool_acquired_conns | PostgreSQL number of currently acquired connections in the pool | db_name default to the kv table name (kv) | 
| pgxpool_canceled_acquire_count | PostgreSQL cumulative count of acquires from the pool that were canceled by a context | db_name default to the kv table name (kv) | 
| pgxpool_constructing_conns | PostgreSQL number of conns with construction in progress in the pool | db_name default to the kv table name (kv) | 
| pgxpool_empty_acquire | PostgreSQL cumulative count of successful acquires from the pool that waited for a resource to be released or constructed because the pool was empty | db_name default to the kv table name (kv) | 
| pgxpool_idle_conns | PostgreSQL number of currently idle conns in the pool | db_name default to the kv table name (kv) | 
| pgxpool_max_conns | PostgreSQL maximum size of the pool | db_name default to the kv table name (kv) | 
| pgxpool_total_conns | PostgreSQL total number of resources currently in the pool | db_name default to the kv table name (kv) | 
Example queries¶
Note
when using Prometheus functions like rate or increase, results are extrapolated and may not be exact.
99th percentile of API request latencies
```
sum by (operation)(histogram_quantile(0.99, rate(api_request_duration_seconds_bucket[1m]))) ```
50th percentile of S3-compatible API latencies
Number of errors in outgoing S3 requests
