Metrics#
There are two types of metrics exporters in an Xinference cluster:
Supervisor metrics exporter at <endpoint>/metrics, e.g. http://127.0.0.1:9997/metrics.
Worker metrics exporter at each worker node, the exporter host and port can be set by –metrics-exporter-host and –metrics-exporter-port options in xinference-local or xinference-worker command.
Supervisor Metrics#
exceptions_total_counter (counter): Total number of requested which generated an exception
requests_total_counter (counter): Total number of requests received
responses_total_counter (counter): Total number of responses sent
status_codes_counter (counter): Total number of response status codes
Worker Metrics#
xinference:generate_tokens_per_s (gauge): Generate throughput in tokens/s.
xinference:input_tokens_total_counter (counter): Total number of input tokens.
xinference:output_tokens_total_counter (counter): Total number of output tokens.
xinference:time_to_first_token_ms (gauge): First token latency in ms.