Metric - Lin

Each component of LinDB provides self-monitoring metrics to help users understand running status.

By default, LinDB regularly stores latest self-monitoring metric data into the _internal database.

There are several types of metrics as below

General: General metrics, such as CPU, Mem, network, etc., applicable to Root, Broker, Storage;
Broker: Broker internal monitoring metrics;
Storage: Storage internal monitoring metrics;

All metrics are labeled with global tags as follows:

node: component's node;

TIP

Since LinDB supports multiple storage clusters (Storage) under a compute cluster (Broker), in order to better distinguish storage clusters, 'namespace' has been added to the metric under Storage to distinguish the cluster.

General

Go Runtime

Metric Name	Tags	Fields	Description
lindb.runtime	-	go_goroutines	the number of goroutines
lindb.runtime	-	go_threads	the number of records in the thread creation profile
lindb.runtime.mem	-	alloc	bytes of allocated heap objects
		total_alloc	cumulative bytes allocated for heap objects
		sys	the total bytes of memory obtained from the OS
		lookups	the number of pointer lookups performed by the runtime
		mallocs	the cumulative count of heap objects allocated
		frees	the cumulative count of heap objects freed
		heap_alloc	bytes of allocated heap objects
		heap_sys	bytes of heap memory obtained from the OS
		heap_idle	bytes in idle (unused) spans
		heap_inuse	bytes in in-use spans
		heap_released	bytes of physical memory returned to the OS
		heap_objects	the number of allocated heap objects
		stack_inuse	bytes in stack spans
		stack_sys	bytes of stack memory obtained from the OS
		mspan_inuse	bytes of allocated mspan structures
		mspan_sys	bytes of memory obtained from the OS for mspan
		mcache_inuse	bytes of allocated mcache structures
		mcache_sys	bytes of memory obtained from the OS for mcache structures
		buck_hash_sys	bytes of memory in profiling bucket hash tables
		gc_sys	bytes of memory in garbage collection metadata
		other_sys	bytes of memory in miscellaneous off-heap
		next_gc	the target heap size of the next GC cycle
		last_gc	the time the last garbage collection finished
		gc_cpu_fraction	the fraction of this program's available CPU time used by the GC since the program started

System

Metric Name	Tags	Fields	Description
lindb.monitor.system.cpu_stat	-	idle	CPU time that's not actively being used
		nice	CPU time used by processes that have a positive niceness
		system	CPU time used by the kernel
		user	CPU time used by user space processes
		irq	Interrupt Requests
		steal	The percentage of time a virtual CPU waits for a real CPU
		softirq	The kernel is servicing interrupt requests (IRQs)
		iowait	It marks time spent waiting for input or output operations
lindb.monitor.system.mem_stat	-	total	Total amount of RAM on this system
		used	RAM used by programs
		free	Free RAM
		usage	Percentage of RAM used by programs
lindb.monitor.system.disk_usage_stats	-	total	Total amount of disk
		used	Disk used by programs
		free	Free disk
		usage	Percentage of disk used by programs
lindb.monitor.system.disk_inodes_stats	-	total	Total amount of inode
		used	INode used by programs
		free	Free inode
		usage	Percentage of inode used by programs
lindb.monitor.system.net_stat	interface	bytes_sent	number of bytes sent
		bytes_recv	number of bytes received
		packets_sent	number of packets sent
		packets_recv	number of packets received
		errin	total number of errors while receiving
		errout	total number of errors while sending
		dropin	total number of incoming packets which were dropped
		dropout	total number of outgoing packets which were dropped (always 0 on OSX and BSD)

Network

Metric Name	Tags	Fields	Description
lindb.traffic.tcp	addr	accept_conns	accept total count
		accept_failures	accept failure
		active_conns	current active connections
		reads	read total count
		read_bytes	read byte size
		read_failures	read failure
		writes	write total count
		write_bytes	write byte size
		write_failures	write failure
		close_conns	close total count
		close_failures	close failure
lindb.traffic.grpc_client.unary	grpc_service grpc_method	failures	grpc unary client handle msg failure
lindb.traffic.grpc_client.unary.duration	grpc_service grpc_method	histogram	grpc unary client handle msg duration
lindb.traffic.grpc_server.unary	grpc_service grpc_method	failures	grpc unary server handle msg failure
lindb.traffic.grpc_server.unary.duration	grpc_service grpc_method	histogram	grpc unary server handle msg duration
lindb.traffic.grpc_client.stream	grpc_service grpc_service grpc_method	msg_received_failures	grpc cliet receive msg failure
lindb.traffic.grpc_client.stream	grpc_service grpc_service grpc_method	msg_sent_failures	grpc cliet send msg failure
lindb.traffic.grpc_client.stream.received_duration	grpc_service grpc_service grpc_method	histogram	grpc client receive msg duration, include receive total count/handle duration
lindb.traffic.grpc_client.stream.sent_duration	grpc_service grpc_service grpc_method	histogram	grpc client send msg duration, include send total count
lindb.traffic.grpc_server.stream	grpc_service grpc_service grpc_method	msg_received_failures	grpc server receive msg failure
lindb.traffic.grpc_server.stream	grpc_service grpc_service grpc_method	msg_sent_failures	grpc server send msg failure
lindb.traffic.grpc_server.stream.received_duration	grpc_service grpc_service grpc_method	histogram	grpc server receive msg duration, include receive total count/handle duration
lindb.traffic.grpc_server.stream.sent_duration	grpc_service grpc_service grpc_method	histogram	grpc server send msg duration, include send total count
lindb.traffic.grpc_server	-	panics	panic when grpc server handle request

Concurrent

Metric Name	Tags	Fields	Description
lindb.concurrent.pool	pool_name	workers_alive	current workers count in use
		workers_created	workers created count since start
		workers_killed	workers killed count since start
		tasks_consumed	workers consumed count
		tasks_rejected	workers rejected count
		tasks_panic	workers execute panic count
lindb.concurrent.pool.tasks_waiting_duration	pool_name	histogram	task waiting time
lindb.concurrent.pool.tasks_executing_duration	pool_name	histogram	task executing time with waiting period
lindb.concurrent.limit	type	throttle_requests	number of reaches the max-concurrency
		timeout_requests	number pending and then timeout
		processed	number of processed requests

Coordinator

Metric Name	Tags	Fields	Description
lindb.coordinator.state_manager	type,coordinator	handle_events	handle coordinator event success count
		handle_event_failures	handle coordinator event failure count
		panics	panic count whne handle coordinator event

Query

Applicable to Root, Broker.

Metric Name	Tags	Fields	Description
lindb.query	-	created_tasks	create query tasks
		alive_tasks	current executing tasks(alive)
		expire_tasks	task expire, long-term no response
		emitted_responses	emit response to parent node
		omitted_responses	omit response because task evicted
lindb.task.transport	-	sent_requests	send request successfully
		sent_requests_failures	send request failure
		sent_responses	send response successfully
		sent_responses_failures	send response successfully

Broker

Metric Name	Tags	Fields	Description
lindb.master.shard.leader	-	elections	shard leader elect successfully
lindb.master.shard.leader	-	elect_failures	shard leader elect failure
lindb.master.controller	-	failovers	master fail over successfully
		failover_failures	master fail over failure
		reassigns	master reassign successfully
		reassign_failures	master reassign failure
lindb.http.ingest_duration	path	histogram	ingest duration(include count)
lindb.ingestion.proto	-	data_corrupted	corrupted when parse
		ingested_metrics	ingested metrics
		read_bytes	read data bytes
		dropped_metrics	drop metrics when append
lindb.ingestion.flat	-	data_corrupted	corrupted when parse
		ingested_metrics	ingested metrics
		read_bytes	read data bytes
		dropped_metrics	drop metrics when append
	size	block	read data block size
lindb.ingestion.influx	-	data_corrupted	corrupted when parse
		ingested_metrics	ingested metrics
		ingested_fields	ingested fields
		read_bytes	read data bytes
		dropped_metrics	drop metrics when append
		dropped_fields	drop fields when append
lindb.broker.database.write	db	out_of_time_range	timestamp of metrics out of acceptable write time range
lindb.broker.database.write	db	shard_not_found	shard not found count
lindb.broker.family.write	db	active_families	number of current active replica family channel
		batch_metrics	batch into memory chunk success count
		batch_metrics_failures	batch into memory chunk failure count
		pending_send	number of pending send message
		send_success	send message success count
		send_failures	send message failure count
		send_size	bytes of send message
		retry	retry count
		retry_drop	number of drop message after too many retry
		create_stream	create replica stream success count
		create_stream_failures	create replica stream failure count
		close_stream	close replica stream success count
		close_stream_failures	close replica stream failure count
		leader_changed	shard leader changed

Storage

Metric Name	Tags	Fields	Description
lindb.storage.wal	db shard	receive_write_bytes	receive write request bytes(broker->leader)
		write_wal	write wal successfully(broker->leader)
		write_wal_failures	write wal failure(broker->leader)
		receive_replica_bytes	receive replica request bytes(storage leader->follower
		replica_wal	replica wal successfully(storage leader->follower)
		replica_wal_failures	replica wal failure(storage leader->follower)
lindb.storage.replicator.runner	type db shard	active_replicators	number of current active local replicators
		replica_panics	replica panic count
		consume_msg	get message successfully count
		consume_msg_failures	get message failure count
		replica_lag	replica lag message count
		replica_bytes	bytes of replica data
		replicas	replica success count
lindb.storage.replica.local	db shard	decompress_failures	decompress message failure count
		replica_failures	replica failure count
		replica_rows	row number of replica
		ack_sequence	ack persist sequence count
		invalid_sequence	invalid replica sequence count
lindb.storage.replica.remote	db shard	not_ready	remote replicator channel not ready
		follower_offline	remote follower node offline
		need_close_last_stream	need close last stream, when do re-connection
		close_last_stream_failures	close last stream failure
		create_replica_cli	create replica client successfully
		create_replica_cli_failures	create replica client failure
		create_replica_stream	create replica stream successfully
		create_replica_stream_failures	create replica stream failure
		get_last_ack_failures	get last ack sequence from remote follower failure
		reset_follower_append_idx	reset follower append index successfully
		reset_follower_append_idx_failures	reset follower append index failure
		reset_append_idx	reset current leader local append index
		reset_replica_idx	reset current leader replica index successfully
		reset_replica_failures	reset current leader replica index failure
		send_msg	send replica msg successfully
		send_msg_failures	send replica msg failure
		receive_msg	receive replica resp successfully
		receive_msg_failures	receive replica resp failure
		ack_sequence	ack replica successfully sequence count
		invalid_ack_sequence	get wrong replica ack sequence from follower
lindb.tsdb.indexdb	db	build_inverted_index	build inverted index count
lindb.tsdb.memdb	db	allocated_pages	allocate temp memory page successfully
lindb.tsdb.memdb	db	allocate_page_failures	allocate temp memory page failure
lindb.tsdb.database	db	metadb_flush_failures	flush metadata database failure
lindb.tsdb.database.metadb_flush_duration	db	histogram	flush metadata database duration(include count)
lindb.tsdb.metadb	db	gen_metric_ids	generate metric id successfully
		gen_metric_id_failures	generate metric id failure
		gen_tag_key_ids	generate tag key id successfully
		gen_tag_key_id_failures	generate tag key id failure
		gen_field_ids	generate field id successfully
		gen_field_id_failures	generate field id failure
		gen_tag_value_ids	generate tag value id successfully
		gen_tag_value_id_failures	generate tag value id failure
lindb.tsdb.shard	db shard	active_families	number of current active families
		write_batches	write batch count
		write_metrics	write metric success count
		write_fields	write field data point success count
		write_metrics_failures	write metric failures
		memdb_total_size	total memory size of memory database
		active_memdbs	number of current active memory database
		memdb_flush_failures	flush memory database failure
		lookup_metric_meta_failures	lookup meta of metric failure
		indexdb_flush_failures	flush index database failure
lindb.tsdb.shard.memdb_flush_duration	db shard	histogram	flush memory database duration(include count)
lindb.tsdb.shard.indexdb_flush_duration	db shard	indexdb_flush_duration	flush index database duration(include count)
lindb.kv.table.cache	-	evicts	evict reader from cache
		cache_hits	get reader hit cache
		cache_misses	get reader miss cache
		closes	close reader successfully
		close_failures	close reader failure
		active_readers	number of active reader in cache
lindb.kv.table.read	-	gets	get data by key successfully
		get_failures	get data by key failures
		read_bytes	bytes of read data
		mmaps	map file successfully
		mmap_failures	map file failure
		unmmaps	unmam file successfully
		unmmap_failures	unmam file failure
lindb.kv.table.write	-	bad_keys	add bad key count
		add_keys	add key successfully
		write_bytes	bytes of write data
lindb.kv.compaction	type	compacting	number of compacting jobs
lindb.kv.compaction	type	failure	compact failure
lindb.kv.compaction.duration	type	histogram	compact duration(include count)
lindb.kv.flush	-	flushing	number of flushing jobs
lindb.kv.flush	-	failure	flush job failure
lindb.kv.flush.duration	-	histogram	flush duration(include count)
lindb.storage.query	-	metric_queries	execute metric query successfully(just plan it)
		metric_query_failures	execute metric query failure
		meta_queries	metadata query successfully
		meta_query_failures	metadata query failure
		omitted_requests	omit request(task no belong to current node, wrong stream etc.)

Data Model

Configuration

Self monitoring metric