When it comes to timeseries data, there are lots of terms tossed about that can lead to some confusion. This page is a sort of glossary that helps to define words related to the use of OpenTSDB.
Cardinality is a mathematical term defined as the number of elements in a set. In database lingo, it's often used to refer to the number of unique items in an index. With regards to OpenTSDB it can refer to:
Due to the nature of the OpenTSDB storage schema, metrics with higher cardinality may take longer return results during query execution than those with lower cardinality. E.g. we may have metric foo
with the tag name datacenter
and there are 100 possible values for datacenter. Then we have metric bar
with the tag host
and 50,000 possible values for host. Metric bar
has a higher cardinality than foo
: 50,000 possible time series for bar
an only 100 for foo
.
An OpenTSDB compaction takes multiple columns in an HBase row and merges them into a single column to reduce disk space. This is not to be confused with HBase compactions where multiple edits to a region are merged into one. OpenTSDB compactions can occur periodically for a TSD after data has been written, or during a query.
Each of the metrics above can be recorded as a number at a specific time. For example, we could record that Sue worked 8 hours at the end of each day. Or that "mylogo.jpg" was downloaded 400 times in the past hour. Thus a datapoint consists of:
A metric is simply the name of a quantitative measurement. Metrics include things like:
Note
Notice that the metric
did not include a specific number or a time. That is becaue a metric
is just a label of what you are measuring. The actual measurements are called datapoints
, as you'll see later.
Unfortunately OpenTSDB requires metrics to be named as a single, long word without spaces. Thus metrics are usually recorded using "dotted notation". For example, the metrics above would have names like:
A metric
should be descriptive of what is being measured, but with OpenTSDB, it should not be too specific. Instead, it is better to use tags
to differentiate and organize different items that may share a common metric. Tags are pairs of words that provide a means of associating a metric with a specific item. Each pair consists of a tagk
that represents the group or category of the following tagv
that represents a specific item, object, location or other noun.
Expanding on the metric examples above:
tagk
of employee
with their names as the tagv
. These would be recorded as employee=sue
, employee=john
etc.tagk
of file
to arrive at file=logo.jpg
or file=index.php
tagk
of region
to get region=new_england
or region=north_west
A collection of two or more data points for a single metric and group of tag name/value pairs.
Timestamps are simply the absolute time when a value for a given metric was recorded.
A value represents the actual numeric measurement of the given metric. One of our employees, Sue, worked 8 hours yesterday, thus the value would be 8
. There were 1,024 downloads of logo.jpg
from our webserver in the past hour. And 12 inches of snow fell in New England today.
© 2010–2016 The OpenTSDB Authors
Licensed under the GNU LGPLv2.1+ and GPLv3+ licenses.
http://opentsdb.net/docs/build/html/user_guide/definitions.html