Add Azure Monitor output plugin by gunnaraasen · Pull Request #4089 · influxdata/telegraf

gunnaraasen · 2018-04-30T21:25:29Z

Required for all PRs:

Signed CLA.
Associated README.md updated.
Has appropriate unit tests.

This is a new output plugin for Azure Monitor. I will be adding more unit tests soon.

danielnelson · 2018-05-01T02:25:01Z

plugins/outputs/azuremonitor/README.md

+
+```
+# Configuration for sending aggregate metrics to Azure Monitor
+[[outputs.azuremonitor]]


Not required but consider renaming azure_monitor.

danielnelson · 2018-05-01T02:25:28Z

plugins/outputs/azuremonitor/README.md

+## specified, the plugin will attempt to retrieve the resource ID
+## of the VM via the instance metadata service (optional if running 
+## on an Azure VM with MSI)
+#resourceId = "/subscriptions/<subscription-id>/resourceGroups/<resource-group>/providers/Microsoft.Compute/virtualMachines/<vm-name>"


Use snake_case for all options.

danielnelson · 2018-05-01T04:19:46Z

plugins/outputs/azuremonitor/azuremetadata.go

+	}
+
+	if resp.StatusCode >= 300 || resp.StatusCode < 200 {
+		return nil, fmt.Errorf("Post Error. HTTP response code:%d message:%s, content: %s",


Should this be a "GET Error"?

danielnelson · 2018-05-01T04:21:35Z

plugins/outputs/azuremonitor/azuremetadata.go

+	}
+
+	if resp.StatusCode >= 300 || resp.StatusCode < 200 {
+		return nil, fmt.Errorf("Post Error. HTTP response code:%d message:%s reply:\n%s",


GET Error, also convert to a single line error string since these will end up in the logging output.

danielnelson · 2018-05-01T04:22:58Z

plugins/outputs/azuremonitor/azuremetadata_test.go

+	// 	t.Logf("metadata is \n%v", metadata)
+	// }
+
+	//fmt.Printf("metadata is \n%v", metadata)


Don't forget to clear out this code.

danielnelson · 2018-05-01T06:04:25Z

plugins/outputs/azuremonitor/azuremonitor.go

+	metadataService  *AzureInstanceMetadata
+	instanceMetadata *VirtualMachineMetadata
+	msiToken         *msiToken
+	msiResource      string


Make this a package const

danielnelson · 2018-05-01T06:05:32Z

plugins/outputs/azuremonitor/azuremonitor.go

+	instanceMetadata *VirtualMachineMetadata
+	msiToken         *msiToken
+	msiResource      string
+	bearerToken      string


Use directly from msiToken

danielnelson · 2018-05-01T06:06:07Z

plugins/outputs/azuremonitor/azuremonitor.go

+	msiToken         *msiToken
+	msiResource      string
+	bearerToken      string
+	expiryWatermark  time.Duration


I see this being used but where is it set?

danielnelson · 2018-05-01T06:06:46Z

plugins/outputs/azuremonitor/azuremonitor.go

+	expiryWatermark  time.Duration
+
+	oauthConfig *adal.OAuthConfig
+	adalToken   adal.OAuthTokenProvider


This I see being set but not used.

danielnelson · 2018-05-01T06:07:58Z

plugins/outputs/azuremonitor/azuremonitor.go

+	period      time.Duration
+	delay       time.Duration
+	periodStart time.Time
+	periodEnd   time.Time


I think these two times can just be locals

asheniam · 2018-05-01T17:08:00Z

plugins/outputs/azuremonitor/azuremonitor.go

+		return nil, fmt.Errorf("Error authenticating: %v", err)
+	}
+
+	metricsEndpoint := fmt.Sprintf("https://%s.monitoring.azure.com%s/metrics",


One thing to be careful here -- As custom metrics is in preview on the Azure Monitor side, we don't support all regions of public Azure. We are only going to be available for a few regions as part of the reviews so not all these endpoints will exist.

asheniam · 2018-05-01T17:15:24Z

plugins/outputs/azuremonitor/azuremonitor.go

+		return nil, fmt.Errorf("Error authenticating: %v", err)
+	}
+
+	metricsEndpoint := fmt.Sprintf("https://%s.monitoring.azure.com%s/metrics",


For any of the endpoints that the output plugin is communicating with (ex. https://.monitoring.azure.com, we should ideally move all these endpoints into single place in the code, not sprinkle them across all files. This will help future proof the plugin to more easily support other Azure clouds down the road (Azure Germany, Azure China, Azure US Government) which will all have their own endpoints.

gunnaraasen · 2018-05-07T16:57:37Z

@danielnelson I've updated this branch to use the running output aggregator pattern you suggested. I moved the original PR code to the ga-azure-monitor-original branch. I will add some tests if you think the architecture makes sense now.

danielnelson · 2018-05-07T22:30:51Z

Looks good, like how you dealt with not having the filter for old metrics. Gives me some ideas for improving this with the normal aggregators.

asheniam

Adding feedback on the Azure Monitor output plugin

asheniam · 2018-06-11T01:37:37Z

plugins/outputs/azuremonitor/azuremonitor.go

+  #resource_id = "/subscriptions/<subscription_id>/resourceGroups/<resource_group>/providers/Microsoft.Compute/virtualMachines/<vm_name>"
+  ## Azure region to publish metrics against.  Defaults to eastus.
+  ## Leave blank to automatically query the region via MSI.
+  #region = "useast"


Use "eastus" as the default, not "useast". "useast" will not work -- it won't resolve to any monitoring endpoint.

asheniam · 2018-06-11T01:38:41Z

plugins/outputs/azuremonitor/azuremonitor.go

+}
+
+const (
+	defaultRegion          string = "eastus"


We shouldn't have a default region constant. This value either needs to come from instance metadata or come from user configuration. The region must match the region of the Azure resource ID and can't be guessed.

asheniam · 2018-06-11T01:39:29Z

plugins/outputs/azuremonitor/azuremonitor.go

+		return err
+	}
+
+	req.Header.Set("Authorization", "Bearer "+a.msiToken.AccessToken)


If we are not using MSI, where are we setting a.adalToken?

asheniam · 2018-06-11T01:40:40Z

plugins/outputs/azuremonitor/azuremonitor.go

+	}
+	defer resp.Body.Close()
+
+	if resp.StatusCode >= 300 || resp.StatusCode < 200 {


Why are we only doing fmt.Errorf for status codes in the [300, 200) range. We should also follow this pattern for any errors encountered in the 4xx or 5xx range

asheniam · 2018-06-11T01:41:59Z

plugins/outputs/azuremonitor/azuremonitor.go

+		Data: &azureMonitorData{
+			BaseData: &azureMonitorBaseData{
+				Metric:         m.Name(),
+				Namespace:      "default",


Any reason we choose to set the metric namespace to be "default"? This might be good for users to override via config but as a default, it might be better to go with a value which has "Telegraf" in the namespace.

asheniam · 2018-06-11T01:46:50Z

plugins/outputs/azuremonitor/azuremonitor.go

+	for _, m := range azmetrics {
+		// Azure Monitor accepts new batches of points in new-line delimited
+		// JSON, following RFC 4288 (see https://github.com/ndjson/ndjson-spec).
+		jsonBytes, err := json.Marshal(&m)


How large can azmetrics be? I believe the Azure Monitor metric API has a max request body size of 4MB. If we exceed this limit, we should issue multiple POST requests

asheniam · 2018-06-11T01:48:29Z

plugins/outputs/azuremonitor/azuremonitor.go

+  # timeout = "5s"
+
+  ## Whether or not to use managed service identity.
+  #use_managed_service_identity = true


Nit: In the sample configuration, we should make it clear that when use_managed_service_identify is false, it's required that the user supply resource_id, region, azure_subscription, azure_tenant, azure_client_id, and azure_client_secret. These become mandatory parameters.

asheniam · 2018-06-11T01:50:03Z

plugins/outputs/azuremonitor/azuremonitor.go

+	useMsi              bool              `toml:"use_managed_service_identity"`
+	ResourceID          string            `toml:"resource_id"`
+	Region              string            `toml:"region"`
+	Timeout             internal.Duration `toml:"Timeout"`


Nit: Should this be lower case "timeout" instead of "Timeout"?

asheniam · 2018-06-11T01:50:51Z

plugins/outputs/azuremonitor/azuremonitor.go

+	AzureTenantID       string            `toml:"azure_tenant"`
+	AzureClientID       string            `toml:"azure_client_id"`
+	AzureClientSecret   string            `toml:"azure_client_secret"`
+	StringAsDimension   bool              `toml:"string_as_dimension"`


What is StringAsDimension? There isn't such example in the sample config?

asheniam · 2018-06-11T01:53:08Z

plugins/outputs/azuremonitor/azuremonitor.go

+		return &AzureMonitor{
+			StringAsDimension: false,
+			Timeout:           internal.Duration{Duration: time.Second * 5},
+			Region:            defaultRegion,


As mentioned earlier, we shouldn't treat region special with a default region. This should be treated as the rest of the configuration -- either it comes from instance metadata or from user supplied config.

asheniam · 2018-06-11T02:29:44Z

@danielnelson - I saw the milestone changed from 1.7 to 1.8. What is the timeline for 1.8?

danielnelson · 2018-06-12T00:35:43Z

I believe 1.8 will be finished around the end of August? Does that sounds right @russorat?

danielnelson · 2018-06-12T20:34:56Z

plugins/outputs/azuremonitor/azuremonitor.go

+			continue
+		}
+		for id := range a.cache[tbucket] {
+			a.cache[tbucket][id].updated = false


If the interval is lower than the 1m aggregation period, this can get set to false before the metric has had a chance to be returned in Push() since you may have multiple calls to Reset() before the metric is returned. This causes the plugin not to write any metrics unless you set the set the flush_interval to at least 1m.

danielnelson · 2018-06-14T23:50:23Z

plugins/outputs/azuremonitor/azuremonitor.go

+
+	// Pull region and resource identifier
+	err := a.GetInstanceMetadata()
+	if err != nil && a.ResourceID == "" && a.Region == "" {


I think this should be || instead of &&, we should always return if err != nil, so that we are sure to see all errors and I think we need both of these set.

Also, if Region and ResourceID are set beforehand, maybe we can skip this function completely?

danielnelson · 2018-06-14T23:52:27Z

plugins/outputs/azuremonitor/azuremetadata.go

+}
+
+// GetInstanceMetadata retrieves metadata about the current Azure VM
+func (a *AzureMonitor) GetInstanceMetadata() error {


I think ideally this function would return region, resource, error. The calling function would combine these with the plugin settings and create the url. Also, consider passing in the client and making this and the function above a free function.

danielnelson · 2018-06-15T00:04:04Z

plugins/outputs/azuremonitor/azuremonitor.go

+	if err != nil && a.ResourceID == "" && a.Region == "" {
+		return fmt.Errorf("E! No resource id specified, and Azure Instance metadata service not available.  If not running on an Azure VM, provide a value for resource_id")
+	}
+


Once we have the final discovered URL, I think we should add a log message with the final version at debug level.

danielnelson · 2018-07-19T18:02:03Z

plugins/outputs/azure_monitor/README.md

+   Identity](https://docs.microsoft.com/en-us/azure/active-directory/msi-overview)
+   for more details. Only available on ARM-based resources.
+
+**Note: As shown above, the last option (#5) is the preferred way to


I guess this should be number 4

danielnelson · 2018-07-19T18:04:08Z

plugins/outputs/azure_monitor/README.md

+**Note: As shown above, the last option (#5) is the preferred way to
+authenticate when running Telegraf on Azure VMs. The VMs will need to be given
+access to the Azure Monitor to publish custom metrics. Instructions on how to
+grant access can be found [here]()**


Don't forget to add the link target. I suggest just turning the whole sentence into a link:

[Instructions on how to grant access](http://example.org).

danielnelson · 2018-07-19T18:04:37Z

plugins/outputs/azure_monitor/README.md

+If Telegraf is not running on a virtual machine or the VM Instance Metadata service is not available, the following variables are required for the output to function.
+
+* region
+* resourceId


resource_id

danielnelson · 2018-07-19T18:11:17Z

plugins/outputs/azure_monitor/README.md

+
+This plugin will send custom metrics to Azure Monitor.
+Azure Monitor has a metric resolution of one minute.
+To handle this in Telegraf, the Azure Monitor output plugin will automatically aggregates metrics into one minute buckets, which are then sent to Azure Monitor on every flush interval.


I know you are trying to do the "semantic linefeeds" style, but can you make sure to wrap all the lines at no more than 78 chars. As an aside, I don't really care for this style, I find it hard to read the plain text and the diff advantage is small as changes can be handled using --word-diff.

danielnelson · 2018-07-19T18:15:40Z

plugins/outputs/azure_monitor/README.md

+
+### Configuration:
+
+```


For this can you run telegraf --usage azure_monitor and use the output (minus the Description text).

danielnelson · 2018-07-19T22:43:06Z

plugins/outputs/azure_monitor/azure_monitor.go

+		return err
+	}
+
+	// req, err := http.NewRequest("POST", a.url, bytes.NewBuffer(body))


Remove commented out code

danielnelson · 2018-07-19T22:45:32Z

plugins/outputs/azure_monitor/azure_monitor.go

+	// refresh the token if needed.
+	req, err = autorest.CreatePreparer(a.auth.WithAuthorization()).Prepare(req)
+	if err != nil {
+		return fmt.Errorf("E! [outputs.azure_monitor] Unable to fetch authentication credentials: %v", err)


This is an error, not a log message, so don't add the log level or module: unable to fetch authentication....

danielnelson · 2018-07-19T22:46:34Z

plugins/outputs/azure_monitor/azure_monitor.go

+	}
+
+	if resp.StatusCode < 200 || resp.StatusCode > 299 {
+		return fmt.Errorf("E! Failed to write: %v", string(rbody))


Remove log level, start with lowercase. I would not include the body as it could be very long.

danielnelson · 2018-07-19T22:59:36Z

plugins/outputs/azure_monitor/azure_monitor.go

+			continue
+		}
+		for id := range a.cache[tbucket] {
+			a.cache[tbucket][id].updated = false


I think you should reset this in Push, this will be more resilient against reordered calls even though they shouldn't happen in the current implementation: Push -> Add -> Reset. Also, Push feels like the more appropriate place to clear the update flag.

danielnelson · 2018-07-19T23:12:16Z

plugins/outputs/azure_monitor/azure_monitor_test.go

@@ -0,0 +1,275 @@
+package azure_monitor


Let us know when the tests are ready.

glinton

Appears all that's left is to add your tests and update your branch

glinton · 2018-08-13T21:26:09Z

plugins/outputs/azure_monitor/azure_monitor_test.go

+		fields  fields
+		wantErr bool
+	}{
+		// TODO: Add test cases.


gunnaraasen force-pushed the ga-azure-monitor branch from e4f0f68 to aecf5bb Compare April 30, 2018 22:06

danielnelson added the new plugin label May 1, 2018

danielnelson added this to the 1.7.0 milestone May 1, 2018

danielnelson reviewed May 1, 2018

View reviewed changes

asheniam reviewed May 1, 2018

View reviewed changes

karolz-ms mentioned this pull request May 2, 2018

Microsoft Application Insights output plugin #4010

Merged

3 tasks

gunnaraasen force-pushed the ga-azure-monitor branch from aecf5bb to 40c37aa Compare May 7, 2018 15:59

gunnaraasen force-pushed the ga-azure-monitor branch from 46e7131 to 4b8d0ad Compare May 24, 2018 17:11

danielnelson modified the milestones: 1.7.0, 1.8.0 Jun 3, 2018

asheniam reviewed Jun 11, 2018

View reviewed changes

danielnelson reviewed Jun 12, 2018

View reviewed changes

danielnelson reviewed Jun 15, 2018

View reviewed changes

gunnaraasen force-pushed the ga-azure-monitor branch 4 times, most recently from 3639d05 to 50854bf Compare July 11, 2018 15:55

danielnelson reviewed Jul 19, 2018

View reviewed changes

gunnaraasen force-pushed the ga-azure-monitor branch 2 times, most recently from 08d6e68 to 5de442b Compare August 8, 2018 18:07

glinton suggested changes Aug 13, 2018

View reviewed changes

plugins/outputs/azure_monitor/azure_monitor_test.go Outdated

fields fields

wantErr bool

}{

// TODO: Add test cases.

Copy link

Contributor

glinton Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add tests

danielnelson and others added 3 commits August 29, 2018 06:25

Add idea for an output that aggregates before adding to metric buffer

28bf19d

Starting on azure monitor metrics integration with MSI auth

c092804

Output: Azure Monitor: Initial aggregated metric implementation

f596bd6

gunnaraasen added 7 commits August 29, 2018 06:32

Finish authorization refactor

1560f22

More refactoring

9340102

Finish auth refactor and address remaining PR feedback

92600fc

Rename output to azure_monitor along with README updates

6cf2d06

Address PR feedback

16afd80

Remove temporarily remove tests to build package

0255e11

Fix namespace issue

fb70450

gunnaraasen force-pushed the ga-azure-monitor branch from 7760b87 to fb70450 Compare August 29, 2018 15:45

danielnelson added 6 commits August 31, 2018 14:18

Remove trailing whitespace

55631dd

Add selfstat for metric_outside_window

acceeef

Adjust spacing in sample config

f4b21ca

Only pull instance metadata if not fully specified

0052401

Add testutils for comparing telegraf.Metric

0df4b6b

Add tests for aggregation functions

00c11b9

glinton approved these changes Sep 4, 2018

View reviewed changes

danielnelson added 8 commits September 4, 2018 14:30

Add tests for write method of azure monitor output

8e4d345

Use dedicated aggregate type instead of Metric

949bfe9

Fix testutil metrics testcase

b277b5f

Call Add/Push/Write sequentially

c112dbe

Fix error if tag value is empty

a8e8ced

Update readme

1a7c64b

Update region list in readme

ed1f080

Update license of dependencies

e21494a

danielnelson merged commit f70d651 into master Sep 5, 2018

danielnelson deleted the ga-azure-monitor branch September 5, 2018 21:50

rgitzel pushed a commit to rgitzel/telegraf that referenced this pull request Oct 17, 2018

Add Azure Monitor output plugin (influxdata#4089)

ff34788

otherpirate pushed a commit to otherpirate/telegraf that referenced this pull request Mar 15, 2019

Add Azure Monitor output plugin (influxdata#4089)

ad6388d

otherpirate pushed a commit to otherpirate/telegraf that referenced this pull request Mar 15, 2019

Add Azure Monitor output plugin (influxdata#4089)

ccd60f3

dupondje pushed a commit to dupondje/telegraf that referenced this pull request Apr 22, 2019

Add Azure Monitor output plugin (influxdata#4089)

034601e

athoune pushed a commit to bearstech/telegraf that referenced this pull request Apr 17, 2020

Add Azure Monitor output plugin (influxdata#4089)

978850e


		### Configuration:

		```

Conversation

gunnaraasen commented Apr 30, 2018

Required for all PRs:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gunnaraasen commented May 7, 2018

Uh oh!

danielnelson commented May 7, 2018

Uh oh!

asheniam left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asheniam commented Jun 11, 2018

Uh oh!

danielnelson commented Jun 12, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment