Conversation
e4f0f68 to
aecf5bb
Compare
|
|
||
| ``` | ||
| # Configuration for sending aggregate metrics to Azure Monitor | ||
| [[outputs.azuremonitor]] |
There was a problem hiding this comment.
Not required but consider renaming azure_monitor.
| ## specified, the plugin will attempt to retrieve the resource ID | ||
| ## of the VM via the instance metadata service (optional if running | ||
| ## on an Azure VM with MSI) | ||
| #resourceId = "/subscriptions/<subscription-id>/resourceGroups/<resource-group>/providers/Microsoft.Compute/virtualMachines/<vm-name>" |
There was a problem hiding this comment.
Use snake_case for all options.
| } | ||
|
|
||
| if resp.StatusCode >= 300 || resp.StatusCode < 200 { | ||
| return nil, fmt.Errorf("Post Error. HTTP response code:%d message:%s, content: %s", |
There was a problem hiding this comment.
Should this be a "GET Error"?
| } | ||
|
|
||
| if resp.StatusCode >= 300 || resp.StatusCode < 200 { | ||
| return nil, fmt.Errorf("Post Error. HTTP response code:%d message:%s reply:\n%s", |
There was a problem hiding this comment.
GET Error, also convert to a single line error string since these will end up in the logging output.
| // t.Logf("metadata is \n%v", metadata) | ||
| // } | ||
|
|
||
| //fmt.Printf("metadata is \n%v", metadata) |
There was a problem hiding this comment.
Don't forget to clear out this code.
| metadataService *AzureInstanceMetadata | ||
| instanceMetadata *VirtualMachineMetadata | ||
| msiToken *msiToken | ||
| msiResource string |
There was a problem hiding this comment.
Make this a package const
| instanceMetadata *VirtualMachineMetadata | ||
| msiToken *msiToken | ||
| msiResource string | ||
| bearerToken string |
There was a problem hiding this comment.
Use directly from msiToken
| msiToken *msiToken | ||
| msiResource string | ||
| bearerToken string | ||
| expiryWatermark time.Duration |
There was a problem hiding this comment.
I see this being used but where is it set?
| expiryWatermark time.Duration | ||
|
|
||
| oauthConfig *adal.OAuthConfig | ||
| adalToken adal.OAuthTokenProvider |
There was a problem hiding this comment.
This I see being set but not used.
| period time.Duration | ||
| delay time.Duration | ||
| periodStart time.Time | ||
| periodEnd time.Time |
There was a problem hiding this comment.
I think these two times can just be locals
| return nil, fmt.Errorf("Error authenticating: %v", err) | ||
| } | ||
|
|
||
| metricsEndpoint := fmt.Sprintf("https://%s.monitoring.azure.com%s/metrics", |
There was a problem hiding this comment.
One thing to be careful here -- As custom metrics is in preview on the Azure Monitor side, we don't support all regions of public Azure. We are only going to be available for a few regions as part of the reviews so not all these endpoints will exist.
| return nil, fmt.Errorf("Error authenticating: %v", err) | ||
| } | ||
|
|
||
| metricsEndpoint := fmt.Sprintf("https://%s.monitoring.azure.com%s/metrics", |
There was a problem hiding this comment.
For any of the endpoints that the output plugin is communicating with (ex. https://.monitoring.azure.com, we should ideally move all these endpoints into single place in the code, not sprinkle them across all files. This will help future proof the plugin to more easily support other Azure clouds down the road (Azure Germany, Azure China, Azure US Government) which will all have their own endpoints.
aecf5bb to
40c37aa
Compare
|
@danielnelson I've updated this branch to use the running output aggregator pattern you suggested. I moved the original PR code to the |
|
Looks good, like how you dealt with not having the filter for old metrics. Gives me some ideas for improving this with the normal aggregators. |
46e7131 to
4b8d0ad
Compare
asheniam
left a comment
There was a problem hiding this comment.
Adding feedback on the Azure Monitor output plugin
| #resource_id = "/subscriptions/<subscription_id>/resourceGroups/<resource_group>/providers/Microsoft.Compute/virtualMachines/<vm_name>" | ||
| ## Azure region to publish metrics against. Defaults to eastus. | ||
| ## Leave blank to automatically query the region via MSI. | ||
| #region = "useast" |
There was a problem hiding this comment.
Use "eastus" as the default, not "useast". "useast" will not work -- it won't resolve to any monitoring endpoint.
| } | ||
|
|
||
| const ( | ||
| defaultRegion string = "eastus" |
There was a problem hiding this comment.
We shouldn't have a default region constant. This value either needs to come from instance metadata or come from user configuration. The region must match the region of the Azure resource ID and can't be guessed.
| return err | ||
| } | ||
|
|
||
| req.Header.Set("Authorization", "Bearer "+a.msiToken.AccessToken) |
There was a problem hiding this comment.
If we are not using MSI, where are we setting a.adalToken?
| } | ||
| defer resp.Body.Close() | ||
|
|
||
| if resp.StatusCode >= 300 || resp.StatusCode < 200 { |
There was a problem hiding this comment.
Why are we only doing fmt.Errorf for status codes in the [300, 200) range. We should also follow this pattern for any errors encountered in the 4xx or 5xx range
| Data: &azureMonitorData{ | ||
| BaseData: &azureMonitorBaseData{ | ||
| Metric: m.Name(), | ||
| Namespace: "default", |
There was a problem hiding this comment.
Any reason we choose to set the metric namespace to be "default"? This might be good for users to override via config but as a default, it might be better to go with a value which has "Telegraf" in the namespace.
| for _, m := range azmetrics { | ||
| // Azure Monitor accepts new batches of points in new-line delimited | ||
| // JSON, following RFC 4288 (see https://github.com/ndjson/ndjson-spec). | ||
| jsonBytes, err := json.Marshal(&m) |
There was a problem hiding this comment.
How large can azmetrics be? I believe the Azure Monitor metric API has a max request body size of 4MB. If we exceed this limit, we should issue multiple POST requests
| # timeout = "5s" | ||
|
|
||
| ## Whether or not to use managed service identity. | ||
| #use_managed_service_identity = true |
There was a problem hiding this comment.
Nit: In the sample configuration, we should make it clear that when use_managed_service_identify is false, it's required that the user supply resource_id, region, azure_subscription, azure_tenant, azure_client_id, and azure_client_secret. These become mandatory parameters.
| useMsi bool `toml:"use_managed_service_identity"` | ||
| ResourceID string `toml:"resource_id"` | ||
| Region string `toml:"region"` | ||
| Timeout internal.Duration `toml:"Timeout"` |
There was a problem hiding this comment.
Nit: Should this be lower case "timeout" instead of "Timeout"?
| AzureTenantID string `toml:"azure_tenant"` | ||
| AzureClientID string `toml:"azure_client_id"` | ||
| AzureClientSecret string `toml:"azure_client_secret"` | ||
| StringAsDimension bool `toml:"string_as_dimension"` |
There was a problem hiding this comment.
What is StringAsDimension? There isn't such example in the sample config?
| return &AzureMonitor{ | ||
| StringAsDimension: false, | ||
| Timeout: internal.Duration{Duration: time.Second * 5}, | ||
| Region: defaultRegion, |
There was a problem hiding this comment.
As mentioned earlier, we shouldn't treat region special with a default region. This should be treated as the rest of the configuration -- either it comes from instance metadata or from user supplied config.
|
@danielnelson - I saw the milestone changed from 1.7 to 1.8. What is the timeline for 1.8? |
|
I believe 1.8 will be finished around the end of August? Does that sounds right @russorat? |
| continue | ||
| } | ||
| for id := range a.cache[tbucket] { | ||
| a.cache[tbucket][id].updated = false |
There was a problem hiding this comment.
If the interval is lower than the 1m aggregation period, this can get set to false before the metric has had a chance to be returned in Push() since you may have multiple calls to Reset() before the metric is returned. This causes the plugin not to write any metrics unless you set the set the flush_interval to at least 1m.
|
|
||
| // Pull region and resource identifier | ||
| err := a.GetInstanceMetadata() | ||
| if err != nil && a.ResourceID == "" && a.Region == "" { |
There was a problem hiding this comment.
I think this should be || instead of &&, we should always return if err != nil, so that we are sure to see all errors and I think we need both of these set.
Also, if Region and ResourceID are set beforehand, maybe we can skip this function completely?
| } | ||
|
|
||
| // GetInstanceMetadata retrieves metadata about the current Azure VM | ||
| func (a *AzureMonitor) GetInstanceMetadata() error { |
There was a problem hiding this comment.
I think ideally this function would return region, resource, error. The calling function would combine these with the plugin settings and create the url. Also, consider passing in the client and making this and the function above a free function.
| if err != nil && a.ResourceID == "" && a.Region == "" { | ||
| return fmt.Errorf("E! No resource id specified, and Azure Instance metadata service not available. If not running on an Azure VM, provide a value for resource_id") | ||
| } | ||
|
|
There was a problem hiding this comment.
Once we have the final discovered URL, I think we should add a log message with the final version at debug level.
3639d05 to
50854bf
Compare
| Identity](https://docs.microsoft.com/en-us/azure/active-directory/msi-overview) | ||
| for more details. Only available on ARM-based resources. | ||
|
|
||
| **Note: As shown above, the last option (#5) is the preferred way to |
There was a problem hiding this comment.
I guess this should be number 4
| **Note: As shown above, the last option (#5) is the preferred way to | ||
| authenticate when running Telegraf on Azure VMs. The VMs will need to be given | ||
| access to the Azure Monitor to publish custom metrics. Instructions on how to | ||
| grant access can be found [here]()** |
There was a problem hiding this comment.
Don't forget to add the link target. I suggest just turning the whole sentence into a link:
[Instructions on how to grant access](http://example.org).
| If Telegraf is not running on a virtual machine or the VM Instance Metadata service is not available, the following variables are required for the output to function. | ||
|
|
||
| * region | ||
| * resourceId |
|
|
||
| This plugin will send custom metrics to Azure Monitor. | ||
| Azure Monitor has a metric resolution of one minute. | ||
| To handle this in Telegraf, the Azure Monitor output plugin will automatically aggregates metrics into one minute buckets, which are then sent to Azure Monitor on every flush interval. |
There was a problem hiding this comment.
I know you are trying to do the "semantic linefeeds" style, but can you make sure to wrap all the lines at no more than 78 chars. As an aside, I don't really care for this style, I find it hard to read the plain text and the diff advantage is small as changes can be handled using --word-diff.
|
|
||
| ### Configuration: | ||
|
|
||
| ``` |
There was a problem hiding this comment.
For this can you run telegraf --usage azure_monitor and use the output (minus the Description text).
| return err | ||
| } | ||
|
|
||
| // req, err := http.NewRequest("POST", a.url, bytes.NewBuffer(body)) |
There was a problem hiding this comment.
Remove commented out code
| // refresh the token if needed. | ||
| req, err = autorest.CreatePreparer(a.auth.WithAuthorization()).Prepare(req) | ||
| if err != nil { | ||
| return fmt.Errorf("E! [outputs.azure_monitor] Unable to fetch authentication credentials: %v", err) |
There was a problem hiding this comment.
This is an error, not a log message, so don't add the log level or module: unable to fetch authentication....
| } | ||
|
|
||
| if resp.StatusCode < 200 || resp.StatusCode > 299 { | ||
| return fmt.Errorf("E! Failed to write: %v", string(rbody)) |
There was a problem hiding this comment.
Remove log level, start with lowercase. I would not include the body as it could be very long.
| continue | ||
| } | ||
| for id := range a.cache[tbucket] { | ||
| a.cache[tbucket][id].updated = false |
There was a problem hiding this comment.
I think you should reset this in Push, this will be more resilient against reordered calls even though they shouldn't happen in the current implementation: Push -> Add -> Reset. Also, Push feels like the more appropriate place to clear the update flag.
| @@ -0,0 +1,275 @@ | |||
| package azure_monitor | |||
There was a problem hiding this comment.
Let us know when the tests are ready.
08d6e68 to
5de442b
Compare
glinton
left a comment
There was a problem hiding this comment.
Appears all that's left is to add your tests and update your branch
| fields fields | ||
| wantErr bool | ||
| }{ | ||
| // TODO: Add test cases. |
7760b87 to
fb70450
Compare
Required for all PRs:
This is a new output plugin for Azure Monitor. I will be adding more unit tests soon.