Skip to content

Conversation

crystall-bitquill
Copy link
Contributor

@crystall-bitquill crystall-bitquill commented Oct 10, 2023

Summary

Add additional logging for efm

Description

  • Adds additional logging in the MonitorImpl class for previously unhandled exceptions and for scenarios where monitoring has stopped but the startMonitoring method was called again.
  • Changes the monitoring loop to continue running if an unhandled exception is thrown.

Related to #675

Additional Reviewers

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

} finally {
if (this.monitoringConn != null) {
try {
this.monitoringConn.close();
} catch (final SQLException ex) {
// ignore
LOGGER.warning(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I'm not sure this message adds extra clarity.

@@ -94,6 +94,9 @@ public MonitorImpl(

@Override
public void startMonitoring(final MonitorConnectionContext context) {
if (this.stopped) {
LOGGER.warning(() -> Messages.get("MonitorImpl.monitorIsStopped"));
Copy link
Contributor

@sergiyvamz sergiyvamz Oct 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add monitoring host name to the message?

@crystall-bitquill crystall-bitquill force-pushed the issue-675 branch 2 times, most recently from 4ee88ef to 10ba058 Compare October 10, 2023 23:48
@davecramer
Copy link
Contributor

Do we have an issue to refer to that prompted this change ?

@crystall-bitquill
Copy link
Contributor Author

Do we have an issue to refer to that prompted this change ?

Yes, #675. I've updated the PR description.

@@ -188,6 +188,9 @@ MonitorThreadContainer.emptyNodeKeys=Provided node keys are empty.

# Monitor Impl
MonitorImpl.contextNullWarning=Parameter 'context' should not be null.
MonitorImpl.interruptedExceptionDuringMonitoring=Monitoring thread for node {0} was interrupted: {1}
MonitorImpl.exceptionDuringMonitoring=Unhandled exception in monitoring thread for node {0}: {1}
MonitorImpl.monitorIsStopped=Monitoring has already stopped for node {0}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: Monitor was or is already stopped?

() -> Messages.get(
"MonitorImpl.interruptedExceptionDuringMonitoring",
new Object[] {this.hostSpec.getHost(), intEx.getMessage()}));
} catch (final Exception ex) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we expecting any other kind of exception? (we did not catch it beforehand)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're adding extra logging in case there was an exception we hadn't accounted for stopping this method and going undetected. This is because of the OOM error that was noticed after the newContexts queue had too many context objects added to it.

new Object[] {this.hostSpec.getHost(), intEx.getMessage()}));
} catch (final Exception ex) {
// do nothing; exit thread
LOGGER.warning(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to change code so such unhandled exceptions to be logged but the monitoring thread/loop keeps running.

@crystall-bitquill crystall-bitquill changed the title chore: add additional logging for efm fix: continue monitoring unless InterruptedException is thrown Oct 17, 2023
@crystall-bitquill crystall-bitquill force-pushed the issue-675 branch 2 times, most recently from eabc10f to 81530fe Compare October 21, 2023 01:59
this.activeContexts.add(monitorContext);
break;
}
synchronized (monitorContext) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this really have to be synchronized? It is a local variable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a local variable but we add it to queues in MonitorImpl that runs in a separate thread. Monitoring thread may change an internal state of the context. The idea was to synchronize on it to avoid multi-threading collisions. It seems this intent isn't properly implemented here. Need to address it in a separate PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not fix it in this PR ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some investigation, it seems like the current implementation is working and a fix isn't required.

@crystall-bitquill crystall-bitquill force-pushed the issue-675 branch 2 times, most recently from a29d9c4 to 249a8b8 Compare October 24, 2023 19:36
@crystall-bitquill crystall-bitquill changed the title fix: continue monitoring unless InterruptedException is thrown fix: continue monitoring if unhandled Exception is thrown Oct 24, 2023
@crystall-bitquill crystall-bitquill force-pushed the issue-675 branch 2 times, most recently from ddc8a54 to 6b1e5c4 Compare October 24, 2023 19:40
chore: add additional logging for efm
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants