Skip to content

Allow sending logs to stdout by using STDOUT_LOG env var #334

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 28, 2023

Conversation

magec
Copy link
Collaborator

@magec magec commented Feb 28, 2023

Hi!

We are currently using PgCat in production 💃 (at least for some hours already), thanks for the good work!.

There are two things that have came up during the deployment:

  • Logs are all written to stderr: We expect error logging to go to stderr and warns/info/etc to stdout. This is because we monitor the output of every piece of software we run and have set alarms to check whether there is an increase in the error logging. Current logging implementation does not allow a fine grained control over this.

  • Stats filled out the buffer: There is a buffer for sending stats in the code, it is currently set to 100_000. When we started using this, during peaks of traffic, this buffer filled up and led the system into an inconsistent state and a flood of
    [2023-02-23T12:09:59.609264Z WARN pgcat::stats] Got event ServerActive { client_id: 1612304972, server_id: 1350173897 } for unregistered server Started. Our current implementation reaches 100k clients connected to 4 different instances of pgcat, so at high traffic, that buffer can fill up.

We propose a solution here where the log can be configured to log to stdout (maintaining backward compatibility), and also where the buffer is increased (by 5).

Logging mechanism

The idea was to use a special kind of logger that allows sending logs to different targets depending on the log level.

By default, if nothing is set, it acts as a regular env_log logger, it sends everything to standard error.

If the Env variable STDOUT_LOG is defined, it will be used for configuring the standard out logger.

The behavior is:

  • If it is an error, the message is written to standard error.
  • If it is not, and it matches the log level of the standard output logger (STDOUT_LOG env var), it will be send to standard output.
  • If the above is not true, it is sent to the stderr logger that will log it or not depending on the value
    of the RUST_LOG env var.

So to summarize, if no STDOUT_LOG env var is present, the logger is the default logger. If STDOUT_LOG is set, everything but errors, that matches the log level set in the STDOUT_LOG env var is sent to stdout. You can have also some esoteric configuration where you set RUST_LOG=debug and STDOUT_LOG=info, in here, erros will go to stderr, warns and infos to stdout and debugs to stderr.

In a previous contribution I was said that 'We are not supporting env vars for configuration yet' but given that this is a special case, because is the log system I implemented it this way. If you think the name of the ENV var is not the right one, or have another proposal for the implementation just let me know. Still, independently of the implementation, I think logging everything to stderr is not flexive enough, actually that's why stderr/stdout exists.

If we agree on a merge, I'll write documentation on the README for whaver was agreed upon.

@@ -160,7 +161,7 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
let client_server_map: ClientServerMap = Arc::new(Mutex::new(HashMap::new()));

// Statistics reporting.
let (stats_tx, stats_rx) = mpsc::channel(100_000);
let (stats_tx, stats_rx) = mpsc::channel(500_000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great find! @levkk do you think we could setup an mpmc channel for this to avoid this bottleneck?

Copy link
Contributor

@levkk levkk Feb 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how mpmc will help here, since we'll have to dedup the events in the end. The events count things like statistics, how many clients are connected, queries & transactions sent etc.

Current metrics implementation is a bit lax in guarantees of accuracy. I'm thinking more and more we should switch to atomics instead and count what we need on each ConnectionPool. Since atomics can be read and written to very quickly from many threads, I think it'll work nicely.

This change is good however, to keep us going. 500_000 should be enough, and the only constraint here is memory. The events themselves should be pretty small and since this is just a limit to help process burst workloads, I think this will continue to work.

Copy link
Contributor

@levkk levkk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great!

@levkk levkk merged commit 9241df1 into postgresml:main Feb 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants