Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

newlog: add filter, dedup and counter functions #4702

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

europaul
Copy link
Contributor

Newlog will have 3 ways to reduce the amount of logs:

  • filter: filter logs out based on the source code line that produced them
  • counter: count the number of logs produced by a specific source code line. Add that number to the first occurance of the log and remove the rest
  • deduplicator (for errors only): record the last X errors in a sliding window and remove duplicates

The benchmarking of the newlogd with the new features is in the dedup_test.go file. It shows that CPU and RAM usage increase by a factor of 3 when the features are enabled. So they can be disabled by setting the deduplication window size to 0 and not providing anything to the filter and counter functions.

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds new features to reduce log volume in Newlog by implementing three log reduction mechanisms: filtering, deduplication, and counting.

  • Introduces a deduplication feature for error logs using a ring buffer.
  • Implements a log counter that appends occurrence counts to log entries.
  • Adds configurable log filtering and updates global configuration settings accordingly.

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
pkg/newlog/cmd/dedup.go Adds deduplication functions for real‐time log reduction
pkg/newlog/cmd/counter.go Implements a counter to tag and suppress duplicate log entries
pkg/newlog/cmd/dedup_test.go Provides tests for deduplication logic
pkg/newlog/cmd/filter.go Introduces log filtering using configurable filename filters
pkg/newlog/cmd/newlogd.go Updates main log processing to integrate the new deduplication and filtering logic
pkg/pillar/types/global.go Adds new global settings keys for deduplication, log counting, and filtering
pkg/newlog/cmd/newlogd_test.go Minor test update for gzip log parsing

@europaul europaul force-pushed the log-filtering-and-dedup branch from c0462b9 to e947722 Compare March 21, 2025 20:07
@europaul europaul requested a review from Copilot March 21, 2025 20:08
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request adds three new log reduction mechanisms—filtering, deduplication, and counting—to help reduce excessive logs. Key changes include adding deduplication functionality with a sliding window (pkg/newlog/cmd/dedup.go), implementing log counting (pkg/newlog/cmd/counter.go), introducing log filtering (pkg/newlog/cmd/filter.go), and integrating these features via configuration into the log compression routine (pkg/newlog/cmd/newlogd.go).

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
pkg/newlog/cmd/dedup.go New deduplication logic with a sliding window for error log entries
pkg/newlog/cmd/counter.go New log counting functionality to annotate log entries
pkg/newlog/cmd/dedup_test.go Test cases for deduplication with an identified iteration bug
pkg/newlog/cmd/filter.go New log filtering mechanism based on filename criteria
pkg/pillar/types/global.go Added configuration items for deduplication and filtering settings
pkg/newlog/cmd/newlogd.go Integrated log deduplication, counting, and filtering into file compression
pkg/newlog/cmd/newlogd_test.go Test scaffolding for log compression and integration testing

@europaul
Copy link
Contributor Author

Waiting on #4703 to be merged to import the newer version of pillar with the right global config parameters.

@europaul europaul force-pushed the log-filtering-and-dedup branch from a2357dc to e2ff9b3 Compare March 28, 2025 09:56
@europaul
Copy link
Contributor Author

Yetus seems to be bailing out again because the PR includes too many updated vendor dependencies.

@rene
Copy link
Contributor

rene commented Mar 28, 2025

@eriknordmark could you rebase this PR, go modules for pkg/newlog were updated by dependabot...

@europaul europaul force-pushed the log-filtering-and-dedup branch from e2ff9b3 to aac93d6 Compare March 28, 2025 14:46
@europaul
Copy link
Contributor Author

@eriknordmark could you rebase this PR, go modules for pkg/newlog were updated by dependabot...

@rene done

Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see yetus/golangcilint is failing with the output
pkg/newlog/level=error msg="Running error: context loading failed: failed to load packages: failed to load with go/packages: err: exit status 1: stderr: go: downloading go1.24 (linux/amd64)\ngo: download go1.24 for linux/amd64: toolchain not available\n"
pkg/newlog/level=warning msg="Failed to discover go env: failed to run 'go env': exit status 1"

@europaul
Copy link
Contributor Author

europaul commented Mar 28, 2025

I see yetus/golangcilint is failing with the output pkg/newlog/level=error msg="Running error: context loading failed: failed to load packages: failed to load with go/packages: err: exit status 1: stderr: go: downloading go1.24 (linux/amd64)\ngo: download go1.24 for linux/amd64: toolchain not available\n" pkg/newlog/level=warning msg="Failed to discover go env: failed to run 'go env': exit status 1"

@eriknordmark where can you see this? I cannot find the error

nvm, I found it in scan-results artefact

@rene
Copy link
Contributor

rene commented Mar 29, 2025

@europaul , LGTM, just left a few comments....


// deduplicateLogs can be used to deduplicate logs on the fly reading from a channel
// and writing to another channel
func deduplicateLogs(in <-chan inputEntry, out chan<- inputEntry) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this actually used outside of tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my first attempt was to dedup logs on the fly as they come into the newlogd - then this function was used. I'm still not sure which is the better way: this or deduplicating while compressing - that's why I kept the function

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We agreed to remove this and the tests as it is unused (at least right now)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 1338 to 1339
if dirName == uploadDevDir {
if len(filenameFilter.Load().(map[string]any)) != 0 || len(logCounter) != 0 || dedupWindowSize.Load() != 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if dirName == uploadDevDir {
if len(filenameFilter.Load().(map[string]any)) != 0 || len(logCounter) != 0 || dedupWindowSize.Load() != 0 {
if dirName == uploadDevDir &&
len(filenameFilter.Load().(map[string]any)) != 0 ||
len(logCounter) != 0 ||
dedupWindowSize.Load() != 0 {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think it's better readable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because it does not make the reader expect an else-branch for the inner if.

continue // we don't care about the error here
}
var useEntry bool
if useEntry = !filterOut(&logEntry); !useEntry {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if useEntry = !filterOut(&logEntry); !useEntry {
if useEntry = filterOut(&logEntry); useEntry {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or perhaps directly:

Suggested change
if useEntry = !filterOut(&logEntry); !useEntry {
if filterOut(&logEntry) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then the variable should be called here dontUseEntry and it breaks the nice flow a little :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double negation will break every flow ;-)

also filterOut returns a dontUseEntry not a useEntry. Perhaps filterOut is wrong then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine for me.

Copy link
Contributor Author

@europaul europaul Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also filterOut returns a dontUseEntry not a useEntry

that's exactly why it's negated :)

@europaul europaul force-pushed the log-filtering-and-dedup branch from aac93d6 to 71d0a86 Compare March 31, 2025 17:26
@github-actions github-actions bot requested a review from eriknordmark March 31, 2025 17:27
@europaul europaul force-pushed the log-filtering-and-dedup branch from 71d0a86 to 19061e6 Compare March 31, 2025 17:32
@europaul
Copy link
Contributor Author

I see yetus/golangcilint is failing with the output pkg/newlog/level=error msg="Running error: context loading failed: failed to load packages: failed to load with go/packages: err: exit status 1: stderr: go: downloading go1.24 (linux/amd64)\ngo: download go1.24 for linux/amd64: toolchain not available\n" pkg/newlog/level=warning msg="Failed to discover go env: failed to run 'go env': exit status 1"

@eriknordmark the key to solving this issue was to set go version in go.mod file to 1.24.1 (include the patch). Apparently it's a known problem golang/go#62278 (comment)

@europaul
Copy link
Contributor Author

Out current version of yetus is having problems with go 1.24, that's why I created another PR #4728 to upgrade yetus - it should work fine then (tested locally with mini-yetus ❤️ @shjala )

@europaul
Copy link
Contributor Author

europaul commented Mar 31, 2025

Folks, I would appreciate if someone could kick off Eden tests! Thank you 😊

Copy link
Contributor

@rene rene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run tests

@europaul europaul force-pushed the log-filtering-and-dedup branch from bf52a32 to 0b68040 Compare April 1, 2025 08:33
@europaul
Copy link
Contributor Author

europaul commented Apr 1, 2025

rebased to see yetus fails no more

@europaul europaul force-pushed the log-filtering-and-dedup branch from 0b68040 to 6481b77 Compare April 3, 2025 16:16
@eriknordmark
Copy link
Contributor

I see a set of yetus failures in the summary in https://github.com/lf-edge/eve/actions/runs/14247833424?pr=4702
Can some of those be fixed? (I don't know about the missing include complaints, but the other go code ones should be fixable.)

europaul added 3 commits April 4, 2025 11:54
This commit updates the go version to 1.24 and packages eve-api and
eve/pkg/pillar to latest versions.

Signed-off-by: Paul Gaiduk <[email protected]>
Newlog will have 3 ways to reduce the amount of logs:
- filter: filter logs out based on the source code line that
produced them
- counter: count the number of logs produced by a specific
source code line. Add that number to the first occurance of the log
and remove the rest
- deduplicator (for errors only): record the last X errors in a sliding
window and remove duplicates

The benchmarking of the newlogd with the new features is in the
dedup_test.go file. It shows that CPU and RAM usage increase by a factor
of 3 when the features are enabled. So they can be disabled by setting
the deduplication window size to 0 and not providing anything to the
filter and counter functions.

Signed-off-by: Paul Gaiduk <[email protected]>
Fix warnings like "redefinition of the built-in function" in newlogd.go
and "should not use dot imports" in newlogd_test.go.

Signed-off-by: Paul Gaiduk <[email protected]>
@europaul europaul force-pushed the log-filtering-and-dedup branch from 6481b77 to 77b1d15 Compare April 4, 2025 10:04
@europaul
Copy link
Contributor Author

europaul commented Apr 4, 2025

@eriknordmark

Can some of those be fixed?

Done

I don't know about the missing include complaints

These will be fixed as a part of a separate effort to make yetus work.

@rene
Copy link
Contributor

rene commented Apr 7, 2025

@europaul , the four Eden smoke tests are consistently failing in the Log tests: https://github.com/lf-edge/eve/actions/runs/14266695558/job/40034257107?pr=4702#step:3:1454

Apparently due to timeout, but it requires a investigation to figure out the root cause....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants