Skip to content

[ML] add new normalize_above parameter to p_value significant terms heuristic#78833

Merged
benwtrent merged 3 commits into
elastic:masterfrom
benwtrent:feature/ml-p_value-add-new-normalizing-param
Oct 12, 2021
Merged

[ML] add new normalize_above parameter to p_value significant terms heuristic#78833
benwtrent merged 3 commits into
elastic:masterfrom
benwtrent:feature/ml-p_value-add-new-normalizing-param

Conversation

@benwtrent

Copy link
Copy Markdown
Member

This commit adds the new normalize_above parameter to the p_value significant
terms heuristic.

This parameter allows for consistent significance results at various scales. When a total count (in or out of the set background set) is above the normalize_above parameter, both the total set and the set including the term are scaled by normalize_above/count where count is term in the set or total set size.

@elasticmachine elasticmachine added the Team:ML Meta label for the ML team label Oct 7, 2021
@elasticmachine

Copy link
Copy Markdown
Collaborator

Pinging @elastic/ml-core (Team:ML)

@benwtrent benwtrent force-pushed the feature/ml-p_value-add-new-normalizing-param branch from 103926f to ef26aa9 Compare October 7, 2021 14:27

@droberts195 droberts195 left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM apart from a couple of nits

Comment thread docs/reference/aggregations/bucket/significantterms-aggregation.asciidoc Outdated
Comment thread docs/reference/aggregations/bucket/significantterms-aggregation.asciidoc Outdated
@benwtrent benwtrent merged commit 843fa42 into elastic:master Oct 12, 2021
@benwtrent benwtrent deleted the feature/ml-p_value-add-new-normalizing-param branch October 12, 2021 14:38
benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request Oct 12, 2021
…euristic (elastic#78833)

This commit adds the new normalize_above parameter to the p_value significant
terms heuristic.

This parameter allows for consistent significance results at various scales. When a total count (in or out of the set background set) is above the normalize_above parameter, both the total set and the set including the term are scaled by normalize_above/count where count is term in the set or total set size.
benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request Oct 12, 2021
benwtrent added a commit that referenced this pull request Oct 12, 2021
…euristic (#78833) (#78999)

This commit adds the new normalize_above parameter to the p_value significant
terms heuristic.

This parameter allows for consistent significance results at various scales. When a total count (in or out of the set background set) is above the normalize_above parameter, both the total set and the set including the term are scaled by normalize_above/count where count is term in the set or total set size.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement :ml Machine learning Team:ML Meta label for the ML team v7.16.0 v8.0.0-beta1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants