Skip to content

Support mem type in nodes hot_threads API#72850

Merged
grcevski merged 7 commits into
elastic:masterfrom
easyice:hot_threads_with_mem
Sep 29, 2021
Merged

Support mem type in nodes hot_threads API#72850
grcevski merged 7 commits into
elastic:masterfrom
easyice:hot_threads_with_mem

Conversation

@easyice

@easyice easyice commented May 7, 2021

Copy link
Copy Markdown
Contributor

from #70345

this PR add memory type in hot_threads API, this can be help us find out which thread allocated many memory

Closes #70345

@elasticsearchmachine elasticsearchmachine added the external-contributor Pull request authored by a developer outside the Elasticsearch team label May 7, 2021
@jtibshirani jtibshirani added the :Core/Infra/Core Core issues without another label label May 18, 2021
@elasticmachine elasticmachine added the Team:Core/Infra Meta label for core/infra team label May 18, 2021
@elasticmachine

Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@williamrandolph

Copy link
Copy Markdown
Contributor

@easyice I'm sorry for our delay in response. We meant to discuss this issue in a recent meeting, but it slipped off the agenda. We will have a team discussion about it soon, and if it's something the wider team believes to be useful, we'll have someone do a PR review. Thank you for your contribution!

@easyice

easyice commented Jun 10, 2021

Copy link
Copy Markdown
Contributor Author

@williamrandolph Thank you for reply, It is not urgent

@williamrandolph

Copy link
Copy Markdown
Contributor

While we like the idea here, and would find it useful in many cases (for example, in debugging ingest processors), we have some concerns about the implementation, and specifically about the use of the com.sun.management.ThreadMXBean class.

First, the class itself has a warning in its Javadoc: "Platform-specific management interface for the thread system of the Java virtual machine. This platform extension is only available to a thread implementation that supports this extension." Do you know anything about which platforms support this extension? In which cases would you expect this not to work?

Second, we know that future versions of the JDK will remove access to certain internal JDK APIs. See JEP 396 and JEP 403. Will those changes affect this code?

Third, there is a setThreadAllocatedMemoryEnabled method that can enable or disable this memory report. Will users want to be able to configure this? Do you know of any performance implications to enabling or disabling it?

@rdnm rdnm removed the discuss label Jun 23, 2021
@easyice

easyice commented Jun 23, 2021

Copy link
Copy Markdown
Contributor Author

Thanks for @williamrandolph, this extension seem possible will be removed, but it is not clear if it will be implemented in any other way

  1. It's not stated in the javadoc, i tested it is support on HotSpot jvm, but not sure in other jvm
  2. this extension seems to be a risk of deletion, can we perform runtime checks like this:
    from:https://stackoverflow.com/questions/13876636/how-to-access-threadmx-of-com-sun-management-threadmxbean-on-jenkins-junit-tes
private static boolean enableBeanInspection = true;
private ThreadMXBean tBean = null;
private com.sun.management.ThreadMXBean sunBean = null;

public ThreadInspector() {
    // Ensure beans are null if we can't / don't want to use them
    if(enableBeanInspection) {
        tBean = ManagementFactory.getThreadMXBean();
        if(tBean instanceof com.sun.management.ThreadMXBean) { 
            sunBean = (com.sun.management.ThreadMXBean)tBean;
        }

        if(tBean.isThreadCpuTimeSupported()) {
            if(!tBean.isThreadCpuTimeEnabled()) {
                tBean.setThreadCpuTimeEnabled(true);
            }
        } else {
            tBean = null;
        }

        if(sunBean != null && sunBean.isThreadAllocatedMemorySupported()) {
            if(!sunBean.isThreadAllocatedMemoryEnabled()) {
                sunBean.setThreadAllocatedMemoryEnabled(true);
            }
        } else {
            sunBean = null;
        }
    }
}

protected long getThreadTime() {
    if(tBean != null) {
        return tBean.getThreadCpuTime(threadId);
    }
    return -1;
}

protected long getThreadMemory() {
    if(sunBean != null) {
        return sunBean.getThreadAllocatedBytes(threadId);
    }
    return -1;
}
  1. I did a simple benchmark , call getThreadAllocatedBytes function with 10k times on my mac notebook, it took 10ms, each call takes 1 microsecond

@grcevski grcevski left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a very good idea and as far as I can tell the API is here to stay for now. It's in the Java 17 LTS release and with proper checks in place we can return an unsupported message to the users, in case the API is removed from the JDK in the future.

I have requested some changes to make the report more clear and user friendly. Please let me know if some of the comments don't make sense, of if you have any questions. We also have unit tests for HotThreads now, it would be great to extend those for the new memory mode.

Thanks!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's best to check if isThreadAllocatedMemorySupported and isThreadAllocatedMemoryEnabled and then throw a similar 'unsupported exception' when using this API in memory mode, if they aren't both on. This way if we can't produce meaningful information for the end user, we'd be letting them know that know we can't.

Alternatively, If monitoring of thread allocations is supported, but it's not enabled, we could enable/disable it in try/finally inside this method, but this will require that we wrap the call to setThreadAllocatedMemoryEnabled inside doPriviledged call. Please see: #77935 for reference.

Both approaches are good, if allocation monitoring is supported it's typically on by default.

@easyice easyice Sep 22, 2021

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for you suggestion, it is very necessary, i will fix the code

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better if we reported the amount of memory allocated per thread in bytes, and avoid reporting percentages of total memory used.

The call to getHeapMemoryUsage() will give us info on how much of the absolute overall heap is currently occupied by objects that are either alive or dead. However, the amount of calculated allocated memory per thread is only what the thread allocated for the last 'interval' we slept for, e.g. 500ms. The percentages reported will be inaccurate.

We could get more accurate in reporting the percentages, by capturing the memory usage before and after the call to Thread.sleep(interval.millis()). However, the JVM comes with many Garbage Collector(GC) technologies, some of the more recent ones are concurrent, which means that they perform at least part of the collection process while the application threads are running.

Essentially, any GC interference would make the reported before/after 'used memory' unstable. For accurate percentages we'd need to rely on no GC interference which isn't possible.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small request here for more user friendly reporting. Would you consider reporting the bytes value with a print function that reduces the number of digits by applying byte units like KB, MB, GB? Also, I'd request a change of the wording 'usage by' to 'allocated by'. We can only tell how many bytes each thread has allocated, but we can't really tell how much memory they actually use or hold onto through reference chains, e.g. caches.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a very good suggestion, Could i change the message to something like this : "xxxKB allocated by thread ...." ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's a great choice.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your suggestion to use an approach similar to the 'ThreadInspector' to return the com.sun.management.ThreadMX is good. It would be better if it wasn't statically initialized so that we can return an error that we don't support this kind of report in an API error message, for clients that run Elasticsearch with other JVMs.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked for my JDK team in my company, they give me an other way (reflection) to avoid import com.sun package, i think both approaches are good

import java.lang.management.ManagementFactory;
import java.lang.reflect.Method;

import java.lang.management.ThreadMXBean;

public class Test {
    public static void main(String[] args) {
        ThreadMXBean threadMXBean = ManagementFactory.getThreadMXBean(); 
        try { 
            Method getBytes = threadMXBean.getClass().getMethod("getThreadAllocatedBytes", long.class); 
            getBytes.setAccessible(true); 

            long threadId = Thread.currentThread().getId();
            long bytes = (long)getBytes.invoke(threadMXBean, threadId);
            System.out.println(bytes);
        } catch (Throwable e) { 
            System.out.println(e);
        }
    }
}

@grcevski grcevski Sep 22, 2021

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both approaches are good, we import the com.sun packages in a few places already in Elasticsearch, so it's not unusual. We actually do something similar to the above (with reflection) to get access to the HotSpotDiagnosticMXBean class in JvmInfo.java here:

@SuppressWarnings("unchecked") Class<? extends PlatformManagedObject> clazz =

You can borrow some code from there if you'd like, HotSpotDiagnosticMXBean is in the same package as com.sun.management.ThreadMXBean.

@easyice easyice force-pushed the hot_threads_with_mem branch from 3554d4a to e634f2c Compare September 24, 2021 03:46
@easyice

easyice commented Sep 24, 2021

Copy link
Copy Markdown
Contributor Author

@grcevski I had fix the codes for review, and add some Test in HotThreadsTests class . but it seems difficult to mock getThreadAllocatedBytes , so i didn't write UT in HotThreadsTests#testInnerDetect for mem type, If you have other suggestions, I can continue to improve the code,Thanks!

@easyice easyice requested a review from grcevski September 26, 2021 00:44

@grcevski grcevski left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great now. LGTM!

I left few minor comments, if you could fix them, that would be great.

While reviewing the code I noticed a few spacing issues, we tend to put a space between ){ or before catch in } catch, as well as after a comma, e.g. , 2500L. If you can check to see if the style is met everywhere it would be really great.

Thanks for this contribution!

sb.append(String.format(Locale.ROOT, "%n%4.1f%% (%s out of %s) %s usage by thread '%s'%n",
percent, TimeValue.timeValueNanos(time), interval, type.getTypeValue(), threadName));

if (type.equals(ReportType.MEM)) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small remark here, since type is an Enum type now, it would be better if we checked with type == ReportType.MEM, this way we can have the compiler do the checking for us.


try {
long bytes = (long)getThreadAllocatedBytes.invoke(threadMXBean, id);
if (bytes < 0){

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a good idea to check for the value being less than 0. Since we return 0 in all error cases, can we please return Math.max(0, bytes) at line 81 here? Having some erroneous value that's negative would confuse our users.

@grcevski

Copy link
Copy Markdown
Contributor

@elasticmachine test this please

@easyice

easyice commented Sep 28, 2021

Copy link
Copy Markdown
Contributor Author

@grcevski Thanks for review, the comments is very helpful, I had fixed them:

  1. Reformat the code, Spaces have been added where needed
  2. change equals to ==
  3. return 0 if getThreadAllocatedBytes is error

@grcevski

Copy link
Copy Markdown
Contributor

@elasticmachine test this please

@grcevski

Copy link
Copy Markdown
Contributor

@elasticmachine update branch

@grcevski

Copy link
Copy Markdown
Contributor

@elasticmachine test this please

@grcevski grcevski left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @easyice, I apologize for requesting more changes, but while doing my final review before merge I realized that we might log many warning messages in the logs. Can you simply remove the logging statements and turn the log.debug code into an assert?

I hope you don't mind, I also will spend some time tomorrow to see if we can find a way to unit test some more of the code. If I find a way I'll write some test suggestions.

Again, thank you for all this great work!

continue;
}
result.put(threadIds[i], new ThreadTimeAccumulator(threadInfos[i], cpuTime));
//put to result when getThreadAllocatedBytes return -1

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please remove this comment? It doesn't seem like it's relevant anymore.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok 👌


public static boolean isThreadAllocatedMemorySupported() {
if (isThreadAllocatedMemorySupported == null) {
logger.warn("isThreadAllocatedMemorySupported is not available");

@grcevski grcevski Sep 28, 2021

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's best if we remove the logger.warn messages. I just realized that getThreadAllocatedBytes is called for every thread in any report mode, which means that on a JVM that doesn't support this API we'll log these warnings in the Elasticsearch logs for as many threads as we have running. It's probably not the best.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, It is really a issue, i will remove the log message, if getThreadAllocatedBytes is not support, the exception message seems enough:

ElasticsearchException("thread allocated memory is not supported on this JDK");

I also add a judgement, only call getThreadAllocatedBytes in mem mode, what do you think of this?


try {
long bytes = (long) getThreadAllocatedBytes.invoke(threadMXBean, id);
if (bytes < 0) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you please consider turning this into an assert statement instead of a runtime check with logger.debug?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@easyice

easyice commented Sep 29, 2021

Copy link
Copy Markdown
Contributor Author

@grcevski Thanks for review again, the issues is fixed, and added some unit test , please review again, thank you so much!

@easyice easyice requested a review from grcevski September 29, 2021 09:00

@grcevski grcevski left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @easyice,

These changes look amazing! I really like the tests you've added.

There's only one small adjustment I'd suggest, can you please pass SunThreadInfo as argument to the innerDetect and getAllValidThreadInfos, and then remove the newly added public method?

I think we should follow the pattern we already have for unit testing with the ThreadMXBean. If we exposed sunThreadInfo as public method, it would be available elsewhere in Elasticsearch and the API would be confusing.

private int threadElementsSnapshotCount = 10;
private ReportType type = ReportType.CPU;
private boolean ignoreIdleThreads = true;
private SunThreadInfo sunThreadInfo = new SunThreadInfo();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this field if we use arguments to pass the SunThreadInfo object.

}

// Used for testing
public HotThreads sunThreadInfo(SunThreadInfo sunThreadInfo) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this method as it is public and will be confusing to users of the HotThreads class inside Elasticsearch.

}
//put to result when getThreadAllocatedBytes return -1
long allocatedBytes = SunThreadInfo.getThreadAllocatedBytes(threadIds[i]);
long allocatedBytes = type == ReportType.MEM ? sunThreadInfo.getThreadAllocatedBytes(threadIds[i]) : 0;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an SunThreadInfo sunThreadInfo argument to getAllValidThreadInfos right after ThreadMxBean to pass either the real sunThreadInfo or the mocked one.

}

if (type == ReportType.MEM && SunThreadInfo.isThreadAllocatedMemorySupported() == false) {
if (type == ReportType.MEM && sunThreadInfo.isThreadAllocatedMemorySupported() == false) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an argument SunThreadInfo sunThreadInfo to innerDetect after ThreadMXBean to pass in the real sunThreadInfo from detect() or the mocked one during testing. This way we don't need to keep a reference of SunThreadInfo and expose a public or protected method on HotThreads.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea, I have changed the code

@grcevski

Copy link
Copy Markdown
Contributor

@elasticmachine test this please

@easyice easyice requested a review from grcevski September 29, 2021 15:09

@grcevski grcevski left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks again!

@grcevski grcevski added auto-backport Automatically create backport pull requests when merged v7.16.0 v8.0.0 >enhancement labels Sep 29, 2021
@grcevski grcevski merged commit 5ff82e2 into elastic:master Sep 29, 2021
grcevski pushed a commit to grcevski/elasticsearch that referenced this pull request Sep 29, 2021
Add new HotThreads report type to capture allocated memory
per Elasticsearch thread.
grcevski added a commit that referenced this pull request Sep 29, 2021
Backport of #72850

Add new HotThreads report type to capture allocated memory
per Elasticsearch thread.

Co-authored-by: zhangchao <80152403@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged :Core/Infra/Core Core issues without another label >enhancement external-contributor Pull request authored by a developer outside the Elasticsearch team Team:Core/Infra Meta label for core/infra team v7.16.0 v8.0.0-beta1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Per thread memory allocations in hot_threads API

10 participants