Support mem type in nodes hot_threads API#72850
Conversation
|
Pinging @elastic/es-core-infra (Team:Core/Infra) |
|
@easyice I'm sorry for our delay in response. We meant to discuss this issue in a recent meeting, but it slipped off the agenda. We will have a team discussion about it soon, and if it's something the wider team believes to be useful, we'll have someone do a PR review. Thank you for your contribution! |
|
@williamrandolph Thank you for reply, It is not urgent |
|
While we like the idea here, and would find it useful in many cases (for example, in debugging ingest processors), we have some concerns about the implementation, and specifically about the use of the First, the class itself has a warning in its Javadoc: "Platform-specific management interface for the thread system of the Java virtual machine. This platform extension is only available to a thread implementation that supports this extension." Do you know anything about which platforms support this extension? In which cases would you expect this not to work? Second, we know that future versions of the JDK will remove access to certain internal JDK APIs. See JEP 396 and JEP 403. Will those changes affect this code? Third, there is a |
|
Thanks for @williamrandolph, this extension seem possible will be removed, but it is not clear if it will be implemented in any other way
|
grcevski
left a comment
There was a problem hiding this comment.
I think this is a very good idea and as far as I can tell the API is here to stay for now. It's in the Java 17 LTS release and with proper checks in place we can return an unsupported message to the users, in case the API is removed from the JDK in the future.
I have requested some changes to make the report more clear and user friendly. Please let me know if some of the comments don't make sense, of if you have any questions. We also have unit tests for HotThreads now, it would be great to extend those for the new memory mode.
Thanks!
There was a problem hiding this comment.
I think it's best to check if isThreadAllocatedMemorySupported and isThreadAllocatedMemoryEnabled and then throw a similar 'unsupported exception' when using this API in memory mode, if they aren't both on. This way if we can't produce meaningful information for the end user, we'd be letting them know that know we can't.
Alternatively, If monitoring of thread allocations is supported, but it's not enabled, we could enable/disable it in try/finally inside this method, but this will require that we wrap the call to setThreadAllocatedMemoryEnabled inside doPriviledged call. Please see: #77935 for reference.
Both approaches are good, if allocation monitoring is supported it's typically on by default.
There was a problem hiding this comment.
Thanks for you suggestion, it is very necessary, i will fix the code
There was a problem hiding this comment.
I think it would be better if we reported the amount of memory allocated per thread in bytes, and avoid reporting percentages of total memory used.
The call to getHeapMemoryUsage() will give us info on how much of the absolute overall heap is currently occupied by objects that are either alive or dead. However, the amount of calculated allocated memory per thread is only what the thread allocated for the last 'interval' we slept for, e.g. 500ms. The percentages reported will be inaccurate.
We could get more accurate in reporting the percentages, by capturing the memory usage before and after the call to Thread.sleep(interval.millis()). However, the JVM comes with many Garbage Collector(GC) technologies, some of the more recent ones are concurrent, which means that they perform at least part of the collection process while the application threads are running.
Essentially, any GC interference would make the reported before/after 'used memory' unstable. For accurate percentages we'd need to rely on no GC interference which isn't possible.
There was a problem hiding this comment.
Small request here for more user friendly reporting. Would you consider reporting the bytes value with a print function that reduces the number of digits by applying byte units like KB, MB, GB? Also, I'd request a change of the wording 'usage by' to 'allocated by'. We can only tell how many bytes each thread has allocated, but we can't really tell how much memory they actually use or hold onto through reference chains, e.g. caches.
There was a problem hiding this comment.
That's a very good suggestion, Could i change the message to something like this : "xxxKB allocated by thread ...." ?
There was a problem hiding this comment.
Yes, that's a great choice.
There was a problem hiding this comment.
I think your suggestion to use an approach similar to the 'ThreadInspector' to return the com.sun.management.ThreadMX is good. It would be better if it wasn't statically initialized so that we can return an error that we don't support this kind of report in an API error message, for clients that run Elasticsearch with other JVMs.
There was a problem hiding this comment.
I asked for my JDK team in my company, they give me an other way (reflection) to avoid import com.sun package, i think both approaches are good
import java.lang.management.ManagementFactory;
import java.lang.reflect.Method;
import java.lang.management.ThreadMXBean;
public class Test {
public static void main(String[] args) {
ThreadMXBean threadMXBean = ManagementFactory.getThreadMXBean();
try {
Method getBytes = threadMXBean.getClass().getMethod("getThreadAllocatedBytes", long.class);
getBytes.setAccessible(true);
long threadId = Thread.currentThread().getId();
long bytes = (long)getBytes.invoke(threadMXBean, threadId);
System.out.println(bytes);
} catch (Throwable e) {
System.out.println(e);
}
}
}
There was a problem hiding this comment.
Both approaches are good, we import the com.sun packages in a few places already in Elasticsearch, so it's not unusual. We actually do something similar to the above (with reflection) to get access to the HotSpotDiagnosticMXBean class in JvmInfo.java here:
You can borrow some code from there if you'd like, HotSpotDiagnosticMXBean is in the same package as com.sun.management.ThreadMXBean.
3554d4a to
e634f2c
Compare
|
@grcevski I had fix the codes for review, and add some Test in HotThreadsTests class . but it seems difficult to mock |
grcevski
left a comment
There was a problem hiding this comment.
This looks great now. LGTM!
I left few minor comments, if you could fix them, that would be great.
While reviewing the code I noticed a few spacing issues, we tend to put a space between ){ or before catch in } catch, as well as after a comma, e.g. , 2500L. If you can check to see if the style is met everywhere it would be really great.
Thanks for this contribution!
| sb.append(String.format(Locale.ROOT, "%n%4.1f%% (%s out of %s) %s usage by thread '%s'%n", | ||
| percent, TimeValue.timeValueNanos(time), interval, type.getTypeValue(), threadName)); | ||
|
|
||
| if (type.equals(ReportType.MEM)) { |
There was a problem hiding this comment.
Small remark here, since type is an Enum type now, it would be better if we checked with type == ReportType.MEM, this way we can have the compiler do the checking for us.
|
|
||
| try { | ||
| long bytes = (long)getThreadAllocatedBytes.invoke(threadMXBean, id); | ||
| if (bytes < 0){ |
There was a problem hiding this comment.
I think it's a good idea to check for the value being less than 0. Since we return 0 in all error cases, can we please return Math.max(0, bytes) at line 81 here? Having some erroneous value that's negative would confuse our users.
|
@elasticmachine test this please |
|
@grcevski Thanks for review, the comments is very helpful, I had fixed them:
|
|
@elasticmachine test this please |
|
@elasticmachine update branch |
|
@elasticmachine test this please |
grcevski
left a comment
There was a problem hiding this comment.
Hi @easyice, I apologize for requesting more changes, but while doing my final review before merge I realized that we might log many warning messages in the logs. Can you simply remove the logging statements and turn the log.debug code into an assert?
I hope you don't mind, I also will spend some time tomorrow to see if we can find a way to unit test some more of the code. If I find a way I'll write some test suggestions.
Again, thank you for all this great work!
| continue; | ||
| } | ||
| result.put(threadIds[i], new ThreadTimeAccumulator(threadInfos[i], cpuTime)); | ||
| //put to result when getThreadAllocatedBytes return -1 |
There was a problem hiding this comment.
Can we please remove this comment? It doesn't seem like it's relevant anymore.
|
|
||
| public static boolean isThreadAllocatedMemorySupported() { | ||
| if (isThreadAllocatedMemorySupported == null) { | ||
| logger.warn("isThreadAllocatedMemorySupported is not available"); |
There was a problem hiding this comment.
I think it's best if we remove the logger.warn messages. I just realized that getThreadAllocatedBytes is called for every thread in any report mode, which means that on a JVM that doesn't support this API we'll log these warnings in the Elasticsearch logs for as many threads as we have running. It's probably not the best.
There was a problem hiding this comment.
Yeah, It is really a issue, i will remove the log message, if getThreadAllocatedBytes is not support, the exception message seems enough:
ElasticsearchException("thread allocated memory is not supported on this JDK");
I also add a judgement, only call getThreadAllocatedBytes in mem mode, what do you think of this?
|
|
||
| try { | ||
| long bytes = (long) getThreadAllocatedBytes.invoke(threadMXBean, id); | ||
| if (bytes < 0) { |
There was a problem hiding this comment.
Would you please consider turning this into an assert statement instead of a runtime check with logger.debug?
|
@grcevski Thanks for review again, the issues is fixed, and added some unit test , please review again, thank you so much! |
grcevski
left a comment
There was a problem hiding this comment.
Hi @easyice,
These changes look amazing! I really like the tests you've added.
There's only one small adjustment I'd suggest, can you please pass SunThreadInfo as argument to the innerDetect and getAllValidThreadInfos, and then remove the newly added public method?
I think we should follow the pattern we already have for unit testing with the ThreadMXBean. If we exposed sunThreadInfo as public method, it would be available elsewhere in Elasticsearch and the API would be confusing.
| private int threadElementsSnapshotCount = 10; | ||
| private ReportType type = ReportType.CPU; | ||
| private boolean ignoreIdleThreads = true; | ||
| private SunThreadInfo sunThreadInfo = new SunThreadInfo(); |
There was a problem hiding this comment.
We can remove this field if we use arguments to pass the SunThreadInfo object.
| } | ||
|
|
||
| // Used for testing | ||
| public HotThreads sunThreadInfo(SunThreadInfo sunThreadInfo) { |
There was a problem hiding this comment.
Please remove this method as it is public and will be confusing to users of the HotThreads class inside Elasticsearch.
| } | ||
| //put to result when getThreadAllocatedBytes return -1 | ||
| long allocatedBytes = SunThreadInfo.getThreadAllocatedBytes(threadIds[i]); | ||
| long allocatedBytes = type == ReportType.MEM ? sunThreadInfo.getThreadAllocatedBytes(threadIds[i]) : 0; |
There was a problem hiding this comment.
Add an SunThreadInfo sunThreadInfo argument to getAllValidThreadInfos right after ThreadMxBean to pass either the real sunThreadInfo or the mocked one.
| } | ||
|
|
||
| if (type == ReportType.MEM && SunThreadInfo.isThreadAllocatedMemorySupported() == false) { | ||
| if (type == ReportType.MEM && sunThreadInfo.isThreadAllocatedMemorySupported() == false) { |
There was a problem hiding this comment.
Add an argument SunThreadInfo sunThreadInfo to innerDetect after ThreadMXBean to pass in the real sunThreadInfo from detect() or the mocked one during testing. This way we don't need to keep a reference of SunThreadInfo and expose a public or protected method on HotThreads.
There was a problem hiding this comment.
That's a good idea, I have changed the code
|
@elasticmachine test this please |
Add new HotThreads report type to capture allocated memory per Elasticsearch thread.
from #70345
this PR add memory type in hot_threads API, this can be help us find out which thread allocated many memory
Closes #70345