疑难ANR面试题:crash导致ANR深入剖析

张开发
2026/4/14 2:36:30 15 分钟阅读

分享文章

疑难ANR面试题:crash导致ANR深入剖析
crash导致的ANR问题分析日志分析部分anr原因是MotionEvent一直没有让InputDispatcher端finish01-1212:01:15.4355222179I am_anr:[0,2142,com.example.linjw.dagger2demo,818462534,Input dispatching timed out(3c73418 com.example.linjw.dagger2demo/com.example.linjw.dagger2demo.SearchActivity(server)is not responding. Waited 5005msforMotionEvent(deviceId4,eventTime162728390000,sourceTOUCHSCREEN|STYLUS|BLUETOOTH_STYLUS,displayId0,actionDOWN,actionButton0x00000000,flags0x00000000,metaState0x00000000,buttonState0x00000000,classificationNONE,edgeFlags0x00000000,xPrecision22.8,yPrecision11.1,xCursorPositionnan,yCursorPositionnan,pointers[0:(1011.9,359.0)]),policyFlags0x62000000)]日志可以看到进程2142中一个MotionEvent 5s都没有结束。那么这里大家可能会觉得主线程卡了等那么日志可以看看2142主线程到底在干啥可以看到2142主线程在5秒以前就已经有空指针异常Shutting down VM了。那么这里的空指针异常的crash到底和ANR有啥关系呢要解释这个问题就需要看看派发流程只要InputDispatcher发送事件后5s没有收到app发送的finish那么就会ANR。那么接下来好好理清楚App正常接受事件流程。正常日志派发流程这里知识java端的堆栈其实这里由native端进行调用到frameworks/base/core/jni/android_view_InputEventReceiver.cppframeworks/base/core/java/android/view/InputEventReceiver.java// Called from native code.SuppressWarnings(unused)UnsupportedAppUsage(maxTargetSdkBuild.VERSION_CODES.R,trackingBug170729553)privatevoiddispatchInputEvent(intseq,InputEvent event){mSeqMap.put(event.getSequenceNumber(),seq);onInputEvent(event);}这里的onInputEvent是有对应实现的可以看到默认的直接会执行finishInputEvent方法这个finishInputEvent最后会调用到InputConsumer::sendFinishedSignal方法传递finish到InputDispatcher中。那么最后子类到底是哪里处理的呢最后是会执行到ViewRootImpl中的WindowInputEventReceiver进行处理frameworks/base/core/java/android/view/ViewRootImpl.java处理完成后会调用到finishInputEvent但是如果和demo一样在dispatch过程中抛出异常不执行到finishInputEvent呢那是不是就无法进行sendFinish呢了这里就需要深入分析native部分代码了。异常后源码分析部分那么这里出现异常难道就不会再次发送finish了么新版本确实是这样的一旦检测到了有异常后虽然有设置skipCallbacks为true但是没看到新版本代码有任何针对skipCallbacks为true情况进行调用sendFinishedSignal但是在以前版本其实是有的commit 3bdcdd8531781569d501e7023c22e25e2bae0dd1 Author: Jeff Brownjeffbrowngoogle.comDate: Tue Apr1020:36:072012-0700Bemorecareful about exceptionsininput callbacks. consumeEvents()may be called reentrantly so we need to be careful when handling exceptions. When called directly through JNI, the exception should be allowed to bubble up to the caller. When called from a Looper callback, the exception should be recorded on the MessageQueue and bubbled when the call to nativePollOnce()returns. Bug:6312938Change-Id: Ief5e315802f586aa85af7eef1bd6e9bea4ce24abdiff--gita/core/jni/android_view_InputEventReceiver.cpp b/core/jni/android_view_InputEventReceiver.cpp index 348437d0f34a..8f6f5f4966a3100644--- a/core/jni/android_view_InputEventReceiver.cpp b/core/jni/android_view_InputEventReceiver.cpp -52,8 52,7 public: status_t initialize();status_t finishInputEvent(uint32_t seq, bool handled);- status_t consumeEvents(bool consumeBatches);- static int handleReceiveCallback(int receiveFd, int events, void* data); status_t consumeEvents(JNIEnv* env, bool consumeBatches);protected: virtual ~NativeInputEventReceiver(); -68,6 67,8 private: const char*getInputChannelName(){returnmInputConsumer.getChannel()-getName().string();} static int handleReceiveCallback(int receiveFd, int events, void* data);}; -128,11 129,13 int NativeInputEventReceiver::handleReceiveCallback(int receiveFd, int events,vreturn1;}- status_t statusr-consumeEvents(false /*consumeBatches*/); JNIEnv*envAndroidRuntime::getJNIEnv(); status_t statusr-consumeEvents(env,false/*consumeBatches*/); r-mMessageQueue-raiseAndClearException(env,handleReceiveCallback);returnstatusOK||statusNO_MEMORY ?1:0;}-status_tNativeInputEventReceiver::consumeEvents(bool consumeBatches){status_t NativeInputEventReceiver::consumeEvents(JNIEnv* env, bool consumeBatches){#if DEBUG_DISPATCH_CYCLEALOGD(channel %s ~ Consuming input events, consumeBatches%s., getInputChannelName(), consumeBatches ?true:false); -142,7 145,7 status_t NativeInputEventReceiver::consumeEvents(bool consumeBatches){mBatchedInputEventPendingfalse;}- JNIEnv*envAndroidRuntime::getJNIEnv(); bool skipCallbacksfalse;//省略部分 -if(!inputEventObj){- ALOGW(channel %s ~ Failed to obtain event object., getInputChannelName());- mInputConsumer.sendFinishedSignal(seq,false);-continue;-} default: assert(false);// InputConsumer should prevent this from ever happening inputEventObjNULL;}if(inputEventObj){#if DEBUG_DISPATCH_CYCLE- ALOGD(channel %s ~ Dispatching input event., getInputChannelName()); ALOGD(channel %s ~ Dispatching input event., getInputChannelName());#endif- env-CallVoidMethod(mReceiverObjGlobal, - gInputEventReceiverClassInfo.dispatchInputEvent, seq, inputEventObj);- - env-DeleteLocalRef(inputEventObj); env-CallVoidMethod(mReceiverObjGlobal, gInputEventReceiverClassInfo.dispatchInputEvent, seq, inputEventObj);if(env-ExceptionCheck()){ ALOGE(Exception dispatching input event.); skipCallbackstrue;}}else{ ALOGW(channel %s ~ Failed to obtain event object., getInputChannelName()); skipCallbackstrue;}}-if(mMessageQueue-raiseAndClearException(env,dispatchInputEvent)){if(skipCallbacks){mInputConsumer.sendFinishedSignal(seq,false);}}可以看到这个提交其实针对skipCallbacks是有进行sendFinishedSignal的处理但是新版本已经没有了那是为啥呢经过查证发现是在因为小米如下提交导致删除了针对skipCallbacks为true情况下调用sendFinishedSignal。具体小米提交详情commit a6c3e088c64c2ccb7bf68a013a026af386f462a5 Author: chenxinyuchenxinyu7xiaomi.comDate: Tue Nov1620:42:1320210800 Delete skipCallbacks when Exception dispatchInputEvent beacuse calling finishInputEvent twice will causeNative CrashIf there is an exception, finishInputEvent method will be called,thenNativeInputEventReceiver also send finish signal,will cause a native crash,Abort message: Could notfindconsumetimeforseqxxxx[1]https://cs.android.com/android/platform/superproject//master:frameworks/base/core/jni/android_view_InputEventReceiver.cpp;l441?qInputEventRessandroid%2Fplatform%2Fsuperproject:frameworks%2F[2]https://cs.android.com/android/platform/superproject//master:frameworks/native/libs/input/InputTransport.cpp;l1259?qInputTRANssandroid%2Fplatform%2Fsuperproject:frameworks%2F Signed-off-by: chenxinyuchenxinyu7xiaomi.comChange-Id: Ib834e2a960741f7fa33a0661c67f305af0db517a Merged-In: Ib834e2a960741f7fa33a0661c67f305af0db517adiff--gita/core/jni/android_view_InputEventReceiver.cpp b/core/jni/android_view_InputEventReceiver.cpp index a699f912806d..7d0f60adeb5c100644--- a/core/jni/android_view_InputEventReceiver.cpp b/core/jni/android_view_InputEventReceiver.cpp -447,10 447,6 status_t NativeInputEventReceiver::consumeEvents(JNIEnv* env, skipCallbackstrue;}}- -if(skipCallbacks){- mInputConsumer.sendFinishedSignal(seq,false);-}}}明显可以看出小米这个提交其实是为了修复一个’Native Crash’因为有的场景会重复调用sendFinishedSignal方法导致了crash。其实这里也可以看看源码frameworks/native/libs/input/InputTransport.cppstatus_tInputConsumer::sendUnchainedFinishedSignal(uint32_tseq,boolhandled){InputMessage msg;msg.header.typeInputMessage::Type::FINISHED;msg.header.seqseq;msg.body.finished.handledhandled;//核心就是这个getConsumeTime导致了native crashmsg.body.finished.consumeTimegetConsumeTime(seq);status_t resultmChannel-sendMessage(msg);if(resultOK){// Remove the consume time if the socket write succeeded. We will not need to ack this// message anymore. If the socket write did not succeed, we will try again and will still// need consume time.popConsumeTime(seq);}returnresult;}//核心就是这个getConsumeTime导致了native crashnsecs_tInputConsumer::getConsumeTime(uint32_tseq)const{autoitmConsumeTimes.find(seq);// Consume time will be missing if either finishInputEvent is called twice, or if it was// called for the wrong (synthetic?) input event. Either way, it is a bug that should be fixed.LOG_ALWAYS_FATAL_IF(itmConsumeTimes.end(),Could not find consume time for seq%PRIu32,seq);returnit-second;}那么难道真的是小米提交修复crash导致的anr么那么如果把小米的修改提交进行revert掉就不会ANR了吗我们尝试revert小米提交验证发现确实产生异常的当前事件确实是可以发送finish到InputDispatcher中如果不再进行触摸其实也不会ANR但是如果再对app进行触摸事件一样会进行ANR这个是为什么呢其实日志中也可以看得到原因因为异常Shutting down VM01-1211:25:51.46926272627D AndroidRuntime: Shutting down VM 01-1211:25:51.47426272627E AndroidRuntime: FATAL EXCEPTION: main 01-1211:25:51.47426272627E AndroidRuntime: Process: com.example.linjw.dagger2demo, PID:2627那么自然主线程也不可以继续执行接收输入事件了。这里也可以通过Perfetto来看看相关线程情况明显可以看到在出现异常后主线程直接就是DetachCurrentThread自然也不会再进行事件的接收。那么总结一下只要有异常抛出那么主线就直接无法再继续运行所以哪怕加上原来的skipCallbacks为true时候进行sendFinishedSignal确实当前这个事件正常finish了不会当场anr但是主线程已经停止运行了如果有新事件过来无法接受也会ANR。所以严格意义说小米这个修改并不会影响最后ANR的结果。思考那么是否有更好修改方式呢这里在了解清楚了小米提交的修改背景后发表点个人观点其实本质上就是sendFinishedSignal被重复调用了因为第一次调用后就会删除导致第二次无法getConsumeTime是日志中主动抛出异常的。那么其实修改方式完全可以考虑在sendFinishedSignal这个方法体实现中利用返回值或者log打印报错等方式导致最后sendFinishedSignal执行失败不会发送信息到InputDispatcher端就可以这种温和方式解决而不是一旦发现有地方调用2次就直接native crash。修改方式也不应该是一直控制想办法不让业务模块调用2次的因为这种其实后续扩展比较难控制是否有可能会调用到两次。而且在android_view_InputEventReceiver.cpp中的sendFinishedSignal属于异常情况下的一种底线收尾原则让app产生异常也可以结束告知InputDispatcher进行finish从而不会anr。原文地址https://mp.weixin.qq.com/s/C4WUkXVFhAhLs4gDin1CpA更多fw实战开发干货请关注下面“千里马学框架”

更多文章