Android系统java/native crash和anr异常处理流程

1、Android系统java crash异常处理流程

参考：Android8.0 系统异常处理流程_此男子淡漠-CSDN博客

Java处理未捕获异常有个Thread.UncaughtExceptionHandler，在Android系统中当然也是通过实现其来进行未捕获异常处理。Android 默认系统异常处理是在启动SystemServer进程时设置的。

Zygote进程启动SystemServer时会调用ZygoteInit的forkSystemServer()方法,该方法中又通过handleSystemServerProcess()方法来对SystemServer进程做一些处理,最后会调用到RuntimeInit.commonInit()方法

frameworks/base/core/java/com/android/internal/os/RuntimeInit.java

protected static final void commonInit() {

Thread.setUncaughtExceptionPreHandler(new LoggingHandler());

// 该出就设置了默认未捕获异常的处理Handler-KillApplicationHandler

Thread.setDefaultUncaughtExceptionHandler(new KillApplicationHandler());

...

}

KillApplicationHandler代码如下：frameworks/base/core/java/com/android/internal/os/RuntimeInit.java

private static class KillApplicationHandler implements Thread.UncaughtExceptionHandler {

public void uncaughtException(Thread t, Throwable e) {

try {

...

// 1. mApplicationObject标识当前应用

ActivityManager.getService().handleApplicationCrash(

mApplicationObject, new ApplicationErrorReport.ParcelableCrashInfo(e));

} ...

finally {

// 无论如何都要保证出现crash的进程不存活

Process.killProcess(Process.myPid());

System.exit(10);

}

注：如上ActivityManager.getService()得到的就是ActivityManagerService的服务端代理对象，实现是通过Binder机制。看看AMS在handleApplicationCrash方法中是如何处理的

frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java

public void handleApplicationCrash(IBinder app,

ApplicationErrorReport.ParcelableCrashInfo crashInfo) {

ProcessRecord r = findAppProcess(app, "Crash");

final String processName = app == null ? "system_server"

: (r == null ? "unknown" : r.processName);

handleApplicationCrashInner("crash", r, processName, crashInfo);

}

void handleApplicationCrashInner(String eventType, ProcessRecord r, String processName,

ApplicationErrorReport.CrashInfo crashInfo) {

// 1. 将crash信息写入event log中

EventLog.writeEvent(EventLogTags.AM_CRASH, Binder.getCallingPid(),

UserHandle.getUserId(Binder.getCallingUid()), processName,

r == null ? -1 : r.info.flags,

crashInfo.exceptionClassName,

crashInfo.exceptionMessage,

crashInfo.throwFileName,

crashInfo.throwLineNumber);

addErrorToDropBox(eventType, r, processName, null, null, null, null, null, crashInfo);

// 2.

mAppErrors.crashApplication(r, crashInfo);

}

备注：如上注释1处将log记录在event log中。注释2处调用AppError的crashApplication方法

frameworks/base/services/core/java/com/android/server/am/AppErrors.java

void crashApplication(ProcessRecord r, ApplicationErrorReport.CrashInfo crashInfo) {

final int callingPid = Binder.getCallingPid();

final int callingUid = Binder.getCallingUid();

final long origId = Binder.clearCallingIdentity();

try {

// 调用内部的crashApplicationInner

crashApplicationInner(r, crashInfo, callingPid, callingUid);

} finally {

Binder.restoreCallingIdentity(origId);

}

继续看crashApplicationInner方法frameworks/base/services/core/java/com/android/server/am/AppErrors.java

void crashApplicationInner(ProcessRecord r, ApplicationErrorReport.CrashInfo crashInfo,

int callingPid, int callingUid) {

...

synchronized (mService) {

// 1. 处理有IActivityController的情况，如果Controller已经处理错误，则不会显示错误框

if (handleAppCrashInActivityController(r, crashInfo, shortMsg, longMsg, stackTrace,

timeMillis, callingPid, callingUid)) {

return;

}

...

AppErrorDialog.Data data = new AppErrorDialog.Data();

data.result = result;

data.proc = r;

...

// 2. 发送SHOW_ERROR_UI_MSG给AMS的mUiHandler，将弹出一个错误对话框，提示用户某进程crash

final Message msg = Message.obtain();

msg.what = ActivityManagerService.SHOW_ERROR_UI_MSG;

task = data.task;

msg.obj = data;

mService.mUiHandler.sendMessage(msg);

}

// 3. 调用AppErrorResult的get方法，该方法内部调用了wait方法，故为阻塞状态，当用户处理了对话框后会调用AppErrorResult的set方法，该方法内部调用了notifyAll()方法来唤醒线程。

// 注意此处涉及了两个线程的工作，crashApplicationInner函数工作在Binder调用所在的线程；对话框工作于AMS的Ui线程

int res = result.get();

Intent appErrorIntent = null;

MetricsLogger.action(mContext, MetricsProto.MetricsEvent.ACTION_APP_CRASH, res);

// 4. 判断用户操作结果，然后根据结果做不同处理

if (res == AppErrorDialog.TIMEOUT || res == AppErrorDialog.CANCEL) {

res = AppErrorDialog.FORCE_QUIT;

}

synchronized (mService) {

// 不在提示错误

if (res == AppErrorDialog.MUTE) {

stopReportingCrashesLocked(r);

}

// 尝试重启进程

if (res == AppErrorDialog.RESTART) {

mService.removeProcessLocked(r, false, true, "crash");

if (task != null) {

try {

mService.startActivityFromRecents(task.taskId,

ActivityOptions.makeBasic().toBundle());

} ...

}

// 强行结束进程

if (res == AppErrorDialog.FORCE_QUIT) {

long orig = Binder.clearCallingIdentity();

try {

// Kill it with fire!

mService.mStackSupervisor.handleAppCrashLocked(r);

if (!r.persistent) {

mService.removeProcessLocked(r, false, false, "crash");

mService.mStackSupervisor.resumeFocusedStackTopActivityLocked();

}

} finally {

Binder.restoreCallingIdentity(orig);

}

// 停止进程并报告错误

if (res == AppErrorDialog.FORCE_QUIT_AND_REPORT) {

appErrorIntent = createAppErrorIntentLocked(r, timeMillis, crashInfo);

}

...

}

if (appErrorIntent != null) {

try {

// 启动报告错误界面

mContext.startActivityAsUser(appErrorIntent, new UserHandle(r.userId));

} catch (ActivityNotFoundException e) {

Slog.w(TAG, "bug report receiver dissappeared", e);

}

备注：如上，注释1会优先让crash观察者进行crash处理，crash观察者通过AMS的setActivityController()方法进行设置，如果已经处理则不会再弹出错误对话框。注释2会发送SHOW_ERROR_UI_MSG消息给AMS的mUIHandler处理来请求弹出错误对话框。注释3通过调用AppErrorResult中的get()方法来使线程阻塞。需要注意的是此处涉及到两个线程，crashApplicationInner工作在Binder调用所在的线程，对话框显示则处于AMS的UI线程。具体AppErrorResult的工作后面会说到。待用户操作对话框后或者超时时间到时get()方法就会被唤醒，并且返回处理结果。注释4则根据用户操作结果进行不同的处理，例如强制停止进程，重启进程等。

crash对话框的显示和用户行为

frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java

final class UiHandler extends Handler {

@Override

public void handleMessage(Message msg) {

switch (msg.what) {

// 显示错误对话框

case SHOW_ERROR_UI_MSG: {

mAppErrors.handleShowAppErrorUi(msg);

ensureBootCompleted();

} break;

// 显示ANR对话框

case SHOW_NOT_RESPONDING_UI_MSG: {

mAppErrors.handleShowAnrUi(msg);

ensureBootCompleted();

} break;

...

}

可以看到UiHandler对错误和ANR对话框显示的处理，这里看错误对话框的显示，其还是通过AppErrors类进行处理。frameworks/base/services/core/java/com/android/server/am/AppErrors.java

void handleShowAppErrorUi(Message msg) {

...

synchronized (mService) {

ProcessRecord proc = data.proc;

AppErrorResult res = data.result;

// 1. crash 对话框已显示，故无需再显示

if (proc != null && proc.crashDialog != null) {

if (res != null) {

res.set(AppErrorDialog.ALREADY_SHOWING);

}

return;

}

...

final boolean crashSilenced = mAppsNotReportingCrashes != null &&

mAppsNotReportingCrashes.contains(proc.info.packageName);

if ((mService.canShowErrorDialogs() || showBackground) && !crashSilenced) {

// 2. 创建crash对话框

proc.crashDialog = new AppErrorDialog(mContext, mService, data);

} else {

// 3. 如果AMS禁止显示错误对话框，或者当前设备处于睡眠模式则不会让显示对话框

if (res != null) {

res.set(AppErrorDialog.CANT_SHOW);

}

// 4. 调用Dialog show方法显示crash对话框

if(data.proc.crashDialog != null) {

data.proc.crashDialog.show();

}

备注：注释1先对crash进程是否已经显示对话框做了判断，如果已经显示则无需显示。注释2处，手机没有息屏，AMS也允许显示crash对话框，则创建对话框，否则走注释3处，直接说明不显示。如果走到注释4则需要显示crash对话框，故直接调用Dialog的show()方法。这里对注释1和注释3处的res.set()方法做以解释，这res就是AppErrorResult,也就是在crashApplicationInner方法中创建的，该方法在请求AMS显示对话框时调用了result.get()使其阻塞，调用set方法后则会唤醒Binder调用线程，接着走下面代码，进而对结果进行判断。

看下AppErrorResult get()和set()的实现

frameworks/base/services/core/java/com/android/server/am/AppErrorResult.java

final class AppErrorResult {

public void set(int res) {

synchronized (this) {

mHasResult = true;

// 1. set方法设置mResult的值

mResult = res;

// 2. 调用notifyAll唤醒持有当前对象锁且处于阻塞状态的所有线程

notifyAll();

}

public int get() {

synchronized (this) {

while (!mHasResult) {

try {

//3. 实质通过wait()使当前线程阻塞

wait();

} catch (InterruptedException e) {

}

// 4. 返回mResult

return mResult;

}

boolean mHasResult = false;

int mResult;

}

通过get()方法线程阻塞，通过set方法更新mResult的值并唤醒处于等待队列的线程，此时接着get()方法wait后面的代码执行，将set()方法中更新的mResult值作为返回值。

当错误对话框弹出后，用户操作或者超时时间处理

frameworks/base/services/core/java/com/android/server/am/AppErrorDialog.java

@Override

public void onClick(View v) {

// 1. 判断点击控件，来决定操作

switch (v.getId()) {

// 请求重启进程

case com.android.internal.R.id.aerr_restart:

mHandler.obtainMessage(RESTART).sendToTarget();

break;

// 请求反馈报错问题

case com.android.internal.R.id.aerr_report:

mHandler.obtainMessage(FORCE_QUIT_AND_REPORT).sendToTarget();

break;

// 请求关闭crash Dialog并杀死进程

case com.android.internal.R.id.aerr_close:

mHandler.obtainMessage(FORCE_QUIT).sendToTarget();

break;

// 请求不再提示对话框

case com.android.internal.R.id.aerr_mute:

mHandler.obtainMessage(MUTE).sendToTarget();

break;

default:

break;

}

// 2. 受到请求信息后调用setResult()方法并关闭对话框

private final Handler mHandler = new Handler() {

public void handleMessage(Message msg) {

setResult(msg.what);

dismiss();

}

};

private void setResult(int result) {

synchronized (mService) {

if (mProc != null && mProc.crashDialog == AppErrorDialog.this) {

mProc.crashDialog = null;

}

// 3. 调用AppErrorResult的set方法使阻塞线程运行，并将用户点击结果告知

mResult.set(result);

mHandler.removeMessages(TIMEOUT);

}

如上，最终通过mResult.set()方法唤线程，是线程代码接着执行

frameworks/base/services/core/java/com/android/server/am/AppErrors.java

void crashApplicationInner(ProcessRecord r, ApplicationErrorReport.CrashInfo crashInfo,

int callingPid, int callingUid) {

...

// 3. 阻塞线程直至超时或者用户操作对话框

int res = result.get();

// 4. 判断用户操作结果，然后根据结果做不同处理

...

}

后续清理工作

根据前面的流程，我们知道当进程crash后，最终将被kill掉，此时AMS还需要完成后续的清理工作。

我们先来回忆一下进程启动后，注册到AMS的部分流程

frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java

// 进程启动后，对应的ActivityThread会attach到AMS上

private final boolean attachApplicationLocked(IApplicationThread thread,

int pid) {

...

final String processName = app.processName;

try {

// 1. 创建“讣告”接收者

AppDeathRecipient adr = new AppDeathRecipient(

app, pid, thread);

thread.asBinder().linkToDeath(adr, 0);

app.deathRecipient = adr;

}

...

}

当进程注册到AMS时，AMS注册了一个“讣告”接收者注册到进程中。

因此，当crash进程被kill后，AppDeathRecipient中的binderDied方法将被回调。看源码知道bindDied()方法中又会调用到appDiedLocked()方法

frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java

final void appDiedLocked(ProcessRecord app, int pid, IApplicationThread thread,

boolean fromBinderDied) {

...

// 1. 该进程没有杀死，则杀死进程

if (!app.killed) {

if (!fromBinderDied) {

killProcessQuiet(pid);

}

killProcessGroup(app.uid, pid);

app.killed = true;

}

if (app.pid == pid && app.thread != null &&

app.thread.asBinder() == thread.asBinder()) {

...

// 2.

handleAppDiedLocked(app, false, true);

...

} ...

}

备注：注释1会将进程杀死，注释2处为app死亡的关键处理。

frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java

private final void handleAppDiedLocked(ProcessRecord app,

boolean restarting, boolean allowRestart) {

int pid = app.pid;

// 1. 进行进程中service、ContentProvider、BroadcastReceiver等的收尾工作

boolean kept = cleanUpApplicationRecordLocked(app, restarting, allowRestart, -1,

false );

if (!kept && !restarting) {

removeLruProcessLocked(app);

if (pid > 0) {

ProcessList.remove(pid);

}

...

// 2. 判断是否还存在可见的Activity

boolean hasVisibleActivities = mStackSupervisor.handleAppDiedLocked(app);

// 清除activity列表

app.activities.clear();

...

try {

if (!restarting && hasVisibleActivities

&& !mStackSupervisor.resumeFocusedStackTopActivityLocked()) {

// 3. 若当前crash进程中存在可视Activity，那么AMS还是会确保所有可见Activity正常运行，故会重启该进程

mStackSupervisor.ensureActivitiesVisibleLocked(null, 0, !PRESERVE_WINDOWS);

}

} finally {

mWindowManager.continueSurfaceLayout();

}

备注：注释1比较重要的是对于crash进程中的Bounded Service而言，会清理掉service与客户端之间的联系，此外若service的客户端重要性过低，还会被直接kill掉。注释2处判断是否应用还存在可见的Activity，注释3处对于可见的Activity系统要保证其正常运行，还会重新启动进程。

2、Android系统native crash异常处理流程

参考：Android稳定性系列8　Native crash处理流程_liuwg1226的专栏-CSDN博客

从系统全局来说，Crash分为framework/App Crash， Native Crash，以及Kernel Crash。

（1）对于framework层或者app层的Crash(即Java层面Crash)，那么往往是通过抛出未捕获异常而导致的Crash。

（2）至于Kernel Crash，很多情况是发生Kernel panic，对于内核崩溃往往是驱动或者硬件出现故障。

（3）Native Crash，即C/C++层面的Crash，这是介于系统framework层与Linux层之间的一层，这是本文接下来要讲解的内容。

system_server进程启动过程中，调用startOtherServices来启动各种其他系统Service时，也正是这个时机会创建一个用于监听native crash事件的NativeCrashListener对象(继承于线程)，通过socket机制来监听，等待即debuggerd与该线程创建连接，并处理相应事件。紧接着通过NativeCrashListener#run()调用到AMS#handleApplicationCrashInner()函数来处理crash流程。

NativeCrashListener的主要工作：

（1）创建socket服务端”/data/system/ndebugsocket”

（2）等待socket客户端(即debuggerd)来建立连接；

（3）调用NativeCrashListener#consumeNativeCrashData来处理native crash信息；

（4）应答debuggerd已经建立连接，并写入应答消息告知debuggerd进程。

Native crash的工作核心是由debuggerd守护进程来完成。要了解Native Crash，首先从应用程序入口位于begin.S中的__linker_init入手。

2.1 begin.S

arch/arm/begin.S

ENTRY(_start)

mov r0, sp

//入口地址【见小节1.2】

bl __linker_init

mov pc, r0

END(_start)

2.2 __linker_init
linker.cpp

extern "C" ElfW(Addr) __linker_init(void* raw_args) {

KernelArgumentBlock args(raw_args);

ElfW(Addr) linker_addr = args.getauxval(AT_base);

...

//【见小节1.3】

ElfW(Addr) start_address = __linker_init_post_relocation(args, linker_addr);

return start_address;

}

2.3 __linker_init_post_relocation
linker.cpp

static ElfW(Addr) __linker_init_post_relocation(KernelArgumentBlock& args, ElfW(Addr) linker_base) {

...

// Sanitize the environment.

__libc_init_AT_SECURE(args);

// Initialize system properties

__system_properties_init();

//【见小节1.4】

debuggerd_init();

...

}

2.4 debuggerd_init
linker/debugger.cpp

__LIBC_HIDDEN__ void debuggerd_init() {

struct sigaction action;

memset(&action, 0, sizeof(action));

sigemptyset(&action.sa_mask);

//【见小节1.5】

action.sa_sigaction = debuggerd_signal_handler;

//SA_RESTART代表中断某个syscall，则会自动重新调用该syscall

//SA_SIGINFO代表信号附带参数siginfo_t结构体可传送到signal_handler函数

action.sa_flags = SA_RESTART | SA_SIGINFO;

//使用备用signal栈(如果可用)，以便我们能捕获栈溢出

action.sa_flags |= SA_ONSTACK;

sigaction(SIGABRT, &action, nullptr);

sigaction(SIGBUS, &action, nullptr);

sigaction(SIGFPE, &action, nullptr);

sigaction(SIGILL, &action, nullptr);

sigaction(SIGPIPE, &action, nullptr);

sigaction(SIGSEGV, &action, nullptr);

#if defined(SIGSTKFLT)

sigaction(SIGSTKFLT, &action, nullptr);

#endif

sigaction(SIGTRAP, &action, nullptr);

}

2.6 send_debuggerd_packet
linker/debugger.cpp

static void send_debuggerd_packet(siginfo_t* info) {

// Mutex防止多个crashing线程同一时间来来尝试跟debuggerd进行通信

static pthread_mutex_t crash_mutex = PTHREAD_MUTEX_INITIALIZER;

int ret = pthread_mutex_trylock(&crash_mutex);

if (ret != 0) {

    if (ret == EBUSY) {

      __libc_format_log(ANDROID_LOG_INFO, "libc",

          "Another thread contacted debuggerd first; not contacting debuggerd.");

      //等待其他线程释放该锁，从而获取该锁

      pthread_mutex_lock(&crash_mutex);

    }

    return;

}

//建立与debuggerd的socket通道

int s = socket_abstract_client(DEBUGGER_SOCKET_NAME, SOCK_STREAM | SOCK_CLOEXEC);

...

debugger_msg_t msg;

msg.action = DEBUGGER_ACTION_CRASH;

msg.tid = gettid();

msg.abort_msg_address = reinterpret_cast(g_abort_message);

msg.original_si_code = (info != nullptr) ? info->si_code : 0;

//将DEBUGGER_ACTION_CRASH消息发送给debuggerd服务端

ret = TEMP_FAILURE_RETRY(write(s, &msg, sizeof(msg)));

if (ret == sizeof(msg)) {

    char debuggerd_ack;

    //阻塞等待debuggerd服务端的回应数据

    ret = TEMP_FAILURE_RETRY(read(s, &debuggerd_ack, 1));

    int saved_errno = errno;

    notify_gdb_of_libraries();

    errno = saved_errno;

}

close(s);

}

该方法的主要功能：

调用socket_abstract_client，建立于debuggerd的socket通道；

将action = DEBUGGER_ACTION_CRASH的消息发送给debuggerd服务端；

阻塞等待debuggerd服务端的回应数据。

接下来，看看debuggerd服务端接收到DEBUGGER_ACTION_CRASH的处理流程

debuggerd服务端

debuggerd 守护进程启动后，一直在等待socket client的连接。当native crash发送后便会向debuggerd发送action = DEBUGGER_ACTION_CRASH的消息。

2.1 do_server

/debuggerd/debuggerd.cpp

static int do_server() {

...

for (;;) {

    sockaddr_storage ss;

    sockaddr* addrp = reinterpret_cast(&ss);

    socklen_t alen = sizeof(ss);

    //等待客户端连接

    int fd = accept4(s, addrp, &alen, SOCK_CLOEXEC);

    if (fd == -1) {

      continue; //accept失败

    }

    //处理native crash发送过来的请求【见小节2.2】

    handle_request(fd);

}

return 0;

}

-------à一路调用到

worker_process，处于client发送过来的请求，server端通过子进程来处理
/debuggerd/debuggerd.cpp

static void worker_process(int fd, debugger_request_t& request) {

std::string tombstone_path;

int tombstone_fd = -1;

switch (request.action) {

    case DEBUGGER_ACTION_CRASH:

      //打开tombstone文件

      tombstone_fd = open_tombstone(&tombstone_path);

      if (tombstone_fd == -1) {

        exit(1); //无法打开tombstone文件，则退出该进程

      }

      break;

    ...

}

……

if (!attach_gdb) {

    //将进程crash情况告知AMS【见小节2.4.3】

    activity_manager_write(request.pid, crash_signal, amfd, *amfd_data.get());

}

……

}

整个过程比较复杂，下面只介绍attach_gdb=false的执行流程：

（1）当DEBUGGER_ACTION_CRASH ，则调用open_tombstone并继续执行；

（2）调用ptrace方法attach到目标进程;

（3）调用BacktraceMap::Create来生成backtrace;

（4）当DEBUGGER_ACTION_CRASH，则执行activity_manager_connect；

（5）调用drop_privileges来取消特权模式；

（6）通过perform_dump执行dump操作；

（7）SIGBUS等致命信号，则调用engrave_tombstone()，这是核心方法

（8）调用activity_manager_write，将进程crash情况告知AMS；

（9）调用ptrace方法detach到目标进程;

（10）当DEBUGGER_ACTION_CRASH，发送信号SIGKILL给目标进程tid

备注：如上activity_manager_connect()该方法的功能是建立跟上层ActivityManager的socket连接。对于”/data/system/ndebugsocket”的socket的服务端是在NativeCrashListener.java方法中创建并启动的。

3、Android系统anr异常处理流程

参考：深入探索Android稳定性优化 – Android开发中文站（深入探索Android稳定性优化）

ANR(Application Not responding)，是指应用程序未响应，Android系统对于一些事件需要在一定的时间范围内完成，如果超过预定时间能未能得到有效响应或者响应时间过长，都会造成ANR。一般地，这时往往会弹出一个提示框，告知用户当前xxx未响应，用户可选择继续等待或者Force Close。

ANR的几种类型：

（1）KeyDispatchTimeout (5 seconds) 按键或触摸事件处理超时(一般是UI主线程做了耗时的操作,这类ANR最常见)

（2）BroadcastTimeout(10 seconds，即10s内没有执行完成) 广播的分发和处理超时(一般是onReceiver执行时间过长)

（3）ServiceTimeout(20 seconds) Service的启动和执行20s超时

（4）ContentProviderTimeout（10 second）ContentProvider 在10S内没有处理完成发生ANR。

ActivityManagerService.appNotResponding()在程序无响应、ANR时被调用

/frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java

    final void appNotResponding(ProcessRecord app, ActivityRecord activity,

            ActivityRecord parent, boolean aboveSystem, final String annotation) {

        ...

        updateCpuStatsNow(); //第一次更新cpu统计信息

        synchronized (this) {

          //PowerManager.reboot() 会阻塞很长时间，因此忽略关机时的ANR

          if (mShuttingDown) {

              return;

          } else if (app.notResponding) {

              return;

          } else if (app.crashing) {

              return;

          }

          //记录ANR到EventLog

          EventLog.writeEvent(EventLogTags.AM_ANR, app.userId, app.pid,

                  app.processName, app.info.flags, annotation);

          // 将当前进程添加到firstPids

          firstPids.add(app.pid);

          int parentPid = app.pid;

          //将system_server进程添加到firstPids

          if (MY_PID != app.pid && MY_PID != parentPid) firstPids.add(MY_PID);

          for (int i = mLruProcesses.size() - 1; i >= 0; i--) {

              ProcessRecord r = mLruProcesses.get(i);

              if (r != null && r.thread != null) {

                  int pid = r.pid;

                  if (pid > 0 && pid != app.pid && pid != parentPid && pid != MY_PID) {

                      if (r.persistent) {

                          firstPids.add(pid); //将persistent进程添加到firstPids

                      } else {

                          lastPids.put(pid, Boolean.TRUE); //其他进程添加到lastPids

                      }

                  }

              }

          }

        }

        // 记录ANR输出到main log

        StringBuilder info = new StringBuilder();

        info.setLength(0);

        info.append("ANR in ").append(app.processName);

        if (activity != null && activity.shortComponentName != null) {

          info.append(" (").append(activity.shortComponentName).append(")");

        }

        info.append("n");

        info.append("PID: ").append(app.pid).append("n");

        if (annotation != null) {

            info.append("Reason: ").append(annotation).append("n");

        }

        if (parent != null && parent != activity) {

            info.append("Parent: ").append(parent.shortComponentName).append("n");

        }

        //创建CPU tracker对象

        final ProcessCpuTracker processCpuTracker = new ProcessCpuTracker(true);

        //输出traces信息【见小节2】

        File tracesFile = dumpStackTraces(true, firstPids, processCpuTracker,

                lastPids, NATIVE_STACKS_OF_INTEREST);

        updateCpuStatsNow(); //第二次更新cpu统计信息

        //记录当前各个进程的CPU使用情况

        synchronized (mProcessCpuTracker) {

            cpuInfo = mProcessCpuTracker.printCurrentState(anrTime);

        }

        //记录当前CPU负载情况

        info.append(processCpuTracker.printCurrentLoad());

        info.append(cpuInfo);

        //记录从anr时间开始的Cpu使用情况

        info.append(processCpuTracker.printCurrentState(anrTime));

        //输出当前ANR的reason，以及CPU使用率、负载信息

        Slog.e(TAG, info.toString());

        //将traces文件和 CPU使用率信息保存到dropbox，即data/system/dropbox目录

        addErrorToDropBox("anr", app, app.processName, activity, parent, annotation,

                cpuInfo, tracesFile, null);

        synchronized (this) {

            ...

            //后台ANR的情况, 则直接杀掉

            if (!showBackground && !app.isInterestingToUserLocked() && app.pid != MY_PID) {

                app.kill("bg anr", true);

                return;

            }

            //设置app的ANR状态，病查询错误报告receiver

            makeAppNotRespondingLocked(app,

                    activity != null ? activity.shortComponentName : null,

                    annotation != null ? "ANR " + annotation : "ANR",

                    info.toString());

            //重命名trace文件

            String tracesPath = SystemProperties.get("dalvik.vm.stack-trace-file", null);

            if (tracesPath != null && tracesPath.length() != 0) {

                //traceRenameFile = "/data/anr/traces.txt"

                File traceRenameFile = new File(tracesPath);

                String newTracesPath;

                int lpos = tracesPath.lastIndexOf (".");

                if (-1 != lpos)

                    // 新的traces文件= /data/anr/traces_进程名_当前日期.txt

                    newTracesPath = tracesPath.substring (0, lpos) + "_" + app.processName + "_" + mTraceDateFormat.format(new Date()) + tracesPath.substring (lpos);

                else

                    newTracesPath = tracesPath + "_" + app.processName;

                traceRenameFile.renameTo(new File(newTracesPath));

            }

            //弹出ANR对话框

            Message msg = Message.obtain();

            HashMap map = new HashMap();

            msg.what = SHOW_NOT_RESPONDING_MSG;

            msg.obj = map;

            msg.arg1 = aboveSystem ? 1 : 0;

            map.put("app", app);

            if (activity != null) {

                map.put("activity", activity);

            }

            //向ui线程发送，内容为SHOW_NOT_RESPONDING_MSG的消息

            mUiHandler.sendMessage(msg);

        }

   }

当发生ANR时, 会按顺序依次执行:

（1）输出ANR Reason信息到Event Log. 也就是说ANR触发的时间点最接近的就是EventLog中输出的am_anr信息;

（2）收集并输出重要进程列表中的各个线程的traces信息，该方法较耗时; 【见小节2】

（3）输出当前各个进程的CPU使用情况以及CPU负载情况;

（4）将traces文件和 CPU使用情况信息保存到dropbox，即/data/system/dropbox目录

（5）根据进程类型,来决定直接后台杀掉,还是弹框告知用户.

ANR输出重要进程的traces信息，这些进程包含:

（1）firstPids队列：第一个是ANR进程，第二个是system_server，剩余是所有persistent进程；

（2）Native队列：是指/system/bin/目录的mediaserver,sdcard 以及surfaceflinger进程；

（3）lastPids队列: 是指mLruProcesses中的不属于firstPids的所有进程。

5、总结：

（1）Java/native crash调用：AMS#handleApplicationCrashInner方法（注意：app进程调用引起的native crash会走到AMS的这里，通过NativeCrashListener# consumeNativeCrashData函数中调用NativeCrashReporter走到AMS）

（2）native crash 调用：NativeCrashListener# consumeNativeCrashData 方法(注：native守护进程crash会走到这里，如wpa_supplicant)

//关键日志：

NativeCrashListener: Read pid=7441 signal=11

/system/bin/tombstoned: Tombstone written to: /data/tombstones/tombstone_04

BootReceiver: Copying /data/tombstones/tombstone_04 to DropBox (SYSTEM_TOMBSTONE)

（3）Anr调用：AMS#appNotResponding

注：需要在开发者选项中打开相应配置才会弹框。

Android系统java/native crash和anr异常处理流程

2.2 __linker_init linker.cpp extern "C" ElfW(Addr) __linker_init(void* raw_args) { KernelArgumentBlock args(raw_args); ElfW(Addr) linker_addr = args.getauxval(AT_base); ... //【见小节1.3】 ElfW(Addr) start_address = __linker_init_post_relocation(args, linker_addr); return start_address; }

Java相关栏目本月热门文章