Tales from a Core File

Month: October 2011

Last Monday was the illumos hack-a-thon. There, I worked with Matt Amdur on adding tab completion support to mdb — the illumos modular debugger. The hack-a-thon was wildly successful and a lot of fun, I hope to put together an entry on the hack-a-thon and give an overview of the projects that were worked on over the course of the next few days. During the hack-a-thon, Matt and I created a working prototype that would complete the types and members using ::print, but there was still some good work for us to do. One of the challenges that we were facing was some unexpected behavior whenever the mdb pager was invoked. We were seeing different actions depending on which actions you took from the pager: continue, quit, or abort.

If you take a look at the source code, you’ll see that sometimes we’ll leave this function by calling longjmp(3c). There’s a lot of different places that we call setjmp(3c) and sigsetjmp(3c) in mdb, so tracking down where we were going just based on looking at the source code would be tricky. So, we want to answer the question, where are we jumping to? There are a few different ways we can do this (and more that aren’t listed):

  1. Inspect the source code
  2. Use mdb to debug mdb
  3. Use the DTrace pid provider and trace a certain number of instructions before we assume we’ve gotten there
  4. Use the DTrace pid provider and look at the jmp_buf to get the address of where we were jumping

Ultimately, I decided to go with option four, knowing that I would have to solve this problem again at some point in the future. The first step is to look at the definition of the jmp_buf definition. For the full definition take a look at setjmp_iso.h. Here’s the snippet that actually defines the type:

     82 #if defined(__i386) || defined(__amd64) || \
     83 	defined(__sparc) || defined(__sparcv9)
     84 #if defined(_LP64) || defined(_I32LPx)
     85 typedef long	jmp_buf[_JBLEN];
     86 #else
     87 typedef int	jmp_buf[_JBLEN];
     88 #endif
     89 #else
     90 #error "ISA not supported"
     91 #endif

Basically, the jmp_buf is just an array where we store some of registers. Unfortunately this isn’t sufficient to figure out where to go. So instead, we need to take a look at the implementation. setjmp is implemented in assembly for the particular architecture. Here it is for x86 and amd64. Now that we have the implementation, let’s figure out what to do. As a heads up, if you’re looking at any of these .s files, the numbers are actually in base 10, which is different from what you get when you look at the mdb output which has them in hex. Let’s take a quick look at the longjmp source for a 32-bit system and dig into what’s going on and how we know what to do:

     73 	ENTRY(longjmp)
     74 	movl	4(%esp),%edx	/ first parameter after return addr
     75 	movl	8(%esp),%eax	/ second parameter
     76 	movl	0(%edx),%ebx	/ restore ebx
     77 	movl	4(%edx),%esi	/ restore esi
     78 	movl	8(%edx),%edi	/ restore edi
     79 	movl	12(%edx),%ebp	/ restore caller's ebp
     80 	movl	16(%edx),%esp	/ restore caller's esp
     82 	movl	24(%edx), %ecx
     83 	test	%ecx, %ecx	/ test flag word
     84 	jz	1f
     85 	xorl	%ecx, %ecx	/ if set, clear ul_siglink
     86 	movl	%ecx, %gs:UL_SIGLINK
     87 1:
     88 	test	%eax,%eax	/ if val != 0
     89 	jnz	1f		/ 	return val
     90 	incl	%eax		/ else return 1
     91 1:
     92 	jmp	*20(%edx)	/ return to caller
     93 	SET_SIZE(longjmp)

The function is pretty well commented, so we can follow along pretty easily. Basically we load the jmp_buf that was passed in into %edx, add 0x14 to that value and then jump to that piece of code. So now we know exactly what the address we’re returning to is. With this in hand, we only have two tasks left: transforming this address into a function and offset, and doing this all with a simple DTrace script. Solving the first problem is actually pretty easy. We can just use the DTrace uaddr function which will translate it into an address and offset for us. The script itself is now an exercise in copyin and arithmetic. Here’s the main part of the script:

 * Given a sigbuf translate that into where the longjmp is taking us.
 * On i386 the address is 0x14 into the jmp_buf.
 * On amd64 the address is 0x38 into the jmp_buf.

        uaddr(curpsinfo->pr_dmodel == PR_MODEL_ILP32 ?
            *(uint32_t *)copyin(arg0 + 0x14, sizeof (uint32_t)) :
            *(uint64_t *)copyin(arg0 + 0x38, sizeof (uint64_t)));

Now, if we run this, here’s what we get:

[root@bh1-build2 (bh1) /var/tmp]# dtrace -s longjmp.d $(pgrep -z rm mdb)
dtrace: script 'longjmp.d' matched 1 probe
CPU     ID                    FUNCTION:NAME
  8  69580                    longjmp:entry   mdb`mdb_run+0x38

Now we know exactly where we ended up after the longjmp(), and this will method will work on both 32-bit and 64-bit x86 systems. If you’d like to download the script, you can just download it from here.

Recent Posts

September 27, 2019
September 6, 2019
October 1, 2014