Just the other day I was working with Ryan Dahl on debugging an issue he hit while working on adding support for Crankshaft — the new JIT for Google’s v8 — for SunOS. This came about from Bryan’s discovery of what can happen when magic collides. Now, this is a rather delicate operation and there is a lot of special stuff that is going on. Since Ryan and I had an interesting little debugging session and both learned something, I thought I’d share a bit of what was going on with an explanation.
As a part of Crankshaft, they are firing a signal to do a bit of the profiling. Some of the code that is in bleeding edge for src/platform-solaris.cc currently looks like:
615 static void ProfilerSignalHandler(int signal, siginfo_t* info, void* context) { 616 USE(info); 617 if (signal != SIGPROF) return; 618 if (active_sampler_ == NULL || !active_sampler_->IsActive()) return; 619 if (vm_tid_ != pthread_self()) return; 620 621 TickSample sample_obj; 622 TickSample* sample = CpuProfiler::TickSampleEvent(); 623 if (sample == NULL) sample = &sample_obj; 624 625 // Extracting the sample from the context is extremely machine dependent. 626 ucontext_t* ucontext = reinterpret_cast(context); 627 mcontext_t& mcontext = ucontext->uc_mcontext; 628 sample->state = Top::current_vm_state(); 629 630 #if V8_HOST_ARCH_IA32 631 sample->pc = reinterpret_cast(mcontext.gregs[KDIREG_EIP]); 632 sample->sp = reinterpret_cast(mcontext.gregs[KDIREG_ESP]); 633 sample->fp = reinterpret_cast(mcontext.gregs[KDIREG_EBP]); 634 #elif V8_HOST_ARCH_X64 635 sample->pc = reinterpret_cast(mcontext.gregs[KDIREG_RIP]); 636 sample->sp = reinterpret_cast(mcontext.gregs[KDIREG_RSP]); 637 sample->fp = reinterpret_cast(mcontext.gregs[KDIREG_RBP]); 638 #else 639 UNIMPLEMENTED(); 640 #endif 641 active_sampler_->SampleStack(sample); 642 active_sampler_->Tick(sample); 643 }
Now for those of you who have spent a long time working with SunOS might notice what’s wrong with this right away. But in some ways it’s not quite so obvious, so let’s talk about what’s happening.
This code is being used as a signal handler, specifically for SIGPROF. If we pull up the manual page for sigaction(2), the Solaris version has the following comment in its notes section:
NOTES The handler routine can be declared: void handler (int sig, siginfo_t *sip, ucontext_t *ucp); The sig argument is the signal number. The sip argument is a pointer (to space on the stack) to a siginfo_t structure, which provides additional detail about the delivery of the signal. The ucp argument is a pointer (again to space on the stack) to a ucontext_t structure (defined in <sys/ucon- text.h>) which contains the context from before the signal. It is not recommended that ucp be used by the handler to restore the context from before the signal delivery. SunOS 5.11 Last change: 23 Mar 2005 5
When a signal is delivered on an x86 UNIX system a program stops doing what it is currently doing and if there is a signal handler, executes the code for the signal handler and then returns to what it was previously doing (this is a bit more complicated in a multi-threaded program). We generally describe this as a signal interrupting the thread in question. This third argument to the handler is a context, which is all the information necessary to describe where a user program is executing. If we peek our heads into <sys/ucontext.h> on an x86 based system (the SPARC version is different)) we will find the following declaration for the structure (with a few #ifdefs along for the ride):
75 #if !defined(_XPG4_2) || defined(__EXTENSIONS__) 76 struct ucontext { 77 #else 78 struct __ucontext { 79 #endif 80 unsigned long uc_flags; 81 ucontext_t *uc_link; 82 sigset_t uc_sigmask; 83 stack_t uc_stack; 84 mcontext_t uc_mcontext; 85 long uc_filler[5]; /* see ABI spec for Intel386 */ 86 };
Specifically here we are interested in the mcontext — what v8 is using. To best understand what the mcontext is, I took a look at what the OpenGroup defines for ucontext.h in SUSv2. They have the following to say about the mcontext:
mcontext_t uc_mcontext a machine-specific representation of the saved context
More specifically the mcontext_t has two members. From <sys/regset.h> we get:
378 /* 379 * Structure mcontext defines the complete hardware machine state. 380 * (This structure is specified in the i386 ABI suppl.) 381 */ 382 typedef struct { 383 gregset_t gregs; /* general register set */ 384 fpregset_t fpregs; /* floating point register set */ 385 } mcontext_t;
Well, that’s exactly what v8 is looking for. From the code snippet there, they are saving three registers that describe how the machine works:
- The Base Pointer – ebp for i386 and rbp on amd64
- The Instruction Pointer – eip for i386 and rip for amd64
- The Stack Pointer – esp for i386 and rsp on amd64
Now keeping track of what each of these does can be quite confusing, so let’s do a quick review.
The instruction pointer holds the address of the next assembly instruction that the CPU should execute for this program. The Base Pointer and Stack Pointer are unfortunately, not quite as intuitive. Memory is laid out in the stack from high addresses towards low addresses. The stack pointer tells us where the bottom of the stack is, i.e. if we decrement the address we can store a new value. When we use the stack, we break it up into what are called stack frames. A stack frame contains everything necessary to run a function: arguments to the function, copies of registers that are expected to be saved, the instruction to return to after the function completes (the eip) and a pointer to the previous stack frame. The ebp points into the current stack frame.
After this brief interlude, we now return to the code that we were working on v8 src/platform_solaris.cc. Now, every so often that code would segfault. With a brief bit of debugging work and comparing the registers before the interrupt was taken with those in the mcontext, we found that we were using the wrong value! Now, if you look back, you’ll see that we’re using macros with prefix KDIREG. These are generally gotten from <sys/kdi_regs.h>. Specifically the definitions used are architecture dependent and for x86 will be found in <ia32/sys/kdi_regs.h> and in <amd64/sys/kdi_regs.h> for amd64. This is the interface that kmdb uses for operating.
In this context, kdi stands for the Kernel/Debugger Interface. So these definitions are meant for structures that are using that interface. When we specified KDIREGS_ESP the value it ended up actually getting out of the register actually was giving us the register ECX. ECX can be used as a general purpose and historically CX was used for loop counters, so the chances that we’re getting an invalid address are pretty high.
However, it turned out it was not too hard to use the correct registers. Looking at <sys/regset.h> had the answer right in front of us:
91 /* 92 * The names and offsets defined here are specified by i386 ABI suppl. 93 */ 94 95 #define SS 18 /* only stored on a privilege transition */ 96 #define UESP 17 /* only stored on a privilege transition */ 97 #define EFL 16 98 #define CS 15 99 #define EIP 14 100 #define ERR 13 101 #define TRAPNO 12 102 #define EAX 11 103 #define ECX 10 104 #define EDX 9 105 #define EBX 8 106 #define ESP 7 107 #define EBP 6 108 #define ESI 5 109 #define EDI 4 110 #define DS 3 111 #define ES 2 112 #define FS 1 113 #define GS 0
This led us to making the obvious substitutions:
630 #if V8_HOST_ARCH_IA32 631 sample->pc = reinterpret_cast(mcontext.gregs[EIP]); 632 sample->sp = reinterpret_cast(mcontext.gregs[ESP]); 633 sample->fp = reinterpret_cast(mcontext.gregs[EBP]);
Well, actually it was almost too obvious, because it segfaulted as well in the same location. However, instead of using address 0xf (a reasonable value for ECX), it actually had 0x0 in the ESP register! Now wait a minute, this is what tells us where the bottom of the stack is, that’s not right, if the bottom of the stack is at 0 we’re in a lot of trouble.
Now, on Solaris x86/amd64 we take interrupts on the stack. These days, most systems use a 1:1 threading model (for reasons why, ask Bryan or read his paper) so for each userland thread there is a kernel thread that corresponds to it which means that each thread has a stack in both userland and the kernel. So here ESP really could be called KESP — referring to the ESP of the kernel thread. So really what we are interested in here is the ESP for userland or the register UESP.
Now that we know that we need to be using UESP, I took another look at the header file and found the following snippet:
115 /* aliases for portability */ 116 117 #if defined(__amd64) 118 119 #define REG_PC REG_RIP 120 #define REG_FP REG_RBP 121 #define REG_SP REG_RSP 122 #define REG_PS REG_RFL 123 #define REG_R0 REG_RAX 124 #define REG_R1 REG_RDX 125 126 #else /* __i386 */ 127 128 #define REG_PC EIP 129 #define REG_FP EBP 130 #define REG_SP UESP 131 #define REG_PS EFL 132 #define REG_R0 EAX 133 #define REG_R1 EDX
One of the nice things about this here is that it makes it easier to write code that works across both the x86 and amd64 architectures. Of course, this doesn’t really work when looking at SPARC platforms because the ABI and calling conventions are different due to the differences in CPU architecture. This is one of the things that I personally enjoy about SunOS. The act of defining these more portable aliases is really helpful and if we ever get a 128 bit processor for some reason, those macros will be extended to make sense for it as well. Those portable definitions allowed us to take those architecture ifdefs and just replace it with the following three lines:
631 sample->pc = reinterpret_cast(mcontext.gregs[REG_PC]); 632 sample->sp = reinterpret_cast(mcontext.gregs[REG_SP]); 633 sample->fp = reinterpret_cast(mcontext.gregs[REG_FP]);
That’s about it for our little trip down to sys/regset.h. The fix should hopefully land in v8 (it may even have by the time I get around to posting this) shortly. It should be fun to play around with node and a proper Crankshaft on v8.