Closed
Description
Even though the 386 has a 4-byte word size, some instructions move data in larger chunks. In particular float64 loads move 8 bytes and some SSE2 operations (as yet unused) move even larger amounts. A float64 load from an only-4-byte-aligned address is significantly more expensive than one from a 8-byte-aligned address. The same is probably true of SSE2 too. We should make sure that data symbols >32 bits (e.g., float64 constants) are aligned on 8-byte boundaries in the data segment. (Fairly easy.) We should also make sure that the stack pointer can be relied upon to be 8-byte aligned (harder), by making stack frame sizes = -4 mod 8 (the caller PC will add 4 more) and starting new goroutines with a properly aligned stack pointer. It might be worth using 16 bytes for the stack alignment. OS X requires that for their own ABI, presumably because it matters for SSE2.