Well, so when you run SSA, it tends to break up variable lifetimes into smaller chunks; so they’ll go in and out of registers… Right now they’re still homed to the same stack slot, but over this basic block it might be in a register, and then a little while later it might be in a different register, so we have to emit the debugging information that describes that value movement in and out of registers. That’s something that we really want to get done at 1.10.
We have been getting more and more trouble with loop. So I’ve mentioned that the cooperative scheduling in Go is enforced by the compiler, and right now it’s kind of lightly enforced. It enforces it when you enter a function or method. But if you are running in a tight loop that has no function calls within it, it does not enforce any cooperation there. This is a particular problem – Rhys Hiltner mentioned this in his tutorial or his talk also at GopherCon – where the garbage collector needs to interrupt all the threads right at the beginning of a GC, just for a few microseconds, but it does need to interrupt all of them. It does this by asking them to reschedule themselves; they all reschedule, they discover that a garbage collection is in process and they go stand in a corner and wait till the GC does its thing and then says “Yeah, back in the pool. Go!”
[00:16:21.10] Then there’s this one guy running a tight loop, and the GC tries to tap him on the shoulder and he does not respond… And does not respond, and does not respond, and does not respond, so everything hangs up, and it can be an appreciable fraction of your pause time for GC. In some rare cases, it can be long.
We need to fix that and we need to change the compiler to add a preemption check on every loop package. The problem with that is it slows down your loops a little bit, and some loops it slows down a lot. So there’s follow-up work to try to figure out if we can mitigate this cost using a clever implementation. We have already tried loop unrolling, and for whatever reason it was not helpful. Either we did it wrong… We probably did it wrong, because we did it kind of in a very bloody-minded way. Just take the loop, don’t get smart about the indexing or anything, just do the check over and over — I wanna say make two copies of the body, but check after every execution of the body, so twice per loop, whereas in many counted loops you could say “Well, I’m gonna unroll by two, increment by two, and then I’ll worry about the odd case at the end.” We didn’t do anything that clever… So that’s also for 1.10, dealing with that and the knock-on problems there.
The garbage collector guys are looking into whether they can make generational collection work, and that will add a write barrier that’s gonna be on all the time, which will then motivate — it’s still like a lot harder to write barrier optimizations. I don’t know who’s gonna be doing that. It might be me, it might be somebody else, but we’re certainly motivated to look at it.