This makes codegen faster (by perhaps 10-20% in the case of Jit64,
I didn't measure too closely), which helps speed up NBA Live 2005
a little. But the game still has serious performance issues.
My change in 867cd99 caused farcode to fill up much more than before.
Most games still ran fine, but certain games would have multiple
cache clears per minute, on top of being overall slow.
Maybe there's something prettier we can do about this than just
allocating more RAM, but we have the resource budget for making
Dolphin use more RAM, so I consider increasing the size of the
cache to be a good solution at least for the time being.
At least for Prince of Persia: The Forgotten Sands, 48 MB isn't
enough to stop the cache clears, but 64 MB is. (I only played the
game for a few minutes when testing, though.)
Being able to preserve the address register is useful for the
next commit, and W0 is the address register used for loads. Saving
the address register used for stores, W1, was already supported.
If a host register has been newly allocated for the destination
guest register, and the load triggers an exception, we must make
sure to not write the old value in the host register into ppcState.
This commit achieves this by not marking the register as dirty
until after the load is done.
We implement this by first rounding to nearest integer using the current
rouding mode, then converting this value from floating point to an integral
value.
GCC complains about float_emit being null when inlining
ByteswapAfterLoad into MMIOLoadToReg. ByteswapAfterLoad
does dereference float_emit, but only when passing FLAG_FLOAT,
which MMIOLoadToReg has an assert for and does not support.
Also cleaning up some unnecessarily specified namespaces while
I'm at it.
Reusing the slowmem handlers of existing blocks meshes badly
with reusing the empty space left when destroying blocks.
I don't think reusing slowmem handlers was much of a gain anyway.
This is done entirely through interpreter fallbacks. It would
probably be possible to implement this using host exception
handlers instead, but I think it would be a lot of complexity
for a rarely used feature, so let's not do it for now.
For performance reasons, there are two settings for this feature:
One setting which does enables just what True Crime: New York City
needs and one setting which enables it all. The latter makes
almost all float instructions fall back to the interpreter.
If W0 is locked when fpr.RW is called, the indirectly called
ConvertSingleToDoubleLower may need to emit a push+pop, so it's
better for fresx/frsqrtex to call RW before locking W0 than after.