Commit Graph

63 Commits

Author SHA1 Message Date
Pierre Bourdon
3570c7f03a Reformat all the things. Have fun with merge conflicts. 2016-06-24 10:43:46 +02:00
Lioncash
d9a16f7c9c Interpreter: Remove unnecessary includes from Interpreter.h
Previously the JIT code relied on indirect inclusion from this header,
this gets rid of that.
2016-01-10 18:51:12 -05:00
Pringo
aaa67ca57e Update Issue Tracker Link in Comment 2015-12-02 09:01:28 -08:00
degasus
bceab37801 PPCAnalyst: Mark function as static 2015-08-09 11:15:44 +02:00
Tillmann Karras
0d3acbd9c7 PPCAnalyst: drop needless forward declarations 2015-08-06 10:39:43 +02:00
Scott Mansell
a31ebb9bcd PPCAnalyst: Don't swap instruction which might cause interrupts.
fcmpo and fcmpu can be matched by the REORDER_CMP pass, as they set CR0
and they can cause interrupts if the fpu is disabled.
So we add an extra check to make sure op a is an integer op too.
2015-07-07 22:37:48 +12:00
degasus
c375111076 Options: merge SCoreStartupParameter into SConfig 2015-06-12 19:07:45 +02:00
Tillmann Karras
30ebb2459e Set copyright year to when a file was created 2015-05-25 13:22:31 +02:00
Tillmann Karras
cefcb0ace9 Update license headers to GPLv2+ 2015-05-25 13:22:31 +02:00
Lioncash
a11bbe6fea PPCTables: Remove FL_OUT_S.
This is unused, and since it had the same value as FL_OUT_D, it was unnecessarily setting the rS register as an output, even on instructions that only have FL_OUT_D set.
2015-03-03 16:23:28 -05:00
Lioncash
7c244766dc Interpreter: Use correct destination for psq_l, psq_lx, psq_lu, and psq_lux.
Gekko manual defines these as storing to rD, not rS.

Also removed FL_OUT_FLOAT_S, since nothing uses it now.
2015-02-21 21:20:41 -05:00
magumagu
985087c7e2 Make all unknown opcodes behave consistently.
Consistently fall back to the interpreter for unknown instructions, and
make sure GetOpInfo() always returns a non-null pointer.
2015-02-11 22:18:33 -08:00
magumagu
e136c8a066 PowerPC: misc cleanup. 2015-02-11 13:56:36 -08:00
magumagu
ac54c6a4e2 Make address translation respect the CPU translation mode.
The PowerPC CPU has bits in MSR (DR and IR) which control whether
addresses are translated. We should respect these instead of mixing
physical addresses and translated addresses into the same address space.

This is mostly mass-renaming calls to memory accesses APIs from places
which expect address translation to use a different version from those
which do not expect address translation.

This does very little on its own, but it's the first step to a correct BAT
implementation.
2015-02-11 13:56:22 -08:00
magumagu
0f96a0104e Merge pull request #1752 from Buddybenj/clean-up
Clean Up
2015-02-10 11:39:14 -08:00
Lioncash
e07679114b Use emplace_* functions where in-place construction is preferable 2015-02-04 11:39:08 -05:00
Benjamin Przybocki
4f324ad742 Clean Up 2015-01-24 17:10:21 -06:00
CarlKenner
0ab1517134 Skip zeroes that sometimes pad function to 16 byte boundary (eg. Donkey Kong Country Returns).
This fixes function detection in the debugger, and prevents functions showing up as four bytes inside another function.
2015-01-22 02:00:35 +10:30
Fiora
e8cfcd3aeb JIT: make instruction merging generic
Now it should be easier to merge more than 2-instruction-long sequences.
Also correct some minor inconsistencies in behavior between instruction
merging cases.
2015-01-11 09:11:18 -08:00
Fiora
8237004448 JIT: optimize for the common case of unquantized psq_l/st
Optimistically assume used GQRs are 0 in blocks that only use one GQR, and
bail at the start of the block and recompile if that assumption fails.

Many games use almost entirely unquantized stores (e.g. Rebel Strike, Sonic
Colors), so this will likely be a big performance improvement across the board
for games with heavy use of paired singles.
2015-01-10 14:14:43 -08:00
Dolphin Bot
89b7f1057f Merge pull request #1804 from FioraAeterna/fastermmu2_master
MMU: various improvements, bugfixes, optimizations
2015-01-07 00:49:58 +01:00
Fiora
8f7c799794 JIT: catch illegal instruction errors
Still crash, but at least give a message informing the world that it happened.
2015-01-06 11:06:49 -08:00
Fiora
e85f0ff179 MMU: fix problems with blocks that cross vmem page boundaries
In rare cases, this can result in a violation of the JIT block cache constraint
that blocks must end in the same place. This can cause instability, lockups,
due to blocks not properly being invalidated properly.
l Please enter the commit message for your changes. Lines starting
2015-01-05 10:46:04 -08:00
Fiora
c2ed29fe0d MemmapFunctions: various MMU optimizations
Small TLB lookup optimizations: this is the hot path for MMU code, so try to
make it better.

Template the TLB lookup functions based on the lookup type (opcode, data,
no exception).

Clean up the Read/Write functions and make them more consistent.

Add an early-exit path for MMU accesses to ReadFromHardware/WriteToHardware.
2015-01-05 10:34:55 -08:00
Fiora
e7a49ae5f3 Eliminate some spammy log messages in MMU mode
dcbz: just don't use GetPointer, that can't be right anyways
ppcanalyst: don't print "instruction hex 0" messages in MMU mode, where ISIs
are expected.
2014-12-21 12:41:44 -08:00
skidau
693f413364 Updated C bit on TLB cache hits.
Added TLB state to the save state file.
2014-12-05 14:29:13 +11:00
Ryan Houdek
08660c89ad Fix register usage detection in PPCAnalyst.
lmw/stmw weren't properly setting input and output registers since they use multiple registers.
dcbz was just missing a flag in the instruction tables.
2014-12-02 16:12:33 -06:00
Fiora
72c96c20d3 JIT: more optimizing of float ops based on known input characteristics
If the inputs are both float singles, and the top half is known to be identical
to the bottom half, we can use packed arithmetic instead of scalar to skip
the movddup.

This is slower on a few rather old CPUs, plus the Atom+Silvermont, so detect
Atom and disable it in that case.

Also avoid PPC_FP on stores if we know that the output came from a float op.
2014-11-29 11:33:11 -08:00
Fiora
7df50b0710 JIT: skip weird fmul rounding if the input is known to be single precision 2014-11-29 11:30:51 -08:00
Fiora
97fba41860 JIT: merge fcmpx and cror
Almost all uses of boolean condition-register ops in real code seem to be
the combination fcmpx + cror (e.g. for <= or >=). This merges the two.
2014-10-29 00:30:27 -07:00
comex
b6a7438053 Add BitSet and, as a test, convert some JitRegCache stuff to it.
This is a higher level, more concise wrapper for bitsets which supports
efficiently counting and iterating over set bits.  It's similar to
std::bitset, but the latter does not support efficient iteration (and at
least in libc++, the count algorithm is subpar, not that it really
matters).  The converted uses include both bitsets and, notably,
considerably less efficient regular arrays (for in/out registers in
PPCAnalyst).

Unfortunately, this may slightly pessimize unoptimized builds.
2014-10-25 16:56:51 -04:00
Fiora
7ba9a8537b JIT: add basic register allocation heuristics
Should be at least a bit better than the previous LRU approach. Currently
has two basic components: whether a register is dirty (dirty registers need
to be stored, so clobbering them hurts more) and how many other registers will
be used between now and the next time a register gets used.

Also don't pre-load values that don't need to be in registers.
2014-10-09 20:09:14 -07:00
Fiora
8fe730194b JIT: load registers if they're going to be used later in the block 2014-10-03 11:58:04 -07:00
skidau
4b37fdfa45 Added a CompileExceptionCheck function to the JitInterface and re-routed the existing code to utilise the interface. 2014-09-27 20:16:26 +10:00
Fiora
bfab5f1e91 JIT: generic branch merging
Why merge just cmps and rlwinm when we can merge ALL the branches?
2014-09-24 12:34:18 -07:00
Fiora
f103234e2b JIT: flush a register if it won't be used for the rest of the block
This should dramatically reduce code size in the case of blocks with
lots of branches, and certainly doesn't hurt elsewhere either.

This can probably be improved a good bit through smarter tracking of register
usage, e.g. discarding registers that are going to be overwritten, but this
is a good start and should help reduce code size and register pressure.
Unlike that sort of change, this is a "safe" patch; it only flushes registers,
which can't affect correctness, unlike actually discarding data.

As part of this, refactor PPCAnalyst to support distinguishing between
float and integer registers (to properly handle instructions that access
both, like floating-point loads and stores).

Also update every instruction in the interpreter flags table I could find
that didn't have all the correct flags.
2014-09-22 16:00:25 -07:00
Fiora
08ac10d00a PPCAnalyst/JIT: add ability to easily toggle branch and carry merging 2014-09-13 13:48:24 -07:00
Fiora
54129a8ca5 PPCAnalyst: refactor, add carry op reordering and non-cmp reordering
Tries as hard as possible to push carry-using operations (like addc and adde)
next to each other. Refactor the instruction reordering to be more flexible
and allow multiple passes.

353 -> 192 x86 instructions on a carry-heavy code block in Pokemon Puzzle.
12% faster overall in Pokemon Puzzle; probably less in typical games (Virtual
Console games seem to be carry-heavy for some reason; maybe a different
compiler?)
2014-09-13 13:48:23 -07:00
Fiora
45d84605a9 JIT64: optimize carry calculations further
Keep carry flags in the x86 flags register if used in the next instruction.
2014-09-13 13:48:20 -07:00
Fiora
bea2504a51 JIT64: optimize carry calculations
Omit carry calculations that get overwritten later in the block before they're
used. Very common in the case of srawix and friends.
2014-09-13 13:47:43 -07:00
Ryan Houdek
b8d4834cb1 Fix the return value of PPCAnalyst.
In situations where conditional continue isn't supported + if a JIT doesn't implement a instruction that has the FL_ENDBLOCK flag. This would cause an
infinite loop.
In reality all the JITs should implement every FL_ENDBLOCK instruction regardless, but JITIL doesn't implement tw/twi which are FL_ENDBLOCK
instructions.
2014-09-10 21:33:17 -05:00
Rachel Bryk
f93aa7087c Kill Core::g_CoreStartupParameter. 2014-09-09 00:24:49 -04:00
Fiora
07e0c917c6 Revert "JIT64: optimize CA calculations" 2014-09-05 10:26:30 -07:00
Fiora
3aa40dab00 JIT64: optimize carry calculations
Omit carry calculations that get overwritten later in the block before they're
used. Very common in the case of srawix and friends.
2014-09-01 20:41:48 -07:00
Lioncash
beb95b75ca PPCAnalyst: Use std::swap instead of making a temporary variable 2014-08-30 18:32:09 -04:00
Lioncash
eb535be874 Core: Clean up brace placements 2014-08-30 18:06:49 -04:00
Fiora
7dbc623dc0 JIT: Initial FPRF support
Doesn't support all the FPSCR flags, just the FPRF ones.
Add PPCAnalyzer support to remove unnecessary FPRF calculations.

POV-ray benchmark with enableFPRF forced on for an extreme comparison:
Before: 1500s
After, fmul/fmadd only: 728s
After, all float: 753s

In real games that use FPRF, like F-Zero GX, FPRF previously cost a few percent
of total runtime.

Since FPRF is so much faster now, if enableFPRF is set, just do it for every
float instruction, not just fmul/fmadd like before. I don't know if this will
fix any games, but there's little good reason not to.
2014-08-26 10:57:03 -07:00
Fiora
5c0145f71b PPCAnalyzer: move num_instructions initialization to correct place
Much of the PPC Analyzer code (e.g. instruction reordering for merging
branches) wasn't actually being run.
2014-08-21 11:19:23 -07:00
Lioncash
4759510f70 Get rid of instances of "using namespace std;" in the project 2014-08-17 02:05:33 -04:00
Lioncash
a46a500b94 Core: Fix warnings on Linux related to the JIT 2014-08-02 16:15:20 -04:00