Commit Graph

2439 Commits

Author SHA1 Message Date
Skyler Saleh
f567fd93b9 Apple M1: Removed unavailable CPU core dialog box
Removed the unavailable CPU core dialog box that asked users to change their
selected CPU core to one that is available. Instead, Dolphin now just overrides
the core to the default, and logs that it performed the override.
2021-05-22 15:25:18 -07:00
Skyler Saleh
f92ccd5058 Apple M1: Fix bug that could cause crash with MMU
Added a Common::JITPageWriteDisableExecuteEnable() that could be missed when
a memory exception is triggered by the running game.
2021-05-22 15:25:18 -07:00
Skyler Saleh
948764d37b Apple M1: Build, Analytics, and Memory Management
Analytics:
- Incorporated fix to allow the full set of analytics that was recommended by
  spotlightishere

BuildMacOSUniversalBinary:
- The x86_64 slice for a universal binary is now built for 10.12
- The universal binary build script now can be configured though command line
  options instead of modifying the script itself.
- os.system calls were replaced with equivalent subprocess calls
- Formatting was reworked to be more PEP 8 compliant
- The script was refactored to make it more modular
- The com.apple.security.cs.disable-library-validation entitlement was removed

Memory Management:
- Changed the JITPageWrite*Execute*() functions to incorporate support for
  nesting

Other:
- Fixed several small lint errors
- Fixed doc and formatting mistakes
- Several small refactors to make things clearer
2021-05-22 15:25:17 -07:00
Skyler Saleh
4ecb3084b7 Apple M1 Support for MacOS
This commit adds support for compiling Dolphin for ARM on MacOS so that it can
run natively on the M1 processors without running through Rosseta2 emulation
providing a 30-50% performance speedup and less hitches from Rosseta2.

It consists of several key changes:

- Adding support for W^X allocation(MAP_JIT) for the ARM JIT
- Adding the machine context and config info to identify the M1 processor
- Additions to the build system and docs to support building universal binaries
- Adding code signing entitlements to access the MAP_JIT functionality
- Updating the MoltenVK libvulkan.dylib to a newer version with M1 support
2021-05-22 15:25:17 -07:00
Scott Mansell
610613ee76 Use correct mask for Fake VMem
Shouldn't have any behaviour change for regular usage as both masks are 32MB
by default.
But fixes theoretical buffer overrun when memory size override is used.
2021-05-23 05:54:02 +12:00
JosJuice
68a5fc55d2 Interpreter: Fix fctiwx rounding
The interpreter implementation of fctiwx was treating rounding
mode 0 as "round to nearest, ties towards zero", which is not
an actual IEEE-754 rounding mode. The IBM document mentioned
in a comment at the top of the function, on the other hand,
treats rounding mode 0 as "round to nearest, ties to even",
which makes more sense.

This fixes one of JMC's console-recorded F-Zero GX replays on
JitArm64. (JitArm64 uses an interpreter fallback for fctiwx.)
2021-05-22 17:28:04 +02:00
Mai M
1054abc9cc
Merge pull request #9712 from JosJuice/jitarm64-fmul-rounding
JitArm64: Fix fmul rounding issues
2021-05-20 10:25:02 -04:00
Mai M
5949a19fe6
Merge pull request #9714 from JosJuice/jitarm64-convert-fmov
JitArm64: Prefer using FMOV when doing single/double conversion
2021-05-20 10:24:36 -04:00
Mai M
6958df5967
Merge pull request #9695 from JosJuice/jitarm64-fres
JitArm64: Implement fres and frsqrte
2021-05-20 10:23:49 -04:00
Mai M
539c2cb00e
Merge pull request #9667 from Sintendo/jit64divwx2
Jit64: Minor divwx optimizations
2021-05-20 10:22:54 -04:00
JosJuice
11be2314fe JitArm64: Fix fmul rounding issues
This is a port of 4f18f60 to JitArm64.
2021-05-15 23:27:34 +02:00
JosJuice
66e912a252 PPCAnalyst: Treat frspx output as single 2021-05-15 23:27:33 +02:00
JosJuice
77afb0f4c3 PPCAnalyst: Apply "bitexact" analysis to fprIsSingle
This lets us set fprIsSingle to true in more cases.
2021-05-15 23:27:33 +02:00
JosJuice
e5f2dcd891 JitArm64: Implement FPRF updates for fres+frsqrte 2021-05-15 19:21:17 +02:00
JosJuice
4b3fda7906 JitArm64: Implement frsqrte 2021-05-15 19:21:15 +02:00
JosJuice
85226e09f0 JitArm64: Implement fres 2021-05-15 19:16:32 +02:00
JosJuice
8c12068a03 JitArm64: Prefer using FMOV when doing single/double conversion
FMOV is faster than INS and ties UMOV. (On all CPUs I checked,
at least. It certainly shouldn't be slower, though.)
2021-05-15 18:56:40 +02:00
JosJuice
b980797a16 PPCAnalyst: Fix broken bitexact analysis
`code` points to the first instruction in the block, not the
current instruction.
2021-05-15 18:19:04 +02:00
Mat M
24b9a64c11
Merge pull request #9690 from Sintendo/jit64divwux
Jit64: divwux - Prefer three-operand IMUL
2021-05-13 06:42:14 -04:00
JosJuice
bfe8b1068d JitArm64: Implement FPRF updates 2021-05-13 11:51:00 +02:00
Sintendo
2cafa0a960 Jit64: divwux - Prefer three-operand IMUL
By taking advantage of three-operand IMUL, we can eliminate a MOV
instruction. This is a small code size win. However, due to IMUL sign
extending the immediate value to 64 bits, we can only apply this when
the magic number's most significant bit is zero.

To ensure this can actually happen, we also minimize the magic number by
checking for trailing zeroes.

Example (Unsigned division by 18)
Before:
41 BE E4 38 8E E3    mov         r14d,0E38E38E4h
4D 0F AF F5          imul        r14,r13
49 C1 EE 24          shr         r14,24h

After:
4D 69 F5 39 8E E3 38 imul        r14,r13,38E38E39h
49 C1 EE 22          shr         r14,22h
2021-05-06 19:54:33 +02:00
JosJuice
b305e4cfc1 JitArm64: Fix JitRegister::Register call for cstd
Seems like I made a little copy-paste error.
2021-05-06 00:20:47 +02:00
Léo Lam
51bf2dca21
Merge pull request #9675 from JosJuice/jit64-div-80000000
Jit64: Fix UB/infinite loop when compiling division by 0x80000000
2021-04-26 23:50:27 +02:00
JosJuice
7d4b87e7ae Jit64: Fix UB/infinite loop when compiling division by 0x80000000 2021-04-26 23:42:03 +02:00
JosJuice
ac679eb24d
Merge pull request #9666 from leoetlino/jit-block-hashtable
Jit: Optimize block link queries by using hash tables
2021-04-25 18:45:41 +02:00
JosJuice
69c14d6ec3 JitArm64: Fix frspx with single precision source
I haven't observed this breaking any game, but it didn't match
the behavior of the interpreter as far as I could tell from
reading the code, in that denormals weren't being flushed.
2021-04-25 15:56:59 +02:00
JosJuice
54451ac731 JitArm64: Use ConvertSingleToDoubleLower in RW when faster 2021-04-25 15:56:59 +02:00
JosJuice
9d6263f306 JitArm64: Add unit tests for single/double conversion 2021-04-25 15:56:58 +02:00
JosJuice
2a9d88739c JitArm64: Skip accurate single/double conversion if store-safe 2021-04-25 15:56:58 +02:00
JosJuice
1d106ceaf5 JitArm64: Optimize ConvertSingleToDouble, part 2
If we can prove that FCVT will provide a correct conversion,
we can use FCVT. This makes the common case a bit faster
and the less likely cases (unfortunately including zero,
which FCVT actually can convert correctly) a bit slower.
2021-04-25 15:56:19 +02:00
JosJuice
018e247624 JitArm64: Optimize ConvertSingleToDouble, part 1 2021-04-25 15:56:19 +02:00
JosJuice
28e4869c43 JitArm64: Optimize ConvertDoubleToSingle 2021-04-25 15:56:19 +02:00
JosJuice
6e0a5876ef JitArm64: Use accurate single/double conversions
Our old conversion approach became a lot more inaccurate when
enabling flush-to-zero, to the point of obviously breaking games.
2021-04-25 15:56:19 +02:00
JosJuice
39eccf6603 JitArm64: Call RW before FCMPE in fselx
Needed because the next commit will make RW clobber flags.
2021-04-25 15:56:19 +02:00
JosJuice
949686bbe7 JitArm64: Factor out single/double conversion code to functions
Preparation for following commits.

This commit intentionally doesn't touch paired stores,
since paired stores are supposed to flush to zero.
(Consistent with Jit64.)
2021-04-25 15:56:19 +02:00
JosJuice
fdf7744a53 JitArm64: Move float conversion code out of EmitBackpatchRoutine
This simplifies some of the following commits. It does require
an extra register, but hey, we have 32 of them.

Something I think would be nice to add to the register cache
in the future is the ability to keep both the single and double
version of a guest register in two different host registers
when that is useful. That way, the extra register we write to
here can be read by a later instruction, saving us from
having to perform the same conversion again.
2021-04-25 15:56:19 +02:00
Léo Lam
aa3a96f048
Merge pull request #9644 from JosJuice/jit-fallback-discard
Jits: Fix interpreter fallback handling of discarded registers
2021-04-25 13:20:41 +02:00
JosJuice
b3b5016f54 Jits: Fix interpreter fallback handling of discarded registers
When the interpreter writes to a discarded register, its type
must be changed so that it is no longer considered discarded.

Fixes a 62ce1c7 regression.
2021-04-25 13:01:40 +02:00
Sintendo
47e16133e5 Jit64: divwx - Eliminate XOR for constant dividend
We normally check for division by zero to know if we should set the
destination register to zero with a XOR. However, when the divisor and
destination registers are the same the explicit zeroing can be omitted.
In addition, some of the surrounding branching can be simplified as
well.

Before:
45 85 FF             test        r15d,r15d
75 05                jne         normal_path
45 33 FF             xor         r15d,r15d
EB 0C                jmp         done
normal_path:
B8 5A 00 00 00       mov         eax,5Ah
99                   cdq
41 F7 FF             idiv        eax,r15d
44 8B F8             mov         r15d,eax
done:

After:
45 85 FF             test        r15d,r15d
74 0C                je          done
B8 5A 00 00 00       mov         eax,5Ah
99                   cdq
41 F7 FF             idiv        eax,r15d
44 8B F8             mov         r15d,eax
done:
2021-04-24 21:32:21 +02:00
Sintendo
abc4c8f601 Jit64: divwx - Eliminate MOV for division by power of 2
Division by a power of two can be slightly improved when the
destination and dividend registers are the same.

Before:
8B C6                mov         eax,esi
85 C0                test        eax,eax
8D 70 03             lea         esi,[rax+3]
0F 49 F0             cmovns      esi,eax
C1 FE 02             sar         esi,2

After:
85 F6                test        esi,esi
8D 46 03             lea         eax,[rsi+3]
0F 48 F0             cmovs       esi,eax
C1 FE 02             sar         esi,2
2021-04-24 19:28:23 +02:00
Sintendo
246adf0d6d Jit64: divwx - Eliminate MOV for division by 2
When destination and input registers match, a redundant MOV instruction
can be eliminated.

Before:
8B C7                mov         eax,edi
8B F8                mov         edi,eax
C1 EF 1F             shr         edi,1Fh
03 F8                add         edi,eax
D1 FF                sar         edi,1

After:
8B C7                mov         eax,edi
C1 EF 1F             shr         edi,1Fh
03 F8                add         edi,eax
D1 FF                sar         edi,1
2021-04-24 18:53:21 +02:00
Léo Lam
c812ab6a63
Jit: Optimize block link queries by using hash tables
Repeated erase() + iteration on a std::multimap is extremely slow.

Slow enough that it causes a 7 second long stutter during some
transitions in F-Zero X (a N64 VC game that triggers many, many icache
invalidations).

And slow enough that JitBaseBlockCache::DestroyBlock shows up on a
flame graph as taking >50% of total CPU time on the CPU-GPU thread:
https://i.imgur.com/vvqiFL6.png

This commit optimises those block link queries by replacing the
std::multimap (which is typically implemented with red-black trees)
with hash tables.

Master: https://i.imgur.com/vvqiFL6.png / 7s stutters
(starting from 5.0-2021 and with branch following disabled)

This commit: https://i.imgur.com/hAO74fy.png / ~0.7s stutters, which
is pretty close to 5.0 stable. (5.0-2021 introduced the performance
regression and it is especially noticeable when branch following
is disabled, which is the case for all N64 VC games since 5.0-8377.)
2021-04-24 17:20:59 +02:00
Lioncash
adebc499f9 Jit64: Indicate explicit [[fallthrough]] within load helper 2021-04-19 17:37:44 -04:00
JMC47
5322256065
Merge pull request #9625 from leoetlino/mmu-sdr-update
MMU: Fix SDR updates being silently dropped in some cases
2021-04-06 20:23:13 -04:00
Pokechu22
dad309d365 Disable ICache emulation for some games
Specifically, 'Scooby-Doo! Mystery Mayhem', 'Scooby-Doo! Unmasked', 'Ed, Edd n Eddy: The Mis-Edventures', and the Wii version of 'Happy Feet'.

The JIT cache causes problems with emulated icache invalidation in these games, resulting in areas failing to load.
2021-04-06 12:44:10 -07:00
Léo Lam
49edd5f482
MMU: Remove a bunch of useless swaps
The swaps are confusing and don't accomplish much.

It was originally written like this:

u32 pte = bswap(*(u32*)&base_mem[pteg_addr]);

then bswap was changed to Common::swap32, and then the array access
was replaced with Memory::Read_U32, leading to the useless swaps.
2021-04-06 18:25:29 +02:00
Léo Lam
960d957f4f
MMU: Fix SDR updates being silently dropped in some cases
While 6xx_pem.pdf §7.6.1.1 mentions that the number of trailing
zeros in HTABORG must be equal to the number of trailing ones
in the mask (i.e. HTABORG must be properly aligned), this is actually
not a hard requirement. Real hardware will just OR the base address
anyway. Ignoring SDR changes would lead to incorrect emulation.

Logging a warning instead of dropping the SDR update silently is a
saner behaviour.
2021-04-06 18:25:09 +02:00
JMC47
5222a4b7e5
Merge pull request #9585 from JosJuice/jitarm64-skip-carry
JitArm64: Skip calculating carry flag when not needed
2021-04-06 04:41:16 -04:00
JMC47
99d43362e6
Merge pull request #9351 from JosJuice/discard-registers
Jits: Discard registers which we know will be overwritten
2021-04-06 04:40:26 -04:00
Pokechu22
004dfd1586 Replace uses of cassert with Common/Assert.h 2021-04-02 10:18:18 -07:00