This PR aims to reduce the memory usage in the CPU page table by moving
GPU specific parameters into a child class. This saves 1Gb of Memory for
most games.
An implementation of the cemuhook motion/touch protocol, this adds the
ability for users to connect several different devices to citra to send
direct motion and touch data to citra.
Co-Authored-By: jroweboy <jroweboy@gmail.com>
We relies on UNREACHABLE's noreturn attribute to eliminate parent's "no return value" warning. However, this was wrapped in a `if(!false)` block, which compilers may not unfold to recognize the noreturn nature.
Makes the header more general for other potential algorithms in the
future. While we're at it, include a missing <functional> include to
satisfy the use of std::less.
This was related to the source allocator being passed into the
constructor potentially having a different type than allocator being
constructed.
We simply need to provide a constructor to handle this case.
This resolves issues related to the allocator causing debug builds on
MSVC to fail.
Implement VOTE using Nvidia's intrinsics. Documentation about these can
be found here
https://developer.nvidia.com/reading-between-threads-shader-intrinsics
Instead of using portable ARB instructions I opted to use Nvidia
intrinsics because these are the closest we have to how Tegra X1
hardware renders.
To stub VOTE on non-Nvidia drivers (including nouveau) this commit
simulates a GPU with a warp size of one, returning what is meaningful
for the instruction being emulated:
* anyThreadNV(value) -> value
* allThreadsNV(value) -> value
* allThreadsEqualNV(value) -> true
ballotARB, also known as "uint64_t(activeThreadsNV())", emits
VOTE.ANY Rd, PT, PT;
on nouveau's compiler. This doesn't match exactly to Nvidia's code
VOTE.ALL Rd, PT, PT;
Which is emulated with activeThreadsNV() by this commit. In theory this
shouldn't really matter since .ANY, .ALL and .EQ affect the predicates
(set to PT on those cases) and not the registers.
This commit ensures that all backing memory allocated for the Guest CPU
is aligned to 256 bytes. This due to how gpu memory works and the heavy
constraints it has in the alignment of physical memory.
Instead of storing all block width, height and depths in their shifted
form:
block_width = 1U << block_shift;
Store them like they are provided by the emulated hardware (their
block_shift form). This way we can avoid doing the costly
Common::AlignUp operation to align texture sizes and drop CPU integer
divisions with bitwise logic (defined in Common::AlignBits).
Avoids potentially performing multiple reallocations (depending on the
size of the input data) by reserving the necessary amount of memory
ahead of time.
This is trivially doable, so there's no harm in it.
These can be generified together by using a concept type to designate
them. This also has the benefit of not making copies of potentially very
large arrays.
Allows for things such as:
auto rect = Common::Rectangle{0, 0, 0, 0};
as opposed to being required to explicitly write out the underlying
type, such as:
auto rect = Common::Rectangle<int>{0, 0, 0, 0};
The only requirement for the deduction is that all constructor arguments
be the same type.
nullptr was being returned in the error case, which, at a glance may
seem perfectly OK... until you realize that std::string has the
invariant that it may not be constructed from a null pointer. This
means that if this error case was ever hit, then the application would
most likely crash from a thrown exception in std::string's constructor.
Instead, we can change the function to return an optional value,
indicating if a failure occurred.
Makes the parameter ordering consistent, and also makes the filename
parameter a std::string. A std::string would be constructed anyways with
the previous code, as IOFile's only constructor with a filepath is one
taking a std::string.
We can also make WriteStringToFile's string parameter utilize a
std::string_view for the string, making use of our previous changes to
IOFile.
We don't need to force the usage of a std::string here, and can instead
use a std::string_view, which allows writing out other forms of strings
(e.g. C-style strings) without any unnecessary heap allocations.
Uses arithmetic that can be identified more trivially by compilers for
optimizations. e.g. Rather than shifting the halves of the value and
then swapping and combining them, we can swap them in place.
e.g. for the original swap32 code on x86-64, clang 8.0 would generate:
mov ecx, edi
rol cx, 8
shl ecx, 16
shr edi, 16
rol di, 8
movzx eax, di
or eax, ecx
ret
while GCC 8.3 would generate the ideal:
mov eax, edi
bswap eax
ret
now both generate the same optimal output.
MSVC used to generate the following with the old code:
mov eax, ecx
rol cx, 8
shr eax, 16
rol ax, 8
movzx ecx, cx
movzx eax, ax
shl ecx, 16
or eax, ecx
ret 0
Now MSVC also generates a similar, but equally optimal result as clang/GCC:
bswap ecx
mov eax, ecx
ret 0
====
In the swap64 case, for the original code, clang 8.0 would generate:
mov eax, edi
bswap eax
shl rax, 32
shr rdi, 32
bswap edi
or rax, rdi
ret
(almost there, but still missing the mark)
while, again, GCC 8.3 would generate the more ideal:
mov rax, rdi
bswap rax
ret
now clang also generates the optimal sequence for this fallback as well.
This is a case where MSVC unfortunately falls short, despite the new
code, this one still generates a doozy of an output.
mov r8, rcx
mov r9, rcx
mov rax, 71776119061217280
mov rdx, r8
and r9, rax
and edx, 65280
mov rax, rcx
shr rax, 16
or r9, rax
mov rax, rcx
shr r9, 16
mov rcx, 280375465082880
and rax, rcx
mov rcx, 1095216660480
or r9, rax
mov rax, r8
and rax, rcx
shr r9, 16
or r9, rax
mov rcx, r8
mov rax, r8
shr r9, 8
shl rax, 16
and ecx, 16711680
or rdx, rax
mov eax, -16777216
and rax, r8
shl rdx, 16
or rdx, rcx
shl rdx, 16
or rax, rdx
shl rax, 8
or rax, r9
ret 0
which is pretty unfortunate.
Allows the compiler to inform when the result of a swap function is
being ignored (which is 100% a bug in all usage scenarios). We also mark
them noexcept to allow other functions using them to be able to be
marked as noexcept and play nicely with things that potentially inspect
"nothrowability".
Including every OS' own built-in byte swapping functions is kind of
undesirable, since it adds yet another build path to ensure compilation
succeeds on.
Given we only support clang, GCC, and MSVC for the time being, we can
utilize their built-in functions directly instead of going through the
OS's API functions.
This shrinks the overall code down to just
if (msvc)
use msvc's functions
else if (clang or gcc)
use clang/gcc's builtins
else
use the slow path
The template type here is actually a forwarding reference, not an rvalue
reference in this case, so it's more appropriate to use std::forward to
preserve the value category of the type being moved.
Makes the return type consistently uniform (like the intrinsics we're
wrapping). This also conveniently silences a truncation warning within
the kernel multi_level_queue.
Since C++17, the introduction of deduction guides for locking facilities
means that we no longer need to hardcode the mutex type into the locks
themselves, making it easier to switch mutex types, should it ever be
necessary in the future.
Many of these functions are carried over from Dolphin (where they aren't
used anymore). Given these have no use (and we really shouldn't be
screwing around with OS-specific thread scheduler handling from the
emulator, these can be removed.
The function for setting the thread name is left, however, since it can
have debugging utility usages.
When #2247 was created, thread_queue_list.h was the only user of
boost-related code, however #2252 moved the page table struct into
common, which makes use of Boost.ICL, so we need to add the dependency
to the common library's link interface again.
We really don't need to pull in several headers of boost related
machinery just to perform the erase-remove idiom (particularly with
C++20 around the corner, which adds universal container std::erase and
std::erase_if, which we can just use instead).
With this, we don't need to link in anything boost-related into common.
This makes the class much more flexible and doesn't make performing
copies with classes that contain a bitfield member a pain.
Given BitField instances are only intended to be used within unions, the
fact the full storage value would be copied isn't a big concern (only
sizeof(union_type) would be copied anyways).
While we're at it, provide defaulted move constructors for consistency.
Moves local global state into the Impl class itself and initializes it
at the creation of the instance instead of in the function.
This makes it nicer for weakly-ordered architectures, given the
CreateEntry() class won't need to have atomic loads executed for each
individual call to the CreateEntry class.
Makes it consistent with the regular standard containers in terms of
size representation. This also gets rid of dependence on our own
type aliases, removing the need for an include.
The necessity of this parameter is dubious at best, and in 2019 probably
offers completely negligible savings as opposed to just leaving this
enabled. This removes it and simplifies the overall interface.
This is compromise for swap type being used in union. A union has deleted default constructor if it has at least one variant member with non-trivial default constructor, and no variant member of T has a default member initializer. In the use case of Bitfield, all variant members will be the swap type on endianness mismatch, which would all have non-trivial default constructor if default value is specified, and non of them can have member initializer
Original reason:
As Windows multi-byte character codec is unspecified while we always assume std::string uses UTF-8 in our code base, this can output gibberish when the string contains non-ASCII characters. ::OutputDebugStringW combined with Common::UTF8ToUTF16W is preferred here.
While admirable as a means to ensure immutability, this has the
unfortunate downside of making the class non-movable. std::move cannot
actually perform a move operation if the provided operand has const data
members (std::move acts as an operation to "slide" resources out of an
object instance). Given Barrier contains move-only types such as
std::mutex, this can lead to confusing error messages if an object ever
contained a Barrier instance and said object was attempted to be moved.
This is also unused and superceded by standard functionality. The
standard library provides std::this_thread::sleep_for(), which provides
a much more flexible interface, as different time units can be used with
it.
This is an old function that's no longer necessary. C++11 introduced
proper threading support to the language and a thread ID can be
retrieved via std::this_thread::get_id() if it's ever needed.
This is an analog of BitSet from Dolphin that was introduced to allow
iterating over a set of bits. Given it's currently unused, and given
that std::bitset exists, we can remove this. If it's ever needed in the
future it can be brought back.
Xbyak is currently entirely unused. Rather than carting it along, remove
it and get rid of a dependency. If it's ever needed in the future, then
it can be re-added (and likely be more up to date at that point in
time).
Currently, there's no way to specify if an assertion should
conditionally occur due to unimplemented behavior. This is useful when
something is only partially implemented (e.g. due to ongoing RE work).
In particular, this would be useful within the graphics code.
The rationale behind this is it allows a dev to disable unimplemented
feature assertions (which can occur in an unrelated work area), while
still enabling regular assertions, which act as behavior guards for
conditions or states which must not occur. Previously, the only way a
dev could temporarily disable asserts, was to disable the regular
assertion macros, which has the downside of also disabling, well, the
regular assertions which hold more sanitizing value, as opposed to
unimplemented feature assertions.
Currently, this was only performing a logging call, which doesn't
actually invoke any assertion behavior. This is unlike
UNIMPLEMENTED_MSG, which *does* assert.
This makes the expected behavior uniform across both macros.
Storing signed type causes the following behaviour: extractValue can do overflow/negative left shift. Now it only relies on two implementation-defined behaviours (which are almost always defined as we want): unsigned->signed conversion and signed right shift
An old function from Dolphin. This is also unused, and pretty inflexible
when it comes to printing out different data types (for example, one
might not want to print out an array of u8s but a different type
instead. Given we use fmt, there's no need to keep this implementation
of the function around.
This is an unused hold-over from Dolphin that was primarily used to
parse values out of the .ini files. Given we already have libraries that
do this for us, we don't need to keep this around.
Everything from here is completely unused and also written with the
notion of supporting 32-bit architecture variants in mind. Given the
Switch itself is on a 64-bit architecture, we won't be supporting 32-bit
architectures. If we need specific allocation functions in the future,
it's likely more worthwhile to new functions for that purpose.
Like with TelemetryJson, we can make the implementation details private
and avoid the need to expose httplib to external libraries that need to
use the Client class.
* Added a context menu on the buttons including Clear & Restore Default
* Allow clearing (unsetting) inputs. Added a Clear All button
* Allow restoring a single input to default (instead of all)
operator+ for std::string creates an entirely new string, which is kind
of unnecessary here if we just want to append a null terminator to the
existing one.
Reduces the total amount of potential allocations that need to be done
in the logging path.
First of all they are foundamentally broken. As our convention is that std::string is always UTF-8, these functions assume that the multi-byte character version of TString (std::string) from windows is also in UTF-8, which is almost always wrong. We are not going to build multi-byte character build, and even if we do, this dirty work should be handled by frontend framework early.
We always use unicode internally. Any dirty work of conversion with other codec should be handled by frontend framework (Qt). Further more, ShiftJIS/CP1252 are not special (they are not code set used by 3ds, or any guest/host dependencies we have), so there is no reason to specifically include them
* Stubbed IRS
Currently we have no ideal way of implementing IRS. For the time being we should have the functions stubbed until we come up with a way to emulate IRS properly.
* Added IRS to logging backend
* Forward declared shared memory for irs
MSVC 19.11 (A.K.A. VS 15.3)'s C++ standard library implements P0154R1
(http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0154r1.html)
which defines two new constants within the <new> header, std::hardware_destructive_interference_size
and std::hardware_constructive_interference_size.
std::hardware_destructive_interference_size defines the minimum
recommended offset between two concurrently-accessed objects to avoid
performance degradation due to contention introduced by the
implementation (with the lower-bound being at least alignof(max_align_t)).
In other words, the minimum offset between objects necessary to avoid
false-sharing.
std::hardware_constructive_interference_size on the other hand defines
the maximum recommended size of contiguous memory occupied by two
objects accessed wth temporal locality by concurrent threads (also
defined to be at least alignof(max_align_t)). In other words the maximum
size to promote true-sharing.
So we can simply use this facility to determine the ideal alignment
size. Unfortunately, only MSVC supports this right now, so we need to
enclose it within an ifdef for the time being.
Multi-line doc comments still need the '<' after the ///, otherwise it's
treated as a regular comment and makes the original doc comment broken
in viewers, IDEs, etc. While we're at it, also fix some typos in the
comments.
The previous form of initializing done here is a C-ism, an empty set of
braces is sufficient for initializing (and doesn't potentially cause
missing brace warnings, given the first member of the struct is a COORD
struct).
Previously core itself was the library containing the code to gather
common information (build info, CPU info, and OS info), however all of
this isn't core-dependent and can be moved to the common code and use
the common interfaces. We can then just call those functions from the
core instead.
This will allow replacing our CPU detection with Xbyak's which has
better detection facilities than ours. It also keeps more
architecture-dependent code in common instead of core.
These currently aren't used and contain commented out source code that
corresponds to Dolphin's JIT. Given our CPU code is organized quite
differently, we shouldn't be keeping this around (at the moment it just
adds to compile times marginally).
The filter is returned via const reference, so this was making a
pointless copy of the entire filter every time a message was being
pushed into the logger instance.
Note: according to cppreference it is necessary to convert char to unsigned char when using std::tolower and std::toupper, otherwise the behaviour would be undefined.
There's no need to perform the resize separately here, since the
constructor allows presizing the buffer.
Also move the empty string check before the construction of the string
to make the early out more straightforward.
This is equivalent to doing:
push_back(std::string(""));
which is likely not to cause issues, assuming a decent std::string
implementation with small-string optimizations implemented in its
design, however it's still a little unnecessary to copy that buffer
regardless. Instead, we can use emplace_back() to directly construct the
empty string within the std::vector instance, eliminating any possible
overhead from the copy.
We can just use the variant of std::string's replace() function that can
replace an occurrence with N copies of the same character, eliminating
the need to allocate a std::string containing a buffer of spaces.
We can just leverage std::unique_ptr to automatically close these for us
in error cases instead of jumping to the end of the function to call
fclose on them.
Instead of using an unsigned int as a parameter and expecting a user to
always pass in the correct values, we can just convert the enum into an
enum class and use that type as the parameter type instead, which makes
the interface more type safe.
We also get rid of the bookkeeping "NUM_" element in the enum by just
using an unordered map. This function is generally low-frequency in
terms of calls (and I'd hope so, considering otherwise would mean we're
slamming the disk with IO all the time) so I'd consider this acceptable
in this case.
This avoids a redundant std::string construction if a key doesn't exist
in the map already.
e.g.
data[key] requires constructing a new default instance of the value in
the map (but this is wasteful, since we're already setting something
into the map over top of it).
Allows avoiding constructing std::string instances, since this only
reads an arbitrary sequence of characters.
We can also make ParseFilterRule() internal, since it doesn't depend on
any private instance state of Filter
These can just use a view to a string since its only comparing against
two names in both cases for matches. This avoids constructing
std::string instances where they aren't necessary.
These are unused and essentially don't provide much benefit either. If
we ever need rotation functions, these can be introduced in a way that
they don't sit in a common_* header and require a bunch of ifdefing to
simply be available
Android and macOS have supported thread_local for quite a while, but
most importantly is that we don't even really need it. Instead of using
a thread-local buffer, we can just return a non-static buffer as a
std::string, avoiding the need for that quality entirely.
These operators don't modify internal class state, so they can be made
const member functions. While we're at it, drop the unnecessary inline
keywords. Member functions that are defined in the class declaration are
already inline by default.
This provides the equivalent behavior, but without as much boilerplate.
While we're at it, explicitly default the move constructor, since we
have a move-assignment operator defined.
* More improvements to GDBStub
- Debugging of threads should work correctly with source and assembly level stepping and modifying registers and memory, meaning threads and callstacks are fully clickable in VS.
- List of modules is available to the client, with assumption that .nro and .nso are backed up by an .elf with symbols, while deconstructed ROMs keep N names.
- Initial support for floating point registers.
* Tidy up as requested in PR feedback
* Tidy up as requested in PR feedback
* Add VfsFile and VfsDirectory classes
* Finish abstract Vfs classes
* Implement RealVfsFile (computer fs backend)
* Finish RealVfsFile and RealVfsDirectory
* Finished OffsetVfsFile
* More changes
* Fix import paths
* Major refactor
* Remove double const
* Use experimental/filesystem or filesystem depending on compiler
* Port partition_filesystem
* More changes
* More Overhaul
* FSP_SRV fixes
* Fixes and testing
* Try to get filesystem to compile
* Filesystem on linux
* Remove std::filesystem and document/test
* Compile fixes
* Missing include
* Bug fixes
* Fixes
* Rename v_file and v_dir
* clang-format fix
* Rename NGLOG_* to LOG_*
* Most review changes
* Fix TODO
* Guess 'main' to be Directory by filename
Without this, it's possible to get compilation failures in the (rare) scenario where
a container is used to store a bunch of live IOFile instances, as they may be using
std::move_if_noexcept under the hood. Given these definitely don't throw exceptions
this is also not incorrect to add either.
Ensure that the actual types being passed in are trivially copyable. The internal
call to ReadArray() and WriteArray() will always succeed, since they're passed a pointer to char*
which is always trivially copyable.
The minimum clang/GCC versions we support already support this. We can also
remove is_standard_layout(), as fread and fwrite only require the type to be
trivially copyable.
These are all unused and the Write() ones should arguably not even be in the interface. There are better ways to provide this if we ever need it (like iterators).
Additionally, when updating fmtlib, there was a change in fmtlib broke
how the old logging macro was overloaded, so this works around that by
just naming the fmtlib macro impl something different
swap{16,32,64} are defined as macros on the two, but client code
tries to invoke them as Common::swap{16,32,64}, which naturally
doesn't work. This hack redefines the macros as inline functions
in the Common namespace: the bodies of the functions are the
same as the original macros, but relying on OS-specific
implementation details like this is of course brittle.
Add a new set of logging macros based on fmtlib
Similar but not exactly the same as https://github.com/citra-emu/citra/pull/3533
Citra currently uses a different version of fmt, which does not support FMT_VARIADIC so
make_args is used instead. On the other hand, yuzu uses fmt 4.1.0 which doesn't have make_args yet
so FMT_VARIADIC is used.
* Updated the audout:u and IAudioOut, now it might work with RetroArch without trigger an assert, however it's not the ideal implementation
* Updated the audout:u and IAudioOut, now it might work with RetroArch without trigger an assert, however it's not the ideal implementation
* audout:u OpenAudioOut implementation and IAudioOut cmd 1,2,3,4,5 implementation
* using an enum for audio_out_state as well as changing its initialize to member initializer list
* Minor fixes, added Service_Audio for LOG_*, changed PcmFormat enum to EnumClass
* Minor fixes, added Service_Audio for LOG_*, changed PcmFormat enum to EnumClass
* added missing Audio loggin subclass, minor fixes, clang comment breakline
* Solving backend logging conflict
* minor fix
* Fixed duplicated Service NVDRV in backend.cpp, my bad
* Added nvmemp, Added /dev/nvhost-ctrl, SetClientPID now stores pid
* used clang-format-3.9 instead
* lowercase pid
* Moved nvmemp handlers to cpp
* Removed unnecessary logging for NvOsGetConfigU32. Cleaned up log and changed to LOG_DEBUG
* using std::arrays instead of c arrays
* nvhost get config now uses std::array completely
* added pid logging back
* updated cmakelist
* missing includes
* added array, removed memcpy
* clang-format6.0