Merge branch 'dev' into textureimage-void

Comment out unimplemented check
In my testing on macOS, MK8 sometimes crashed at this function, giving a void type instead of u32. I've temporarily commented this out until (if) this is implemented and added a check for if it is implemented
2025-12-10 14:52:05 -06:00 · 2024-09-15 21:37:22 +02:00 · 2024-04-01 20:55:40 -04:00
23 changed files with 2653 additions and 1386 deletions
--- a/README.md
+++ b/README.md
@@ -9,6 +9,9 @@ SPDX-License-Identifier: GPL-3.0-or-later
 We're in need of developers. Please join our chat below or DM a dev if you want to contribute!
 This repo is currently based on Yuzu EA 4176 but the code will be rewritten for legal and performance reasons.
 Support the original suyu developer team [here](https://discord.gg/79B6wqFPnc).
 <hr />
 <h1 align="center">
--- a/bug_fixes_plan.md
+++ b/bug_fixes_plan.md
@@ -1,85 +0,0 @@
 # Suyu Bug Fixes Plan
 ## 1. Game-specific issues
 ### Approach:
 - Analyze logs and crash reports for the affected games (e.g., Echoes of Wisdom, Tears of the Kingdom, Shin Megami Tensei V).
 - Identify common patterns or specific hardware/API calls causing issues.
 - Implement game-specific workarounds if necessary.
 ### TODO:
 - [ ] Review game-specific issues in the issue tracker
 - [ ] Analyze logs and crash reports
 - [ ] Implement fixes for each game
 - [ ] Test fixes thoroughly
 ## 2. Crashes
 ### Approach:
 - Implement better error handling and logging throughout the codebase.
 - Add more robust null checks and boundary checks.
 - Review and optimize memory management.
 ### TODO:
 - [ ] Implement a centralized error handling system
 - [ ] Add more detailed logging for crash-prone areas
 - [ ] Review and improve memory management in core emulation components
 ## 3. Shader caching and performance issues
 ### Approach:
 - Optimize shader compilation process.
 - Implement background shader compilation to reduce stuttering.
 - Review and optimize the caching mechanism.
 ### TODO:
 - [ ] Profile shader compilation and identify bottlenecks
 - [ ] Implement asynchronous shader compilation
 - [ ] Optimize shader cache storage and retrieval
 - [ ] Implement shader pre-caching for known games
 ## 4. Missing features
 ### Approach:
 - Prioritize missing features based on user demand and technical feasibility.
 - Implement support for additional file formats (NSZ, XCZ).
 - Add custom save data folder selection.
 ### TODO:
 - [ ] Implement NSZ and XCZ file format support
 - [ ] Add UI option for custom save data folder selection
 - [ ] Update relevant documentation
 ## 5. Add-ons and mods issues
 ### Approach:
 - Review the current implementation of add-ons and mods support.
 - Implement a more robust system for managing and applying mods.
 - Improve compatibility checks for mods.
 ### TODO:
 - [ ] Review and refactor the current mod system
 - [ ] Implement better mod management UI
 - [ ] Add compatibility checks for mods
 - [ ] Improve documentation for mod creators
 ## 6. General optimization
 ### Approach:
 - Profile the emulator to identify performance bottlenecks.
 - Optimize core emulation components.
 - Implement multi-threading where appropriate.
 ### TODO:
 - [ ] Conduct thorough profiling of the emulator
 - [ ] Optimize CPU-intensive operations
 - [ ] Implement or improve multi-threading in suitable components
 - [ ] Review and optimize memory usage
 ## Testing and Quality Assurance
 - Implement a comprehensive test suite for core emulation components.
 - Set up continuous integration to run tests automatically.
 - Establish a structured QA process for testing game compatibility and performance.
 Remember to update the relevant documentation and changelog after implementing these fixes. Prioritize the issues based on their impact on user experience and the number of affected users.
--- a/bugs.png
+++ b/bugs.png
--- a/src/core/core.h
+++ b/src/core/core.h
@@ -14,7 +14,6 @@
 #include "common/common_types.h"
 #include "core/file_sys/vfs/vfs_types.h"
 #include "libretro.h"
 namespace Core::Frontend {
 class EmuWindow;
@@ -141,25 +140,6 @@ enum class SystemResultStatus : u32 {
    ErrorLoader,         ///< The base for loader errors (too many to repeat)
 };
 class LibretroWrapper {
 public:
    LibretroWrapper();
    ~LibretroWrapper();
    bool LoadCore(const std::string& core_path);
    bool LoadGame(const std::string& game_path);
    void Run();
    void Reset();
    void Unload();
    // Implement other libretro API functions as needed
 private:
    void* core_handle;
    retro_game_info game_info;
    // Add other necessary libretro-related members
 };
 class System {
 public:
    using CurrentBuildProcessID = std::array<u8, 0x20>;
@@ -476,17 +456,9 @@ public:
    /// Applies any changes to settings to this core instance.
    void ApplySettings();
    // New methods for libretro support
    bool LoadLibretroCore(const std::string& core_path);
    bool LoadLibretroGame(const std::string& game_path);
    void RunLibretroCore();
    void ResetLibretroCore();
    void UnloadLibretroCore();
 private:
    struct Impl;
    std::unique_ptr<Impl> impl;
    std::unique_ptr<LibretroWrapper> libretro_wrapper;
 };
 } // namespace Core
--- a/src/core/core_timing.cpp
+++ b/src/core/core_timing.cpp
@@ -26,6 +26,24 @@ std::shared_ptr<EventType> CreateEvent(std::string name, TimedCallback&& callbac
    return std::make_shared<EventType>(std::move(callback), std::move(name));
 }
 struct CoreTiming::Event {
    s64 time;
    u64 fifo_order;
    std::weak_ptr<EventType> type;
    s64 reschedule_time;
    heap_t::handle_type handle{};
    // Sort by time, unless the times are the same, in which case sort by
    // the order added to the queue
    friend bool operator>(const Event& left, const Event& right) {
        return std::tie(left.time, left.fifo_order) > std::tie(right.time, right.fifo_order);
    }
    friend bool operator<(const Event& left, const Event& right) {
        return std::tie(left.time, left.fifo_order) < std::tie(right.time, right.fifo_order);
    }
 };
 CoreTiming::CoreTiming() : clock{Common::CreateOptimalClock()} {}
 CoreTiming::~CoreTiming() {
@@ -69,7 +87,7 @@ void CoreTiming::Pause(bool is_paused) {
 }
 void CoreTiming::SyncPause(bool is_paused) {
-    if (is_paused == paused && paused_set == is_paused) {
+    if (is_paused == paused && paused_set == paused) {
        return;
    }
@@ -94,7 +112,7 @@ bool CoreTiming::IsRunning() const {
 bool CoreTiming::HasPendingEvents() const {
    std::scoped_lock lock{basic_lock};
-    return !event_queue.empty();
+    return !(wait_set && event_queue.empty());
 }
 void CoreTiming::ScheduleEvent(std::chrono::nanoseconds ns_into_future,
@@ -103,8 +121,8 @@ void CoreTiming::ScheduleEvent(std::chrono::nanoseconds ns_into_future,
        std::scoped_lock scope{basic_lock};
        const auto next_time{absolute_time ? ns_into_future : GetGlobalTimeNs() + ns_into_future};
-        event_queue.emplace_back(Event{next_time.count(), event_fifo_id++, event_type});
+        auto h{event_queue.emplace(Event{next_time.count(), event_fifo_id++, event_type, 0})};
-        std::push_heap(event_queue.begin(), event_queue.end(), std::greater<>());
+        (*h).handle = h;
    }
    event.Set();
@@ -118,9 +136,9 @@ void CoreTiming::ScheduleLoopingEvent(std::chrono::nanoseconds start_time,
        std::scoped_lock scope{basic_lock};
        const auto next_time{absolute_time ? start_time : GetGlobalTimeNs() + start_time};
-        event_queue.emplace_back(
+        auto h{event_queue.emplace(
-            Event{next_time.count(), event_fifo_id++, event_type, resched_time.count()});
+            Event{next_time.count(), event_fifo_id++, event_type, resched_time.count()})};
-        std::push_heap(event_queue.begin(), event_queue.end(), std::greater<>());
+        (*h).handle = h;
    }
    event.Set();
@@ -131,11 +149,17 @@ void CoreTiming::UnscheduleEvent(const std::shared_ptr<EventType>& event_type,
    {
        std::scoped_lock lk{basic_lock};
-        event_queue.erase(
+        std::vector<heap_t::handle_type> to_remove;
-            std::remove_if(event_queue.begin(), event_queue.end(),
+        for (auto itr = event_queue.begin(); itr != event_queue.end(); itr++) {
-                           [&](const Event& e) { return e.type.lock().get() == event_type.get(); }),
+            const Event& e = *itr;
-            event_queue.end());
+            if (e.type.lock().get() == event_type.get()) {
-        std::make_heap(event_queue.begin(), event_queue.end(), std::greater<>());
+                to_remove.push_back(itr->handle);
            }
        }
        for (auto& h : to_remove) {
            event_queue.erase(h);
        }
        event_type->sequence_number++;
    }
@@ -148,7 +172,7 @@ void CoreTiming::UnscheduleEvent(const std::shared_ptr<EventType>& event_type,
 void CoreTiming::AddTicks(u64 ticks_to_add) {
    cpu_ticks += ticks_to_add;
-    downcount -= static_cast<s64>(ticks_to_add);
+    downcount -= static_cast<s64>(cpu_ticks);
 }
 void CoreTiming::Idle() {
@@ -156,7 +180,7 @@ void CoreTiming::Idle() {
 }
 void CoreTiming::ResetTicks() {
-    downcount.store(MAX_SLICE_LENGTH, std::memory_order_release);
+    downcount = MAX_SLICE_LENGTH;
 }
 u64 CoreTiming::GetClockTicks() const {
@@ -177,38 +201,48 @@ std::optional<s64> CoreTiming::Advance() {
    std::scoped_lock lock{advance_lock, basic_lock};
    global_timer = GetGlobalTimeNs().count();
-    while (!event_queue.empty() && event_queue.front().time <= global_timer) {
+    while (!event_queue.empty() && event_queue.top().time <= global_timer) {
-        Event evt = std::move(event_queue.front());
+        const Event& evt = event_queue.top();
        std::pop_heap(event_queue.begin(), event_queue.end(), std::greater<>());
        event_queue.pop_back();
-        if (const auto event_type = evt.type.lock()) {
+        if (const auto event_type{evt.type.lock()}) {
            const auto evt_time = evt.time;
            const auto evt_sequence_num = event_type->sequence_number;
-            basic_lock.unlock();
+            if (evt.reschedule_time == 0) {
                event_queue.pop();
-            const auto new_schedule_time = event_type->callback(
+                basic_lock.unlock();
                evt_time, std::chrono::nanoseconds{GetGlobalTimeNs().count() - evt_time});
-            basic_lock.lock();
+                event_type->callback(
                    evt_time, std::chrono::nanoseconds{GetGlobalTimeNs().count() - evt_time});
-            if (evt_sequence_num != event_type->sequence_number) {
+                basic_lock.lock();
-                continue;
+            } else {
-            }
+                basic_lock.unlock();
-            if (new_schedule_time.has_value() || evt.reschedule_time != 0) {
+                const auto new_schedule_time{event_type->callback(
-                const auto next_schedule_time = new_schedule_time.value_or(
+                    evt_time, std::chrono::nanoseconds{GetGlobalTimeNs().count() - evt_time})};
                    std::chrono::nanoseconds{evt.reschedule_time});
-                auto next_time = evt.time + next_schedule_time.count();
+                basic_lock.lock();
-                if (evt.time < pause_end_time) {
+
-                    next_time = pause_end_time + next_schedule_time.count();
+                if (evt_sequence_num != event_type->sequence_number) {
                    // Heap handle is invalidated after external modification.
                    continue;
                }
-                event_queue.emplace_back(Event{next_time, event_fifo_id++, evt.type,
+                const auto next_schedule_time{new_schedule_time.has_value()
-                                               next_schedule_time.count()});
+                                                  ? new_schedule_time.value().count()
-                std::push_heap(event_queue.begin(), event_queue.end(), std::greater<>());
+                                                  : evt.reschedule_time};
                // If this event was scheduled into a pause, its time now is going to be way
                // behind. Re-set this event to continue from the end of the pause.
                auto next_time{evt.time + next_schedule_time};
                if (evt.time < pause_end_time) {
                    next_time = pause_end_time + next_schedule_time;
                }
                event_queue.update(evt.handle, Event{next_time, event_fifo_id++, evt.type,
                                                     next_schedule_time, evt.handle});
            }
        }
@@ -216,7 +250,7 @@ std::optional<s64> CoreTiming::Advance() {
    }
    if (!event_queue.empty()) {
-        return event_queue.front().time;
+        return event_queue.top().time;
    } else {
        return std::nullopt;
    }
@@ -235,7 +269,7 @@ void CoreTiming::ThreadLoop() {
 #ifdef _WIN32
                    while (!paused && !event.IsSet() && wait_time > 0) {
                        wait_time = *next_time - GetGlobalTimeNs().count();
-                        if (wait_time >= 1'000'000) { // 1ms
+                        if (wait_time >= timer_resolution_ns) {
                            Common::Windows::SleepForOneTick();
                        } else {
 #ifdef ARCHITECTURE_x86_64
@@ -256,8 +290,10 @@ void CoreTiming::ThreadLoop() {
            } else {
                // Queue is empty, wait until another event is scheduled and signals us to
                // continue.
                wait_set = true;
                event.Wait();
            }
            wait_set = false;
        }
        paused_set = true;
@@ -291,4 +327,10 @@ std::chrono::microseconds CoreTiming::GetGlobalTimeUs() const {
    return std::chrono::microseconds{Common::WallClock::CPUTickToUS(cpu_ticks)};
 }
 #ifdef _WIN32
 void CoreTiming::SetTimerResolutionNs(std::chrono::nanoseconds ns) {
    timer_resolution_ns = ns.count();
 }
 #endif
 } // namespace Core::Timing
--- a/src/core/core_timing.h
+++ b/src/core/core_timing.h
@@ -11,7 +11,8 @@
 #include <optional>
 #include <string>
 #include <thread>
-#include <vector>
+
 #include <boost/heap/fibonacci_heap.hpp>
 #include "common/common_types.h"
 #include "common/thread.h"
@@ -42,6 +43,18 @@ enum class UnscheduleEventType {
    NoWait,
 };
 /**
 * This is a system to schedule events into the emulated machine's future. Time is measured
 * in main CPU clock cycles.
 *
 * To schedule an event, you first have to register its type. This is where you pass in the
 * callback. You then schedule events using the type ID you get back.
 *
 * The s64 ns_late that the callbacks get is how many ns late it was.
 * So to schedule a new event on a regular basis:
 * inside callback:
 *   ScheduleEvent(period_in_ns - ns_late, callback, "whatever")
 */
 class CoreTiming {
 public:
    CoreTiming();
@@ -53,56 +66,99 @@ public:
    CoreTiming& operator=(const CoreTiming&) = delete;
    CoreTiming& operator=(CoreTiming&&) = delete;
    /// CoreTiming begins at the boundary of timing slice -1. An initial call to Advance() is
    /// required to end slice - 1 and start slice 0 before the first cycle of code is executed.
    void Initialize(std::function<void()>&& on_thread_init_);
    /// Clear all pending events. This should ONLY be done on exit.
    void ClearPendingEvents();
    /// Sets if emulation is multicore or single core, must be set before Initialize
    void SetMulticore(bool is_multicore_) {
        is_multicore = is_multicore_;
    }
    /// Pauses/Unpauses the execution of the timer thread.
    void Pause(bool is_paused);
    /// Pauses/Unpauses the execution of the timer thread and waits until paused.
    void SyncPause(bool is_paused);
    /// Checks if core timing is running.
    bool IsRunning() const;
    /// Checks if the timer thread has started.
    bool HasStarted() const {
        return has_started;
    }
    /// Checks if there are any pending time events.
    bool HasPendingEvents() const;
    /// Schedules an event in core timing
    void ScheduleEvent(std::chrono::nanoseconds ns_into_future,
                       const std::shared_ptr<EventType>& event_type, bool absolute_time = false);
    /// Schedules an event which will automatically re-schedule itself with the given time, until
    /// unscheduled
    void ScheduleLoopingEvent(std::chrono::nanoseconds start_time,
                              std::chrono::nanoseconds resched_time,
                              const std::shared_ptr<EventType>& event_type,
                              bool absolute_time = false);
    void UnscheduleEvent(const std::shared_ptr<EventType>& event_type,
                         UnscheduleEventType type = UnscheduleEventType::Wait);
    void AddTicks(u64 ticks_to_add);
    void ResetTicks();
    void Idle();
    s64 GetDowncount() const {
-        return downcount.load(std::memory_order_relaxed);
+        return downcount;
    }
    /// Returns the current CNTPCT tick value.
    u64 GetClockTicks() const;
    /// Returns the current GPU tick value.
    u64 GetGPUTicks() const;
    /// Returns current time in microseconds.
    std::chrono::microseconds GetGlobalTimeUs() const;
    /// Returns current time in nanoseconds.
    std::chrono::nanoseconds GetGlobalTimeNs() const;
    /// Checks for events manually and returns time in nanoseconds for next event, threadsafe.
    std::optional<s64> Advance();
 #ifdef _WIN32
    void SetTimerResolutionNs(std::chrono::nanoseconds ns);
 #endif
 private:
-    struct Event {
+    struct Event;
        s64 time;
        u64 fifo_order;
        std::shared_ptr<EventType> type;
        bool operator>(const Event& other) const {
            return std::tie(time, fifo_order) > std::tie(other.time, other.fifo_order);
        }
    };
    static void ThreadEntry(CoreTiming& instance);
    void ThreadLoop();
    void Reset();
    std::unique_ptr<Common::WallClock> clock;
-    std::atomic<s64> global_timer{0};
+
-    std::vector<Event> event_queue;
+    s64 global_timer = 0;
-    std::atomic<u64> event_fifo_id{0};
+
 #ifdef _WIN32
    s64 timer_resolution_ns;
 #endif
    using heap_t =
        boost::heap::fibonacci_heap<CoreTiming::Event, boost::heap::compare<std::greater<>>>;
    heap_t event_queue;
    u64 event_fifo_id = 0;
    Common::Event event{};
    Common::Event pause_event{};
@@ -117,12 +173,20 @@ private:
    std::function<void()> on_thread_init{};
    bool is_multicore{};
-    std::atomic<s64> pause_end_time{};
+    s64 pause_end_time{};
-    std::atomic<u64> cpu_ticks{};
+    /// Cycle timing
-    std::atomic<s64> downcount{};
+    u64 cpu_ticks{};
    s64 downcount{};
 };
 /// Creates a core timing event with the given name and callback.
 ///
 /// @param name     The name of the core timing event to create.
 /// @param callback The callback to execute for the event.
 ///
 /// @returns An EventType instance representing the created event.
 ///
 std::shared_ptr<EventType> CreateEvent(std::string name, TimedCallback&& callback);
 } // namespace Core::Timing
--- a/src/core/cpu_manager.cpp
+++ b/src/core/cpu_manager.cpp
@@ -1,12 +1,6 @@
 // SPDX-FileCopyrightText: Copyright 2018 yuzu Emulator Project
 // SPDX-License-Identifier: GPL-2.0-or-later
 #include <algorithm>
 #include <atomic>
 #include <memory>
 #include <thread>
 #include <vector>
 #include "common/fiber.h"
 #include "common/microprofile.h"
 #include "common/scope_exit.h"
@@ -30,7 +24,6 @@ void CpuManager::Initialize() {
    num_cores = is_multicore ? Core::Hardware::NUM_CPU_CORES : 1;
    gpu_barrier = std::make_unique<Common::Barrier>(num_cores + 1);
    core_data.resize(num_cores);
    for (std::size_t core = 0; core < num_cores; core++) {
        core_data[core].host_thread =
            std::jthread([this, core](std::stop_token token) { RunThread(token, core); });
@@ -38,10 +31,10 @@ void CpuManager::Initialize() {
 }
 void CpuManager::Shutdown() {
-    for (auto& data : core_data) {
+    for (std::size_t core = 0; core < num_cores; core++) {
-        if (data.host_thread.joinable()) {
+        if (core_data[core].host_thread.joinable()) {
-            data.host_thread.request_stop();
+            core_data[core].host_thread.request_stop();
-            data.host_thread.join();
+            core_data[core].host_thread.join();
        }
    }
 }
@@ -73,7 +66,12 @@ void CpuManager::HandleInterrupt() {
    Kernel::KInterruptManager::HandleInterrupt(kernel, static_cast<s32>(core_index));
 }
 ///////////////////////////////////////////////////////////////////////////////
 ///                             MultiCore                                   ///
 ///////////////////////////////////////////////////////////////////////////////
 void CpuManager::MultiCoreRunGuestThread() {
    // Similar to UserModeThreadStarter in HOS
    auto& kernel = system.Kernel();
    auto* thread = Kernel::GetCurrentThreadPointer(kernel);
    kernel.CurrentScheduler()->OnThreadStart();
@@ -90,6 +88,10 @@ void CpuManager::MultiCoreRunGuestThread() {
 }
 void CpuManager::MultiCoreRunIdleThread() {
    // Not accurate to HOS. Remove this entire method when singlecore is removed.
    // See notes in KScheduler::ScheduleImpl for more information about why this
    // is inaccurate.
    auto& kernel = system.Kernel();
    kernel.CurrentScheduler()->OnThreadStart();
@@ -103,6 +105,10 @@ void CpuManager::MultiCoreRunIdleThread() {
    }
 }
 ///////////////////////////////////////////////////////////////////////////////
 ///                             SingleCore                                   ///
 ///////////////////////////////////////////////////////////////////////////////
 void CpuManager::SingleCoreRunGuestThread() {
    auto& kernel = system.Kernel();
    auto* thread = Kernel::GetCurrentThreadPointer(kernel);
@@ -148,16 +154,19 @@ void CpuManager::PreemptSingleCore(bool from_running_environment) {
        system.CoreTiming().Advance();
        kernel.SetIsPhantomModeForSingleCore(false);
    }
-    current_core.store((current_core + 1) % Core::Hardware::NUM_CPU_CORES, std::memory_order_release);
+    current_core.store((current_core + 1) % Core::Hardware::NUM_CPU_CORES);
    system.CoreTiming().ResetTicks();
    kernel.Scheduler(current_core).PreemptSingleCore();
    // We've now been scheduled again, and we may have exchanged schedulers.
    // Reload the scheduler in case it's different.
    if (!kernel.Scheduler(current_core).IsIdle()) {
        idle_count = 0;
    }
 }
 void CpuManager::GuestActivate() {
    // Similar to the HorizonKernelMain callback in HOS
    auto& kernel = system.Kernel();
    auto* scheduler = kernel.CurrentScheduler();
@@ -175,19 +184,27 @@ void CpuManager::ShutdownThread() {
 }
 void CpuManager::RunThread(std::stop_token token, std::size_t core) {
    /// Initialization
    system.RegisterCoreThread(core);
-    std::string name = is_multicore ? "CPUCore_" + std::to_string(core) : "CPUThread";
+    std::string name;
    if (is_multicore) {
        name = "CPUCore_" + std::to_string(core);
    } else {
        name = "CPUThread";
    }
    MicroProfileOnThreadCreate(name.c_str());
    Common::SetCurrentThreadName(name.c_str());
    Common::SetCurrentThreadPriority(Common::ThreadPriority::Critical);
    auto& data = core_data[core];
    data.host_context = Common::Fiber::ThreadToFiber();
    // Cleanup
    SCOPE_EXIT {
        data.host_context->Exit();
        MicroProfileOnThreadExit();
    };
    // Running
    if (!gpu_barrier->Sync(token)) {
        return;
    }
--- a/src/core/hle/service/set/settings.cpp
+++ b/src/core/hle/service/set/settings.cpp
@@ -23,9 +23,4 @@ void LoopProcess(Core::System& system) {
    ServerManager::RunServer(std::move(server_manager));
 }
 bool IsFirmwareVersionSupported(u32 version) {
    // Add support for firmware version 18.0.0
    return version <= 180000; // 18.0.0 = 180000
 }
 } // namespace Service::Set
--- a/src/core/libretro_wrapper.cpp
+++ b/src/core/libretro_wrapper.cpp
@@ -1,117 +0,0 @@
 #include "core/libretro_wrapper.h"
 #include "nintendo_library/nintendo_library.h"
 #include <dlfcn.h>
 #include <stdexcept>
 #include <cstring>
 #include <iostream>
 namespace Core {
 LibretroWrapper::LibretroWrapper() : core_handle(nullptr), nintendo_library(std::make_unique<Nintendo::Library>()) {}
 LibretroWrapper::~LibretroWrapper() {
    Unload();
 }
 bool LibretroWrapper::LoadCore(const std::string& core_path) {
    core_handle = dlopen(core_path.c_str(), RTLD_LAZY);
    if (!core_handle) {
        std::cerr << "Failed to load libretro core: " << dlerror() << std::endl;
        return false;
    }
    // Load libretro core functions
    #define LOAD_SYMBOL(S) S = reinterpret_cast<decltype(S)>(dlsym(core_handle, #S)); \
    if (!S) { \
        std::cerr << "Failed to load symbol " #S ": " << dlerror() << std::endl; \
        Unload(); \
        return false; \
    }
    LOAD_SYMBOL(retro_init)
    LOAD_SYMBOL(retro_deinit)
    LOAD_SYMBOL(retro_api_version)
    LOAD_SYMBOL(retro_get_system_info)
    LOAD_SYMBOL(retro_get_system_av_info)
    LOAD_SYMBOL(retro_set_environment)
    LOAD_SYMBOL(retro_set_video_refresh)
    LOAD_SYMBOL(retro_set_audio_sample)
    LOAD_SYMBOL(retro_set_audio_sample_batch)
    LOAD_SYMBOL(retro_set_input_poll)
    LOAD_SYMBOL(retro_set_input_state)
    LOAD_SYMBOL(retro_set_controller_port_device)
    LOAD_SYMBOL(retro_reset)
    LOAD_SYMBOL(retro_run)
    LOAD_SYMBOL(retro_serialize_size)
    LOAD_SYMBOL(retro_serialize)
    LOAD_SYMBOL(retro_unserialize)
    LOAD_SYMBOL(retro_load_game)
    LOAD_SYMBOL(retro_unload_game)
    #undef LOAD_SYMBOL
    if (!nintendo_library->Initialize()) {
        std::cerr << "Failed to initialize Nintendo Library" << std::endl;
        Unload();
        return false;
    }
    retro_init();
    return true;
 }
 bool LibretroWrapper::LoadGame(const std::string& game_path) {
    if (!core_handle) {
        std::cerr << "Libretro core not loaded" << std::endl;
        return false;
    }
    game_info.path = game_path.c_str();
    game_info.data = nullptr;
    game_info.size = 0;
    game_info.meta = nullptr;
    if (!retro_load_game(&game_info)) {
        std::cerr << "Failed to load game through libretro" << std::endl;
        return false;
    }
    if (!nintendo_library->LoadROM(game_path)) {
        std::cerr << "Failed to load ROM through Nintendo Library" << std::endl;
        return false;
    }
    return true;
 }
 void LibretroWrapper::Run() {
    if (core_handle) {
        retro_run();
        nintendo_library->RunFrame();
    } else {
        std::cerr << "Cannot run: Libretro core not loaded" << std::endl;
    }
 }
 void LibretroWrapper::Reset() {
    if (core_handle) {
        retro_reset();
        // Add any necessary reset logic for Nintendo Library
    } else {
        std::cerr << "Cannot reset: Libretro core not loaded" << std::endl;
    }
 }
 void LibretroWrapper::Unload() {
    if (core_handle) {
        retro_unload_game();
        retro_deinit();
        dlclose(core_handle);
        core_handle = nullptr;
    }
    nintendo_library->Shutdown();
 }
 // Add implementations for other libretro functions as needed
 } // namespace Core
--- a/src/core/libretro_wrapper.h
+++ b/src/core/libretro_wrapper.h
@@ -1,53 +0,0 @@
 #pragma once
 #include <string>
 #include <memory>
 // Forward declaration
 namespace Nintendo {
 class Library;
 }
 struct retro_game_info;
 namespace Core {
 class LibretroWrapper {
 public:
    LibretroWrapper();
    ~LibretroWrapper();
    bool LoadCore(const std::string& core_path);
    bool LoadGame(const std::string& game_path);
    void Run();
    void Reset();
    void Unload();
 private:
    void* core_handle;
    retro_game_info game_info;
    std::unique_ptr<Nintendo::Library> nintendo_library;
    // Libretro function pointers
    void (*retro_init)();
    void (*retro_deinit)();
    unsigned (*retro_api_version)();
    void (*retro_get_system_info)(struct retro_system_info *info);
    void (*retro_get_system_av_info)(struct retro_system_av_info *info);
    void (*retro_set_environment)(void (*)(unsigned, const char*));
    void (*retro_set_video_refresh)(void (*)(const void*, unsigned, unsigned, size_t));
    void (*retro_set_audio_sample)(void (*)(int16_t, int16_t));
    void (*retro_set_audio_sample_batch)(size_t (*)(const int16_t*, size_t));
    void (*retro_set_input_poll)(void (*)());
    void (*retro_set_input_state)(int16_t (*)(unsigned, unsigned, unsigned, unsigned));
    void (*retro_set_controller_port_device)(unsigned, unsigned);
    void (*retro_reset)();
    void (*retro_run)();
    size_t (*retro_serialize_size)();
    bool (*retro_serialize)(void*, size_t);
    bool (*retro_unserialize)(const void*, size_t);
    bool (*retro_load_game)(const struct retro_game_info*);
    void (*retro_unload_game)();
 };
 } // namespace Core
--- a/src/core/memory.cpp
+++ b/src/core/memory.cpp
--- a/src/core/nintendo_switch_library.cpp
+++ b/src/core/nintendo_switch_library.cpp
@@ -1,149 +0,0 @@
 // SPDX-FileCopyrightText: Copyright 2024 suyu Emulator Project
 // SPDX-License-Identifier: GPL-2.0-or-later
 #include <algorithm>
 #include <memory>
 #include <string>
 #include <vector>
 #include "common/logging/log.h"
 #include "core/core.h"
 #include "core/file_sys/content_archive.h"
 #include "core/file_sys/patch_manager.h"
 #include "core/file_sys/registered_cache.h"
 #include "core/hle/service/filesystem/filesystem.h"
 #include "core/loader/loader.h"
 #include "core/memory.h"
 #include "core/nintendo_switch_library.h"
 namespace Core {
 /**
 * NintendoSwitchLibrary class manages the operations related to installed games
 * on the emulated Nintendo Switch, including listing games, launching them,
 * and providing additional functionality inspired by multi-system emulation.
 */
 class NintendoSwitchLibrary {
 public:
    explicit NintendoSwitchLibrary(Core::System& system) : system(system) {}
    struct GameInfo {
        u64 program_id;
        std::string title_name;
        std::string file_path;
        u32 version;
    };
    [[nodiscard]] std::vector<GameInfo> GetInstalledGames() {
        std::vector<GameInfo> games;
        const auto& cache = system.GetContentProvider().GetUserNANDCache();
        for (const auto& [program_id, content_type] : cache.GetAllEntries()) {
            if (content_type == FileSys::ContentRecordType::Program) {
                const auto title_name = GetGameName(program_id);
                const auto file_path = cache.GetEntryUnparsed(program_id, FileSys::ContentRecordType::Program);
                const auto version = GetGameVersion(program_id);
                if (!title_name.empty() && !file_path.empty()) {
                    games.push_back({program_id, title_name, file_path, version});
                }
            }
        }
        return games;
    }
    [[nodiscard]] std::string GetGameName(u64 program_id) {
        const auto& patch_manager = system.GetFileSystemController().GetPatchManager(program_id);
        const auto metadata = patch_manager.GetControlMetadata();
        if (metadata.first != nullptr) {
            return metadata.first->GetApplicationName();
        }
        return "";
    }
    [[nodiscard]] u32 GetGameVersion(u64 program_id) {
        const auto& patch_manager = system.GetFileSystemController().GetPatchManager(program_id);
        return patch_manager.GetGameVersion().value_or(0);
    }
    [[nodiscard]] bool LaunchGame(u64 program_id) {
        const auto file_path = system.GetContentProvider().GetUserNANDCache().GetEntryUnparsed(program_id, FileSys::ContentRecordType::Program);
        if (file_path.empty()) {
            LOG_ERROR(Core, "Failed to launch game. File not found for program_id={:016X}", program_id);
            return false;
        }
        const auto loader = Loader::GetLoader(system, file_path);
        if (!loader) {
            LOG_ERROR(Core, "Failed to create loader for game. program_id={:016X}", program_id);
            return false;
        }
        // Check firmware compatibility
        if (!CheckFirmwareCompatibility(program_id)) {
            LOG_ERROR(Core, "Firmware version not compatible with game. program_id={:016X}", program_id);
            return false;
        }
        const auto result = system.Load(*loader);
        if (result != ResultStatus::Success) {
            LOG_ERROR(Core, "Failed to load game. Error: {}, program_id={:016X}", result, program_id);
            return false;
        }
        LOG_INFO(Core, "Successfully launched game. program_id={:016X}", program_id);
        return true;
    }
    bool CheckForUpdates(u64 program_id) {
        // TODO: Implement update checking logic
        return false;
    }
    bool ApplyUpdate(u64 program_id) {
        // TODO: Implement update application logic
        return false;
    }
    bool SetButtonMapping(const std::string& button_config) {
        // TODO: Implement button mapping logic
        return false;
    }
    bool CreateSaveState(u64 program_id, const std::string& save_state_name) {
        // TODO: Implement save state creation
        return false;
    }
    bool LoadSaveState(u64 program_id, const std::string& save_state_name) {
        // TODO: Implement save state loading
        return false;
    }
    void EnableFastForward(bool enable) {
        // TODO: Implement fast forward functionality
    }
    void EnableRewind(bool enable) {
        // TODO: Implement rewind functionality
    }
 private:
    const Core::System& system;
    bool CheckFirmwareCompatibility(u64 program_id) {
        // TODO: Implement firmware compatibility check
        return true;
    }
 };
 // Use smart pointer for better memory management
 std::unique_ptr<NintendoSwitchLibrary> CreateNintendoSwitchLibrary(Core::System& system) {
    return std::make_unique<NintendoSwitchLibrary>(system);
 }
 } // namespace Core
--- a/src/core/nintendo_switch_library.h
+++ b/src/core/nintendo_switch_library.h
@@ -1,33 +0,0 @@
 // SPDX-FileCopyrightText: Copyright 2024 suyu Emulator Project
 // SPDX-License-Identifier: GPL-2.0-or-later
 #pragma once
 #include <string>
 #include <vector>
 #include "common/common_types.h"
 namespace Core {
 class System;
 class NintendoSwitchLibrary {
 public:
    struct GameInfo {
        u64 program_id;
        std::string title;
        std::string file_path;
    };
    explicit NintendoSwitchLibrary(Core::System& system);
    std::vector<GameInfo> GetInstalledGames();
    std::string GetGameName(u64 program_id);
    bool LaunchGame(u64 program_id);
 private:
    Core::System& system;
 };
 } // namespace Core
--- a/src/nintendo_library/nintendo_library.cpp
+++ b/src/nintendo_library/nintendo_library.cpp
@@ -1,72 +0,0 @@
 #include "nintendo_library.h"
 #include <iostream>
 namespace Nintendo {
 Library::Library() : initialized(false) {}
 Library::~Library() {
    if (initialized) {
        Shutdown();
    }
 }
 bool Library::Initialize() {
    if (initialized) {
        return true;
    }
    // Add initialization code here
    // For example, setting up emulation environment, loading system files, etc.
    std::cout << "Nintendo Library initialized" << std::endl;
    initialized = true;
    return true;
 }
 void Library::Shutdown() {
    if (!initialized) {
        return;
    }
    // Add cleanup code here
    std::cout << "Nintendo Library shut down" << std::endl;
    initialized = false;
 }
 bool Library::LoadROM(const std::string& rom_path) {
    if (!initialized) {
        std::cerr << "Nintendo Library not initialized" << std::endl;
        return false;
    }
    // Add code to load and validate the ROM file
    current_rom = rom_path;
    std::cout << "ROM loaded: " << rom_path << std::endl;
    return true;
 }
 bool Library::RunFrame() {
    if (!initialized || current_rom.empty()) {
        std::cerr << "Cannot run frame: Library not initialized or no ROM loaded" << std::endl;
        return false;
    }
    // Add code to emulate one frame of the game
    // This is where the core emulation logic would go
    return true;
 }
 void Library::SetVideoBuffer(void* buffer, int width, int height) {
    // Add code to set up the video buffer for rendering
    std::cout << "Video buffer set: " << width << "x" << height << std::endl;
 }
 void Library::SetAudioBuffer(void* buffer, int size) {
    // Add code to set up the audio buffer for sound output
    std::cout << "Audio buffer set: " << size << " bytes" << std::endl;
 }
 } // namespace Nintendo
--- a/src/nintendo_library/nintendo_library.h
+++ b/src/nintendo_library/nintendo_library.h
@@ -1,31 +0,0 @@
 #pragma once
 #include <string>
 #include <vector>
 namespace Nintendo {
 class Library {
 public:
    Library();
    ~Library();
    bool Initialize();
    void Shutdown();
    // Add methods for Nintendo-specific functionality
    bool LoadROM(const std::string& rom_path);
    bool RunFrame();
    void SetVideoBuffer(void* buffer, int width, int height);
    void SetAudioBuffer(void* buffer, int size);
    // Add more methods as needed
 private:
    // Add private members for internal state
    bool initialized;
    std::string current_rom;
    // Add more members as needed
 };
 } // namespace Nintendo
--- a/src/video_core/gpu.cpp
+++ b/src/video_core/gpu.cpp
@@ -33,7 +33,6 @@
 #include "video_core/memory_manager.h"
 #include "video_core/renderer_base.h"
 #include "video_core/shader_notify.h"
 #include "video_core/optimized_rasterizer.h"
 namespace Tegra {
@@ -41,46 +40,515 @@ struct GPU::Impl {
    explicit Impl(GPU& gpu_, Core::System& system_, bool is_async_, bool use_nvdec_)
        : gpu{gpu_}, system{system_}, host1x{system.Host1x()}, use_nvdec{use_nvdec_},
          shader_notify{std::make_unique<VideoCore::ShaderNotify>()}, is_async{is_async_},
-          gpu_thread{system_, is_async_}, scheduler{std::make_unique<Control::Scheduler>(gpu)} {
+          gpu_thread{system_, is_async_}, scheduler{std::make_unique<Control::Scheduler>(gpu)} {}
        Initialize();
    }
    ~Impl() = default;
-    void Initialize() {
+    std::shared_ptr<Control::ChannelState> CreateChannel(s32 channel_id) {
-        // Initialize the GPU memory manager
+        auto channel_state = std::make_shared<Tegra::Control::ChannelState>(channel_id);
-        memory_manager = std::make_unique<Tegra::MemoryManager>(system);
+        channels.emplace(channel_id, channel_state);
-        
+        scheduler->DeclareChannel(channel_state);
-        // Initialize the command buffer
+        return channel_state;
        command_buffer.reserve(COMMAND_BUFFER_SIZE);
        // Initialize the fence manager
        fence_manager = std::make_unique<FenceManager>();
    }
-    // ... (previous implementation remains the same)
+    void BindChannel(s32 channel_id) {
        if (bound_channel == channel_id) {
            return;
        }
        auto it = channels.find(channel_id);
        ASSERT(it != channels.end());
        bound_channel = channel_id;
        current_channel = it->second.get();
        rasterizer->BindChannel(*current_channel);
    }
    std::shared_ptr<Control::ChannelState> AllocateChannel() {
        return CreateChannel(new_channel_id++);
    }
    void InitChannel(Control::ChannelState& to_init, u64 program_id) {
        to_init.Init(system, gpu, program_id);
        to_init.BindRasterizer(rasterizer);
        rasterizer->InitializeChannel(to_init);
    }
    void InitAddressSpace(Tegra::MemoryManager& memory_manager) {
        memory_manager.BindRasterizer(rasterizer);
    }
    void ReleaseChannel(Control::ChannelState& to_release) {
        UNIMPLEMENTED();
    }
    /// Binds a renderer to the GPU.
    void BindRenderer(std::unique_ptr<VideoCore::RendererBase> renderer_) {
        renderer = std::move(renderer_);
-        rasterizer = std::make_unique<VideoCore::OptimizedRasterizer>(system, gpu);
+        rasterizer = renderer->ReadRasterizer();
-        host1x.MemoryManager().BindInterface(rasterizer.get());
+        host1x.MemoryManager().BindInterface(rasterizer);
-        host1x.GMMU().BindRasterizer(rasterizer.get());
+        host1x.GMMU().BindRasterizer(rasterizer);
    }
-    // ... (rest of the implementation remains the same)
+    /// Flush all current written commands into the host GPU for execution.
    void FlushCommands() {
        rasterizer->FlushCommands();
    }
    /// Synchronizes CPU writes with Host GPU memory.
    void InvalidateGPUCache() {
        std::function<void(PAddr, size_t)> callback_writes(
            [this](PAddr address, size_t size) { rasterizer->OnCacheInvalidation(address, size); });
        system.GatherGPUDirtyMemory(callback_writes);
    }
    /// Signal the ending of command list.
    void OnCommandListEnd() {
        rasterizer->ReleaseFences(false);
        Settings::UpdateGPUAccuracy();
    }
    /// Request a host GPU memory flush from the CPU.
    template <typename Func>
    [[nodiscard]] u64 RequestSyncOperation(Func&& action) {
        std::unique_lock lck{sync_request_mutex};
        const u64 fence = ++last_sync_fence;
        sync_requests.emplace_back(action);
        return fence;
    }
    /// Obtains current flush request fence id.
    [[nodiscard]] u64 CurrentSyncRequestFence() const {
        return current_sync_fence.load(std::memory_order_relaxed);
    }
    void WaitForSyncOperation(const u64 fence) {
        std::unique_lock lck{sync_request_mutex};
        sync_request_cv.wait(lck, [this, fence] { return CurrentSyncRequestFence() >= fence; });
    }
    /// Tick pending requests within the GPU.
    void TickWork() {
        std::unique_lock lck{sync_request_mutex};
        while (!sync_requests.empty()) {
            auto request = std::move(sync_requests.front());
            sync_requests.pop_front();
            sync_request_mutex.unlock();
            request();
            current_sync_fence.fetch_add(1, std::memory_order_release);
            sync_request_mutex.lock();
            sync_request_cv.notify_all();
        }
    }
    /// Returns a reference to the Maxwell3D GPU engine.
    [[nodiscard]] Engines::Maxwell3D& Maxwell3D() {
        ASSERT(current_channel);
        return *current_channel->maxwell_3d;
    }
    /// Returns a const reference to the Maxwell3D GPU engine.
    [[nodiscard]] const Engines::Maxwell3D& Maxwell3D() const {
        ASSERT(current_channel);
        return *current_channel->maxwell_3d;
    }
    /// Returns a reference to the KeplerCompute GPU engine.
    [[nodiscard]] Engines::KeplerCompute& KeplerCompute() {
        ASSERT(current_channel);
        return *current_channel->kepler_compute;
    }
    /// Returns a reference to the KeplerCompute GPU engine.
    [[nodiscard]] const Engines::KeplerCompute& KeplerCompute() const {
        ASSERT(current_channel);
        return *current_channel->kepler_compute;
    }
    /// Returns a reference to the GPU DMA pusher.
    [[nodiscard]] Tegra::DmaPusher& DmaPusher() {
        ASSERT(current_channel);
        return *current_channel->dma_pusher;
    }
    /// Returns a const reference to the GPU DMA pusher.
    [[nodiscard]] const Tegra::DmaPusher& DmaPusher() const {
        ASSERT(current_channel);
        return *current_channel->dma_pusher;
    }
    /// Returns a reference to the underlying renderer.
    [[nodiscard]] VideoCore::RendererBase& Renderer() {
        return *renderer;
    }
    /// Returns a const reference to the underlying renderer.
    [[nodiscard]] const VideoCore::RendererBase& Renderer() const {
        return *renderer;
    }
    /// Returns a reference to the shader notifier.
    [[nodiscard]] VideoCore::ShaderNotify& ShaderNotify() {
        return *shader_notify;
    }
    /// Returns a const reference to the shader notifier.
    [[nodiscard]] const VideoCore::ShaderNotify& ShaderNotify() const {
        return *shader_notify;
    }
    [[nodiscard]] u64 GetTicks() const {
        u64 gpu_tick = system.CoreTiming().GetGPUTicks();
        if (Settings::values.use_fast_gpu_time.GetValue()) {
            gpu_tick /= 256;
        }
        return gpu_tick;
    }
    [[nodiscard]] bool IsAsync() const {
        return is_async;
    }
    [[nodiscard]] bool UseNvdec() const {
        return use_nvdec;
    }
    void RendererFrameEndNotify() {
        system.GetPerfStats().EndGameFrame();
    }
    /// Performs any additional setup necessary in order to begin GPU emulation.
    /// This can be used to launch any necessary threads and register any necessary
    /// core timing events.
    void Start() {
        Settings::UpdateGPUAccuracy();
        gpu_thread.StartThread(*renderer, renderer->Context(), *scheduler);
    }
    void NotifyShutdown() {
        std::unique_lock lk{sync_mutex};
        shutting_down.store(true, std::memory_order::relaxed);
        sync_cv.notify_all();
    }
    /// Obtain the CPU Context
    void ObtainContext() {
        if (!cpu_context) {
            cpu_context = renderer->GetRenderWindow().CreateSharedContext();
        }
        cpu_context->MakeCurrent();
    }
    /// Release the CPU Context
    void ReleaseContext() {
        cpu_context->DoneCurrent();
    }
    /// Push GPU command entries to be processed
    void PushGPUEntries(s32 channel, Tegra::CommandList&& entries) {
        gpu_thread.SubmitList(channel, std::move(entries));
    }
    /// Notify rasterizer that any caches of the specified region should be flushed to Switch memory
    void FlushRegion(DAddr addr, u64 size) {
        gpu_thread.FlushRegion(addr, size);
    }
    VideoCore::RasterizerDownloadArea OnCPURead(DAddr addr, u64 size) {
        auto raster_area = rasterizer->GetFlushArea(addr, size);
        if (raster_area.preemtive) {
            return raster_area;
        }
        raster_area.preemtive = true;
        const u64 fence = RequestSyncOperation([this, &raster_area]() {
            rasterizer->FlushRegion(raster_area.start_address,
                                    raster_area.end_address - raster_area.start_address);
        });
        gpu_thread.TickGPU();
        WaitForSyncOperation(fence);
        return raster_area;
    }
    /// Notify rasterizer that any caches of the specified region should be invalidated
    void InvalidateRegion(DAddr addr, u64 size) {
        gpu_thread.InvalidateRegion(addr, size);
    }
    bool OnCPUWrite(DAddr addr, u64 size) {
        return rasterizer->OnCPUWrite(addr, size);
    }
    /// Notify rasterizer that any caches of the specified region should be flushed and invalidated
    void FlushAndInvalidateRegion(DAddr addr, u64 size) {
        gpu_thread.FlushAndInvalidateRegion(addr, size);
    }
    void RequestComposite(std::vector<Tegra::FramebufferConfig>&& layers,
                          std::vector<Service::Nvidia::NvFence>&& fences) {
        size_t num_fences{fences.size()};
        size_t current_request_counter{};
        {
            std::unique_lock<std::mutex> lk(request_swap_mutex);
            if (free_swap_counters.empty()) {
                current_request_counter = request_swap_counters.size();
                request_swap_counters.emplace_back(num_fences);
            } else {
                current_request_counter = free_swap_counters.front();
                request_swap_counters[current_request_counter] = num_fences;
                free_swap_counters.pop_front();
            }
        }
        const auto wait_fence =
            RequestSyncOperation([this, current_request_counter, &layers, &fences, num_fences] {
                auto& syncpoint_manager = host1x.GetSyncpointManager();
                if (num_fences == 0) {
                    renderer->Composite(layers);
                }
                const auto executer = [this, current_request_counter, layers_copy = layers]() {
                    {
                        std::unique_lock<std::mutex> lk(request_swap_mutex);
                        if (--request_swap_counters[current_request_counter] != 0) {
                            return;
                        }
                        free_swap_counters.push_back(current_request_counter);
                    }
                    renderer->Composite(layers_copy);
                };
                for (size_t i = 0; i < num_fences; i++) {
                    syncpoint_manager.RegisterGuestAction(fences[i].id, fences[i].value, executer);
                }
            });
        gpu_thread.TickGPU();
        WaitForSyncOperation(wait_fence);
    }
    std::vector<u8> GetAppletCaptureBuffer() {
        std::vector<u8> out;
        const auto wait_fence =
            RequestSyncOperation([&] { out = renderer->GetAppletCaptureBuffer(); });
        gpu_thread.TickGPU();
        WaitForSyncOperation(wait_fence);
        return out;
    }
    GPU& gpu;
    Core::System& system;
    Host1x::Host1x& host1x;
    std::unique_ptr<VideoCore::RendererBase> renderer;
-    std::unique_ptr<VideoCore::OptimizedRasterizer> rasterizer;
+    VideoCore::RasterizerInterface* rasterizer = nullptr;
    const bool use_nvdec;
-    // ... (rest of the member variables remain the same)
+    s32 new_channel_id{1};
    /// Shader build notifier
    std::unique_ptr<VideoCore::ShaderNotify> shader_notify;
    /// When true, we are about to shut down emulation session, so terminate outstanding tasks
    std::atomic_bool shutting_down{};
    std::array<std::atomic<u32>, Service::Nvidia::MaxSyncPoints> syncpoints{};
    std::array<std::list<u32>, Service::Nvidia::MaxSyncPoints> syncpt_interrupts;
    std::mutex sync_mutex;
    std::mutex device_mutex;
    std::condition_variable sync_cv;
    std::list<std::function<void()>> sync_requests;
    std::atomic<u64> current_sync_fence{};
    u64 last_sync_fence{};
    std::mutex sync_request_mutex;
    std::condition_variable sync_request_cv;
    const bool is_async;
    VideoCommon::GPUThread::ThreadManager gpu_thread;
    std::unique_ptr<Core::Frontend::GraphicsContext> cpu_context;
    std::unique_ptr<Tegra::Control::Scheduler> scheduler;
    std::unordered_map<s32, std::shared_ptr<Tegra::Control::ChannelState>> channels;
    Tegra::Control::ChannelState* current_channel;
    s32 bound_channel{-1};
    std::deque<size_t> free_swap_counters;
    std::deque<size_t> request_swap_counters;
    std::mutex request_swap_mutex;
 };
-// ... (rest of the implementation remains the same)
+GPU::GPU(Core::System& system, bool is_async, bool use_nvdec)
    : impl{std::make_unique<Impl>(*this, system, is_async, use_nvdec)} {}
 GPU::~GPU() = default;
 std::shared_ptr<Control::ChannelState> GPU::AllocateChannel() {
    return impl->AllocateChannel();
 }
 void GPU::InitChannel(Control::ChannelState& to_init, u64 program_id) {
    impl->InitChannel(to_init, program_id);
 }
 void GPU::BindChannel(s32 channel_id) {
    impl->BindChannel(channel_id);
 }
 void GPU::ReleaseChannel(Control::ChannelState& to_release) {
    impl->ReleaseChannel(to_release);
 }
 void GPU::InitAddressSpace(Tegra::MemoryManager& memory_manager) {
    impl->InitAddressSpace(memory_manager);
 }
 void GPU::BindRenderer(std::unique_ptr<VideoCore::RendererBase> renderer) {
    impl->BindRenderer(std::move(renderer));
 }
 void GPU::FlushCommands() {
    impl->FlushCommands();
 }
 void GPU::InvalidateGPUCache() {
    impl->InvalidateGPUCache();
 }
 void GPU::OnCommandListEnd() {
    impl->OnCommandListEnd();
 }
 u64 GPU::RequestFlush(DAddr addr, std::size_t size) {
    return impl->RequestSyncOperation(
        [this, addr, size]() { impl->rasterizer->FlushRegion(addr, size); });
 }
 u64 GPU::CurrentSyncRequestFence() const {
    return impl->CurrentSyncRequestFence();
 }
 void GPU::WaitForSyncOperation(u64 fence) {
    return impl->WaitForSyncOperation(fence);
 }
 void GPU::TickWork() {
    impl->TickWork();
 }
 /// Gets a mutable reference to the Host1x interface
 Host1x::Host1x& GPU::Host1x() {
    return impl->host1x;
 }
 /// Gets an immutable reference to the Host1x interface.
 const Host1x::Host1x& GPU::Host1x() const {
    return impl->host1x;
 }
 Engines::Maxwell3D& GPU::Maxwell3D() {
    return impl->Maxwell3D();
 }
 const Engines::Maxwell3D& GPU::Maxwell3D() const {
    return impl->Maxwell3D();
 }
 Engines::KeplerCompute& GPU::KeplerCompute() {
    return impl->KeplerCompute();
 }
 const Engines::KeplerCompute& GPU::KeplerCompute() const {
    return impl->KeplerCompute();
 }
 Tegra::DmaPusher& GPU::DmaPusher() {
    return impl->DmaPusher();
 }
 const Tegra::DmaPusher& GPU::DmaPusher() const {
    return impl->DmaPusher();
 }
 VideoCore::RendererBase& GPU::Renderer() {
    return impl->Renderer();
 }
 const VideoCore::RendererBase& GPU::Renderer() const {
    return impl->Renderer();
 }
 VideoCore::ShaderNotify& GPU::ShaderNotify() {
    return impl->ShaderNotify();
 }
 const VideoCore::ShaderNotify& GPU::ShaderNotify() const {
    return impl->ShaderNotify();
 }
 void GPU::RequestComposite(std::vector<Tegra::FramebufferConfig>&& layers,
                           std::vector<Service::Nvidia::NvFence>&& fences) {
    impl->RequestComposite(std::move(layers), std::move(fences));
 }
 std::vector<u8> GPU::GetAppletCaptureBuffer() {
    return impl->GetAppletCaptureBuffer();
 }
 u64 GPU::GetTicks() const {
    return impl->GetTicks();
 }
 bool GPU::IsAsync() const {
    return impl->IsAsync();
 }
 bool GPU::UseNvdec() const {
    return impl->UseNvdec();
 }
 void GPU::RendererFrameEndNotify() {
    impl->RendererFrameEndNotify();
 }
 void GPU::Start() {
    impl->Start();
 }
 void GPU::NotifyShutdown() {
    impl->NotifyShutdown();
 }
 void GPU::ObtainContext() {
    impl->ObtainContext();
 }
 void GPU::ReleaseContext() {
    impl->ReleaseContext();
 }
 void GPU::PushGPUEntries(s32 channel, Tegra::CommandList&& entries) {
    impl->PushGPUEntries(channel, std::move(entries));
 }
 VideoCore::RasterizerDownloadArea GPU::OnCPURead(PAddr addr, u64 size) {
    return impl->OnCPURead(addr, size);
 }
 void GPU::FlushRegion(DAddr addr, u64 size) {
    impl->FlushRegion(addr, size);
 }
 void GPU::InvalidateRegion(DAddr addr, u64 size) {
    impl->InvalidateRegion(addr, size);
 }
 bool GPU::OnCPUWrite(DAddr addr, u64 size) {
    return impl->OnCPUWrite(addr, size);
 }
 void GPU::FlushAndInvalidateRegion(DAddr addr, u64 size) {
    impl->FlushAndInvalidateRegion(addr, size);
 }
 } // namespace Tegra
--- a/src/video_core/optimized_rasterizer.cpp
+++ b/src/video_core/optimized_rasterizer.cpp
@@ -1,221 +0,0 @@
 #include "video_core/optimized_rasterizer.h"
 #include "common/settings.h"
 #include "video_core/gpu.h"
 #include "video_core/memory_manager.h"
 #include "video_core/engines/maxwell_3d.h"
 namespace VideoCore {
 OptimizedRasterizer::OptimizedRasterizer(Core::System& system, Tegra::GPU& gpu)
    : system{system}, gpu{gpu}, memory_manager{gpu.MemoryManager()} {
    InitializeShaderCache();
 }
 OptimizedRasterizer::~OptimizedRasterizer() = default;
 void OptimizedRasterizer::Draw(bool is_indexed, u32 instance_count) {
    MICROPROFILE_SCOPE(GPU_Rasterization);
    PrepareRendertarget();
    UpdateDynamicState();
    if (is_indexed) {
        DrawIndexed(instance_count);
    } else {
        DrawArrays(instance_count);
    }
 }
 void OptimizedRasterizer::Clear(u32 layer_count) {
    MICROPROFILE_SCOPE(GPU_Rasterization);
    PrepareRendertarget();
    ClearFramebuffer(layer_count);
 }
 void OptimizedRasterizer::DispatchCompute() {
    MICROPROFILE_SCOPE(GPU_Compute);
    PrepareCompute();
    LaunchComputeShader();
 }
 void OptimizedRasterizer::ResetCounter(VideoCommon::QueryType type) {
    query_cache.ResetCounter(type);
 }
 void OptimizedRasterizer::Query(GPUVAddr gpu_addr, VideoCommon::QueryType type,
                                VideoCommon::QueryPropertiesFlags flags, u32 payload, u32 subreport) {
    query_cache.Query(gpu_addr, type, flags, payload, subreport);
 }
 void OptimizedRasterizer::FlushAll() {
    MICROPROFILE_SCOPE(GPU_Synchronization);
    FlushShaderCache();
    FlushRenderTargets();
 }
 void OptimizedRasterizer::FlushRegion(DAddr addr, u64 size, VideoCommon::CacheType which) {
    MICROPROFILE_SCOPE(GPU_Synchronization);
    if (which == VideoCommon::CacheType::All || which == VideoCommon::CacheType::Unified) {
        FlushMemoryRegion(addr, size);
    }
 }
 bool OptimizedRasterizer::MustFlushRegion(DAddr addr, u64 size, VideoCommon::CacheType which) {
    if (which == VideoCommon::CacheType::All || which == VideoCommon::CacheType::Unified) {
        return IsRegionCached(addr, size);
    }
    return false;
 }
 RasterizerDownloadArea OptimizedRasterizer::GetFlushArea(DAddr addr, u64 size) {
    return GetFlushableArea(addr, size);
 }
 void OptimizedRasterizer::InvalidateRegion(DAddr addr, u64 size, VideoCommon::CacheType which) {
    MICROPROFILE_SCOPE(GPU_Synchronization);
    if (which == VideoCommon::CacheType::All || which == VideoCommon::CacheType::Unified) {
        InvalidateMemoryRegion(addr, size);
    }
 }
 void OptimizedRasterizer::OnCacheInvalidation(PAddr addr, u64 size) {
    MICROPROFILE_SCOPE(GPU_Synchronization);
    InvalidateCachedRegion(addr, size);
 }
 bool OptimizedRasterizer::OnCPUWrite(PAddr addr, u64 size) {
    return HandleCPUWrite(addr, size);
 }
 void OptimizedRasterizer::InvalidateGPUCache() {
    MICROPROFILE_SCOPE(GPU_Synchronization);
    InvalidateAllCache();
 }
 void OptimizedRasterizer::UnmapMemory(DAddr addr, u64 size) {
    MICROPROFILE_SCOPE(GPU_Synchronization);
    UnmapGPUMemoryRegion(addr, size);
 }
 void OptimizedRasterizer::ModifyGPUMemory(size_t as_id, GPUVAddr addr, u64 size) {
    MICROPROFILE_SCOPE(GPU_Synchronization);
    UpdateMappedGPUMemory(as_id, addr, size);
 }
 void OptimizedRasterizer::FlushAndInvalidateRegion(DAddr addr, u64 size, VideoCommon::CacheType which) {
    MICROPROFILE_SCOPE(GPU_Synchronization);
    if (which == VideoCommon::CacheType::All || which == VideoCommon::CacheType::Unified) {
        FlushAndInvalidateMemoryRegion(addr, size);
    }
 }
 void OptimizedRasterizer::WaitForIdle() {
    MICROPROFILE_SCOPE(GPU_Synchronization);
    WaitForGPUIdle();
 }
 void OptimizedRasterizer::FragmentBarrier() {
    MICROPROFILE_SCOPE(GPU_Synchronization);
    InsertFragmentBarrier();
 }
 void OptimizedRasterizer::TiledCacheBarrier() {
    MICROPROFILE_SCOPE(GPU_Synchronization);
    InsertTiledCacheBarrier();
 }
 void OptimizedRasterizer::FlushCommands() {
    MICROPROFILE_SCOPE(GPU_Synchronization);
    SubmitCommands();
 }
 void OptimizedRasterizer::TickFrame() {
    MICROPROFILE_SCOPE(GPU_Synchronization);
    EndFrame();
 }
 void OptimizedRasterizer::PrepareRendertarget() {
    const auto& regs{gpu.Maxwell3D().regs};
    const auto& framebuffer{regs.framebuffer};
    render_targets.resize(framebuffer.num_color_buffers);
    for (std::size_t index = 0; index < framebuffer.num_color_buffers; ++index) {
        render_targets[index] = GetColorBuffer(index);
    }
    depth_stencil = GetDepthBuffer();
 }
 void OptimizedRasterizer::UpdateDynamicState() {
    const auto& regs{gpu.Maxwell3D().regs};
    UpdateViewport(regs.viewport_transform);
    UpdateScissor(regs.scissor_test);
    UpdateDepthBias(regs.polygon_offset_units, regs.polygon_offset_clamp, regs.polygon_offset_factor);
    UpdateBlendConstants(regs.blend_color);
    UpdateStencilFaceMask(regs.stencil_front_func_mask, regs.stencil_back_func_mask);
 }
 void OptimizedRasterizer::DrawIndexed(u32 instance_count) {
    const auto& draw_state{gpu.Maxwell3D().draw_manager->GetDrawState()};
    const auto& index_buffer{memory_manager.ReadBlockUnsafe(draw_state.index_buffer.Address(),
                                                            draw_state.index_buffer.size)};
    shader_cache.BindComputeShader();
    shader_cache.BindGraphicsShader();
    DrawElementsInstanced(draw_state.topology, draw_state.index_buffer.count,
                          draw_state.index_buffer.format, index_buffer.data(), instance_count);
 }
 void OptimizedRasterizer::DrawArrays(u32 instance_count) {
    const auto& draw_state{gpu.Maxwell3D().draw_manager->GetDrawState()};
    shader_cache.BindComputeShader();
    shader_cache.BindGraphicsShader();
    DrawArraysInstanced(draw_state.topology, draw_state.vertex_buffer.first,
                        draw_state.vertex_buffer.count, instance_count);
 }
 void OptimizedRasterizer::ClearFramebuffer(u32 layer_count) {
    const auto& regs{gpu.Maxwell3D().regs};
    const auto& clear_state{regs.clear_buffers};
    if (clear_state.R || clear_state.G || clear_state.B || clear_state.A) {
        ClearColorBuffers(clear_state.R, clear_state.G, clear_state.B, clear_state.A,
                          regs.clear_color[0], regs.clear_color[1], regs.clear_color[2],
                          regs.clear_color[3], layer_count);
    }
    if (clear_state.Z || clear_state.S) {
        ClearDepthStencilBuffer(clear_state.Z, clear_state.S, regs.clear_depth, regs.clear_stencil,
                                layer_count);
    }
 }
 void OptimizedRasterizer::PrepareCompute() {
    shader_cache.BindComputeShader();
 }
 void OptimizedRasterizer::LaunchComputeShader() {
    const auto& launch_desc{gpu.KeplerCompute().launch_description};
    DispatchCompute(launch_desc.grid_dim_x, launch_desc.grid_dim_y, launch_desc.grid_dim_z);
 }
 } // namespace VideoCore
--- a/src/video_core/optimized_rasterizer.h
+++ b/src/video_core/optimized_rasterizer.h
@@ -1,73 +0,0 @@
 #pragma once
 #include <memory>
 #include <vector>
 #include "common/common_types.h"
 #include "video_core/rasterizer_interface.h"
 #include "video_core/engines/maxwell_3d.h"
 namespace Core {
 class System;
 }
 namespace Tegra {
 class GPU;
 class MemoryManager;
 }
 namespace VideoCore {
 class ShaderCache;
 class QueryCache;
 class OptimizedRasterizer final : public RasterizerInterface {
 public:
    explicit OptimizedRasterizer(Core::System& system, Tegra::GPU& gpu);
    ~OptimizedRasterizer() override;
    void Draw(bool is_indexed, u32 instance_count) override;
    void Clear(u32 layer_count) override;
    void DispatchCompute() override;
    void ResetCounter(VideoCommon::QueryType type) override;
    void Query(GPUVAddr gpu_addr, VideoCommon::QueryType type,
               VideoCommon::QueryPropertiesFlags flags, u32 payload, u32 subreport) override;
    void FlushAll() override;
    void FlushRegion(DAddr addr, u64 size, VideoCommon::CacheType which) override;
    bool MustFlushRegion(DAddr addr, u64 size, VideoCommon::CacheType which) override;
    RasterizerDownloadArea GetFlushArea(DAddr addr, u64 size) override;
    void InvalidateRegion(DAddr addr, u64 size, VideoCommon::CacheType which) override;
    void OnCacheInvalidation(PAddr addr, u64 size) override;
    bool OnCPUWrite(PAddr addr, u64 size) override;
    void InvalidateGPUCache() override;
    void UnmapMemory(DAddr addr, u64 size) override;
    void ModifyGPUMemory(size_t as_id, GPUVAddr addr, u64 size) override;
    void FlushAndInvalidateRegion(DAddr addr, u64 size, VideoCommon::CacheType which) override;
    void WaitForIdle() override;
    void FragmentBarrier() override;
    void TiledCacheBarrier() override;
    void FlushCommands() override;
    void TickFrame() override;
 private:
    void PrepareRendertarget();
    void UpdateDynamicState();
    void DrawIndexed(u32 instance_count);
    void DrawArrays(u32 instance_count);
    void ClearFramebuffer(u32 layer_count);
    void PrepareCompute();
    void LaunchComputeShader();
    Core::System& system;
    Tegra::GPU& gpu;
    Tegra::MemoryManager& memory_manager;
    std::unique_ptr<ShaderCache> shader_cache;
    std::unique_ptr<QueryCache> query_cache;
    std::vector<RenderTargetConfig> render_targets;
    DepthStencilConfig depth_stencil;
    // Add any additional member variables needed for the optimized rasterizer
 };
 } // namespace VideoCore
--- a/src/video_core/renderer_base.h
+++ b/src/video_core/renderer_base.h
@@ -12,7 +12,6 @@
 #include "core/frontend/framebuffer_layout.h"
 #include "video_core/gpu.h"
 #include "video_core/rasterizer_interface.h"
 #include "video_core/optimized_rasterizer.h"
 namespace Core::Frontend {
 class EmuWindow;
@@ -46,8 +45,6 @@ public:
    [[nodiscard]] virtual RasterizerInterface* ReadRasterizer() = 0;
    [[nodiscard]] virtual OptimizedRasterizer* ReadOptimizedRasterizer() = 0;
    [[nodiscard]] virtual std::string GetDeviceVendor() const = 0;
    // Getter/setter functions:
--- a/src/video_core/renderer_opengl/gl_rasterizer.cpp
+++ b/src/video_core/renderer_opengl/gl_rasterizer.cpp
--- a/src/video_core/renderer_opengl/gl_rasterizer.h
+++ b/src/video_core/renderer_opengl/gl_rasterizer.h
@@ -23,7 +23,6 @@
 #include "video_core/renderer_opengl/gl_query_cache.h"
 #include "video_core/renderer_opengl/gl_shader_cache.h"
 #include "video_core/renderer_opengl/gl_texture_cache.h"
 #include "video_core/optimized_rasterizer.h"
 namespace Core::Memory {
 class Memory;
@@ -73,7 +72,8 @@ private:
    TextureCache& texture_cache;
 };
-class RasterizerOpenGL : public VideoCore::OptimizedRasterizer {
+class RasterizerOpenGL : public VideoCore::RasterizerInterface,
                         protected VideoCommon::ChannelSetupCaches<VideoCommon::ChannelInfo> {
 public:
    explicit RasterizerOpenGL(Core::Frontend::EmuWindow& emu_window_, Tegra::GPU& gpu_,
                              Tegra::MaxwellDeviceMemoryManager& device_memory_,
--- a/src/video_core/renderer_vulkan/vk_rasterizer.h
+++ b/src/video_core/renderer_vulkan/vk_rasterizer.h
@@ -24,7 +24,6 @@
 #include "video_core/renderer_vulkan/vk_update_descriptor.h"
 #include "video_core/vulkan_common/vulkan_memory_allocator.h"
 #include "video_core/vulkan_common/vulkan_wrapper.h"
 #include "video_core/optimized_rasterizer.h"
 namespace Core {
 class System;
@@ -74,7 +73,7 @@ private:
    Scheduler& scheduler;
 };
-class RasterizerVulkan final : public VideoCore::OptimizedRasterizer,
+class RasterizerVulkan final : public VideoCore::RasterizerInterface,
                               protected VideoCommon::ChannelSetupCaches<VideoCommon::ChannelInfo> {
 public:
    explicit RasterizerVulkan(Core::Frontend::EmuWindow& emu_window_, Tegra::GPU& gpu_,
--- a/src/video_core/shader_cache.cpp
+++ b/src/video_core/shader_cache.cpp
@@ -3,18 +3,9 @@
 #include <algorithm>
 #include <array>
 #include <atomic>
 #include <filesystem>
 #include <fstream>
 #include <mutex>
 #include <thread>
 #include <vector>
 #include "common/assert.h"
 #include "common/fs/file.h"
 #include "common/fs/path_util.h"
 #include "common/logging/log.h"
 #include "common/thread_worker.h"
 #include "shader_recompiler/frontend/maxwell/control_flow.h"
 #include "shader_recompiler/object_pool.h"
 #include "video_core/control/channel_state.h"
@@ -28,288 +19,233 @@
 namespace VideoCommon {
 constexpr size_t MAX_SHADER_CACHE_SIZE = 1024 * 1024 * 1024; // 1GB
 class ShaderCacheWorker : public Common::ThreadWorker {
 public:
    explicit ShaderCacheWorker(const std::string& name) : ThreadWorker(name) {}
    ~ShaderCacheWorker() = default;
    void CompileShader(ShaderInfo* shader) {
        Push([shader]() {
            // Compile shader here
            // This is a placeholder for the actual compilation process
            std::this_thread::sleep_for(std::chrono::milliseconds(10));
            shader->is_compiled.store(true, std::memory_order_release);
        });
    }
 };
 class ShaderCache::Impl {
 public:
    explicit Impl(Tegra::MaxwellDeviceMemoryManager& device_memory_)
        : device_memory{device_memory_}, workers{CreateWorkers()} {
        LoadCache();
    }
    ~Impl() {
        SaveCache();
    }
    void InvalidateRegion(VAddr addr, size_t size) {
        std::scoped_lock lock{invalidation_mutex};
        InvalidatePagesInRegion(addr, size);
        RemovePendingShaders();
    }
    void OnCacheInvalidation(VAddr addr, size_t size) {
        std::scoped_lock lock{invalidation_mutex};
        InvalidatePagesInRegion(addr, size);
    }
    void SyncGuestHost() {
        std::scoped_lock lock{invalidation_mutex};
        RemovePendingShaders();
    }
    bool RefreshStages(std::array<u64, 6>& unique_hashes);
    const ShaderInfo* ComputeShader();
    void GetGraphicsEnvironments(GraphicsEnvironments& result, const std::array<u64, NUM_PROGRAMS>& unique_hashes);
    ShaderInfo* TryGet(VAddr addr) const {
        std::scoped_lock lock{lookup_mutex};
        const auto it = lookup_cache.find(addr);
        if (it == lookup_cache.end()) {
            return nullptr;
        }
        return it->second->data;
    }
    void Register(std::unique_ptr<ShaderInfo> data, VAddr addr, size_t size) {
        std::scoped_lock lock{invalidation_mutex, lookup_mutex};
        const VAddr addr_end = addr + size;
        Entry* const entry = NewEntry(addr, addr_end, data.get());
        const u64 page_end = (addr_end + SUYU_PAGESIZE - 1) >> SUYU_PAGEBITS;
        for (u64 page = addr >> SUYU_PAGEBITS; page < page_end; ++page) {
            invalidation_cache[page].push_back(entry);
        }
        storage.push_back(std::move(data));
        device_memory.UpdatePagesCachedCount(addr, size, 1);
    }
 private:
    std::vector<std::unique_ptr<ShaderCacheWorker>> CreateWorkers() {
        const size_t num_workers = std::thread::hardware_concurrency();
        std::vector<std::unique_ptr<ShaderCacheWorker>> workers;
        workers.reserve(num_workers);
        for (size_t i = 0; i < num_workers; ++i) {
            workers.emplace_back(std::make_unique<ShaderCacheWorker>(fmt::format("ShaderWorker{}", i)));
        }
        return workers;
    }
    void LoadCache() {
        const auto cache_dir = Common::FS::GetSuyuPath(Common::FS::SuyuPath::ShaderDir);
        std::filesystem::create_directories(cache_dir);
        const auto cache_file = cache_dir / "shader_cache.bin";
        if (!std::filesystem::exists(cache_file)) {
            return;
        }
        std::ifstream file(cache_file, std::ios::binary);
        if (!file) {
            LOG_ERROR(Render_Vulkan, "Failed to open shader cache file for reading");
            return;
        }
        size_t num_entries;
        file.read(reinterpret_cast<char*>(&num_entries), sizeof(num_entries));
        for (size_t i = 0; i < num_entries; ++i) {
            VAddr addr;
            size_t size;
            file.read(reinterpret_cast<char*>(&addr), sizeof(addr));
            file.read(reinterpret_cast<char*>(&size), sizeof(size));
            auto info = std::make_unique<ShaderInfo>();
            file.read(reinterpret_cast<char*>(info.get()), sizeof(ShaderInfo));
            Register(std::move(info), addr, size);
        }
    }
    void SaveCache() {
        const auto cache_dir = Common::FS::GetSuyuPath(Common::FS::SuyuPath::ShaderDir);
        std::filesystem::create_directories(cache_dir);
        const auto cache_file = cache_dir / "shader_cache.bin";
        std::ofstream file(cache_file, std::ios::binary | std::ios::trunc);
        if (!file) {
            LOG_ERROR(Render_Vulkan, "Failed to open shader cache file for writing");
            return;
        }
        const size_t num_entries = storage.size();
        file.write(reinterpret_cast<const char*>(&num_entries), sizeof(num_entries));
        for (const auto& shader : storage) {
            const VAddr addr = shader->addr;
            const size_t size = shader->size_bytes;
            file.write(reinterpret_cast<const char*>(&addr), sizeof(addr));
            file.write(reinterpret_cast<const char*>(&size), sizeof(size));
            file.write(reinterpret_cast<const char*>(shader.get()), sizeof(ShaderInfo));
        }
    }
    void InvalidatePagesInRegion(VAddr addr, size_t size) {
        const VAddr addr_end = addr + size;
        const u64 page_end = (addr_end + SUYU_PAGESIZE - 1) >> SUYU_PAGEBITS;
        for (u64 page = addr >> SUYU_PAGEBITS; page < page_end; ++page) {
            auto it = invalidation_cache.find(page);
            if (it == invalidation_cache.end()) {
                continue;
            }
            InvalidatePageEntries(it->second, addr, addr_end);
        }
    }
    void RemovePendingShaders() {
        if (marked_for_removal.empty()) {
            return;
        }
        // Remove duplicates
        std::sort(marked_for_removal.begin(), marked_for_removal.end());
        marked_for_removal.erase(std::unique(marked_for_removal.begin(), marked_for_removal.end()),
                                 marked_for_removal.end());
        std::vector<ShaderInfo*> removed_shaders;
        std::scoped_lock lock{lookup_mutex};
        for (Entry* const entry : marked_for_removal) {
            removed_shaders.push_back(entry->data);
            const auto it = lookup_cache.find(entry->addr_start);
            ASSERT(it != lookup_cache.end());
            lookup_cache.erase(it);
        }
        marked_for_removal.clear();
        if (!removed_shaders.empty()) {
            RemoveShadersFromStorage(removed_shaders);
        }
    }
    void InvalidatePageEntries(std::vector<Entry*>& entries, VAddr addr, VAddr addr_end) {
        size_t index = 0;
        while (index < entries.size()) {
            Entry* const entry = entries[index];
            if (!entry->Overlaps(addr, addr_end)) {
                ++index;
                continue;
            }
            UnmarkMemory(entry);
            RemoveEntryFromInvalidationCache(entry);
            marked_for_removal.push_back(entry);
        }
    }
    void RemoveEntryFromInvalidationCache(const Entry* entry) {
        const u64 page_end = (entry->addr_end + SUYU_PAGESIZE - 1) >> SUYU_PAGEBITS;
        for (u64 page = entry->addr_start >> SUYU_PAGEBITS; page < page_end; ++page) {
            const auto entries_it = invalidation_cache.find(page);
            ASSERT(entries_it != invalidation_cache.end());
            std::vector<Entry*>& entries = entries_it->second;
            const auto entry_it = std::find(entries.begin(), entries.end(), entry);
            ASSERT(entry_it != entries.end());
            entries.erase(entry_it);
        }
    }
    void UnmarkMemory(Entry* entry) {
        if (!entry->is_memory_marked) {
            return;
        }
        entry->is_memory_marked = false;
        const VAddr addr = entry->addr_start;
        const size_t size = entry->addr_end - addr;
        device_memory.UpdatePagesCachedCount(addr, size, -1);
    }
    void RemoveShadersFromStorage(const std::vector<ShaderInfo*>& removed_shaders) {
        storage.erase(
            std::remove_if(storage.begin(), storage.end(),
                           [&removed_shaders](const std::unique_ptr<ShaderInfo>& shader) {
                               return std::find(removed_shaders.begin(), removed_shaders.end(),
                                                shader.get()) != removed_shaders.end();
                           }),
            storage.end());
    }
    Entry* NewEntry(VAddr addr, VAddr addr_end, ShaderInfo* data) {
        auto entry = std::make_unique<Entry>(Entry{addr, addr_end, data});
        Entry* const entry_pointer = entry.get();
        lookup_cache.emplace(addr, std::move(entry));
        return entry_pointer;
    }
    Tegra::MaxwellDeviceMemoryManager& device_memory;
    std::vector<std::unique_ptr<ShaderCacheWorker>> workers;
    mutable std::mutex lookup_mutex;
    std::mutex invalidation_mutex;
    std::unordered_map<VAddr, std::unique_ptr<Entry>> lookup_cache;
    std::unordered_map<u64, std::vector<Entry*>> invalidation_cache;
    std::vector<std::unique_ptr<ShaderInfo>> storage;
    std::vector<Entry*> marked_for_removal;
 };
 ShaderCache::ShaderCache(Tegra::MaxwellDeviceMemoryManager& device_memory_)
    : impl{std::make_unique<Impl>(device_memory_)} {}
 ShaderCache::~ShaderCache() = default;
 void ShaderCache::InvalidateRegion(VAddr addr, size_t size) {
-    impl->InvalidateRegion(addr, size);
+    std::scoped_lock lock{invalidation_mutex};
    InvalidatePagesInRegion(addr, size);
    RemovePendingShaders();
 }
 void ShaderCache::OnCacheInvalidation(VAddr addr, size_t size) {
-    impl->OnCacheInvalidation(addr, size);
+    std::scoped_lock lock{invalidation_mutex};
    InvalidatePagesInRegion(addr, size);
 }
 void ShaderCache::SyncGuestHost() {
-    impl->SyncGuestHost();
+    std::scoped_lock lock{invalidation_mutex};
    RemovePendingShaders();
 }
 ShaderCache::ShaderCache(Tegra::MaxwellDeviceMemoryManager& device_memory_)
    : device_memory{device_memory_} {}
 bool ShaderCache::RefreshStages(std::array<u64, 6>& unique_hashes) {
-    return impl->RefreshStages(unique_hashes);
+    auto& dirty{maxwell3d->dirty.flags};
    if (!dirty[VideoCommon::Dirty::Shaders]) {
        return last_shaders_valid;
    }
    dirty[VideoCommon::Dirty::Shaders] = false;
    const GPUVAddr base_addr{maxwell3d->regs.program_region.Address()};
    for (size_t index = 0; index < Tegra::Engines::Maxwell3D::Regs::MaxShaderProgram; ++index) {
        if (!maxwell3d->regs.IsShaderConfigEnabled(index)) {
            unique_hashes[index] = 0;
            continue;
        }
        const auto& shader_config{maxwell3d->regs.pipelines[index]};
        const auto program{static_cast<Tegra::Engines::Maxwell3D::Regs::ShaderType>(index)};
        if (program == Tegra::Engines::Maxwell3D::Regs::ShaderType::Pixel &&
            !maxwell3d->regs.rasterize_enable) {
            unique_hashes[index] = 0;
            continue;
        }
        const GPUVAddr shader_addr{base_addr + shader_config.offset};
        const std::optional<VAddr> cpu_shader_addr{gpu_memory->GpuToCpuAddress(shader_addr)};
        if (!cpu_shader_addr) {
            LOG_ERROR(HW_GPU, "Invalid GPU address for shader 0x{:016x}", shader_addr);
            last_shaders_valid = false;
            return false;
        }
        const ShaderInfo* shader_info{TryGet(*cpu_shader_addr)};
        if (!shader_info) {
            const u32 start_address{shader_config.offset};
            GraphicsEnvironment env{*maxwell3d, *gpu_memory, program, base_addr, start_address};
            shader_info = MakeShaderInfo(env, *cpu_shader_addr);
        }
        shader_infos[index] = shader_info;
        unique_hashes[index] = shader_info->unique_hash;
    }
    last_shaders_valid = true;
    return true;
 }
 const ShaderInfo* ShaderCache::ComputeShader() {
-    return impl->ComputeShader();
+    const GPUVAddr program_base{kepler_compute->regs.code_loc.Address()};
    const auto& qmd{kepler_compute->launch_description};
    const GPUVAddr shader_addr{program_base + qmd.program_start};
    const std::optional<VAddr> cpu_shader_addr{gpu_memory->GpuToCpuAddress(shader_addr)};
    if (!cpu_shader_addr) {
        LOG_ERROR(HW_GPU, "Invalid GPU address for shader 0x{:016x}", shader_addr);
        return nullptr;
    }
    if (const ShaderInfo* const shader = TryGet(*cpu_shader_addr)) {
        return shader;
    }
    ComputeEnvironment env{*kepler_compute, *gpu_memory, program_base, qmd.program_start};
    return MakeShaderInfo(env, *cpu_shader_addr);
 }
 void ShaderCache::GetGraphicsEnvironments(GraphicsEnvironments& result,
                                          const std::array<u64, NUM_PROGRAMS>& unique_hashes) {
-    impl->GetGraphicsEnvironments(result, unique_hashes);
+    size_t env_index{};
    const GPUVAddr base_addr{maxwell3d->regs.program_region.Address()};
    for (size_t index = 0; index < NUM_PROGRAMS; ++index) {
        if (unique_hashes[index] == 0) {
            continue;
        }
        const auto program{static_cast<Tegra::Engines::Maxwell3D::Regs::ShaderType>(index)};
        auto& env{result.envs[index]};
        const u32 start_address{maxwell3d->regs.pipelines[index].offset};
        env = GraphicsEnvironment{*maxwell3d, *gpu_memory, program, base_addr, start_address};
        env.SetCachedSize(shader_infos[index]->size_bytes);
        result.env_ptrs[env_index++] = &env;
    }
 }
 ShaderInfo* ShaderCache::TryGet(VAddr addr) const {
-    return impl->TryGet(addr);
+    std::scoped_lock lock{lookup_mutex};
    const auto it = lookup_cache.find(addr);
    if (it == lookup_cache.end()) {
        return nullptr;
    }
    return it->second->data;
 }
 void ShaderCache::Register(std::unique_ptr<ShaderInfo> data, VAddr addr, size_t size) {
-    impl->Register(std::move(data), addr, size);
+    std::scoped_lock lock{invalidation_mutex, lookup_mutex};
    const VAddr addr_end = addr + size;
    Entry* const entry = NewEntry(addr, addr_end, data.get());
    const u64 page_end = (addr_end + SUYU_PAGESIZE - 1) >> SUYU_PAGEBITS;
    for (u64 page = addr >> SUYU_PAGEBITS; page < page_end; ++page) {
        invalidation_cache[page].push_back(entry);
    }
    storage.push_back(std::move(data));
    device_memory.UpdatePagesCachedCount(addr, size, 1);
 }
 void ShaderCache::InvalidatePagesInRegion(VAddr addr, size_t size) {
    const VAddr addr_end = addr + size;
    const u64 page_end = (addr_end + SUYU_PAGESIZE - 1) >> SUYU_PAGEBITS;
    for (u64 page = addr >> SUYU_PAGEBITS; page < page_end; ++page) {
        auto it = invalidation_cache.find(page);
        if (it == invalidation_cache.end()) {
            continue;
        }
        InvalidatePageEntries(it->second, addr, addr_end);
    }
 }
 void ShaderCache::RemovePendingShaders() {
    if (marked_for_removal.empty()) {
        return;
    }
    // Remove duplicates
    std::ranges::sort(marked_for_removal);
    marked_for_removal.erase(std::unique(marked_for_removal.begin(), marked_for_removal.end()),
                             marked_for_removal.end());
    boost::container::small_vector<ShaderInfo*, 16> removed_shaders;
    std::scoped_lock lock{lookup_mutex};
    for (Entry* const entry : marked_for_removal) {
        removed_shaders.push_back(entry->data);
        const auto it = lookup_cache.find(entry->addr_start);
        ASSERT(it != lookup_cache.end());
        lookup_cache.erase(it);
    }
    marked_for_removal.clear();
    if (!removed_shaders.empty()) {
        RemoveShadersFromStorage(removed_shaders);
    }
 }
 void ShaderCache::InvalidatePageEntries(std::vector<Entry*>& entries, VAddr addr, VAddr addr_end) {
    size_t index = 0;
    while (index < entries.size()) {
        Entry* const entry = entries[index];
        if (!entry->Overlaps(addr, addr_end)) {
            ++index;
            continue;
        }
        UnmarkMemory(entry);
        RemoveEntryFromInvalidationCache(entry);
        marked_for_removal.push_back(entry);
    }
 }
 void ShaderCache::RemoveEntryFromInvalidationCache(const Entry* entry) {
    const u64 page_end = (entry->addr_end + SUYU_PAGESIZE - 1) >> SUYU_PAGEBITS;
    for (u64 page = entry->addr_start >> SUYU_PAGEBITS; page < page_end; ++page) {
        const auto entries_it = invalidation_cache.find(page);
        ASSERT(entries_it != invalidation_cache.end());
        std::vector<Entry*>& entries = entries_it->second;
        const auto entry_it = std::ranges::find(entries, entry);
        ASSERT(entry_it != entries.end());
        entries.erase(entry_it);
    }
 }
 void ShaderCache::UnmarkMemory(Entry* entry) {
    if (!entry->is_memory_marked) {
        return;
    }
    entry->is_memory_marked = false;
    const VAddr addr = entry->addr_start;
    const size_t size = entry->addr_end - addr;
    device_memory.UpdatePagesCachedCount(addr, size, -1);
 }
 void ShaderCache::RemoveShadersFromStorage(std::span<ShaderInfo*> removed_shaders) {
    // Remove them from the cache
    std::erase_if(storage, [&removed_shaders](const std::unique_ptr<ShaderInfo>& shader) {
        return std::ranges::find(removed_shaders, shader.get()) != removed_shaders.end();
    });
 }
 ShaderCache::Entry* ShaderCache::NewEntry(VAddr addr, VAddr addr_end, ShaderInfo* data) {
    auto entry = std::make_unique<Entry>(Entry{addr, addr_end, data});
    Entry* const entry_pointer = entry.get();
    lookup_cache.emplace(addr, std::move(entry));
    return entry_pointer;
 }
 const ShaderInfo* ShaderCache::MakeShaderInfo(GenericEnvironment& env, VAddr cpu_addr) {
    auto info = std::make_unique<ShaderInfo>();
    if (const std::optional<u64> cached_hash{env.Analyze()}) {
        info->unique_hash = *cached_hash;
        info->size_bytes = env.CachedSizeBytes();
    } else {
        // Slow path, not really hit on commercial games
        // Build a control flow graph to get the real shader size
        Shader::ObjectPool<Shader::Maxwell::Flow::Block> flow_block;
        Shader::Maxwell::Flow::CFG cfg{env, flow_block, env.StartAddress()};
        info->unique_hash = env.CalculateHash();
        info->size_bytes = env.ReadSizeBytes();
    }
    const size_t size_bytes{info->size_bytes};
    const ShaderInfo* const result{info.get()};
    Register(std::move(info), cpu_addr, size_bytes);
    return result;
 }
 } // namespace VideoCommon