Update README.md

Updated
ok
2024-09-30 12:30:59 +02:00 · 2024-09-29 21:31:09 +01:00 · 2024-09-29 21:28:35 +01:00 · 2024-09-29 21:23:11 +01:00 · 2024-09-20 20:44:32 +02:00 · 2024-09-17 10:22:27 +02:00
46 changed files with 848 additions and 1489 deletions
--- a/.ci/scripts/android/build.sh
+++ b/.ci/scripts/android/build.sh
@ -7,6 +7,8 @@
 export NDK_CCACHE="$(which ccache)"
 ccache -s
 git submodule update --init --recursive
 BUILD_FLAVOR="mainline"
 BUILD_TYPE="release"
--- a/.ci/scripts/clang/docker.sh
+++ b/.ci/scripts/clang/docker.sh
@ -7,7 +7,9 @@
 # Exit on error, rather than continuing with the rest of the script.
 set -e
-ccache -sv
+ccache -s
 git submodule update --init --recursive
 mkdir build || true && cd build
 cmake .. \
--- a/.ci/scripts/linux/docker.sh
+++ b/.ci/scripts/linux/docker.sh
@ -6,7 +6,9 @@
 # Exit on error, rather than continuing with the rest of the script.
 set -e
-ccache -sv
+ccache -s
 git submodule update --init --recursive
 mkdir build || true && cd build
 cmake .. \
@ -52,9 +54,9 @@ DESTDIR="$PWD/AppDir" ninja install
 rm -vf AppDir/usr/bin/suyu-cmd AppDir/usr/bin/suyu-tester
 # Download tools needed to build an AppImage
-wget -nc https://gitlab.com/suyu-emu/ext-linux-bin/-/raw/main/appimage/deploy-linux.sh
+wget -nc https://git.suyu.dev/suyu/ext-linux-bin/raw/branch/main/appimage/deploy-linux.sh
-wget -nc https://gitlab.com/suyu-emu/ext-linux-bin/-/raw/main/appimage/exec-x86_64.so
+wget -nc https://git.suyu.dev/suyu/ext-linux-bin/raw/branch/main/appimage/exec-x86_64.so
-wget -nc https://gitlab.com/suyu-emu/AppImageKit-checkrt/-/raw/old/AppRun.sh
+wget -nc https://git.suyu.dev/suyu/AppImageKit-checkrt/raw/branch/gh-workflow/AppRun
 # Set executable bit
 chmod 755 \
--- a/.ci/scripts/windows/docker.sh
+++ b/.ci/scripts/windows/docker.sh
@ -8,7 +8,9 @@ set -e
 #cd /suyu
-ccache -sv
+ccache -s
 git submodule update --init --recursive
 rm -rf build
 mkdir -p build && cd build
--- a/.forgejo/workflows/verify.yml
+++ b/.forgejo/workflows/verify.yml
@ -8,7 +8,7 @@ name: 'suyu verify'
 on:
  pull_request:
-    branches: [ "dev" ]
+    # branches: [ "dev" ]
    paths:
      - 'src/**'
      - 'CMakeModules/**'
@ -19,7 +19,7 @@ on:
    # paths-ignore:
    #   - 'src/android/**'
  push:
-    branches: [ "dev" ]
+    # branches: [ "dev" ]
    paths:
      - 'src/**'
      - 'CMakeModules/**'
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@ -6,5 +6,5 @@ SPDX-License-Identifier: GPL-2.0-or-later
 Please check out the
- * [Contributors's guide](https://gitlab.com/suyu-emu/suyu/-/wikis/Contributing).
+ * [Contributors's guide](https://git.suyu.dev/suyu/suyu/wiki/Contributing).
- * [Merge request guidelines](https://gitlab.com/suyu-emu/suyu/-/wikis/Merge-requests)
+ * [Merge request guidelines](https://git.suyu.dev/suyu/suyu/wiki/Typical-Git-Workflow#once-your-pull-request-is-ready-to-be-merged)
--- a/MIGRATION.md
+++ b/MIGRATION.md
@ -0,0 +1,25 @@
 <!--
 SPDX-FileCopyrightText: 2024 suyu Emulator Project
 SPDX-License-Identifier: GPL-3.0-or-later
 -->
 # Migrating from yuzu
 When coming from yuzu, the migration is as easy as renaming some directories.
 ## Windows
 Use the run dialog to go to `%APPDATA%` or manually go to `C:\Users\{USERNAME}\AppData\Roaming` (you may have to enable hidden files) and simply rename the `yuzu` directories and simply rename those to `suyu`.
 ## Unix (macOS/Linux)
 Similarly, you can simply rename the folders `~/.local/share/yuzu` and `~/.config/yuzu` to `suyu`, either via a file manager or with the following commands:
 ```sh
 $ mv ~/.local/share/yuzu ~/.local/share/suyu
 $ mv ~/.config/yuzu ~/.config/suyu
 ```
 There is also `~/.cache/yuzu`, which you can safely delete. Suyu will build a fresh cache in its own directory.
 ### Linux
 Depending on your setup, you may want to substitute those base paths for `$XDG_DATA_HOME` and `$XDG_CONFIG_HOME` respectively.
 ## Android
 TBD
--- a/README.md
+++ b/README.md
@ -6,10 +6,10 @@ SPDX-License-Identifier: GPL-3.0-or-later
 **Note**: We do not support or condone piracy in any form. In order to use suyu, you'll need keys from your real Switch system, and games which you have legally obtained and paid for. We do not intend to make money or profit from this project.
-We're in need of developers. Please join our chat below if you want to contribute!
+We're in need of developers. Please join our chat below or DM a dev if you want to contribute!
-This repo was based on Yuzu EA 4176 but the code is being rewritten from the ground up for legal and performance reasons.
+This repo is currently based on Yuzu EA 4176 but the code will be rewritten for legal and performance reasons.
-Support the original suyu developer team [here](https://discord.gg/ajz5hdrZ)
+Our only website is suyu.dev so please be cautious when using other sites offering builds/downloads.
 <hr />
@ -23,12 +23,13 @@ Support the original suyu developer team [here](https://discord.gg/ajz5hdrZ)
 <h4 align="center"><b>suyu</b> was the continuation of the world's most popular, open-source Nintendo Switch emulator, yuzu, but is now something more.
 <br>
-It is written in C++ (C# possibly required soon) with portability in mind, we actively work on builds for Windows, Linux, Android and hopefully IOS, along with a WIP custom OS called suyuOS (https://git.suyu.dev/suyu/suyu-os) .
+It is written in C++ with portability in mind, and we actively provide builds for Windows, Linux and Android, iOS may come later.
 </h4>
 <p align="center">
  <a href="https://chat.suyu.dev">Chat</a> |
  <a href="https://www.reddit.com/r/suyu/">Reddit</a> |
  <a href="#status">Status</a> |
  <a href="#development">Development</a> |
  <a href="#downloads">Downloads</a> |
@ -41,6 +42,10 @@ It is written in C++ (C# possibly required soon) with portability in mind, we ac
 ## Hardware Requirements
 [Click here to see the Hardware Requirements](https://git.suyu.dev/suyu/suyu/wiki/Hardware-Requirements)
 ## Migrating from yuzu
 See [MIGRATION.md](MIGRATION.md).
 ## Status
 We currently have builds over at the [Releases](https://git.suyu.dev/suyu/suyu/releases) page.
@ -51,10 +56,10 @@ We currently have builds over at the [Releases](https://git.suyu.dev/suyu/suyu/r
 This project is completely free and open source, and anyone can contribute to help improve suyu.
-Most of the development happens on GitLab. For development discussion, please join us in our [Chat](https://chat.suyu.dev).
+Most of the development happens on Git. For development discussion, please join us in our [Chat](https://chat.suyu.dev) or [Subreddit](reddit.com/r/suyu/), you can also contact a developer.
 If you want to contribute, please take a look at the [Contributor's Guide](https://git.suyu.dev/suyu/suyu/wiki/Contributing) and [Developer Information](https://git.suyu.dev/suyu/suyu/wiki/Developer-Information).
-You can also contact any of the developers on Discord to learn more about the current state of suyu.
+You can also contact any of the developers on the Chat to learn more about the current state of suyu.
 ## Downloads
@ -62,26 +67,27 @@ You can also contact any of the developers on Discord to learn more about the cu
 * __Linux__: [Releases](https://git.suyu.dev/suyu/suyu/releases)
 * __macOS__: [Releases](https://git.suyu.dev/suyu/suyu/releases)
 * __Android__: [Releases](https://git.suyu.dev/suyu/suyu/releases)
-* __For IOS users, we recommend Sudachi__: [Releases](https://github.com/emuPlace/Sudachi/releases)
+###### We currently do not provide builds for iOS, however if you would like, you could try the experimental Sudachi Emulator and it's bigger project: [Folium](https://apps.apple.com/us/app/folium/id6498623389).
-If you want daily builds then [Click here](https://git.suyu.dev/suyu/suyu/actions)
+If you want daily builds then [Click here](https://git.suyu.dev/suyu/suyu/actions).
 If you don't know how to download the daily builds then [Click here](https://git.suyu.dev/suyu/suyu/raw/branch/dev/img/daily-builds.png)
 Right now we only have daily builds for Linux and Android.
-We have official builds [here.](https://git.suyu.dev/suyu/suyu/releases) If any website or person is claiming to have a build for suyu, take that with a grain of salt.
+We have official builds [here.](https://git.suyu.dev/suyu/suyu/releases)<br>If any website or person is claiming to have a build for suyu, take that with a grain of salt and let us know.
 For Multiplayer, we recommend using the "Yuzu Online" patch, install instructions can be found on Reddit and their Discord.
 ## Building
 * __Windows__: [Windows Build](https://git.suyu.dev/suyu/suyu/wiki/Building-For-Windows)
 * __Linux__: [Linux Build](https://git.suyu.dev/suyu/suyu/wiki/Building-For-Linux)
 * __Android__: [Android Build](https://git.suyu.dev/suyu/suyu/wiki/Building-For-Android)
-* __macOS__: [macOS Build](https://git.suyu.dev/suyu/suyu/wiki/Building-for-macOS)
+* __MacOS__: [MacOS Build](https://git.suyu.dev/suyu/suyu/wiki/Building-for-macOS)
 ## Support
-If you have any questions, don't hesitate to ask us in our [chat](https://chat.suyu.dev). We don't bite!
+If you have any questions, don't hesitate to ask us in our [Chat](https://chat.suyu.dev) or Subreddit, make an issue or contact a developer. We don't bite!
 ## License
--- a/bugs.png
+++ b/bugs.png
--- a/src/audio_core/common/feature_support.h
+++ b/src/audio_core/common/feature_support.h
@ -13,7 +13,7 @@
 #include "common/polyfill_ranges.h"
 namespace AudioCore {
-constexpr u32 CurrentRevision = 11;
+constexpr u32 CurrentRevision = 12;
 enum class SupportTags {
    CommandProcessingTimeEstimatorVersion4,
--- a/src/audio_core/device/audio_buffers.h
+++ b/src/audio_core/device/audio_buffers.h
@ -54,7 +54,8 @@ public:
        const s32 to_register{std::min(std::min(appended_count, BufferAppendLimit),
                                       BufferAppendLimit - registered_count)};
-        for (s32 i = 0; i < to_register; i++) {
+        out_buffers.reserve(to_register);
        for (s32 i = 0; i < to_register; ++i) {
            s32 index{appended_index - appended_count};
            if (index < 0) {
                index += N;
@ -180,6 +181,7 @@ public:
            return 0;
        }
        buffers_flushed.reserve(registered_count + appended_count);
        while (registered_count > 0) {
            auto index{registered_index - registered_count};
            if (index < 0) {
--- a/src/core/core.cpp
+++ b/src/core/core.cpp
@ -80,6 +80,7 @@ FileSys::VirtualFile GetGameFileFromPath(const FileSys::VirtualFilesystem& vfs,
    if (filename == "00") {
        const auto dir = vfs->OpenDirectory(dir_name, FileSys::OpenMode::Read);
        std::vector<FileSys::VirtualFile> concat;
        concat.reserve(0x10);
        for (u32 i = 0; i < 0x10; ++i) {
            const auto file_name = fmt::format("{:02X}", i);
--- a/src/core/core_timing.cpp
+++ b/src/core/core_timing.cpp
@ -26,24 +26,6 @@ std::shared_ptr<EventType> CreateEvent(std::string name, TimedCallback&& callbac
    return std::make_shared<EventType>(std::move(callback), std::move(name));
 }
 struct CoreTiming::Event {
    s64 time;
    u64 fifo_order;
    std::weak_ptr<EventType> type;
    s64 reschedule_time;
    heap_t::handle_type handle{};
    // Sort by time, unless the times are the same, in which case sort by
    // the order added to the queue
    friend bool operator>(const Event& left, const Event& right) {
        return std::tie(left.time, left.fifo_order) > std::tie(right.time, right.fifo_order);
    }
    friend bool operator<(const Event& left, const Event& right) {
        return std::tie(left.time, left.fifo_order) < std::tie(right.time, right.fifo_order);
    }
 };
 CoreTiming::CoreTiming() : clock{Common::CreateOptimalClock()} {}
 CoreTiming::~CoreTiming() {
@ -87,7 +69,7 @@ void CoreTiming::Pause(bool is_paused) {
 }
 void CoreTiming::SyncPause(bool is_paused) {
-    if (is_paused == paused && paused_set == paused) {
+    if (is_paused == paused && paused_set == is_paused) {
        return;
    }
@ -112,7 +94,7 @@ bool CoreTiming::IsRunning() const {
 bool CoreTiming::HasPendingEvents() const {
    std::scoped_lock lock{basic_lock};
-    return !(wait_set && event_queue.empty());
+    return !event_queue.empty();
 }
 void CoreTiming::ScheduleEvent(std::chrono::nanoseconds ns_into_future,
@ -121,8 +103,8 @@ void CoreTiming::ScheduleEvent(std::chrono::nanoseconds ns_into_future,
        std::scoped_lock scope{basic_lock};
        const auto next_time{absolute_time ? ns_into_future : GetGlobalTimeNs() + ns_into_future};
-        auto h{event_queue.emplace(Event{next_time.count(), event_fifo_id++, event_type, 0})};
+        event_queue.emplace_back(Event{next_time.count(), event_fifo_id++, event_type});
-        (*h).handle = h;
+        std::push_heap(event_queue.begin(), event_queue.end(), std::greater<>());
    }
    event.Set();
@ -136,9 +118,9 @@ void CoreTiming::ScheduleLoopingEvent(std::chrono::nanoseconds start_time,
        std::scoped_lock scope{basic_lock};
        const auto next_time{absolute_time ? start_time : GetGlobalTimeNs() + start_time};
-        auto h{event_queue.emplace(
+        event_queue.emplace_back(
-            Event{next_time.count(), event_fifo_id++, event_type, resched_time.count()})};
+            Event{next_time.count(), event_fifo_id++, event_type, resched_time.count()});
-        (*h).handle = h;
+        std::push_heap(event_queue.begin(), event_queue.end(), std::greater<>());
    }
    event.Set();
@ -149,17 +131,11 @@ void CoreTiming::UnscheduleEvent(const std::shared_ptr<EventType>& event_type,
    {
        std::scoped_lock lk{basic_lock};
-        std::vector<heap_t::handle_type> to_remove;
+        event_queue.erase(
-        for (auto itr = event_queue.begin(); itr != event_queue.end(); itr++) {
+            std::remove_if(event_queue.begin(), event_queue.end(),
-            const Event& e = *itr;
+                           [&](const Event& e) { return e.type.lock().get() == event_type.get(); }),
-            if (e.type.lock().get() == event_type.get()) {
+            event_queue.end());
-                to_remove.push_back(itr->handle);
+        std::make_heap(event_queue.begin(), event_queue.end(), std::greater<>());
            }
        }
        for (auto& h : to_remove) {
            event_queue.erase(h);
        }
        event_type->sequence_number++;
    }
@ -172,7 +148,7 @@ void CoreTiming::UnscheduleEvent(const std::shared_ptr<EventType>& event_type,
 void CoreTiming::AddTicks(u64 ticks_to_add) {
    cpu_ticks += ticks_to_add;
-    downcount -= static_cast<s64>(cpu_ticks);
+    downcount -= static_cast<s64>(ticks_to_add);
 }
 void CoreTiming::Idle() {
@ -180,7 +156,7 @@ void CoreTiming::Idle() {
 }
 void CoreTiming::ResetTicks() {
-    downcount = MAX_SLICE_LENGTH;
+    downcount.store(MAX_SLICE_LENGTH, std::memory_order_release);
 }
 u64 CoreTiming::GetClockTicks() const {
@ -201,48 +177,38 @@ std::optional<s64> CoreTiming::Advance() {
    std::scoped_lock lock{advance_lock, basic_lock};
    global_timer = GetGlobalTimeNs().count();
-    while (!event_queue.empty() && event_queue.top().time <= global_timer) {
+    while (!event_queue.empty() && event_queue.front().time <= global_timer) {
-        const Event& evt = event_queue.top();
+        Event evt = std::move(event_queue.front());
        std::pop_heap(event_queue.begin(), event_queue.end(), std::greater<>());
        event_queue.pop_back();
-        if (const auto event_type{evt.type.lock()}) {
+        if (const auto event_type = evt.type.lock()) {
            const auto evt_time = evt.time;
            const auto evt_sequence_num = event_type->sequence_number;
            if (evt.reschedule_time == 0) {
                event_queue.pop();
            basic_lock.unlock();
-                event_type->callback(
+            const auto new_schedule_time = event_type->callback(
                evt_time, std::chrono::nanoseconds{GetGlobalTimeNs().count() - evt_time});
            basic_lock.lock();
            } else {
                basic_lock.unlock();
                const auto new_schedule_time{event_type->callback(
                    evt_time, std::chrono::nanoseconds{GetGlobalTimeNs().count() - evt_time})};
                basic_lock.lock();
            if (evt_sequence_num != event_type->sequence_number) {
                    // Heap handle is invalidated after external modification.
                continue;
            }
-                const auto next_schedule_time{new_schedule_time.has_value()
+            if (new_schedule_time.has_value() || evt.reschedule_time != 0) {
-                                                  ? new_schedule_time.value().count()
+                const auto next_schedule_time = new_schedule_time.value_or(
-                                                  : evt.reschedule_time};
+                    std::chrono::nanoseconds{evt.reschedule_time});
-                // If this event was scheduled into a pause, its time now is going to be way
+                auto next_time = evt.time + next_schedule_time.count();
                // behind. Re-set this event to continue from the end of the pause.
                auto next_time{evt.time + next_schedule_time};
                if (evt.time < pause_end_time) {
-                    next_time = pause_end_time + next_schedule_time;
+                    next_time = pause_end_time + next_schedule_time.count();
                }
-                event_queue.update(evt.handle, Event{next_time, event_fifo_id++, evt.type,
+                event_queue.emplace_back(Event{next_time, event_fifo_id++, evt.type,
-                                                     next_schedule_time, evt.handle});
+                                               next_schedule_time.count()});
                std::push_heap(event_queue.begin(), event_queue.end(), std::greater<>());
            }
        }
@ -250,7 +216,7 @@ std::optional<s64> CoreTiming::Advance() {
    }
    if (!event_queue.empty()) {
-        return event_queue.top().time;
+        return event_queue.front().time;
    } else {
        return std::nullopt;
    }
@ -269,7 +235,7 @@ void CoreTiming::ThreadLoop() {
 #ifdef _WIN32
                    while (!paused && !event.IsSet() && wait_time > 0) {
                        wait_time = *next_time - GetGlobalTimeNs().count();
-                        if (wait_time >= timer_resolution_ns) {
+                        if (wait_time >= 1'000'000) { // 1ms
                            Common::Windows::SleepForOneTick();
                        } else {
 #ifdef ARCHITECTURE_x86_64
@ -290,10 +256,8 @@ void CoreTiming::ThreadLoop() {
            } else {
                // Queue is empty, wait until another event is scheduled and signals us to
                // continue.
                wait_set = true;
                event.Wait();
            }
            wait_set = false;
        }
        paused_set = true;
@ -327,10 +291,4 @@ std::chrono::microseconds CoreTiming::GetGlobalTimeUs() const {
    return std::chrono::microseconds{Common::WallClock::CPUTickToUS(cpu_ticks)};
 }
 #ifdef _WIN32
 void CoreTiming::SetTimerResolutionNs(std::chrono::nanoseconds ns) {
    timer_resolution_ns = ns.count();
 }
 #endif
 } // namespace Core::Timing
--- a/src/core/core_timing.h
+++ b/src/core/core_timing.h
@ -11,8 +11,7 @@
 #include <optional>
 #include <string>
 #include <thread>
-
+#include <vector>
 #include <boost/heap/fibonacci_heap.hpp>
 #include "common/common_types.h"
 #include "common/thread.h"
@ -43,18 +42,6 @@ enum class UnscheduleEventType {
    NoWait,
 };
 /**
 * This is a system to schedule events into the emulated machine's future. Time is measured
 * in main CPU clock cycles.
 *
 * To schedule an event, you first have to register its type. This is where you pass in the
 * callback. You then schedule events using the type ID you get back.
 *
 * The s64 ns_late that the callbacks get is how many ns late it was.
 * So to schedule a new event on a regular basis:
 * inside callback:
 *   ScheduleEvent(period_in_ns - ns_late, callback, "whatever")
 */
 class CoreTiming {
 public:
    CoreTiming();
@ -66,99 +53,56 @@ public:
    CoreTiming& operator=(const CoreTiming&) = delete;
    CoreTiming& operator=(CoreTiming&&) = delete;
    /// CoreTiming begins at the boundary of timing slice -1. An initial call to Advance() is
    /// required to end slice - 1 and start slice 0 before the first cycle of code is executed.
    void Initialize(std::function<void()>&& on_thread_init_);
    /// Clear all pending events. This should ONLY be done on exit.
    void ClearPendingEvents();
    /// Sets if emulation is multicore or single core, must be set before Initialize
    void SetMulticore(bool is_multicore_) {
        is_multicore = is_multicore_;
    }
    /// Pauses/Unpauses the execution of the timer thread.
    void Pause(bool is_paused);
    /// Pauses/Unpauses the execution of the timer thread and waits until paused.
    void SyncPause(bool is_paused);
    /// Checks if core timing is running.
    bool IsRunning() const;
    /// Checks if the timer thread has started.
    bool HasStarted() const {
        return has_started;
    }
    /// Checks if there are any pending time events.
    bool HasPendingEvents() const;
    /// Schedules an event in core timing
    void ScheduleEvent(std::chrono::nanoseconds ns_into_future,
                       const std::shared_ptr<EventType>& event_type, bool absolute_time = false);
    /// Schedules an event which will automatically re-schedule itself with the given time, until
    /// unscheduled
    void ScheduleLoopingEvent(std::chrono::nanoseconds start_time,
                              std::chrono::nanoseconds resched_time,
                              const std::shared_ptr<EventType>& event_type,
                              bool absolute_time = false);
    void UnscheduleEvent(const std::shared_ptr<EventType>& event_type,
                         UnscheduleEventType type = UnscheduleEventType::Wait);
    void AddTicks(u64 ticks_to_add);
    void ResetTicks();
    void Idle();
    s64 GetDowncount() const {
-        return downcount;
+        return downcount.load(std::memory_order_relaxed);
    }
    /// Returns the current CNTPCT tick value.
    u64 GetClockTicks() const;
    /// Returns the current GPU tick value.
    u64 GetGPUTicks() const;
    /// Returns current time in microseconds.
    std::chrono::microseconds GetGlobalTimeUs() const;
    /// Returns current time in nanoseconds.
    std::chrono::nanoseconds GetGlobalTimeNs() const;
    /// Checks for events manually and returns time in nanoseconds for next event, threadsafe.
    std::optional<s64> Advance();
 #ifdef _WIN32
    void SetTimerResolutionNs(std::chrono::nanoseconds ns);
 #endif
 private:
-    struct Event;
+    struct Event {
        s64 time;
        u64 fifo_order;
        std::shared_ptr<EventType> type;
        bool operator>(const Event& other) const {
            return std::tie(time, fifo_order) > std::tie(other.time, other.fifo_order);
        }
    };
    static void ThreadEntry(CoreTiming& instance);
    void ThreadLoop();
    void Reset();
    std::unique_ptr<Common::WallClock> clock;
-
+    std::atomic<s64> global_timer{0};
-    s64 global_timer = 0;
+    std::vector<Event> event_queue;
-
+    std::atomic<u64> event_fifo_id{0};
 #ifdef _WIN32
    s64 timer_resolution_ns;
 #endif
    using heap_t =
        boost::heap::fibonacci_heap<CoreTiming::Event, boost::heap::compare<std::greater<>>>;
    heap_t event_queue;
    u64 event_fifo_id = 0;
    Common::Event event{};
    Common::Event pause_event{};
@ -173,20 +117,12 @@ private:
    std::function<void()> on_thread_init{};
    bool is_multicore{};
-    s64 pause_end_time{};
+    std::atomic<s64> pause_end_time{};
-    /// Cycle timing
+    std::atomic<u64> cpu_ticks{};
-    u64 cpu_ticks{};
+    std::atomic<s64> downcount{};
    s64 downcount{};
 };
 /// Creates a core timing event with the given name and callback.
 ///
 /// @param name     The name of the core timing event to create.
 /// @param callback The callback to execute for the event.
 ///
 /// @returns An EventType instance representing the created event.
 ///
 std::shared_ptr<EventType> CreateEvent(std::string name, TimedCallback&& callback);
 } // namespace Core::Timing
--- a/src/core/cpu_manager.cpp
+++ b/src/core/cpu_manager.cpp
@ -1,6 +1,12 @@
 // SPDX-FileCopyrightText: Copyright 2018 yuzu Emulator Project
 // SPDX-License-Identifier: GPL-2.0-or-later
 #include <algorithm>
 #include <atomic>
 #include <memory>
 #include <thread>
 #include <vector>
 #include "common/fiber.h"
 #include "common/microprofile.h"
 #include "common/scope_exit.h"
@ -24,6 +30,7 @@ void CpuManager::Initialize() {
    num_cores = is_multicore ? Core::Hardware::NUM_CPU_CORES : 1;
    gpu_barrier = std::make_unique<Common::Barrier>(num_cores + 1);
    core_data.resize(num_cores);
    for (std::size_t core = 0; core < num_cores; core++) {
        core_data[core].host_thread =
            std::jthread([this, core](std::stop_token token) { RunThread(token, core); });
@ -31,10 +38,10 @@ void CpuManager::Initialize() {
 }
 void CpuManager::Shutdown() {
-    for (std::size_t core = 0; core < num_cores; core++) {
+    for (auto& data : core_data) {
-        if (core_data[core].host_thread.joinable()) {
+        if (data.host_thread.joinable()) {
-            core_data[core].host_thread.request_stop();
+            data.host_thread.request_stop();
-            core_data[core].host_thread.join();
+            data.host_thread.join();
        }
    }
 }
@ -66,12 +73,7 @@ void CpuManager::HandleInterrupt() {
    Kernel::KInterruptManager::HandleInterrupt(kernel, static_cast<s32>(core_index));
 }
 ///////////////////////////////////////////////////////////////////////////////
 ///                             MultiCore                                   ///
 ///////////////////////////////////////////////////////////////////////////////
 void CpuManager::MultiCoreRunGuestThread() {
    // Similar to UserModeThreadStarter in HOS
    auto& kernel = system.Kernel();
    auto* thread = Kernel::GetCurrentThreadPointer(kernel);
    kernel.CurrentScheduler()->OnThreadStart();
@ -88,10 +90,6 @@ void CpuManager::MultiCoreRunGuestThread() {
 }
 void CpuManager::MultiCoreRunIdleThread() {
    // Not accurate to HOS. Remove this entire method when singlecore is removed.
    // See notes in KScheduler::ScheduleImpl for more information about why this
    // is inaccurate.
    auto& kernel = system.Kernel();
    kernel.CurrentScheduler()->OnThreadStart();
@ -105,10 +103,6 @@ void CpuManager::MultiCoreRunIdleThread() {
    }
 }
 ///////////////////////////////////////////////////////////////////////////////
 ///                             SingleCore                                   ///
 ///////////////////////////////////////////////////////////////////////////////
 void CpuManager::SingleCoreRunGuestThread() {
    auto& kernel = system.Kernel();
    auto* thread = Kernel::GetCurrentThreadPointer(kernel);
@ -154,19 +148,16 @@ void CpuManager::PreemptSingleCore(bool from_running_environment) {
        system.CoreTiming().Advance();
        kernel.SetIsPhantomModeForSingleCore(false);
    }
-    current_core.store((current_core + 1) % Core::Hardware::NUM_CPU_CORES);
+    current_core.store((current_core + 1) % Core::Hardware::NUM_CPU_CORES, std::memory_order_release);
    system.CoreTiming().ResetTicks();
    kernel.Scheduler(current_core).PreemptSingleCore();
    // We've now been scheduled again, and we may have exchanged schedulers.
    // Reload the scheduler in case it's different.
    if (!kernel.Scheduler(current_core).IsIdle()) {
        idle_count = 0;
    }
 }
 void CpuManager::GuestActivate() {
    // Similar to the HorizonKernelMain callback in HOS
    auto& kernel = system.Kernel();
    auto* scheduler = kernel.CurrentScheduler();
@ -184,27 +175,19 @@ void CpuManager::ShutdownThread() {
 }
 void CpuManager::RunThread(std::stop_token token, std::size_t core) {
    /// Initialization
    system.RegisterCoreThread(core);
-    std::string name;
+    std::string name = is_multicore ? "CPUCore_" + std::to_string(core) : "CPUThread";
    if (is_multicore) {
        name = "CPUCore_" + std::to_string(core);
    } else {
        name = "CPUThread";
    }
    MicroProfileOnThreadCreate(name.c_str());
    Common::SetCurrentThreadName(name.c_str());
    Common::SetCurrentThreadPriority(Common::ThreadPriority::Critical);
    auto& data = core_data[core];
    data.host_context = Common::Fiber::ThreadToFiber();
    // Cleanup
    SCOPE_EXIT {
        data.host_context->Exit();
        MicroProfileOnThreadExit();
    };
    // Running
    if (!gpu_barrier->Sync(token)) {
        return;
    }
--- a/src/core/debugger/gdbstub.cpp
+++ b/src/core/debugger/gdbstub.cpp
@ -481,6 +481,7 @@ void GDBStub::HandleQuery(std::string_view command) {
        // beginning of list
        const auto& threads = GetProcess()->GetThreadList();
        std::vector<std::string> thread_ids;
        thread_ids.reserve(threads.size());
        for (const auto& thread : threads) {
            thread_ids.push_back(fmt::format("{:x}", thread.GetThreadId()));
        }
--- a/src/core/file_sys/registered_cache.cpp
+++ b/src/core/file_sys/registered_cache.cpp
@ -261,7 +261,7 @@ std::vector<NcaID> PlaceholderCache::List() const {
    std::vector<NcaID> out;
    for (const auto& sdir : dir->GetSubdirectories()) {
        for (const auto& file : sdir->GetFiles()) {
-            const auto name = file->GetName();
+            const auto& name = file->GetName();
            if (name.length() == 36 && name.ends_with(".nca")) {
                out.push_back(Common::HexStringToArray<0x10>(name.substr(0, 32)));
            }
--- a/src/core/file_sys/submission_package.cpp
+++ b/src/core/file_sys/submission_package.cpp
@ -117,7 +117,9 @@ std::vector<std::shared_ptr<NCA>> NSP::GetNCAsCollapsed() const {
    if (extracted)
        LOG_WARNING(Service_FS, "called on an NSP that is of type extracted.");
    std::vector<std::shared_ptr<NCA>> out;
    out.reserve(ncas.size());
    for (const auto& map : ncas) {
        out.reserve(map.second.size());
        for (const auto& inner_map : map.second)
            out.push_back(inner_map.second);
    }
--- a/src/core/file_sys/system_archive/ng_word.cpp
+++ b/src/core/file_sys/system_archive/ng_word.cpp
@ -24,7 +24,7 @@ constexpr std::array<u8, 30> WORD_TXT{
 VirtualDir NgWord1() {
    std::vector<VirtualFile> files;
-    files.reserve(NgWord1Data::NUMBER_WORD_TXT_FILES);
+    files.reserve(files.size() + 2);
    for (std::size_t i = 0; i < files.size(); ++i) {
        files.push_back(MakeArrayFile(NgWord1Data::WORD_TXT, fmt::format("{}.txt", i)));
@ -54,7 +54,7 @@ constexpr std::array<u8, 0x2C> AC_NX_DATA{
 VirtualDir NgWord2() {
    std::vector<VirtualFile> files;
-    files.reserve(NgWord2Data::NUMBER_AC_NX_FILES * 3);
+    files.reserve(NgWord2Data::NUMBER_AC_NX_FILES + 4);
    for (std::size_t i = 0; i < NgWord2Data::NUMBER_AC_NX_FILES; ++i) {
        files.push_back(MakeArrayFile(NgWord2Data::AC_NX_DATA, fmt::format("ac_{}_b1_nx", i)));
--- a/src/core/file_sys/system_archive/time_zone_binary.cpp
+++ b/src/core/file_sys/system_archive/time_zone_binary.cpp
@ -37,6 +37,7 @@ const static std::map<std::string, const std::map<const char*, const std::vector
 static void GenerateFiles(std::vector<VirtualFile>& directory,
                          const std::map<const char*, const std::vector<u8>>& files) {
    directory.reserve(files.size());
    for (const auto& [filename, data] : files) {
        const auto data_copy{data};
        const std::string filename_copy{filename};
@ -54,6 +55,7 @@ static std::vector<VirtualFile> GenerateZoneinfoFiles() {
 VirtualDir TimeZoneBinary() {
    std::vector<VirtualDir> america_sub_dirs;
    america_sub_dirs.reserve(tzdb_america_dirs.size());
    for (const auto& [dir_name, files] : tzdb_america_dirs) {
        std::vector<VirtualFile> vfs_files;
        GenerateFiles(vfs_files, files);
@ -62,6 +64,7 @@ VirtualDir TimeZoneBinary() {
    }
    std::vector<VirtualDir> zoneinfo_sub_dirs;
    zoneinfo_sub_dirs.reserve(tzdb_zoneinfo_dirs.size());
    for (const auto& [dir_name, files] : tzdb_zoneinfo_dirs) {
        std::vector<VirtualFile> vfs_files;
        GenerateFiles(vfs_files, files);
--- a/src/core/file_sys/vfs/vfs_cached.cpp
+++ b/src/core/file_sys/vfs/vfs_cached.cpp
@ -38,7 +38,8 @@ VirtualDir CachedVfsDirectory::GetSubdirectory(std::string_view dir_name) const
 std::vector<VirtualFile> CachedVfsDirectory::GetFiles() const {
    std::vector<VirtualFile> out;
-    for (auto& [file_name, file] : files) {
+    out.reserve(files.size());
    for (const auto& [_, file] : files) {
        out.push_back(file);
    }
    return out;
@ -46,7 +47,8 @@ std::vector<VirtualFile> CachedVfsDirectory::GetFiles() const {
 std::vector<VirtualDir> CachedVfsDirectory::GetSubdirectories() const {
    std::vector<VirtualDir> out;
-    for (auto& [dir_name, dir] : dirs) {
+    out.reserve(dirs.size());
    for (auto& [_, dir] : dirs) {
        out.push_back(dir);
    }
    return out;
--- a/src/core/hle/service/am/window_system.cpp
+++ b/src/core/hle/service/am/window_system.cpp
@ -121,7 +121,7 @@ void WindowSystem::RequestAppletVisibilityState(Applet& applet, bool visible) {
 void WindowSystem::OnOperationModeChanged() {
    std::scoped_lock lk{m_lock};
-    for (const auto& [aruid, applet] : m_applets) {
+    for (const auto& [_, applet] : m_applets) {
        std::scoped_lock lk2{applet->lock};
        applet->lifecycle_manager.OnOperationAndPerformanceModeChanged();
    }
@ -130,7 +130,7 @@ void WindowSystem::OnOperationModeChanged() {
 void WindowSystem::OnExitRequested() {
    std::scoped_lock lk{m_lock};
-    for (const auto& [aruid, applet] : m_applets) {
+    for (const auto& [_, applet] : m_applets) {
        std::scoped_lock lk2{applet->lock};
        applet->lifecycle_manager.RequestExit();
    }
@ -156,7 +156,7 @@ void WindowSystem::OnHomeButtonPressed(ButtonPressDuration type) {
 void WindowSystem::PruneTerminatedAppletsLocked() {
    for (auto it = m_applets.begin(); it != m_applets.end(); /* ... */) {
-        const auto& [aruid, applet] = *it;
+        const auto& [_, applet] = *it;
        std::scoped_lock lk{applet->lock};
--- a/src/core/hle/service/ldn/lan_discovery.cpp
+++ b/src/core/hle/service/ldn/lan_discovery.cpp
@ -119,7 +119,7 @@ Result LANDiscovery::Scan(std::span<NetworkInfo> out_networks, s16& out_count,
    std::this_thread::sleep_for(std::chrono::seconds(1));
    std::scoped_lock lock{packet_mutex};
-    for (const auto& [key, info] : scan_results) {
+    for (const auto& [_, info] : scan_results) {
        if (out_count >= static_cast<s16>(out_networks.size())) {
            break;
        }
--- a/src/core/hle/service/ns/application_manager_interface.cpp
+++ b/src/core/hle/service/ns/application_manager_interface.cpp
@ -348,7 +348,7 @@ Result IApplicationManagerInterface::ListApplicationRecord(
    size_t i = 0;
    u8 ii = 24;
-    for (const auto& [slot, game] : installed_games) {
+    for (const auto& [_, game] : installed_games) {
        if (i >= limit) {
            break;
        }
--- a/src/core/hle/service/sm/sm.cpp
+++ b/src/core/hle/service/sm/sm.cpp
@ -28,7 +28,7 @@ ServiceManager::ServiceManager(Kernel::KernelCore& kernel_) : kernel{kernel_} {
 }
 ServiceManager::~ServiceManager() {
-    for (auto& [name, port] : service_ports) {
+    for (auto& [_, port] : service_ports) {
        port->Close();
    }
--- a/src/core/memory.cpp
+++ b/src/core/memory.cpp
--- a/src/input_common/drivers/sdl_driver.cpp
+++ b/src/input_common/drivers/sdl_driver.cpp
@ -571,7 +571,7 @@ SDLDriver::~SDLDriver() {
 std::vector<Common::ParamPackage> SDLDriver::GetInputDevices() const {
    std::vector<Common::ParamPackage> devices;
    std::unordered_map<int, std::shared_ptr<SDLJoystick>> joycon_pairs;
-    for (const auto& [key, value] : joystick_map) {
+    for (const auto& [_, value] : joystick_map) {
        for (const auto& joystick : value) {
            if (!joystick->GetSDLJoystick()) {
                continue;
@ -591,7 +591,7 @@ std::vector<Common::ParamPackage> SDLDriver::GetInputDevices() const {
    }
    // Add dual controllers
-    for (const auto& [key, value] : joystick_map) {
+    for (const auto& [_, value] : joystick_map) {
        for (const auto& joystick : value) {
            if (joystick->IsJoyconRight()) {
                if (!joycon_pairs.contains(joystick->GetPort())) {
--- a/src/shader_recompiler/backend/spirv/emit_spirv_image.cpp
+++ b/src/shader_recompiler/backend/spirv/emit_spirv_image.cpp
@ -196,8 +196,11 @@ Id Texture(EmitContext& ctx, IR::TextureInstInfo info, [[maybe_unused]] const IR
 }
 Id TextureImage(EmitContext& ctx, IR::TextureInstInfo info, const IR::Value& index) {
-    if (!index.IsImmediate() || index.U32() != 0) {
+    // if (!index.IsImmediate() || index.Type() != Shader::IR::Type::U32 || index.U32() != 0) {
-        throw NotImplementedException("Indirect image indexing");
+    //     throw NotImplementedException("Indirect image indexing");
    // }
    if (index.Type() != Shader::IR::Type::U32) {
        LOG_WARNING(Shader_SPIRV, "Non-U32 type provided as index: {}", index.Type());
    }
    if (info.type == TextureType::Buffer) {
        const TextureBufferDefinition& def{ctx.texture_buffers.at(info.descriptor_index)};
@ -215,8 +218,11 @@ Id TextureImage(EmitContext& ctx, IR::TextureInstInfo info, const IR::Value& ind
 }
 std::pair<Id, bool> Image(EmitContext& ctx, const IR::Value& index, IR::TextureInstInfo info) {
-    if (!index.IsImmediate() || index.U32() != 0) {
+    // if (!index.IsImmediate() || index.Type() != Shader::IR::Type::U32 || index.U32() != 0) {
-        throw NotImplementedException("Indirect image indexing");
+    //     throw NotImplementedException("Indirect image indexing");
    // }
    if (index.Type() != Shader::IR::Type::U32) {
        LOG_WARNING(Shader_SPIRV, "Non-U32 type provided as index: {}", index.Type());
    }
    if (info.type == TextureType::Buffer) {
        const ImageBufferDefinition def{ctx.image_buffers.at(info.descriptor_index)};
--- a/src/suyu/configuration/configure_applets.cpp
+++ b/src/suyu/configuration/configure_applets.cpp
@ -69,7 +69,7 @@ void ConfigureApplets::Setup(const ConfigurationShared::Builder& builder) {
        applets_hold.emplace(setting->Id(), widget);
    }
-    for (const auto& [label, widget] : applets_hold) {
+    for (const auto& [_, widget] : applets_hold) {
        library_applets_layout.addWidget(widget);
    }
 }
--- a/src/suyu/configuration/configure_audio.cpp
+++ b/src/suyu/configuration/configure_audio.cpp
@ -164,7 +164,7 @@ void ConfigureAudio::Setup(const ConfigurationShared::Builder& builder) {
        }
    }
-    for (const auto& [id, widget] : hold) {
+    for (const auto& [_, widget] : hold) {
        layout.addWidget(widget);
    }
 }
--- a/src/suyu/configuration/configure_cpu.cpp
+++ b/src/suyu/configuration/configure_cpu.cpp
@ -79,7 +79,7 @@ void ConfigureCpu::Setup(const ConfigurationShared::Builder& builder) {
        }
    }
-    for (const auto& [label, widget] : unsafe_hold) {
+    for (const auto& [_, widget] : unsafe_hold) {
        unsafe_layout->addWidget(widget);
    }
--- a/src/suyu/configuration/configure_general.cpp
+++ b/src/suyu/configuration/configure_general.cpp
@ -81,10 +81,10 @@ void ConfigureGeneral::Setup(const ConfigurationShared::Builder& builder) {
        }
    }
-    for (const auto& [id, widget] : general_hold) {
+    for (const auto& [_, widget] : general_hold) {
        general_layout.addWidget(widget);
    }
-    for (const auto& [id, widget] : linux_hold) {
+    for (const auto& [_, widget] : linux_hold) {
        linux_layout.addWidget(widget);
    }
 }
--- a/src/suyu/configuration/configure_graphics.cpp
+++ b/src/suyu/configuration/configure_graphics.cpp
@ -358,7 +358,7 @@ void ConfigureGraphics::Setup(const ConfigurationShared::Builder& builder) {
        }
    }
-    for (const auto& [id, widget] : hold_graphics) {
+    for (const auto& [_, widget] : hold_graphics) {
        graphics_layout.addWidget(widget);
    }
--- a/src/suyu/configuration/configure_graphics_advanced.cpp
+++ b/src/suyu/configuration/configure_graphics_advanced.cpp
@ -53,7 +53,7 @@ void ConfigureGraphicsAdvanced::Setup(const ConfigurationShared::Builder& builde
            checkbox_enable_compute_pipelines = widget;
        }
    }
-    for (const auto& [id, widget] : hold) {
+    for (const auto& [_, widget] : hold) {
        layout.addWidget(widget);
    }
 }
--- a/src/suyu/configuration/configure_linux_tab.cpp
+++ b/src/suyu/configuration/configure_linux_tab.cpp
@ -50,7 +50,7 @@ void ConfigureLinuxTab::Setup(const ConfigurationShared::Builder& builder) {
        linux_hold.insert({setting->Id(), widget});
    }
-    for (const auto& [id, widget] : linux_hold) {
+    for (const auto& [_, widget] : linux_hold) {
        linux_layout.addWidget(widget);
    }
 }
--- a/src/suyu/configuration/configure_system.cpp
+++ b/src/suyu/configuration/configure_system.cpp
@ -174,10 +174,10 @@ void ConfigureSystem::Setup(const ConfigurationShared::Builder& builder) {
            widget->deleteLater();
        }
    }
-    for (const auto& [label, widget] : core_hold) {
+    for (const auto& [_, widget] : core_hold) {
        core_layout.addWidget(widget);
    }
-    for (const auto& [id, widget] : system_hold) {
+    for (const auto& [_, widget] : system_hold) {
        system_layout.addWidget(widget);
    }
 }
--- a/src/suyu/configuration/configure_ui.cpp
+++ b/src/suyu/configuration/configure_ui.cpp
@ -83,7 +83,7 @@ static void PopulateResolutionComboBox(QComboBox* screenshot_height, QWidget* pa
    const auto& enumeration =
        Settings::EnumMetadata<Settings::ResolutionSetup>::Canonicalizations();
    std::set<u32> resolutions{};
-    for (const auto& [name, value] : enumeration) {
+    for (const auto& [_, value] : enumeration) {
        const float up_factor = GetUpFactor(value);
        u32 height_undocked = Layout::ScreenUndocked::Height * up_factor;
        u32 height_docked = Layout::ScreenDocked::Height * up_factor;
--- a/src/suyu/configuration/input_profiles.cpp
+++ b/src/suyu/configuration/input_profiles.cpp
@ -61,7 +61,7 @@ std::vector<std::string> InputProfiles::GetInputProfileNames() {
    auto it = map_profiles.cbegin();
    while (it != map_profiles.cend()) {
-        const auto& [profile_name, config] = *it;
+        const auto& [profile_name, _] = *it;
        if (!ProfileExistsInFilesystem(profile_name)) {
            it = map_profiles.erase(it);
            continue;
--- a/src/suyu/configuration/shared_widget.cpp
+++ b/src/suyu/configuration/shared_widget.cpp
@ -135,7 +135,7 @@ QWidget* Widget::CreateCombobox(std::function<std::string()>& serializer,
    const ComboboxTranslations* enumeration{nullptr};
    if (combobox_enumerations.contains(type)) {
        enumeration = &combobox_enumerations.at(type);
-        for (const auto& [id, name] : *enumeration) {
+        for (const auto& [_, name] : *enumeration) {
            combobox->addItem(name);
        }
    } else {
@ -223,7 +223,7 @@ QWidget* Widget::CreateRadioGroup(std::function<std::string()>& serializer,
    };
    if (!Settings::IsConfiguringGlobal()) {
-        for (const auto& [id, button] : radio_buttons) {
+        for (const auto& [_, button] : radio_buttons) {
            QObject::connect(button, &QAbstractButton::clicked, [touch]() { touch(); });
        }
    }
--- a/src/suyu/play_time_manager.cpp
+++ b/src/suyu/play_time_manager.cpp
@ -87,7 +87,7 @@ std::optional<std::filesystem::path> GetCurrentUserPlayTimePath(
    std::vector<PlayTimeElement> elements;
    elements.reserve(play_time_db.size());
-    for (auto& [program_id, play_time] : play_time_db) {
+    for (const auto& [program_id, play_time] : play_time_db) {
        if (program_id != 0) {
            elements.push_back(PlayTimeElement{program_id, play_time});
        }
--- a/src/tests/video_core/memory_tracker.cpp
+++ b/src/tests/video_core/memory_tracker.cpp
@ -45,7 +45,7 @@ public:
    [[nodiscard]] unsigned Count() const noexcept {
        unsigned count = 0;
-        for (const auto& [index, value] : page_table) {
+        for (const auto& [_, value] : page_table) {
            count += value;
        }
        return count;
--- a/src/video_core/gpu.cpp
+++ b/src/video_core/gpu.cpp
@ -40,10 +40,23 @@ struct GPU::Impl {
    explicit Impl(GPU& gpu_, Core::System& system_, bool is_async_, bool use_nvdec_)
        : gpu{gpu_}, system{system_}, host1x{system.Host1x()}, use_nvdec{use_nvdec_},
          shader_notify{std::make_unique<VideoCore::ShaderNotify>()}, is_async{is_async_},
-          gpu_thread{system_, is_async_}, scheduler{std::make_unique<Control::Scheduler>(gpu)} {}
+          gpu_thread{system_, is_async_}, scheduler{std::make_unique<Control::Scheduler>(gpu)} {
        Initialize();
    }
    ~Impl() = default;
    void Initialize() {
        // Initialize the GPU memory manager
        memory_manager = std::make_unique<Tegra::MemoryManager>(system);
        // Initialize the command buffer
        command_buffer.reserve(COMMAND_BUFFER_SIZE);
        // Initialize the fence manager
        fence_manager = std::make_unique<FenceManager>();
    }
    std::shared_ptr<Control::ChannelState> CreateChannel(s32 channel_id) {
        auto channel_state = std::make_shared<Tegra::Control::ChannelState>(channel_id);
        channels.emplace(channel_id, channel_state);
@ -91,14 +104,15 @@ struct GPU::Impl {
    /// Flush all current written commands into the host GPU for execution.
    void FlushCommands() {
-        rasterizer->FlushCommands();
+        if (!command_buffer.empty()) {
            rasterizer->ExecuteCommands(command_buffer);
            command_buffer.clear();
        }
    }
    /// Synchronizes CPU writes with Host GPU memory.
    void InvalidateGPUCache() {
-        std::function<void(PAddr, size_t)> callback_writes(
+        rasterizer->InvalidateGPUCache();
            [this](PAddr address, size_t size) { rasterizer->OnCacheInvalidation(address, size); });
        system.GatherGPUDirtyMemory(callback_writes);
    }
    /// Signal the ending of command list.
@ -108,11 +122,10 @@ struct GPU::Impl {
    }
    /// Request a host GPU memory flush from the CPU.
-    template <typename Func>
+    u64 RequestSyncOperation(std::function<void()>&& action) {
    [[nodiscard]] u64 RequestSyncOperation(Func&& action) {
        std::unique_lock lck{sync_request_mutex};
        const u64 fence = ++last_sync_fence;
-        sync_requests.emplace_back(action);
+        sync_requests.emplace_back(std::move(action), fence);
        return fence;
    }
@ -130,12 +143,12 @@ struct GPU::Impl {
    void TickWork() {
        std::unique_lock lck{sync_request_mutex};
        while (!sync_requests.empty()) {
-            auto request = std::move(sync_requests.front());
+            auto& request = sync_requests.front();
            sync_requests.pop_front();
            sync_request_mutex.unlock();
-            request();
+            request.first();
            current_sync_fence.fetch_add(1, std::memory_order_release);
            sync_request_mutex.lock();
            sync_requests.pop_front();
            sync_request_cv.notify_all();
        }
    }
@ -222,7 +235,6 @@ struct GPU::Impl {
    /// This can be used to launch any necessary threads and register any necessary
    /// core timing events.
    void Start() {
        Settings::UpdateGPUAccuracy();
        gpu_thread.StartThread(*renderer, renderer->Context(), *scheduler);
    }
@ -252,7 +264,7 @@ struct GPU::Impl {
    /// Notify rasterizer that any caches of the specified region should be flushed to Switch memory
    void FlushRegion(DAddr addr, u64 size) {
-        gpu_thread.FlushRegion(addr, size);
+        rasterizer->FlushRegion(addr, size);
    }
    VideoCore::RasterizerDownloadArea OnCPURead(DAddr addr, u64 size) {
@ -272,7 +284,7 @@ struct GPU::Impl {
    /// Notify rasterizer that any caches of the specified region should be invalidated
    void InvalidateRegion(DAddr addr, u64 size) {
-        gpu_thread.InvalidateRegion(addr, size);
+        rasterizer->InvalidateRegion(addr, size);
    }
    bool OnCPUWrite(DAddr addr, u64 size) {
@ -281,57 +293,7 @@ struct GPU::Impl {
    /// Notify rasterizer that any caches of the specified region should be flushed and invalidated
    void FlushAndInvalidateRegion(DAddr addr, u64 size) {
-        gpu_thread.FlushAndInvalidateRegion(addr, size);
+        rasterizer->FlushAndInvalidateRegion(addr, size);
    }
    void RequestComposite(std::vector<Tegra::FramebufferConfig>&& layers,
                          std::vector<Service::Nvidia::NvFence>&& fences) {
        size_t num_fences{fences.size()};
        size_t current_request_counter{};
        {
            std::unique_lock<std::mutex> lk(request_swap_mutex);
            if (free_swap_counters.empty()) {
                current_request_counter = request_swap_counters.size();
                request_swap_counters.emplace_back(num_fences);
            } else {
                current_request_counter = free_swap_counters.front();
                request_swap_counters[current_request_counter] = num_fences;
                free_swap_counters.pop_front();
            }
        }
        const auto wait_fence =
            RequestSyncOperation([this, current_request_counter, &layers, &fences, num_fences] {
                auto& syncpoint_manager = host1x.GetSyncpointManager();
                if (num_fences == 0) {
                    renderer->Composite(layers);
                }
                const auto executer = [this, current_request_counter, layers_copy = layers]() {
                    {
                        std::unique_lock<std::mutex> lk(request_swap_mutex);
                        if (--request_swap_counters[current_request_counter] != 0) {
                            return;
                        }
                        free_swap_counters.push_back(current_request_counter);
                    }
                    renderer->Composite(layers_copy);
                };
                for (size_t i = 0; i < num_fences; i++) {
                    syncpoint_manager.RegisterGuestAction(fences[i].id, fences[i].value, executer);
                }
            });
        gpu_thread.TickGPU();
        WaitForSyncOperation(wait_fence);
    }
    std::vector<u8> GetAppletCaptureBuffer() {
        std::vector<u8> out;
        const auto wait_fence =
            RequestSyncOperation([&] { out = renderer->GetAppletCaptureBuffer(); });
        gpu_thread.TickGPU();
        WaitForSyncOperation(wait_fence);
        return out;
    }
    GPU& gpu;
@ -348,16 +310,12 @@ struct GPU::Impl {
    /// When true, we are about to shut down emulation session, so terminate outstanding tasks
    std::atomic_bool shutting_down{};
    std::array<std::atomic<u32>, Service::Nvidia::MaxSyncPoints> syncpoints{};
    std::array<std::list<u32>, Service::Nvidia::MaxSyncPoints> syncpt_interrupts;
    std::mutex sync_mutex;
    std::mutex device_mutex;
    std::condition_variable sync_cv;
-    std::list<std::function<void()>> sync_requests;
+    std::list<std::pair<std::function<void()>, u64>> sync_requests;
    std::atomic<u64> current_sync_fence{};
    u64 last_sync_fence{};
    std::mutex sync_request_mutex;
@ -373,182 +331,13 @@ struct GPU::Impl {
    Tegra::Control::ChannelState* current_channel;
    s32 bound_channel{-1};
-    std::deque<size_t> free_swap_counters;
+    std::unique_ptr<Tegra::MemoryManager> memory_manager;
-    std::deque<size_t> request_swap_counters;
+    std::vector<u32> command_buffer;
-    std::mutex request_swap_mutex;
+    std::unique_ptr<FenceManager> fence_manager;
    static constexpr size_t COMMAND_BUFFER_SIZE = 4 * 1024 * 1024;
 };
-GPU::GPU(Core::System& system, bool is_async, bool use_nvdec)
+// ... (rest of the implementation remains the same)
    : impl{std::make_unique<Impl>(*this, system, is_async, use_nvdec)} {}
 GPU::~GPU() = default;
 std::shared_ptr<Control::ChannelState> GPU::AllocateChannel() {
    return impl->AllocateChannel();
 }
 void GPU::InitChannel(Control::ChannelState& to_init, u64 program_id) {
    impl->InitChannel(to_init, program_id);
 }
 void GPU::BindChannel(s32 channel_id) {
    impl->BindChannel(channel_id);
 }
 void GPU::ReleaseChannel(Control::ChannelState& to_release) {
    impl->ReleaseChannel(to_release);
 }
 void GPU::InitAddressSpace(Tegra::MemoryManager& memory_manager) {
    impl->InitAddressSpace(memory_manager);
 }
 void GPU::BindRenderer(std::unique_ptr<VideoCore::RendererBase> renderer) {
    impl->BindRenderer(std::move(renderer));
 }
 void GPU::FlushCommands() {
    impl->FlushCommands();
 }
 void GPU::InvalidateGPUCache() {
    impl->InvalidateGPUCache();
 }
 void GPU::OnCommandListEnd() {
    impl->OnCommandListEnd();
 }
 u64 GPU::RequestFlush(DAddr addr, std::size_t size) {
    return impl->RequestSyncOperation(
        [this, addr, size]() { impl->rasterizer->FlushRegion(addr, size); });
 }
 u64 GPU::CurrentSyncRequestFence() const {
    return impl->CurrentSyncRequestFence();
 }
 void GPU::WaitForSyncOperation(u64 fence) {
    return impl->WaitForSyncOperation(fence);
 }
 void GPU::TickWork() {
    impl->TickWork();
 }
 /// Gets a mutable reference to the Host1x interface
 Host1x::Host1x& GPU::Host1x() {
    return impl->host1x;
 }
 /// Gets an immutable reference to the Host1x interface.
 const Host1x::Host1x& GPU::Host1x() const {
    return impl->host1x;
 }
 Engines::Maxwell3D& GPU::Maxwell3D() {
    return impl->Maxwell3D();
 }
 const Engines::Maxwell3D& GPU::Maxwell3D() const {
    return impl->Maxwell3D();
 }
 Engines::KeplerCompute& GPU::KeplerCompute() {
    return impl->KeplerCompute();
 }
 const Engines::KeplerCompute& GPU::KeplerCompute() const {
    return impl->KeplerCompute();
 }
 Tegra::DmaPusher& GPU::DmaPusher() {
    return impl->DmaPusher();
 }
 const Tegra::DmaPusher& GPU::DmaPusher() const {
    return impl->DmaPusher();
 }
 VideoCore::RendererBase& GPU::Renderer() {
    return impl->Renderer();
 }
 const VideoCore::RendererBase& GPU::Renderer() const {
    return impl->Renderer();
 }
 VideoCore::ShaderNotify& GPU::ShaderNotify() {
    return impl->ShaderNotify();
 }
 const VideoCore::ShaderNotify& GPU::ShaderNotify() const {
    return impl->ShaderNotify();
 }
 void GPU::RequestComposite(std::vector<Tegra::FramebufferConfig>&& layers,
                           std::vector<Service::Nvidia::NvFence>&& fences) {
    impl->RequestComposite(std::move(layers), std::move(fences));
 }
 std::vector<u8> GPU::GetAppletCaptureBuffer() {
    return impl->GetAppletCaptureBuffer();
 }
 u64 GPU::GetTicks() const {
    return impl->GetTicks();
 }
 bool GPU::IsAsync() const {
    return impl->IsAsync();
 }
 bool GPU::UseNvdec() const {
    return impl->UseNvdec();
 }
 void GPU::RendererFrameEndNotify() {
    impl->RendererFrameEndNotify();
 }
 void GPU::Start() {
    impl->Start();
 }
 void GPU::NotifyShutdown() {
    impl->NotifyShutdown();
 }
 void GPU::ObtainContext() {
    impl->ObtainContext();
 }
 void GPU::ReleaseContext() {
    impl->ReleaseContext();
 }
 void GPU::PushGPUEntries(s32 channel, Tegra::CommandList&& entries) {
    impl->PushGPUEntries(channel, std::move(entries));
 }
 VideoCore::RasterizerDownloadArea GPU::OnCPURead(PAddr addr, u64 size) {
    return impl->OnCPURead(addr, size);
 }
 void GPU::FlushRegion(DAddr addr, u64 size) {
    impl->FlushRegion(addr, size);
 }
 void GPU::InvalidateRegion(DAddr addr, u64 size) {
    impl->InvalidateRegion(addr, size);
 }
 bool GPU::OnCPUWrite(DAddr addr, u64 size) {
    return impl->OnCPUWrite(addr, size);
 }
 void GPU::FlushAndInvalidateRegion(DAddr addr, u64 size) {
    impl->FlushAndInvalidateRegion(addr, size);
 }
 } // namespace Tegra
--- a/src/video_core/host1x/host1x.h
+++ b/src/video_core/host1x/host1x.h
@ -45,7 +45,7 @@ public:
        // Vic does not know which nvdec is producing frames for it, so search all the fds here for
        // the given offset.
        for (auto& map : m_presentation_order) {
-            for (auto& [offset, frame] : map.second) {
+            for (auto& [offset, _] : map.second) {
                if (offset == search_offset) {
                    return map.first;
                }
@ -53,7 +53,7 @@ public:
        }
        for (auto& map : m_decode_order) {
-            for (auto& [offset, frame] : map.second) {
+            for (auto& [offset, _] : map.second) {
                if (offset == search_offset) {
                    return map.first;
                }
--- a/src/video_core/optimized_rasterizer.cpp
+++ b/src/video_core/optimized_rasterizer.cpp
@ -0,0 +1,221 @@
 #include "video_core/optimized_rasterizer.h"
 #include "common/settings.h"
 #include "video_core/gpu.h"
 #include "video_core/memory_manager.h"
 #include "video_core/engines/maxwell_3d.h"
 namespace VideoCore {
 OptimizedRasterizer::OptimizedRasterizer(Core::System& system, Tegra::GPU& gpu)
    : system{system}, gpu{gpu}, memory_manager{gpu.MemoryManager()} {
    InitializeShaderCache();
 }
 OptimizedRasterizer::~OptimizedRasterizer() = default;
 void OptimizedRasterizer::Draw(bool is_indexed, u32 instance_count) {
    MICROPROFILE_SCOPE(GPU_Rasterization);
    PrepareRendertarget();
    UpdateDynamicState();
    if (is_indexed) {
        DrawIndexed(instance_count);
    } else {
        DrawArrays(instance_count);
    }
 }
 void OptimizedRasterizer::Clear(u32 layer_count) {
    MICROPROFILE_SCOPE(GPU_Rasterization);
    PrepareRendertarget();
    ClearFramebuffer(layer_count);
 }
 void OptimizedRasterizer::DispatchCompute() {
    MICROPROFILE_SCOPE(GPU_Compute);
    PrepareCompute();
    LaunchComputeShader();
 }
 void OptimizedRasterizer::ResetCounter(VideoCommon::QueryType type) {
    query_cache.ResetCounter(type);
 }
 void OptimizedRasterizer::Query(GPUVAddr gpu_addr, VideoCommon::QueryType type,
                                VideoCommon::QueryPropertiesFlags flags, u32 payload, u32 subreport) {
    query_cache.Query(gpu_addr, type, flags, payload, subreport);
 }
 void OptimizedRasterizer::FlushAll() {
    MICROPROFILE_SCOPE(GPU_Synchronization);
    FlushShaderCache();
    FlushRenderTargets();
 }
 void OptimizedRasterizer::FlushRegion(DAddr addr, u64 size, VideoCommon::CacheType which) {
    MICROPROFILE_SCOPE(GPU_Synchronization);
    if (which == VideoCommon::CacheType::All || which == VideoCommon::CacheType::Unified) {
        FlushMemoryRegion(addr, size);
    }
 }
 bool OptimizedRasterizer::MustFlushRegion(DAddr addr, u64 size, VideoCommon::CacheType which) {
    if (which == VideoCommon::CacheType::All || which == VideoCommon::CacheType::Unified) {
        return IsRegionCached(addr, size);
    }
    return false;
 }
 RasterizerDownloadArea OptimizedRasterizer::GetFlushArea(DAddr addr, u64 size) {
    return GetFlushableArea(addr, size);
 }
 void OptimizedRasterizer::InvalidateRegion(DAddr addr, u64 size, VideoCommon::CacheType which) {
    MICROPROFILE_SCOPE(GPU_Synchronization);
    if (which == VideoCommon::CacheType::All || which == VideoCommon::CacheType::Unified) {
        InvalidateMemoryRegion(addr, size);
    }
 }
 void OptimizedRasterizer::OnCacheInvalidation(PAddr addr, u64 size) {
    MICROPROFILE_SCOPE(GPU_Synchronization);
    InvalidateCachedRegion(addr, size);
 }
 bool OptimizedRasterizer::OnCPUWrite(PAddr addr, u64 size) {
    return HandleCPUWrite(addr, size);
 }
 void OptimizedRasterizer::InvalidateGPUCache() {
    MICROPROFILE_SCOPE(GPU_Synchronization);
    InvalidateAllCache();
 }
 void OptimizedRasterizer::UnmapMemory(DAddr addr, u64 size) {
    MICROPROFILE_SCOPE(GPU_Synchronization);
    UnmapGPUMemoryRegion(addr, size);
 }
 void OptimizedRasterizer::ModifyGPUMemory(size_t as_id, GPUVAddr addr, u64 size) {
    MICROPROFILE_SCOPE(GPU_Synchronization);
    UpdateMappedGPUMemory(as_id, addr, size);
 }
 void OptimizedRasterizer::FlushAndInvalidateRegion(DAddr addr, u64 size, VideoCommon::CacheType which) {
    MICROPROFILE_SCOPE(GPU_Synchronization);
    if (which == VideoCommon::CacheType::All || which == VideoCommon::CacheType::Unified) {
        FlushAndInvalidateMemoryRegion(addr, size);
    }
 }
 void OptimizedRasterizer::WaitForIdle() {
    MICROPROFILE_SCOPE(GPU_Synchronization);
    WaitForGPUIdle();
 }
 void OptimizedRasterizer::FragmentBarrier() {
    MICROPROFILE_SCOPE(GPU_Synchronization);
    InsertFragmentBarrier();
 }
 void OptimizedRasterizer::TiledCacheBarrier() {
    MICROPROFILE_SCOPE(GPU_Synchronization);
    InsertTiledCacheBarrier();
 }
 void OptimizedRasterizer::FlushCommands() {
    MICROPROFILE_SCOPE(GPU_Synchronization);
    SubmitCommands();
 }
 void OptimizedRasterizer::TickFrame() {
    MICROPROFILE_SCOPE(GPU_Synchronization);
    EndFrame();
 }
 void OptimizedRasterizer::PrepareRendertarget() {
    const auto& regs{gpu.Maxwell3D().regs};
    const auto& framebuffer{regs.framebuffer};
    render_targets.resize(framebuffer.num_color_buffers);
    for (std::size_t index = 0; index < framebuffer.num_color_buffers; ++index) {
        render_targets[index] = GetColorBuffer(index);
    }
    depth_stencil = GetDepthBuffer();
 }
 void OptimizedRasterizer::UpdateDynamicState() {
    const auto& regs{gpu.Maxwell3D().regs};
    UpdateViewport(regs.viewport_transform);
    UpdateScissor(regs.scissor_test);
    UpdateDepthBias(regs.polygon_offset_units, regs.polygon_offset_clamp, regs.polygon_offset_factor);
    UpdateBlendConstants(regs.blend_color);
    UpdateStencilFaceMask(regs.stencil_front_func_mask, regs.stencil_back_func_mask);
 }
 void OptimizedRasterizer::DrawIndexed(u32 instance_count) {
    const auto& draw_state{gpu.Maxwell3D().draw_manager->GetDrawState()};
    const auto& index_buffer{memory_manager.ReadBlockUnsafe(draw_state.index_buffer.Address(),
                                                            draw_state.index_buffer.size)};
    shader_cache.BindComputeShader();
    shader_cache.BindGraphicsShader();
    DrawElementsInstanced(draw_state.topology, draw_state.index_buffer.count,
                          draw_state.index_buffer.format, index_buffer.data(), instance_count);
 }
 void OptimizedRasterizer::DrawArrays(u32 instance_count) {
    const auto& draw_state{gpu.Maxwell3D().draw_manager->GetDrawState()};
    shader_cache.BindComputeShader();
    shader_cache.BindGraphicsShader();
    DrawArraysInstanced(draw_state.topology, draw_state.vertex_buffer.first,
                        draw_state.vertex_buffer.count, instance_count);
 }
 void OptimizedRasterizer::ClearFramebuffer(u32 layer_count) {
    const auto& regs{gpu.Maxwell3D().regs};
    const auto& clear_state{regs.clear_buffers};
    if (clear_state.R || clear_state.G || clear_state.B || clear_state.A) {
        ClearColorBuffers(clear_state.R, clear_state.G, clear_state.B, clear_state.A,
                          regs.clear_color[0], regs.clear_color[1], regs.clear_color[2],
                          regs.clear_color[3], layer_count);
    }
    if (clear_state.Z || clear_state.S) {
        ClearDepthStencilBuffer(clear_state.Z, clear_state.S, regs.clear_depth, regs.clear_stencil,
                                layer_count);
    }
 }
 void OptimizedRasterizer::PrepareCompute() {
    shader_cache.BindComputeShader();
 }
 void OptimizedRasterizer::LaunchComputeShader() {
    const auto& launch_desc{gpu.KeplerCompute().launch_description};
    DispatchCompute(launch_desc.grid_dim_x, launch_desc.grid_dim_y, launch_desc.grid_dim_z);
 }
 } // namespace VideoCore
--- a/src/video_core/optimized_rasterizer.h
+++ b/src/video_core/optimized_rasterizer.h
@ -0,0 +1,73 @@
 #pragma once
 #include <memory>
 #include <vector>
 #include "common/common_types.h"
 #include "video_core/rasterizer_interface.h"
 #include "video_core/engines/maxwell_3d.h"
 namespace Core {
 class System;
 }
 namespace Tegra {
 class GPU;
 class MemoryManager;
 }
 namespace VideoCore {
 class ShaderCache;
 class QueryCache;
 class OptimizedRasterizer final : public RasterizerInterface {
 public:
    explicit OptimizedRasterizer(Core::System& system, Tegra::GPU& gpu);
    ~OptimizedRasterizer() override;
    void Draw(bool is_indexed, u32 instance_count) override;
    void Clear(u32 layer_count) override;
    void DispatchCompute() override;
    void ResetCounter(VideoCommon::QueryType type) override;
    void Query(GPUVAddr gpu_addr, VideoCommon::QueryType type,
               VideoCommon::QueryPropertiesFlags flags, u32 payload, u32 subreport) override;
    void FlushAll() override;
    void FlushRegion(DAddr addr, u64 size, VideoCommon::CacheType which) override;
    bool MustFlushRegion(DAddr addr, u64 size, VideoCommon::CacheType which) override;
    RasterizerDownloadArea GetFlushArea(DAddr addr, u64 size) override;
    void InvalidateRegion(DAddr addr, u64 size, VideoCommon::CacheType which) override;
    void OnCacheInvalidation(PAddr addr, u64 size) override;
    bool OnCPUWrite(PAddr addr, u64 size) override;
    void InvalidateGPUCache() override;
    void UnmapMemory(DAddr addr, u64 size) override;
    void ModifyGPUMemory(size_t as_id, GPUVAddr addr, u64 size) override;
    void FlushAndInvalidateRegion(DAddr addr, u64 size, VideoCommon::CacheType which) override;
    void WaitForIdle() override;
    void FragmentBarrier() override;
    void TiledCacheBarrier() override;
    void FlushCommands() override;
    void TickFrame() override;
 private:
    void PrepareRendertarget();
    void UpdateDynamicState();
    void DrawIndexed(u32 instance_count);
    void DrawArrays(u32 instance_count);
    void ClearFramebuffer(u32 layer_count);
    void PrepareCompute();
    void LaunchComputeShader();
    Core::System& system;
    Tegra::GPU& gpu;
    Tegra::MemoryManager& memory_manager;
    std::unique_ptr<ShaderCache> shader_cache;
    std::unique_ptr<QueryCache> query_cache;
    std::vector<RenderTargetConfig> render_targets;
    DepthStencilConfig depth_stencil;
    // Add any additional member variables needed for the optimized rasterizer
 };
 } // namespace VideoCore
--- a/src/video_core/shader_cache.cpp
+++ b/src/video_core/shader_cache.cpp
@ -3,9 +3,18 @@
 #include <algorithm>
 #include <array>
 #include <atomic>
 #include <filesystem>
 #include <fstream>
 #include <mutex>
 #include <thread>
 #include <vector>
 #include "common/assert.h"
 #include "common/fs/file.h"
 #include "common/fs/path_util.h"
 #include "common/logging/log.h"
 #include "common/thread_worker.h"
 #include "shader_recompiler/frontend/maxwell/control_flow.h"
 #include "shader_recompiler/object_pool.h"
 #include "video_core/control/channel_state.h"
@ -19,99 +28,55 @@
 namespace VideoCommon {
-void ShaderCache::InvalidateRegion(VAddr addr, size_t size) {
+constexpr size_t MAX_SHADER_CACHE_SIZE = 1024 * 1024 * 1024; // 1GB
 class ShaderCacheWorker : public Common::ThreadWorker {
 public:
    explicit ShaderCacheWorker(const std::string& name) : ThreadWorker(name) {}
    ~ShaderCacheWorker() = default;
    void CompileShader(ShaderInfo* shader) {
        Push([shader]() {
            // Compile shader here
            // This is a placeholder for the actual compilation process
            std::this_thread::sleep_for(std::chrono::milliseconds(10));
            shader->is_compiled.store(true, std::memory_order_release);
        });
    }
 };
 class ShaderCache::Impl {
 public:
    explicit Impl(Tegra::MaxwellDeviceMemoryManager& device_memory_)
        : device_memory{device_memory_}, workers{CreateWorkers()} {
        LoadCache();
    }
    ~Impl() {
        SaveCache();
    }
    void InvalidateRegion(VAddr addr, size_t size) {
        std::scoped_lock lock{invalidation_mutex};
        InvalidatePagesInRegion(addr, size);
        RemovePendingShaders();
-}
+    }
-void ShaderCache::OnCacheInvalidation(VAddr addr, size_t size) {
+    void OnCacheInvalidation(VAddr addr, size_t size) {
        std::scoped_lock lock{invalidation_mutex};
        InvalidatePagesInRegion(addr, size);
-}
+    }
-void ShaderCache::SyncGuestHost() {
+    void SyncGuestHost() {
        std::scoped_lock lock{invalidation_mutex};
        RemovePendingShaders();
-}
+    }
-ShaderCache::ShaderCache(Tegra::MaxwellDeviceMemoryManager& device_memory_)
+    bool RefreshStages(std::array<u64, 6>& unique_hashes);
-    : device_memory{device_memory_} {}
+    const ShaderInfo* ComputeShader();
    void GetGraphicsEnvironments(GraphicsEnvironments& result, const std::array<u64, NUM_PROGRAMS>& unique_hashes);
-bool ShaderCache::RefreshStages(std::array<u64, 6>& unique_hashes) {
+    ShaderInfo* TryGet(VAddr addr) const {
    auto& dirty{maxwell3d->dirty.flags};
    if (!dirty[VideoCommon::Dirty::Shaders]) {
        return last_shaders_valid;
    }
    dirty[VideoCommon::Dirty::Shaders] = false;
    const GPUVAddr base_addr{maxwell3d->regs.program_region.Address()};
    for (size_t index = 0; index < Tegra::Engines::Maxwell3D::Regs::MaxShaderProgram; ++index) {
        if (!maxwell3d->regs.IsShaderConfigEnabled(index)) {
            unique_hashes[index] = 0;
            continue;
        }
        const auto& shader_config{maxwell3d->regs.pipelines[index]};
        const auto program{static_cast<Tegra::Engines::Maxwell3D::Regs::ShaderType>(index)};
        if (program == Tegra::Engines::Maxwell3D::Regs::ShaderType::Pixel &&
            !maxwell3d->regs.rasterize_enable) {
            unique_hashes[index] = 0;
            continue;
        }
        const GPUVAddr shader_addr{base_addr + shader_config.offset};
        const std::optional<VAddr> cpu_shader_addr{gpu_memory->GpuToCpuAddress(shader_addr)};
        if (!cpu_shader_addr) {
            LOG_ERROR(HW_GPU, "Invalid GPU address for shader 0x{:016x}", shader_addr);
            last_shaders_valid = false;
            return false;
        }
        const ShaderInfo* shader_info{TryGet(*cpu_shader_addr)};
        if (!shader_info) {
            const u32 start_address{shader_config.offset};
            GraphicsEnvironment env{*maxwell3d, *gpu_memory, program, base_addr, start_address};
            shader_info = MakeShaderInfo(env, *cpu_shader_addr);
        }
        shader_infos[index] = shader_info;
        unique_hashes[index] = shader_info->unique_hash;
    }
    last_shaders_valid = true;
    return true;
 }
 const ShaderInfo* ShaderCache::ComputeShader() {
    const GPUVAddr program_base{kepler_compute->regs.code_loc.Address()};
    const auto& qmd{kepler_compute->launch_description};
    const GPUVAddr shader_addr{program_base + qmd.program_start};
    const std::optional<VAddr> cpu_shader_addr{gpu_memory->GpuToCpuAddress(shader_addr)};
    if (!cpu_shader_addr) {
        LOG_ERROR(HW_GPU, "Invalid GPU address for shader 0x{:016x}", shader_addr);
        return nullptr;
    }
    if (const ShaderInfo* const shader = TryGet(*cpu_shader_addr)) {
        return shader;
    }
    ComputeEnvironment env{*kepler_compute, *gpu_memory, program_base, qmd.program_start};
    return MakeShaderInfo(env, *cpu_shader_addr);
 }
 void ShaderCache::GetGraphicsEnvironments(GraphicsEnvironments& result,
                                          const std::array<u64, NUM_PROGRAMS>& unique_hashes) {
    size_t env_index{};
    const GPUVAddr base_addr{maxwell3d->regs.program_region.Address()};
    for (size_t index = 0; index < NUM_PROGRAMS; ++index) {
        if (unique_hashes[index] == 0) {
            continue;
        }
        const auto program{static_cast<Tegra::Engines::Maxwell3D::Regs::ShaderType>(index)};
        auto& env{result.envs[index]};
        const u32 start_address{maxwell3d->regs.pipelines[index].offset};
        env = GraphicsEnvironment{*maxwell3d, *gpu_memory, program, base_addr, start_address};
        env.SetCachedSize(shader_infos[index]->size_bytes);
        result.env_ptrs[env_index++] = &env;
    }
 }
 ShaderInfo* ShaderCache::TryGet(VAddr addr) const {
        std::scoped_lock lock{lookup_mutex};
        const auto it = lookup_cache.find(addr);
@ -119,9 +84,9 @@ ShaderInfo* ShaderCache::TryGet(VAddr addr) const {
            return nullptr;
        }
        return it->second->data;
-}
+    }
-void ShaderCache::Register(std::unique_ptr<ShaderInfo> data, VAddr addr, size_t size) {
+    void Register(std::unique_ptr<ShaderInfo> data, VAddr addr, size_t size) {
        std::scoped_lock lock{invalidation_mutex, lookup_mutex};
        const VAddr addr_end = addr + size;
@ -135,9 +100,74 @@ void ShaderCache::Register(std::unique_ptr<ShaderInfo> data, VAddr addr, size_t
        storage.push_back(std::move(data));
        device_memory.UpdatePagesCachedCount(addr, size, 1);
-}
+    }
-void ShaderCache::InvalidatePagesInRegion(VAddr addr, size_t size) {
+private:
    std::vector<std::unique_ptr<ShaderCacheWorker>> CreateWorkers() {
        const size_t num_workers = std::thread::hardware_concurrency();
        std::vector<std::unique_ptr<ShaderCacheWorker>> workers;
        workers.reserve(num_workers);
        for (size_t i = 0; i < num_workers; ++i) {
            workers.emplace_back(std::make_unique<ShaderCacheWorker>(fmt::format("ShaderWorker{}", i)));
        }
        return workers;
    }
    void LoadCache() {
        const auto cache_dir = Common::FS::GetSuyuPath(Common::FS::SuyuPath::ShaderDir);
        std::filesystem::create_directories(cache_dir);
        const auto cache_file = cache_dir / "shader_cache.bin";
        if (!std::filesystem::exists(cache_file)) {
            return;
        }
        std::ifstream file(cache_file, std::ios::binary);
        if (!file) {
            LOG_ERROR(Render_Vulkan, "Failed to open shader cache file for reading");
            return;
        }
        size_t num_entries;
        file.read(reinterpret_cast<char*>(&num_entries), sizeof(num_entries));
        for (size_t i = 0; i < num_entries; ++i) {
            VAddr addr;
            size_t size;
            file.read(reinterpret_cast<char*>(&addr), sizeof(addr));
            file.read(reinterpret_cast<char*>(&size), sizeof(size));
            auto info = std::make_unique<ShaderInfo>();
            file.read(reinterpret_cast<char*>(info.get()), sizeof(ShaderInfo));
            Register(std::move(info), addr, size);
        }
    }
    void SaveCache() {
        const auto cache_dir = Common::FS::GetSuyuPath(Common::FS::SuyuPath::ShaderDir);
        std::filesystem::create_directories(cache_dir);
        const auto cache_file = cache_dir / "shader_cache.bin";
        std::ofstream file(cache_file, std::ios::binary | std::ios::trunc);
        if (!file) {
            LOG_ERROR(Render_Vulkan, "Failed to open shader cache file for writing");
            return;
        }
        const size_t num_entries = storage.size();
        file.write(reinterpret_cast<const char*>(&num_entries), sizeof(num_entries));
        for (const auto& shader : storage) {
            const VAddr addr = shader->addr;
            const size_t size = shader->size_bytes;
            file.write(reinterpret_cast<const char*>(&addr), sizeof(addr));
            file.write(reinterpret_cast<const char*>(&size), sizeof(size));
            file.write(reinterpret_cast<const char*>(shader.get()), sizeof(ShaderInfo));
        }
    }
    void InvalidatePagesInRegion(VAddr addr, size_t size) {
        const VAddr addr_end = addr + size;
        const u64 page_end = (addr_end + SUYU_PAGESIZE - 1) >> SUYU_PAGEBITS;
        for (u64 page = addr >> SUYU_PAGEBITS; page < page_end; ++page) {
@ -147,18 +177,18 @@ void ShaderCache::InvalidatePagesInRegion(VAddr addr, size_t size) {
            }
            InvalidatePageEntries(it->second, addr, addr_end);
        }
-}
+    }
-void ShaderCache::RemovePendingShaders() {
+    void RemovePendingShaders() {
        if (marked_for_removal.empty()) {
            return;
        }
        // Remove duplicates
-    std::ranges::sort(marked_for_removal);
+        std::sort(marked_for_removal.begin(), marked_for_removal.end());
        marked_for_removal.erase(std::unique(marked_for_removal.begin(), marked_for_removal.end()),
                                 marked_for_removal.end());
-    boost::container::small_vector<ShaderInfo*, 16> removed_shaders;
+        std::vector<ShaderInfo*> removed_shaders;
        std::scoped_lock lock{lookup_mutex};
        for (Entry* const entry : marked_for_removal) {
@ -173,9 +203,9 @@ void ShaderCache::RemovePendingShaders() {
        if (!removed_shaders.empty()) {
            RemoveShadersFromStorage(removed_shaders);
        }
-}
+    }
-void ShaderCache::InvalidatePageEntries(std::vector<Entry*>& entries, VAddr addr, VAddr addr_end) {
+    void InvalidatePageEntries(std::vector<Entry*>& entries, VAddr addr, VAddr addr_end) {
        size_t index = 0;
        while (index < entries.size()) {
            Entry* const entry = entries[index];
@ -188,22 +218,22 @@ void ShaderCache::InvalidatePageEntries(std::vector<Entry*>& entries, VAddr addr
            RemoveEntryFromInvalidationCache(entry);
            marked_for_removal.push_back(entry);
        }
-}
+    }
-void ShaderCache::RemoveEntryFromInvalidationCache(const Entry* entry) {
+    void RemoveEntryFromInvalidationCache(const Entry* entry) {
        const u64 page_end = (entry->addr_end + SUYU_PAGESIZE - 1) >> SUYU_PAGEBITS;
        for (u64 page = entry->addr_start >> SUYU_PAGEBITS; page < page_end; ++page) {
            const auto entries_it = invalidation_cache.find(page);
            ASSERT(entries_it != invalidation_cache.end());
            std::vector<Entry*>& entries = entries_it->second;
-        const auto entry_it = std::ranges::find(entries, entry);
+            const auto entry_it = std::find(entries.begin(), entries.end(), entry);
            ASSERT(entry_it != entries.end());
            entries.erase(entry_it);
        }
-}
+    }
-void ShaderCache::UnmarkMemory(Entry* entry) {
+    void UnmarkMemory(Entry* entry) {
        if (!entry->is_memory_marked) {
            return;
        }
@ -212,40 +242,74 @@ void ShaderCache::UnmarkMemory(Entry* entry) {
        const VAddr addr = entry->addr_start;
        const size_t size = entry->addr_end - addr;
        device_memory.UpdatePagesCachedCount(addr, size, -1);
-}
+    }
-void ShaderCache::RemoveShadersFromStorage(std::span<ShaderInfo*> removed_shaders) {
+    void RemoveShadersFromStorage(const std::vector<ShaderInfo*>& removed_shaders) {
-    // Remove them from the cache
+        storage.erase(
-    std::erase_if(storage, [&removed_shaders](const std::unique_ptr<ShaderInfo>& shader) {
+            std::remove_if(storage.begin(), storage.end(),
-        return std::ranges::find(removed_shaders, shader.get()) != removed_shaders.end();
+                           [&removed_shaders](const std::unique_ptr<ShaderInfo>& shader) {
-    });
+                               return std::find(removed_shaders.begin(), removed_shaders.end(),
-}
+                                                shader.get()) != removed_shaders.end();
                           }),
            storage.end());
    }
-ShaderCache::Entry* ShaderCache::NewEntry(VAddr addr, VAddr addr_end, ShaderInfo* data) {
+    Entry* NewEntry(VAddr addr, VAddr addr_end, ShaderInfo* data) {
        auto entry = std::make_unique<Entry>(Entry{addr, addr_end, data});
        Entry* const entry_pointer = entry.get();
        lookup_cache.emplace(addr, std::move(entry));
        return entry_pointer;
    }
    Tegra::MaxwellDeviceMemoryManager& device_memory;
    std::vector<std::unique_ptr<ShaderCacheWorker>> workers;
    mutable std::mutex lookup_mutex;
    std::mutex invalidation_mutex;
    std::unordered_map<VAddr, std::unique_ptr<Entry>> lookup_cache;
    std::unordered_map<u64, std::vector<Entry*>> invalidation_cache;
    std::vector<std::unique_ptr<ShaderInfo>> storage;
    std::vector<Entry*> marked_for_removal;
 };
 ShaderCache::ShaderCache(Tegra::MaxwellDeviceMemoryManager& device_memory_)
    : impl{std::make_unique<Impl>(device_memory_)} {}
 ShaderCache::~ShaderCache() = default;
 void ShaderCache::InvalidateRegion(VAddr addr, size_t size) {
    impl->InvalidateRegion(addr, size);
 }
-const ShaderInfo* ShaderCache::MakeShaderInfo(GenericEnvironment& env, VAddr cpu_addr) {
+void ShaderCache::OnCacheInvalidation(VAddr addr, size_t size) {
-    auto info = std::make_unique<ShaderInfo>();
+    impl->OnCacheInvalidation(addr, size);
-    if (const std::optional<u64> cached_hash{env.Analyze()}) {
+}
-        info->unique_hash = *cached_hash;
+
-        info->size_bytes = env.CachedSizeBytes();
+void ShaderCache::SyncGuestHost() {
-    } else {
+    impl->SyncGuestHost();
-        // Slow path, not really hit on commercial games
+}
-        // Build a control flow graph to get the real shader size
+
-        Shader::ObjectPool<Shader::Maxwell::Flow::Block> flow_block;
+bool ShaderCache::RefreshStages(std::array<u64, 6>& unique_hashes) {
-        Shader::Maxwell::Flow::CFG cfg{env, flow_block, env.StartAddress()};
+    return impl->RefreshStages(unique_hashes);
-        info->unique_hash = env.CalculateHash();
+}
-        info->size_bytes = env.ReadSizeBytes();
+
-    }
+const ShaderInfo* ShaderCache::ComputeShader() {
-    const size_t size_bytes{info->size_bytes};
+    return impl->ComputeShader();
-    const ShaderInfo* const result{info.get()};
+}
-    Register(std::move(info), cpu_addr, size_bytes);
+
-    return result;
+void ShaderCache::GetGraphicsEnvironments(GraphicsEnvironments& result,
                                          const std::array<u64, NUM_PROGRAMS>& unique_hashes) {
    impl->GetGraphicsEnvironments(result, unique_hashes);
 }
 ShaderInfo* ShaderCache::TryGet(VAddr addr) const {
    return impl->TryGet(addr);
 }
 void ShaderCache::Register(std::unique_ptr<ShaderInfo> data, VAddr addr, size_t size) {
    impl->Register(std::move(data), addr, size);
 }
 } // namespace VideoCommon
Author	SHA1	Message	Date
chaphidoesstuff	8d6b694569	Update README.md	2024-09-30 12:30:59 +02:00
Crunch (Chaz9)	3aca4a3490	Updated	2024-09-29 21:31:09 +01:00
Crunch (Chaz9)	76f6f8de80	ok	2024-09-29 21:28:35 +01:00
Crunch (Chaz9)	592f93b26c	Update Core Timing .h file	2024-09-29 21:23:11 +01:00
chad_mcguffin	e5c47e911b	Remove unrelated discord link This doesn't need to be here as it's all Ex developers who are now unrelated to the project	2024-09-20 20:44:32 +02:00
chaphidoesstuff	42ade6f62a	need to fix bugs people!	2024-09-17 10:22:27 +02:00
Exverge	66993e2603	Comment out unimplemented check In my testing on macOS, MK8 sometimes crashed at this function, giving a void type instead of u32. I've temporarily commented this out until (if) this is implemented and added a check for if it is implemented	2024-09-15 21:37:12 +02:00
chaphidoesstuff	6be886d0ff	audio_core: increment current revision, Courtesy of Sudachi Dev Originally from `39effa1011/src/audio_core/common/feature_support.h`# and my mirror	2024-09-15 17:50:09 +02:00
chaphidoesstuff	ae65020815	Re-added credit to OG devs	2024-09-15 17:40:10 +02:00
Herman Semenov	e886f27816	Using reserve() for optimization inserts, marked unused pair items and minor code refactor	2024-09-15 17:30:44 +02:00
chaphidoesstuff	9490b5264e	Corrected Mistake	2024-09-15 17:18:09 +02:00
chaphidoesstuff	5f485a5863	Updated links	2024-09-15 16:41:53 +02:00
Crimson-Hawk	4eb41467f8	correct the false information in readme regarding rewrite	2024-07-04 12:22:04 +08:00
Crimson-Hawk	daf2c1f496	fix android build	2024-05-29 17:43:46 +08:00
Crimson-Hawk	5f351bf2b3	remove temp.sh	2024-05-29 17:30:20 +08:00
Crimson-Hawk	7b13512b41	fixed reference to gitlab in ci	2024-05-29 17:23:06 +08:00
Crimson-Hawk	e1f809079e	fixed reference to gitlab in ci	2024-05-29 17:14:55 +08:00
Crimson-Hawk	b95cfe6483	fixed reference to gitlab in ci	2024-05-29 16:51:35 +08:00
Crimson Hawk	433bcabb72	make pipeline run on every branch	2024-05-29 08:53:17 +08:00
administrator	267ba83d40	Remove unsanctioned Discord invite Having a Discord server linked to Suyu poses a risk to the accounts of its members. Moreover, many of the members of this server have quit the Suyu project and do not wish to continue its development.	2024-05-28 21:07:57 +08:00
administrator	93b7854f95	Remove unsanctioned Discord invite Having a Discord server linked to Suyu poses a risk to the accounts of its members. Moreover, many of the members of this server have quit the Suyu project and do not wish to continue its development.	2024-05-21 02:12:22 +02:00
chaphidoesstuff	2bacc25996	Update README.md	2024-05-19 16:11:58 -04:00
chaphidoesstuff	99ead71239	Updated README, fixed links in CONTRIBUTING.md Co-authored-by: Exverge <exverge@exverge.xyz> Committed-by: Exverge <exverge@exverge.xyz>	2024-05-19 16:11:58 -04:00
portaldevice	e090ec8b21	Add migration instructions for migrating from yuzu (#178 ) Co-authored-by: Exverge <exverge@exverge.xyz> Signed-off-by: Exverge <exverge@exverge.xyz>	2024-05-19 16:03:52 -04:00