This page highlights the workings of my implementation of an instrumented profiler
Here’s what the profiler looks like:
The profiler is split into a few different classes, they are:
- ProfileLogScope
- Profiler
- ProfilerReport
- ProfilerSample
Each profiler class provides specific functionality required by the profiler. Each class breaks down the required profiler functionality by breaking down the various aspects of the profiler and handling the required logic within itself. By encapsulating the required features in their own respective classes, the profiler can be expanded to add new functionality as and when required as new class implementations that tie into the Profiler interface.
Profile Log Scope
Tier 1 of the profiler was to simply be able to observe different sections of execution logic and determine the time taken for execution. Below is the class implementation of ProfileLogScope:
std::string GetProfileTimeString(double seconds); //------------------------------------------------------------------------------------------------------------------------------ class LogProfileScope { public: LogProfileScope(char const *name) { m_name = name; m_start_time = GetCurrentTimeHPC(); } ~LogProfileScope() { uint64_t duration = GetCurrentTimeHPC() - m_start_time; DebuggerPrintf("%s took %s (%llu HPC Units) \n", m_name, GetProfileTimeString((double)duration).c_str(), (uint64_t)duration); } uint64_t m_start_time; char const* m_name; }; #define COMBINE1(X,Y) X##Y // helper macro #define COMBINE(X,Y) COMBINE1(X,Y) #define PROFILE_LOG_SCOPE( tag ) LogProfileScope COMBINE(__scopeLog, __LINE__) ## ( tag ) #define PROFILE_LOG_SCOPE_FUNCTION() PROFILE_LOG_SCOPE(__FUNCTION__);
The usage of Profile Log Scope is highlighted below:
void App::LoadGameBlackBoard() { PROFILE_LOG_SCOPE("App::LoadGameBlackBoard"); const char* xmlDocPath = "Data/Gameplay/GameConfig.xml"; tinyxml2::XMLDocument gameconfig; gameconfig.LoadFile(xmlDocPath); if(gameconfig.ErrorID() != tinyxml2::XML_SUCCESS) { printf("\n >> Error loading XML file from %s ", xmlDocPath); printf("\n >> Error ID : %i ", gameconfig.ErrorID()); printf("\n >> Error line number is : %i", gameconfig.ErrorLineNum()); printf("\n >> Error name : %s", gameconfig.ErrorName()); ERROR_AND_DIE(">> Error loading GameConfig XML file ") return; } else { XMLElement* rootElement = gameconfig.RootElement(); g_gameConfigBlackboard.PopulateFromXmlElementAttributes(*rootElement); } }
The output of the code above in the output window of visual studio can be seen as:
App::LoadGameBlackBoard took 0.000001 milliseconds (7520 HPC Units)
Profiler
The profiler implementation serves as an interface that handles the logic to maintain profiled samples and generate reports for the desired frame from a buffer of frame history
To perform these functions, the Profiler interface is implemented as follows:
class Profiler { public: Profiler(); ~Profiler(); //------------------------------------------------------------------------------------------------------------------------------ //Methods bool ProfilerInitialize(); void ProfilerShutdown(); void ProfilerSetMaxHistoryTime(double seconds); void ProfilerPause(); void ProfilerResume(); void ProfilerTogglePause(); void ProfilerPush(const char* label); void ProfilerPop(); void ProfilerUpdate(); void ShowProfilerTimeline(); void PopulateGraphData(float* floatArray, float* allocArray, int& timeArraySize, int& allocArraySize, float& maxTime, float& maxAlloc); void MakeTimelineWindow(); void MakeAllocationsWindow(); void EraseOldTrees(); void ProfilerAllocation(size_t byteSize = 0); void ProfilerFree(); void ProfilerBeginFrame(const char* label = "Frame"); void ProfilerEndFrame(); bool IsProfilerOpen(); // We can only really 'view' a complete tree // these functions return the most recently finished tree // use a shared_mutex for accessing trees from the system (as it will try to destroy old ones constantly) // `history` is how many frames back we should try to get ProfilerSample_T* ProfilerAcquirePreviousTree(std::thread::id id, uint history = 0); ProfilerSample_T* ProfilerAcquirePreviousTreeForCallingThread(uint history = 0); void ProfilerReleaseTree(ProfilerSample_T* node); //------------------------------------------------------------------------------------------------------------------------------ // Static Methods static Profiler* CreateInstance(); static void DestroyInstance(); static Profiler* GetInstance(); //------------------------------------------------------------------------------------------------------------------------------ // Dev Console Events static bool Command_PauseProfiler(EventArgs& args); static bool Command_ResumeProfiler(EventArgs& args); static bool Command_ProfilerReport(EventArgs& args); private: ProfilerSample_T* AllocateNode(); void FreeTree(ProfilerSample_T* root); void FreeNode(ProfilerSample_T* node); void RepopulateReportData(); double m_maxHistoryTime = 3; bool m_isPaused = false; bool m_showTimeline = false; std::vector<ProfilerSample_T*> m_History; std::shared_mutex m_HistoryLock; size_t m_AllowedSize = 1048576; //1024 KibiBytes or 1 MebiByte //------------------------------------------------------------------------------------------------------------------------------ //ImGUI values float* m_timeArray; float* m_allocArray; float* m_timeArrayStart; float* m_allocArrayStart; float m_maxAlloc = 0; float m_maxTime = 0; int m_timeArraySize = 0; int m_allocArraySize = 0; float m_reportFrameNum = 0; };
class ProfilerLogObject { public: ProfilerLogObject(char const *label) { gProfiler->ProfilerPush(label); } ~ProfilerLogObject() { gProfiler->ProfilerPop(); } };
To use the profiler, the developer must first call the ProfilerBeginFrame function to tell the profiler the start of the profiler frame and call the ProfilerEndFrame function to indicate the end of the current profiler frame. Within each frame, there needs to be a call to ProfilerPush and ProfilerPop when profiling sections of code. Each ProfilerPush allocates new memory to store the ProfilerSample_T object that contains sample information such as the number of allocations performed, the call stack as well as the size of the allocations.
Profiler Sample
To create a sample to be used by profiler as a record of various code executions performed in the profiled frame, we incorporate a structure that contains some frame specific information. The ProfilerSample_T struct is highlighted below:
struct ProfilerSample_T { //Parent Node ProfilerSample_T* m_parent = nullptr; //Node's children ProfilerSample_T* m_lastChild = nullptr; ProfilerSample_T* m_prevSibling = nullptr; char m_label[64]; //Timing uint64_t m_startTime = 0; uint64_t m_endTime = 0; std::thread::id m_threadID; uint m_refCount = 0; // memory // alloc_count, byte_count int m_allocCount = 0; size_t m_allocationSizeInBytes = 0; int m_freeCount = 0; size_t m_freeSizeInBytes = 0; void AddChild(ProfilerSample_T* child); };
This sample gives enough context over memory usage within the scope of the Profile frame being recorded.
Profiler Report
To generate a report, we make use of a ProfilerReport class which stores a collection of report nodes that were generated for each profile frame. The report node performs all the sorting logic to generate required profiler views and retrieves the sample data from ProfilerSample_T. The implementation is highlighted below:
enum ProfilerReportMode { TREE_VIEW, FLAT_VIEW }; //------------------------------------------------------------------------------------------------------------------------------ class ProfilerReportNode { public: ProfilerReportNode(ProfilerSample_T* node, ProfilerReportNode* parent = nullptr); ~ProfilerReportNode(); void GetChildrenFromSampleRoot(ProfilerSample_T* root, ProfilerReportNode* parent); void SortByTotalTime(); void SortBySelfTime(); void GetSelfTime(); ProfilerReportNode* m_parent = nullptr; int m_allocationCount = 0; size_t m_allocationSize = 0; int m_freeCount = 0; size_t m_freedSize = 0; uint m_numCalls = 0U; float m_totalPercent = 0.0f; float m_selfPercent = 0.0f; char m_label[64]; //Optional: uint64_t m_totalTimeHPC = 0U; double m_totalTime = 0; double m_selfTime = 0; double m_avgTime = 0; //Avg time per call including children double m_avgSelfTime = 0; //Avg time per call not including children double m_maxTime = 0; //longest a certain sample took in the collection double m_maxTimeSelf = 0; //longest a certain sample took in the collection but without including it's children //My kids std::vector<ProfilerReportNode> m_children; bool operator==(const ProfilerReportNode& compare) const; }; bool ComparatorTotalTimeSort(ProfilerReportNode* elementA, ProfilerReportNode* elementB); bool ComparatorSelfTimeSort(ProfilerReportNode* elementA, ProfilerReportNode* elementB);
class ProfilerReport { public: ProfilerReport(); ~ProfilerReport(); ProfilerReportNode* GetFrameInHistory(uint history = 1); ProfilerReportNode* GetRoot(); static ProfilerReport* CreateInstance(); static ProfilerReport* GetInstance(); static void DestroyInstance(); void DrawTreeViewAsImGUIWidget(uint history); void DrawFlatViewAsImGUIWidget(uint history); void SetReportMode(ProfilerReportMode mode); ProfilerReportMode GetMode() { return m_activeMode; } void SetRoot(ProfilerReportNode* root); //NOTE: This is only for the UI buttons and is to be accessed only by ProfilerReportNode int m_sortTotalTime = 0; int m_sortSelfTime = 0; private: void InitializeReporter(); void GenerateTreeFromFrame(ProfilerSample_T* root); void GenerateFlatFromFrame(ProfilerSample_T* root); void AddToFlatViewVector(ProfilerReportNode& rootNode); void PopulateTreeForImGUI(ProfilerReportNode* root); void PopulateFlatForImGUI(); static bool Command_ProfilerToggleMode(EventArgs& args); static bool Command_ProfilerReportFrame(EventArgs& args); ProfilerReportNode* m_root = nullptr; uint m_lastHistoryFrame = 0U; ProfilerReportMode m_activeMode = TREE_VIEW; int m_clicked = 0; std::vector<ProfilerReportNode*> m_flatViewVector; void GenerateFlatViewVector(ProfilerReportNode* m_root); };
The profiler uses Dear ImGUI to create the profiler view that is initiated from the developer console. The view displays a histogram for the time taken during execution as well as a corresponding allocations graph to highlight the memory usage within the profiled frame.
On clicking the profiler frame, we can obtain a tree view as well as a flat view of the frames profiled sorted by their execution times and allocations if needed.
Retrospective
What went well 🙂
- Implementing Profiler logic
- A clear breakdown of Report and Samples
- Abstracting irrelevant data from Profiler to maintain a hierarchy of used classes
What went wrong 🙁
- Encountered bugs when using uints especially when using bad defaults to check against
- Encountered issues when modifying imGUI code to allow for clicks on histograms
What I would do differently 🙂
- Implement a simpler profiler widget
- Allow for some navigation and options on the widget to switch memory tracking modes