Prodigy Engine uses some custom allocators and overloaded new and delete operators to perform memory tracking.
Here’s an image of the development console displaying memory tracking information:
Based on the desired use case, Prodigy Engine supports the following allocators:
- Untracked Allocator
- Tracked Allocator
- Templated Untracked Allocator for STL
- Block Allocator
- Object Allocator
Tracked and Untracked Allocators
The Untracked and Tracked allocators are the default allocators used for Release and Debug builds respectively. They both inherit from an Internal Allocator class. Below is the implementation for said allocators:
//------------------------------------------------------------------------------------------------------------------------------ class InternalAllocator { public: virtual void* Allocate(size_t size) = 0; virtual void Free(void* ptr) = 0; template <typename T, typename ...ARGS> T* Create(ARGS&& ...args) { void* mem = this->Allocate(sizeof(T)); if (mem != nullptr) { // new in place //We want to perfect forward our arguments new (mem) T(std::forward<ARGS>(args)...); } return (T*)mem; } template <typename T> void Destroy(T* obj) { if (obj != nullptr) { obj->~T(); this->Free(obj); } } };
//------------------------------------------------------------------------------------------------------------------------------ class UntrackedAllocator : public InternalAllocator { public: virtual void* Allocate(size_t size) final; virtual void Free(void* ptr) final; static UntrackedAllocator* CreateInstance(); static void DestroyInstance(); static UntrackedAllocator* GetInstance(); };
//------------------------------------------------------------------------------------------------------------------------------ class TrackedAllocator : public InternalAllocator { public: virtual void* Allocate(size_t size) final; virtual void Free(void* ptr) final; static TrackedAllocator* CreateInstance(); static void DestroyInstance(); static TrackedAllocator* GetInstance(); };
Both Tracked and Untracked allocators use a singleton paradigm where a global allocator is created and is responsible for allocations throughout runtime.
Templated Untracked Allocator
Wrote a templated untracked allocator that can be used with STL object allocations. This allocator only exists on a container by value, this means each container has its own instance of this allocator. It would, however, be treated as a namespace or a static and won’t use any internal state.
template <typename T> struct TemplatedUntrackedAllocator { TemplatedUntrackedAllocator() = default; template <class U> constexpr TemplatedUntrackedAllocator(TemplatedUntrackedAllocator<U> const&) noexcept {} // allocator needs to define these types; // the "type" is not as important as the name // the stdlib is expecting these names - remember templates are a form of duck-typing // these three are fairly standard typedef T value_type; typedef size_t size_type; typedef std::ptrdiff_t difference_type; // give hints on how the allocator is implemented so that containers can optmize around it typedef std::true_type propagate_on_container_move_assignment; // when moving - does the allocator local state move with it? typedef std::true_type is_always_equal; // can optimize some containers (allocator of this type is always equal to others of its type) T* allocate(size_t byte_size) { UNUSED(byte_size); return (T*) ::malloc(sizeof(T)); } void deallocate(T* ptr, size_t byte_size) { UNUSED(byte_size); ::free(ptr); } }; template<typename T, class U> bool operator==(TemplatedUntrackedAllocator<T> const&, TemplatedUntrackedAllocator<U> const&) { return true; } template<typename T, class U> bool operator!=(TemplatedUntrackedAllocator<T> const&, TemplatedUntrackedAllocator<U> const&) { return false; }
Although clever, I almost always end up using the standard UntrackedAllocator in my engine which performs the same functionality as the allocator above but without the overhead of being templated.
Block Allocator
Allocator used to assign blocks of memory of desired size using malloc. This allocator is very useful especially in cases where memory usage may vary during run time. A good example of which would be the instrumented profiler that may require less or more memory depending on the number of system frames being profiler per game frame.
Below is the implementation of the block allocator:
typedef unsigned int uint; //------------------------------------------------------------------------------------------------------------------------------ struct Block_T { Block_T* next; }; //------------------------------------------------------------------------------------------------------------------------------ struct Chunck_T { Chunck_T* next; }; // Single-Threaded Block Allocator class BlockAllocator : public InternalAllocator { public: //This Initialization takes a base allocator to sub allocate from //The allocation can grow as long as the base allocator can allocate bool Initialize(InternalAllocator* base, size_t blockSize, size_t alignment, uint blocksPerChunk); //Takes a static buffer of fixed size. This allocation is not allowed to grow bool Initialize(void* buffer, size_t bufferSize, size_t blockSize, size_t alignment); void Deinitialize(); //Interface methods virtual void* Allocate(size_t size) final; // works as long as size <= block_size virtual void Free(void* ptr) final; //------------------------------------------------------------------------------------------------------------------------------ //Static methods static BlockAllocator* CreateInstance(); static void DestroyInstance(); static BlockAllocator* GetInstance(); private: //Block allocator can allocate and free single blocks void* AllocateBlock(); void FreeBlock(void* ptr); //Allocated a chunk and splits into blocks //NOTE: Will fail if there is no base allocator provided bool AllocateChunk(); void BreakUpChunk(void* buffer); void PushFreeBlock(Block_T* block); Block_T* PopFreeBlock(); private: //Sub allocator to use for allocation InternalAllocator* m_base = nullptr; Block_T* m_freeBlocks = nullptr; Chunck_T* m_chunkList = nullptr; size_t m_alignment; size_t m_blockSize; size_t m_blocksPerChunk; size_t m_bufferSize; // AsyncBlockAllocator std::mutex m_chunkLock; std::mutex m_blockLock; };
#include "Engine/Allocators/BlockAllocator.hpp" #include "Engine/Core/MemTracking.hpp" //------------------------------------------------------------------------------------------------------------------------------ typedef uint8_t byte; BlockAllocator* gBlockAllocator = nullptr; //------------------------------------------------------------------------------------------------------------------------------ bool BlockAllocator::Initialize(InternalAllocator* base, size_t blockSize, size_t alignment, uint blocksPerChunk) { m_base = base; m_blockSize = blockSize; m_alignment = alignment; m_blocksPerChunk = blocksPerChunk; m_freeBlocks = nullptr; m_chunkList = nullptr; AllocateChunk(); return true; } //------------------------------------------------------------------------------------------------------------------------------ bool BlockAllocator::Initialize(void* buffer, size_t bufferSize, size_t blockSize, size_t alignment) { // infer class members based on parameters m_blocksPerChunk = bufferSize/ blockSize; m_bufferSize = bufferSize; m_blockSize = blockSize; m_alignment = alignment; m_base = nullptr; m_freeBlocks = nullptr; // allocating blocks from a chunk // may move this to a different method later; BreakUpChunk(buffer); if (m_freeBlocks != nullptr) { return true; } else { return false; } } //------------------------------------------------------------------------------------------------------------------------------ void BlockAllocator::Deinitialize() { //If m_base is not nullptr, we have chunks if (m_base) { std::scoped_lock chunkLock(m_chunkLock); Chunck_T* list = nullptr; // free chunks while (m_chunkList != nullptr) { list = m_chunkList; m_chunkList = m_chunkList->next; m_base->Free(list); } } //Else condition is normal cleanup std::scoped_lock blockLock(m_blockLock); //Reset members m_base = nullptr; m_freeBlocks = nullptr; m_blockSize = 0U; m_blocksPerChunk = 0U; } //------------------------------------------------------------------------------------------------------------------------------ void* BlockAllocator::Allocate(size_t size) { if (size <= m_blockSize) { return AllocateBlock(); } else { return nullptr; } } //------------------------------------------------------------------------------------------------------------------------------ void BlockAllocator::Free(void* ptr) { FreeBlock(ptr); } //------------------------------------------------------------------------------------------------------------------------------ void* BlockAllocator::AllocateBlock() { Block_T* block = PopFreeBlock(); while (block == nullptr) { if (!AllocateChunk()) { return nullptr; } block = PopFreeBlock(); } return block; } //------------------------------------------------------------------------------------------------------------------------------ void BlockAllocator::FreeBlock(void* ptr) { Block_T* block = (Block_T*)ptr; PushFreeBlock(block); } //------------------------------------------------------------------------------------------------------------------------------ BlockAllocator* BlockAllocator::CreateInstance() { if (gBlockAllocator == nullptr) { gBlockAllocator = new BlockAllocator(); } return gBlockAllocator; } //------------------------------------------------------------------------------------------------------------------------------ void BlockAllocator::DestroyInstance() { if (gBlockAllocator != nullptr) { delete gBlockAllocator; gBlockAllocator = nullptr; } } //------------------------------------------------------------------------------------------------------------------------------ BlockAllocator* BlockAllocator::GetInstance() { if (gBlockAllocator == nullptr) { CreateInstance(); } return gBlockAllocator; } //------------------------------------------------------------------------------------------------------------------------------ bool BlockAllocator::AllocateChunk() { //We can't allocate a chunk if we don't have a sub allocator if (m_base == nullptr) { return false; } if (m_chunkLock.try_lock()) { //Allocate a chunk of memory if the base allocator is able to size_t chunkSize = m_blocksPerChunk * m_blockSize + sizeof(Block_T); Chunck_T* chunk = (Chunck_T*)m_base->Allocate(chunkSize); if (chunk == nullptr) { return false; } //Track this chunk so we can free this later chunk->next = m_chunkList; m_chunkList = chunk; //Break up newly allocated chunk BreakUpChunk(chunk + 1); m_chunkLock.unlock(); } return true; } //------------------------------------------------------------------------------------------------------------------------------ void BlockAllocator::BreakUpChunk(void* buffer) { byte* buf = (byte*)buffer; Block_T* first = (Block_T*)buf; Block_T* head = nullptr; for (uint i = 0; i < m_blocksPerChunk; ++i) { Block_T* node = (Block_T*)buf; buf += m_blockSize; node->next = head; head = node; } { //Lock so we are thread safe std::scoped_lock lock(m_blockLock); first->next = m_freeBlocks; m_freeBlocks = head; } } //------------------------------------------------------------------------------------------------------------------------------ void BlockAllocator::PushFreeBlock(Block_T* block) { std::scoped_lock blockLock(m_blockLock); block->next = m_freeBlocks; m_freeBlocks = block; } //------------------------------------------------------------------------------------------------------------------------------ Block_T* BlockAllocator::PopFreeBlock() { std::scoped_lock blockLock(m_blockLock); Block_T* head = m_freeBlocks; if (head != nullptr) { m_freeBlocks = head->next; } return head; }
The block allocator also inherits from internal allocator and performs tracking on the memory allocated.
Object Allocator
This is a templated allocator that allocates memory of the size of the required object. The implementation inherits from the Block Allocator and performs allocations of size T where T represents the type of the object for which memory is to be allocated.
template <typename OBJ> class ObjectAllocator : private BlockAllocator { public: void Initialize(InternalAllocator* parent, uint blocksPerChunk) { return BlockAllocator::Initialize(parent, sizeof(OBJ), alignof(OBJ), blocksPerChunk); } void Deinitialize() { BlockAllocator::Deinitialize(); } virtual void* Allocate(size_t size) { return BlockAllocator::Allocate(size); } virtual void* Free(void* ptr) { return BlockAllocator::Free(ptr); } template <typename ...ARGS> OBJ* Create(ARGS ...args) { void* mem = Allocate(sizeof(OBJ)); if (mem != nullptr) { return new(mem) OBJ(std::forward(args)...); } else { return nullptr; } } void Destroy(OBJ* object) { Free(object); } };
These are the different allocators used in Prodigy Engine but the memory tracking is achieved by overloading the new and delete operators.
Memory Tracking and Call Stacks
To track memory and call stacks, the Prodigy Engine uses a simple memory tracker that overloads operators new and delete and keeps some global values that store the number of allocations, allocation sizes and amount of memory freed for the project.
Below is the implementation of the MemTracker in Prodigy Engine:
struct MemTrackInfo_T { void* m_originalPointer; size_t m_byteSize; Callstack m_callstack; }; struct LogTrackInfo_T { uint m_numAllocations; size_t m_allocationSizeInBytes; Callstack m_callstack; }; // Human parse able data std::string GetSizeString(size_t byte_count); void* UntrackedAlloc(size_t byte_count); void UntrackedFree(void* ptr); void* TrackedAlloc(size_t byte_count); void TrackedFree(void* ptr); void TrackAllocation(void* allocation, size_t byte_count); void UntrackAllocation(void* allocation); size_t MemTrackGetLiveAllocationCount(); size_t MemTrackGetLiveByteCount(); void MemTrackLogLiveAllocations();
//------------------------------------------------------------------------------------------------------------------------------ void* operator new(size_t size) { return TrackedAlloc(size); } //------------------------------------------------------------------------------------------------------------------------------ void operator delete(void* ptr) { TrackedFree(ptr); } //------------------------------------------------------------------------------------------------------------------------------ void* operator new(size_t size, InternalAllocator& allocator) { return allocator.Allocate(size); } //------------------------------------------------------------------------------------------------------------------------------ void operator delete(void* ptr, InternalAllocator& allocator) { return allocator.Free(ptr); } //------------------------------------------------------------------------------------------------------------------------------ void* UntrackedAlloc(size_t size) { return ::malloc(size); } //------------------------------------------------------------------------------------------------------------------------------ void UntrackedFree(void* ptr) { return ::free(ptr); } //------------------------------------------------------------------------------------------------------------------------------ void* TrackedAlloc(size_t byte_count) { // One suggestion and example on how to break up this function // based on build config; #if !defined(MEM_TRACKING) return UntrackedAlloc(byte_count); #else #if (MEM_TRACKING == MEM_TRACK_ALLOC_COUNT) ++gTotalAllocations; ++tTotalAllocations; void* allocation = ::malloc(byte_count); return allocation; #elif (MEM_TRACKING == MEM_TRACK_VERBOSE) ++gTotalAllocations; gTotalBytesAllocated += byte_count; ++tTotalAllocations; tTotalBytesAllocated += byte_count; void* allocation = ::malloc(byte_count); TrackAllocation(allocation, byte_count); return allocation; #endif #endif } //------------------------------------------------------------------------ void TrackedFree(void* ptr) { #if (MEM_TRACKING == MEM_TRACK_ALLOC_COUNT) if (ptr == nullptr) return; --gTotalAllocations; ++tTotalFrees; return ::free(ptr); #elif (MEM_TRACKING == MEM_TRACK_VERBOSE) --gTotalAllocations; ++tTotalFrees; UntrackAllocation(ptr); #endif } //------------------------------------------------------------------------ void TrackAllocation(void* allocation, size_t byte_count) { #if (MEM_TRACKING == MEM_TRACK_VERBOSE) Callstack callstack = CallstackGet(); MemTrackInfo_T info; info.m_byteSize = byte_count; info.m_callstack = callstack; info.m_originalPointer = allocation; { std::scoped_lock lock(GetMemTrackerLock()); GetMemTrakingMap()[allocation] = info; } #endif } //------------------------------------------------------------------------ void UntrackAllocation(void* allocation) { { std::scoped_lock lock(GetMemTrackerLock()); auto mapIterator = GetMemTrakingMap().find(allocation); ::free(allocation); if (mapIterator != GetMemTrakingMap().end()) { gTotalBytesAllocated -= mapIterator->second.m_byteSize; tTotalBytesFreed += mapIterator->second.m_byteSize; GetMemTrakingMap().erase(mapIterator); } } }
This is accompanied by a CallStack class that shows the call stack for the allocation requested:
#define MAX_TRACE 64 #define TRACE_MAX_FUNCTION_NAME_LENGTH 1024 typedef unsigned int uint; class Callstack { public: void* m_trace[MAX_TRACE]; // execution pointers representing where we are in code; uint m_depth = 0; unsigned long m_hash = 0; }; Callstack CallstackGet(uint skip_frames = 0); // Convert a callstack to strings std::vector<std::string> GetCallstackToString(Callstack const& cs);
//------------------------------------------------------------------------------------------------------------------------------ Callstack CallstackGet(uint skip_frames /*= 0*/) { Callstack stackTraceObject; stackTraceObject.m_depth = CaptureStackBackTrace(skip_frames, MAX_TRACE, stackTraceObject.m_trace, &stackTraceObject.m_hash); return stackTraceObject; } //------------------------------------------------------------------------------------------------------------------------------ std::vector<std::string> GetCallstackToString(Callstack const& callStack) { HANDLE pHandle = GetCurrentProcess(); SymInitialize(pHandle, nullptr, true); std::vector<std::string> callStackStrings; PIMAGEHLP_SYMBOL64 symbol = (PIMAGEHLP_SYMBOL64)malloc(sizeof(PIMAGEHLP_SYMBOL64) + (TRACE_MAX_FUNCTION_NAME_LENGTH - 1) * sizeof(TCHAR)); memset(symbol, 0, sizeof(PIMAGEHLP_SYMBOL64) + (TRACE_MAX_FUNCTION_NAME_LENGTH - 1) * sizeof(TCHAR)); symbol->MaxNameLength = TRACE_MAX_FUNCTION_NAME_LENGTH; symbol->SizeOfStruct = sizeof(PIMAGEHLP_SYMBOL64); PDWORD64 displacement = 0; IMAGEHLP_LINE64 *line = (IMAGEHLP_LINE64 *)malloc(sizeof(IMAGEHLP_LINE64)); memset(line, 0, sizeof(IMAGEHLP_LINE64)); line->SizeOfStruct = sizeof(IMAGEHLP_LINE64); for (uint stackTraceIndex = 0; stackTraceIndex < callStack.m_depth - 2; ++stackTraceIndex) { bool symResult = SymGetSymFromAddr64(pHandle, (DWORD64)callStack.m_trace[stackTraceIndex], displacement, (PIMAGEHLP_SYMBOL64)symbol); if (symResult) { DWORD pdwDisplacement = 0; bool lineResult = SymGetLineFromAddr64(pHandle, (DWORD64)callStack.m_trace[stackTraceIndex], &pdwDisplacement, line); if (lineResult) { DebuggerPrintf("%s(%d):%s \n", line->FileName, line->LineNumber, symbol->Name); callStackStrings.push_back(line->FileName); } else { DebuggerPrintf("Error from SymGetSymFromAddr64: %lu.\n", GetLastError()); } } else { DebuggerPrintf("Error from SymFromAddr: %lu.\n", GetLastError()); } } return callStackStrings; }
With this, Prodigy Engine can perform tracking and use this data when required in the Profiler.