Custom Allocators and Memory Tracking

Prodigy Engine uses some custom allocators and overloaded new and delete operators to perform memory tracking.

Here’s an image of the development console displaying memory tracking information:

View of Memory Tracking window showing total allocations and size of the allocations

Based on the desired use case, Prodigy Engine supports the following allocators:

Untracked Allocator
Tracked Allocator
Templated Untracked Allocator for STL
Block Allocator
Object Allocator

Tracked and Untracked Allocators

The Untracked and Tracked allocators are the default allocators used for Release and Debug builds respectively. They both inherit from an Internal Allocator class. Below is the implementation for said allocators:

//------------------------------------------------------------------------------------------------------------------------------
class InternalAllocator  
{
public:
	
	virtual void*	Allocate(size_t size) = 0;
	virtual void	Free(void* ptr) = 0;

	template <typename T, typename ...ARGS>
	T* Create(ARGS&& ...args)
	{
		void* mem = this->Allocate(sizeof(T));
		if (mem != nullptr) 
		{
			// new in place
			//We want to perfect forward our arguments
			new (mem) T(std::forward<ARGS>(args)...);
		}

		return (T*)mem;
	}

	template <typename T> 
	void Destroy(T* obj)
	{
		if (obj != nullptr) 
		{
			obj->~T();
			this->Free(obj);
		}
	}
};

//------------------------------------------------------------------------------------------------------------------------------
class UntrackedAllocator : public InternalAllocator
{
public:
	virtual void*					Allocate(size_t size) final;
	virtual void					Free(void* ptr) final;

	static	UntrackedAllocator*		CreateInstance();
	static	void					DestroyInstance();
	static	UntrackedAllocator*		GetInstance();
};

//------------------------------------------------------------------------------------------------------------------------------
class TrackedAllocator : public InternalAllocator
{
public:
	virtual void*				Allocate(size_t size) final;
	virtual void				Free(void* ptr) final;

	static	TrackedAllocator*	CreateInstance();
	static	void				DestroyInstance();
	static	TrackedAllocator*	GetInstance();
};

Both Tracked and Untracked allocators use a singleton paradigm where a global allocator is created and is responsible for allocations throughout runtime.

Templated Untracked Allocator

Wrote a templated untracked allocator that can be used with STL object allocations. This allocator only exists on a container by value, this means each container has its own instance of this allocator. It would, however, be treated as a namespace or a static and won’t use any internal state.

template <typename T>
struct TemplatedUntrackedAllocator
{
	TemplatedUntrackedAllocator() = default;

	template <class U>
	constexpr TemplatedUntrackedAllocator(TemplatedUntrackedAllocator<U> const&) noexcept {}

	// allocator needs to define these types;
	// the "type" is not as important as the name
	// the stdlib is expecting these names - remember templates are a form of duck-typing
	// these three are fairly standard
	typedef T               value_type;
	typedef size_t          size_type;
	typedef std::ptrdiff_t  difference_type;

	// give hints on how the allocator is implemented so that containers can optmize around it 
	typedef std::true_type  propagate_on_container_move_assignment;   // when moving - does the allocator local state move with it?
	typedef std::true_type  is_always_equal;                          // can optimize some containers (allocator of this type is always equal to others of its type)                         

	T* allocate(size_t byte_size) 
	{ 
		UNUSED(byte_size);
		return (T*) ::malloc(sizeof(T)); 
	}

	void deallocate(T* ptr, size_t byte_size) 
	{ 
		UNUSED(byte_size);
		::free(ptr); 
	}
};

template<typename T, class U>
bool operator==(TemplatedUntrackedAllocator<T> const&, TemplatedUntrackedAllocator<U> const&)
{ 
	return true; 
}

template<typename T, class U>
bool operator!=(TemplatedUntrackedAllocator<T> const&, TemplatedUntrackedAllocator<U> const&)
{ 
	return false; 
}

Although clever, I almost always end up using the standard UntrackedAllocator in my engine which performs the same functionality as the allocator above but without the overhead of being templated.

Block Allocator

Allocator used to assign blocks of memory of desired size using malloc. This allocator is very useful especially in cases where memory usage may vary during run time. A good example of which would be the instrumented profiler that may require less or more memory depending on the number of system frames being profiler per game frame.

Below is the implementation of the block allocator:

typedef unsigned int uint;

//------------------------------------------------------------------------------------------------------------------------------
struct Block_T
{
	Block_T* next;
};

//------------------------------------------------------------------------------------------------------------------------------
struct Chunck_T
{
	Chunck_T* next;
};

// Single-Threaded Block Allocator
class BlockAllocator : public InternalAllocator
{
public:
	//This Initialization takes a base allocator to sub allocate from
	//The allocation can grow as long as the base allocator can allocate
	bool						Initialize(InternalAllocator* base,
								size_t blockSize,
								size_t alignment,
								uint blocksPerChunk);

	//Takes a static buffer of fixed size. This allocation is not allowed to grow
	bool						Initialize(void* buffer,
								size_t bufferSize,
								size_t blockSize,
								size_t alignment);

	void						Deinitialize();

	//Interface methods
	virtual void*				Allocate(size_t size) final; // works as long as size <= block_size
	virtual void				Free(void* ptr) final;

	//------------------------------------------------------------------------------------------------------------------------------
	//Static methods
	static	BlockAllocator*		CreateInstance();
	static	void				DestroyInstance();
	static	BlockAllocator*		GetInstance();

private:

	//Block allocator can allocate and free single blocks
	void*						AllocateBlock();
	void						FreeBlock(void* ptr);

	//Allocated a chunk and splits into blocks
	//NOTE: Will fail if there is no base allocator provided
	bool						AllocateChunk();
	void						BreakUpChunk(void* buffer);

	void						PushFreeBlock(Block_T* block);
	Block_T*					PopFreeBlock();

private: 

	//Sub allocator to use for allocation
	InternalAllocator*			m_base = nullptr;

	Block_T*					m_freeBlocks = nullptr;
	Chunck_T*					m_chunkList = nullptr;  

	size_t						m_alignment;
	size_t						m_blockSize;
	size_t						m_blocksPerChunk;

	size_t						m_bufferSize;

	// AsyncBlockAllocator
	std::mutex					m_chunkLock;
	std::mutex					m_blockLock;
};

#include "Engine/Allocators/BlockAllocator.hpp"
#include "Engine/Core/MemTracking.hpp"

//------------------------------------------------------------------------------------------------------------------------------
typedef uint8_t byte;
BlockAllocator* gBlockAllocator = nullptr;

//------------------------------------------------------------------------------------------------------------------------------
bool BlockAllocator::Initialize(InternalAllocator* base, size_t blockSize, size_t alignment, uint blocksPerChunk)
{
	m_base = base;
	m_blockSize = blockSize;
	m_alignment = alignment;
	m_blocksPerChunk = blocksPerChunk;

	m_freeBlocks = nullptr;
	m_chunkList = nullptr;

	AllocateChunk();  

	return true;
}

//------------------------------------------------------------------------------------------------------------------------------
bool BlockAllocator::Initialize(void* buffer, size_t bufferSize, size_t blockSize, size_t alignment)
{
	// infer class members based on parameters
	m_blocksPerChunk = bufferSize/ blockSize;
	m_bufferSize = bufferSize;
	m_blockSize = blockSize;
	m_alignment = alignment;

	m_base = nullptr;
	m_freeBlocks = nullptr;

	// allocating blocks from a chunk
	// may move this to a different method later; 
	BreakUpChunk(buffer);

	if (m_freeBlocks != nullptr)
	{
		return true;
	}
	else
	{
		return false;
	}
}

//------------------------------------------------------------------------------------------------------------------------------
void BlockAllocator::Deinitialize()
{
	//If m_base is not nullptr, we have chunks
	if (m_base) 
	{
		std::scoped_lock chunkLock(m_chunkLock);

		Chunck_T* list = nullptr;

		// free chunks
		while (m_chunkList != nullptr) 
		{
			list = m_chunkList;
			m_chunkList = m_chunkList->next;
			m_base->Free(list);
		}
	} //Else condition is normal cleanup

	std::scoped_lock blockLock(m_blockLock);

	//Reset members
	m_base = nullptr;
	m_freeBlocks = nullptr;
	m_blockSize = 0U;
	m_blocksPerChunk = 0U;
}

//------------------------------------------------------------------------------------------------------------------------------
void* BlockAllocator::Allocate(size_t size)
{
	if (size <= m_blockSize)
	{
		return AllocateBlock();
	}
	else
	{
		return nullptr;
	}
}

//------------------------------------------------------------------------------------------------------------------------------
void BlockAllocator::Free(void* ptr)
{
	FreeBlock(ptr);
}

//------------------------------------------------------------------------------------------------------------------------------
void* BlockAllocator::AllocateBlock()
{
	Block_T* block = PopFreeBlock();
	
	while (block == nullptr) 
	{
		if (!AllocateChunk()) 
		{
			return nullptr;
		}

		block = PopFreeBlock();
	}

	return block;
}

//------------------------------------------------------------------------------------------------------------------------------
void BlockAllocator::FreeBlock(void* ptr)
{
	Block_T* block = (Block_T*)ptr;
	PushFreeBlock(block);
}

//------------------------------------------------------------------------------------------------------------------------------
BlockAllocator* BlockAllocator::CreateInstance()
{
	if (gBlockAllocator == nullptr)
	{
		gBlockAllocator = new BlockAllocator();
	}

	return gBlockAllocator;
}

//------------------------------------------------------------------------------------------------------------------------------
void BlockAllocator::DestroyInstance()
{
	if (gBlockAllocator != nullptr)
	{
		delete gBlockAllocator;
		gBlockAllocator = nullptr;
	}
}

//------------------------------------------------------------------------------------------------------------------------------
BlockAllocator* BlockAllocator::GetInstance()
{
	if (gBlockAllocator == nullptr)
	{
		CreateInstance();
	}

	return gBlockAllocator;
}

//------------------------------------------------------------------------------------------------------------------------------
bool BlockAllocator::AllocateChunk()
{
	//We can't allocate a chunk if we don't have a sub allocator
	if (m_base == nullptr) 
	{
		return false;
	}

	if (m_chunkLock.try_lock()) 
	{
		//Allocate a chunk of memory if the base allocator is able to 
		size_t chunkSize = m_blocksPerChunk * m_blockSize + sizeof(Block_T);

		Chunck_T* chunk = (Chunck_T*)m_base->Allocate(chunkSize);
		if (chunk == nullptr) 
		{
			return false;
		}

		//Track this chunk so we can free this later
		chunk->next = m_chunkList;
		m_chunkList = chunk;

		//Break up newly allocated chunk
		BreakUpChunk(chunk + 1);
		m_chunkLock.unlock();
	}

	return true;
}

//------------------------------------------------------------------------------------------------------------------------------
void BlockAllocator::BreakUpChunk(void* buffer)
{
	byte* buf = (byte*)buffer;
	Block_T* first = (Block_T*)buf;
	Block_T* head = nullptr;

	for (uint i = 0; i < m_blocksPerChunk; ++i) 
	{
		Block_T* node = (Block_T*)buf;
		buf += m_blockSize;

		node->next = head;
		head = node;
	}

	{
		//Lock so we are thread safe
		std::scoped_lock lock(m_blockLock);
		
		first->next = m_freeBlocks;
		m_freeBlocks = head;
	}
}

//------------------------------------------------------------------------------------------------------------------------------
void BlockAllocator::PushFreeBlock(Block_T* block)
{
	std::scoped_lock blockLock(m_blockLock);

	block->next = m_freeBlocks;
	m_freeBlocks = block;
}

//------------------------------------------------------------------------------------------------------------------------------
Block_T* BlockAllocator::PopFreeBlock()
{
	std::scoped_lock blockLock(m_blockLock);

	Block_T* head = m_freeBlocks;
	if (head != nullptr) 
	{
		m_freeBlocks = head->next;
	}

	return head;
}

The block allocator also inherits from internal allocator and performs tracking on the memory allocated.

Object Allocator

This is a templated allocator that allocates memory of the size of the required object. The implementation inherits from the Block Allocator and performs allocations of size T where T represents the type of the object for which memory is to be allocated.

template <typename OBJ>
class ObjectAllocator : private BlockAllocator
{
public:
	void Initialize(InternalAllocator* parent, uint blocksPerChunk)
	{
		return BlockAllocator::Initialize(parent, sizeof(OBJ), alignof(OBJ), blocksPerChunk);
	}

	void Deinitialize()
	{
		BlockAllocator::Deinitialize();
	}

	virtual void* Allocate(size_t size)
	{
		return BlockAllocator::Allocate(size);
	}

	virtual void* Free(void* ptr)
	{
		return BlockAllocator::Free(ptr);
	}

	template <typename ...ARGS>
	OBJ* Create(ARGS ...args)
	{
		void* mem = Allocate(sizeof(OBJ));
		if (mem != nullptr)
		{
			return new(mem) OBJ(std::forward(args)...);
		}
		else
		{
			return nullptr;
		}
	}

	void Destroy(OBJ* object)
	{
		Free(object);
	}
};

These are the different allocators used in Prodigy Engine but the memory tracking is achieved by overloading the new and delete operators.

Memory Tracking and Call Stacks

To track memory and call stacks, the Prodigy Engine uses a simple memory tracker that overloads operators new and delete and keeps some global values that store the number of allocations, allocation sizes and amount of memory freed for the project.

Below is the implementation of the MemTracker in Prodigy Engine:

struct MemTrackInfo_T
{
	void* m_originalPointer;
	size_t m_byteSize;
	Callstack m_callstack;
};

struct LogTrackInfo_T
{
	uint m_numAllocations;
	size_t m_allocationSizeInBytes;
	Callstack m_callstack;
};

// Human parse able data
std::string			GetSizeString(size_t byte_count);

void*				UntrackedAlloc(size_t byte_count);
void				UntrackedFree(void* ptr);

void*				TrackedAlloc(size_t byte_count);
void				TrackedFree(void* ptr);

void				TrackAllocation(void* allocation, size_t byte_count);
void				UntrackAllocation(void* allocation);

size_t				MemTrackGetLiveAllocationCount();
size_t				MemTrackGetLiveByteCount();
void				MemTrackLogLiveAllocations();

//------------------------------------------------------------------------------------------------------------------------------
void* operator new(size_t size)
{
	return TrackedAlloc(size);
}

//------------------------------------------------------------------------------------------------------------------------------
void operator delete(void* ptr)
{
	TrackedFree(ptr);
}

//------------------------------------------------------------------------------------------------------------------------------
void* operator new(size_t size, InternalAllocator& allocator)
{
	return allocator.Allocate(size);
}

//------------------------------------------------------------------------------------------------------------------------------
void operator delete(void* ptr, InternalAllocator& allocator)
{
	return allocator.Free(ptr);
}

//------------------------------------------------------------------------------------------------------------------------------
void* UntrackedAlloc(size_t size)
{
	return ::malloc(size);
}

//------------------------------------------------------------------------------------------------------------------------------
void UntrackedFree(void* ptr)
{
	return ::free(ptr);
}

//------------------------------------------------------------------------------------------------------------------------------
void* TrackedAlloc(size_t byte_count)
{
	// One suggestion and example on how to break up this function
	// based on build config; 
#if !defined(MEM_TRACKING)
	return UntrackedAlloc(byte_count);
#else
	#if (MEM_TRACKING == MEM_TRACK_ALLOC_COUNT)
		++gTotalAllocations;
		++tTotalAllocations;

		void* allocation = ::malloc(byte_count);
		return allocation;
	#elif (MEM_TRACKING == MEM_TRACK_VERBOSE)
		++gTotalAllocations;
		gTotalBytesAllocated += byte_count;

		++tTotalAllocations;
		tTotalBytesAllocated += byte_count;

		void* allocation = ::malloc(byte_count);
		TrackAllocation(allocation, byte_count);
		return allocation;
	#endif
#endif
}

//------------------------------------------------------------------------
void TrackedFree(void* ptr)
{
#if (MEM_TRACKING == MEM_TRACK_ALLOC_COUNT)
	if (ptr == nullptr)
		return;

	--gTotalAllocations;
	++tTotalFrees;
	return ::free(ptr);
#elif (MEM_TRACKING == MEM_TRACK_VERBOSE)
	--gTotalAllocations;

	++tTotalFrees;

	UntrackAllocation(ptr);
#endif
}

//------------------------------------------------------------------------
void TrackAllocation(void* allocation, size_t byte_count)
{
#if (MEM_TRACKING == MEM_TRACK_VERBOSE)
	Callstack callstack = CallstackGet();
	
	MemTrackInfo_T info;

	info.m_byteSize = byte_count;
	info.m_callstack = callstack;
	info.m_originalPointer = allocation;
	{
		std::scoped_lock lock(GetMemTrackerLock());
		GetMemTrakingMap()[allocation] = info;
	}
#endif
}

//------------------------------------------------------------------------
void UntrackAllocation(void* allocation)
{
	{
		std::scoped_lock lock(GetMemTrackerLock());
		auto mapIterator = GetMemTrakingMap().find(allocation);
		::free(allocation);
		if (mapIterator != GetMemTrakingMap().end())
		{
			gTotalBytesAllocated -= mapIterator->second.m_byteSize;
			tTotalBytesFreed += mapIterator->second.m_byteSize;
			GetMemTrakingMap().erase(mapIterator);
		}
	}
}

This is accompanied by a CallStack class that shows the call stack for the allocation requested:

#define MAX_TRACE 64
#define TRACE_MAX_FUNCTION_NAME_LENGTH 1024

typedef unsigned int uint;

class Callstack
{
public:
	void* m_trace[MAX_TRACE];	// execution pointers representing where we are in code; 
	uint m_depth = 0;			
	unsigned long m_hash = 0;
};

Callstack CallstackGet(uint skip_frames = 0);
// Convert a callstack to strings
std::vector<std::string> GetCallstackToString(Callstack const& cs);

//------------------------------------------------------------------------------------------------------------------------------
Callstack CallstackGet(uint skip_frames /*= 0*/)
{
	Callstack stackTraceObject;	
	stackTraceObject.m_depth = CaptureStackBackTrace(skip_frames, MAX_TRACE, stackTraceObject.m_trace, &stackTraceObject.m_hash);

	return stackTraceObject;
}

//------------------------------------------------------------------------------------------------------------------------------
std::vector<std::string> GetCallstackToString(Callstack const& callStack)
{
	HANDLE pHandle = GetCurrentProcess();

	SymInitialize(pHandle, nullptr, true);
	std::vector<std::string> callStackStrings;
	
	PIMAGEHLP_SYMBOL64 symbol = (PIMAGEHLP_SYMBOL64)malloc(sizeof(PIMAGEHLP_SYMBOL64) + (TRACE_MAX_FUNCTION_NAME_LENGTH - 1) * sizeof(TCHAR));
	memset(symbol, 0, sizeof(PIMAGEHLP_SYMBOL64) + (TRACE_MAX_FUNCTION_NAME_LENGTH - 1) * sizeof(TCHAR));
	symbol->MaxNameLength = TRACE_MAX_FUNCTION_NAME_LENGTH;
	symbol->SizeOfStruct = sizeof(PIMAGEHLP_SYMBOL64);
	PDWORD64 displacement = 0;
	IMAGEHLP_LINE64 *line = (IMAGEHLP_LINE64 *)malloc(sizeof(IMAGEHLP_LINE64));
	memset(line, 0, sizeof(IMAGEHLP_LINE64));
	line->SizeOfStruct = sizeof(IMAGEHLP_LINE64);

	for (uint stackTraceIndex = 0; stackTraceIndex < callStack.m_depth - 2; ++stackTraceIndex)
	{
		bool symResult = SymGetSymFromAddr64(pHandle, (DWORD64)callStack.m_trace[stackTraceIndex], displacement, (PIMAGEHLP_SYMBOL64)symbol);
		if (symResult)
		{
			DWORD pdwDisplacement = 0;
			bool lineResult = SymGetLineFromAddr64(pHandle, (DWORD64)callStack.m_trace[stackTraceIndex], &pdwDisplacement, line);
			if (lineResult)
			{
				DebuggerPrintf("%s(%d):%s \n", line->FileName, line->LineNumber, symbol->Name);
				callStackStrings.push_back(line->FileName);
			}
			else
			{
				DebuggerPrintf("Error from SymGetSymFromAddr64: %lu.\n", GetLastError());
			}
		}
		else
		{
			DebuggerPrintf("Error from SymFromAddr: %lu.\n", GetLastError());
		}
	}

	return callStackStrings;
}

With this, Prodigy Engine can perform tracking and use this data when required in the Profiler.