Author Topic: code review #1  (Read 7400 times)

Offline djemon_lda

  • Captain
  • ***
  • Posts: 52
    • View Profile
code review #1
« on: October 13, 2013, 03:18:49 pm »
ok, so yesterday in EU4 I've got a little too greedy, kicked the shit out of a neighbour country and its allies, and took a little too much from their countries ( including vasalizing one of them by force, and taking a bit land from another ) so they made a coalition against me, and to regenerate before yet another war I turned off the game and thought I need to relax in a different way.

I'm aiming in fixing the reaction fire mechanics, so I wanted to use a priority queue, but then I thought that the regular priority que doesn't my full needs in my project, so it is likely it wouldn't fit yours here, hence I've got a fun attitude of making a nice heap that would meet most of my needs ( I still will need to expand it a little for my project ) and sharing it.

I've used __forceinline when I wanted to be totally sure the compiler will inline the code. I also tried to bring things up if possible, so that is why I've made static all the metods that don't use a field, or a nonstatic method, because not passing the "this" pointer saves time. That way I got both : speed as if all that was typed directly in the code, and readability - and thanks to this I could do things like bitwise shifts, instead of multiplying/dividing by 2, because things like that are hidden behind meaningful names that describe my intentions.

The reason why I did this in 3 ways ( defined a method outside the class, defined it inside the class, defined inside and use __forceinline ) is because I wanted respectively: to be sure that the compiler will generate a function call, to suggest that a method can be inlined if the compiler sees this as an optimization, to make sure, that the piece of code will be inlined.

I know you are compiling on different platforms, so this should do the trick :)

Code: [Select]
#ifdef _MSC_VER
#define __forceinline __forceinline
#else
#define __forceinline inline
#endif

so here it is:


Code: [Select]
template<
typename DataType,
typename HeapIndexer = DirectIndexer,
typename HeapRelation = std::greater_equal<DataType>,
template <typename> class IndexParameterDefinition = PassByValue,
template <typename> class DataParameterDefinition = PassByValue
>
class Heap
{
private:
static const size_t TopIndex = 1;
public:
typedef typename IndexParameterDefinition<typename HeapIndexer::IndexingType>::PassedValue Index;
typedef typename DataParameterDefinition<DataType>::PassedValue Data;
Heap(const std::vector<DataType> data);

__forceinline const size_t& Size()const
{
return _heapSize;
}

const DataType& operator[](const Index indexToBrowseWith)const;

const DataType& Top()const
{
return _heapData[TopIndex];
}
void SetValue(const Index index, Data dataType);
__forceinline void Pop()
{
RemoveInternal(TopIndex);
HeapDown(TopIndex);
}
void Remove(const Index index)
{
const size_t indexToRemoveAt = _indexer.ToRawIndex(index);
RemoveInternal(indexToRemoveAt);
HeapifyInternal(indexToRemoveAt);
}
const bool IsEmpty()const
{
return _heapSize == 0;
}
void Clear();
private:

void HeapifyInternal(const size_t index);
void HeapDown(const size_t index);
size_t GetIndexOfHighestInTercet(const size_t index)const;
void HeapUp(const size_t index);
void Swap(const size_t leftIndex, const size_t rightIndex);
void SetMockOnZeroIndex();
void RemoveInternal(const size_t index)
{
Swap(index, _heapSize);
_heapSize--;
}
bool ShouldLeftsideBeHigherInHeap(const size_t leftsideIndex, const size_t rightsideIndex)const
{
return _shouldLeftGoUp(_heapData[leftsideIndex],_heapData[rightsideIndex]);
}
bool NeedsUpHeaping(const size_t index)const
{
const size_t parent = GetParent(index);
https:// if index = 1, then parent will be equal to 0. this is the only situation where an index doesn't have a legal parent.
return parent && _shouldLeftGoUp(_heapData[index],_heapData[parent]);
}

__forceinline static size_t GetLeftChild(const size_t index)
{
return index << 1; https:// multiply by 2
}
__forceinline static size_t GetRightChild(const size_t index)
{
return GetLeftChild(index) + 1; https:// GetLeftChild(...) is forced inline, so no overhead.
}
__forceinline static size_t GetParent(const size_t index)
{
return index >> 1; https:// divide by 2
}


__forceinline bool IsValidChild(const size_t index)const
{
return index <= _heapSize;
}

https:// an assumption that allows the optimization:
https:// the index of the parent given to this method is always acquired
https:// by using the GetParent method on a legal index inside the
__forceinline static bool IsParentOnHeap(const size_t indexOfAParent)
{
https:// index is in range 0 to 2^32-1, so 'index' returns the same bool value as 'index>0'
return indexOfAParent;
}

std::vector<DataType> _heapData;
HeapIndexer _indexer;
HeapRelation _shouldLeftGoUp;
size_t _heapSize;
};

template<
typename DataType,
typename HeapIndexer,
typename HeapRelation,
template <typename> class IndexParameterDefinition,
template <typename> class DataParameterDefinition
>
const DataType& Heap<DataType, HeapIndexer, HeapRelation,IndexParameterDefinition,DataParameterDefinition>::operator[](const Index indexToBrowseWith)const
{
return _heapData[ _indexer.ToRawIndex(indexToBrowseWith) ];
}
template<
typename DataType,
typename HeapIndexer,
typename HeapRelation,
template <typename> class IndexParameterDefinition,
template <typename> class DataParameterDefinition
>
void Heap<DataType, HeapIndexer, HeapRelation,IndexParameterDefinition,DataParameterDefinition>::Swap(const size_t leftIndex, const size_t rightIndex)
{
std::swap(_heapData[leftIndex], _heapData[rightIndex]);
_indexer.Swap(leftIndex, rightIndex);
}
template<
typename DataType,
typename HeapIndexer,
typename HeapRelation,
template <typename> class IndexParameterDefinition,
template <typename> class DataParameterDefinition
>
void Heap<DataType, HeapIndexer, HeapRelation,IndexParameterDefinition,DataParameterDefinition>::HeapUp(const size_t index)
{
size_t reciever = GetParent(index);
size_t donor = index;
while(IsParentOnHeap(reciever) && ShouldLeftsideBeHigherInHeap(donor, reciever)) https:// a small optimization
{
Swap(reciever, donor);
donor = reciever;
reciever = GetParent(reciever);
}
}

template<
typename DataType,
typename HeapIndexer,
typename HeapRelation,
template <typename> class IndexParameterDefinition,
template <typename> class DataParameterDefinition
>
size_t Heap<DataType, HeapIndexer, HeapRelation,IndexParameterDefinition,DataParameterDefinition>::GetIndexOfHighestInTercet(const size_t index)const
{
size_t higherItemIndex = index;
const size_t leftChild = GetLeftChild(index);
const size_t rightChild = GetRightChild(index);
if(IsValidChild(leftChild) && ShouldLeftsideBeHigherInHeap(leftChild, higherItemIndex))
{
higherItemIndex = leftChild;
}
if(IsValidChild(rightChild) && ShouldLeftsideBeHigherInHeap(rightChild, higherItemIndex))
{
higherItemIndex = rightChild;
}
return higherItemIndex;
}

template<
typename DataType,
typename HeapIndexer,
typename HeapRelation,
template <typename> class IndexParameterDefinition,
template <typename> class DataParameterDefinition
>
void Heap<DataType, HeapIndexer, HeapRelation,IndexParameterDefinition,DataParameterDefinition>::HeapifyInternal(const size_t index)
{
if(NeedsUpHeaping(index))
{
HeapUp(index);
}else
{
HeapDown(index);
}
}

template<
typename DataType,
typename HeapIndexer,
typename HeapRelation,
template <typename> class IndexParameterDefinition,
template <typename> class DataParameterDefinition
>
void Heap<DataType, HeapIndexer, HeapRelation,IndexParameterDefinition,DataParameterDefinition>::HeapDown(const size_t index)
{
size_t donor = index;
size_t reciever = GetIndexOfHighestInTercet(donor);
while(donor != reciever)
{
Swap(donor,reciever);
donor = reciever;
reciever = GetIndexOfHighestInTercet(reciever);
}
}
template<
typename DataType,
typename HeapIndexer,
typename HeapRelation,
template <typename> class IndexParameterDefinition,
template <typename> class DataParameterDefinition
>
void Heap<DataType, HeapIndexer, HeapRelation,IndexParameterDefinition,DataParameterDefinition>::SetValue(const Index index, Data dataType)
{
const size_t internalIndex = _indexer.ToRawIndex(index);
_heapData[internalIndex] = dataType;
HeapifyInternal(internalIndex);
}
template<
typename DataType,
typename HeapIndexer,
typename HeapRelation,
template <typename> class IndexParameterDefinition,
template <typename> class DataParameterDefinition
>
Heap<DataType, HeapIndexer, HeapRelation,IndexParameterDefinition,DataParameterDefinition>::Heap(const std::vector<DataType> data)
: _heapSize(data.size())
{
https:// add 0 index mock value
SetMockOnZeroIndex();
for(auto currentDataItem = data.begin(); currentDataItem != data.end(); ++currentDataItem)
{
_heapData.push_back(*currentDataItem);
}
const size_t halfOfHeap = _heapSize / 2;
for(size_t currentIndex = halfOfHeap; currentIndex >= TopIndex; --currentIndex)
{
HeapDown(currentIndex);
}
}
template<
typename DataType,
typename HeapIndexer,
typename HeapRelation,
template <typename> class IndexParameterDefinition,
template <typename> class DataParameterDefinition
>
void Heap<DataType, HeapIndexer, HeapRelation,IndexParameterDefinition,DataParameterDefinition>::Clear()
{
_heapData.clear();
_heapSize = 0;
_indexer.Clear();
SetMockOnZeroIndex();

}

template<
typename DataType,
typename HeapIndexer,
typename HeapRelation,
template <typename> class IndexParameterDefinition,
template <typename> class DataParameterDefinition
>
void Heap<DataType, HeapIndexer, HeapRelation,IndexParameterDefinition,DataParameterDefinition>::SetMockOnZeroIndex()
{
_indexer.Add (Index(), 0);
_heapData.push_back(DataType());
}

I've wrote two simple heap indexers:
the default one, which allows direct indexing, and takes just one byte of memory :)
Code: [Select]
struct DirectIndexer
{
typedef size_t IndexingType;
__forceinline size_t ToRawIndex(const size_t& indirectIndex)const
{
return indirectIndex;
}
https:// no need to handle mapping in a direct indexer :)
https:// calls to those method should be removed from the binary when compiled with -o2
void Add(const size_t, const size_t){}
void Swap(const size_t, const size_t){}
void Clear(){}
};
a templated default mapped indexer ( for example to be able to map the things in the heap with strings, points or whatever you may wish ) - this one weights 40bytes
Code: [Select]
template<typename IndexType>
struct Indexer
{
typedef typename IndexType IndexingType;
__forceinline size_t ToRawIndex(const IndexType& indirectIndex)
{
return _indexToInternalMap[indirectIndex];
}
void Add(const IndexType& indirectIndex, const size_t directIndex)
{
_indexToInternalMap[indirectIndex] = directIndex;
_internalToIndexMap.push_back( indirectIndex );
}
void Swap(const size_t leftDirectIndex, const size_t rightDirectIndex)
{
const IndexType& leftIndirectIndex = _internalToIndexMap[leftDirectIndex];
const IndexType& rightIndirectIndex = _internalToIndexMap[rightDirectIndex];
std::swap(_internalToIndexMap[leftDirectIndex], _internalToIndexMap[rightDirectIndex]);
std::swap(_indexToInternalMap[leftIndirectIndex], _indexToInternalMap[rightIndirectIndex]);
}
void Clear()
{
_internalToIndexMap.clear();
_indexToInternalMap.clear();
}
private:
std::map<IndexType,size_t> _indexToInternalMap;
std::vector<IndexType> _internalToIndexMap;
};

I also have given an option to chose how to pass the parameters to the public methods, whether it would be by value or by reference, thanks to these two guys:
Code: [Select]
template<typename T>
struct PassByValue
{
typedef T PassedValue;
};
Code: [Select]
template<typename T>
struct PassByReference
{
typedef T& PassedValue;
};
finally, I've made simplified templates, because mostly the change will be between the type of data hold and the type of index, so I've made these four guys, all with default way of passing value to the public methods, and relation set accordingly to their names.

Code: [Select]
template<typename DataType>
class MaxHeap : public Heap<DataType>{};
Code: [Select]
template<typename DataType>
class MinHeap : public Heap<DataType,DirectIndexer,std::less_equal<DataType>>
{
public:
MinHeap(const std::vector<DataType> data) : Heap(data){}
};
Code: [Select]
template<typename DataType, typename IndexType>
class IndexedMaxHeap : public Heap<DataType, Indexer<IndexType>>{};
Code: [Select]
template<typename DataType, typename IndexType>
class IndexedMinHeap : public Heap<DataType, Indexer<IndexType>, std::less_equal<DataType>>{};


I've made a lot of automated, randomized tests ( I was hitting hard the part with removing items from an arbitrary index, and tested the mechanism above 500 milions of times on small and medium heap sizes ( <50, 10000 elements respectively ) and with bigger or smaller duplicate value ratio ( due to setting the value cap for 50 to 32 thousand ( the max rand value ) ), and in total there was above 1000000000 tests of all the heaps mechanisms in total).

the method I was checking the heap property preserved at all times is

Code: [Select]
template<typename T>
bool IsChildInRightRelationsWithParent(const T& child, const T& parent)
{
#ifdef _MIN_HEAP_
return child >= parent;
#else
return child <= parent;
#endif
}

template<typename T>
bool RecurrentCheck(const T& heap, const size_t index)
{
const size_t size = heap.Size();
const size_t leftChild = index * 2;
const size_t rightChild = leftChild + 1;

const bool hasLeftChild = leftChild <= size;
const bool hasRightChild = rightChild <= size;

const bool leftChildOK = (hasLeftChild && IsChildInRightRelationsWithParent(heap[leftChild] , heap[index])) || !hasLeftChild;
const bool rightChildOK = ((hasRightChild) && IsChildInRightRelationsWithParent(heap[rightChild] , heap[index])) || !hasRightChild;

const bool rightIsOk = (!hasRightChild) || RecurrentCheck(heap, rightChild);
const bool leftIsOk = (!hasLeftChild) || RecurrentCheck(heap, leftChild);
const bool isEverythingAllRight = leftChildOK && rightChildOK && leftIsOk && rightIsOk;
}

template<typename T>
bool CheckHeap(const T& heap)
{
return RecurrentCheck(heap, 1);
}

I've used a trick here, that allows to avoid branching of the recurrent function.

if something looks suspicious, or like na error, post about it, and I will verify and comment the reasonin behind.
regards,
djemon_lda

Offline Yankes

  • Global Moderator
  • Commander
  • *****
  • Posts: 3350
    • View Profile
Re: code review #1
« Reply #1 on: October 13, 2013, 05:40:46 pm »
Quote
I've used __forceinline when I wanted to be totally sure the compiler will inline the code. I also tried to bring things up if possible, so that is why I've made static all the metods that don't use a field, or a nonstatic method, because not passing the "this" pointer saves time. That way I got both : speed as if all that was typed directly in the code, and readability - and thanks to this I could do things like bitwise shifts, instead of multiplying/dividing by 2, because things like that are hidden behind meaningful names that describe my intentions.
inline is not enough? in g++ I see only difference in debug builds where speed is not important (I had once, after optimization, speed up 20 times :)).
Do you encounter performance degradation in release build without this keyword?

Offline myk002

  • Colonel
  • ****
  • Posts: 227
    • View Profile
Re: code review #1
« Reply #2 on: October 13, 2013, 09:00:27 pm »
Yeah -- The compiler is almost always better at deciding when inline (or __inline or any of the other variants) should be used, even when it is not specified in the code at all.  What are the numbers when you run your tests with and without any inlining (at -O2)?

Just so I understand the context, too, what are the requirements for this class?  How are they different from those for std::priority_queue?  Which use cases are you optimizing for?

Offline djemon_lda

  • Captain
  • ***
  • Posts: 52
    • View Profile
Re: code review #1
« Reply #3 on: October 17, 2013, 02:39:28 am »
well if you are measuring time before and after a big number of operations, just running it on your computer - you're doing it wrong. the information you get is biased by everything that is an opened application in your system and utilizes your cache for example, makes your test run lose the processor time etc. it is a lot more fruitful, to compare the generated assembly.

Quote
inline is not enough?
as explained above - certainty vs hope the compiler does as I need. I don't want to waste cache for the definition of that method.

Quote
Yeah -- The compiler is almost always better at deciding when inline
Yeah -- general slogans are often moot. just like the "premature optimization" one for instance... for example cache optimizations is something a compiler will basically never do for you, while this is very rewarding. the same goes with things like precise setting of memory barriers when you declare a variable to be volatile, and even if your volatile gets respected by the compiler, the you still have no clue if it will just resign from statement rearangement on the compiler level, or will it also take care of stopping the CPU from reorganizing this ?


Quote
what are the requirements for this class
you would notice that definetely if you have looked into the code. the heap I've made is indexed for external usage. I wanted the ability to change the priorities online, or remove arbitrary priorities from the heap.


what is rather disappointing is that no one has taken the effort to read the code. the heap is missing one simple detail, and I'm sure anyone taking a brief look would notice that. well, that indicates something about the readers after all

Offline Yankes

  • Global Moderator
  • Commander
  • *****
  • Posts: 3350
    • View Profile
Re: code review #1
« Reply #4 on: October 17, 2013, 10:27:25 pm »
well if you are measuring time before and after a big number of operations, just running it on your computer - you're doing it wrong. the information you get is biased by everything that is an opened application in your system and utilizes your cache for example, makes your test run lose the processor time etc. it is a lot more fruitful, to compare the generated assembly.
I doubt that comparing 20s and 1s can be biased. Especially if you repeat test and results are similar, then you can made some conclusions.
Of course this wasnt case of code optimization by deletion, it can happens in naive test.
Quote
as explained above - certainty vs hope the compiler does as I need. I don't want to waste cache for the definition of that method.
What if was multiple calls of this function? Depending how many calls was in hot/cold path you could lost more cache than function calls and function definition. Without profiling on real data you cant answer that. Sometimes it can by irrelevant because hot spot was in different place.

Offline djemon_lda

  • Captain
  • ***
  • Posts: 52
    • View Profile
Re: code review #1
« Reply #5 on: October 18, 2013, 11:54:13 am »
@Yankes: sorry, I've didn't express myself clearly: what I meat was that the increase in my case would be very biased, even if it was done many times, because the change between the solutions ( no inlining, inline, __forceinline ) might be hard to visualize in just such tests.

in final optimization it is always done in context - then you chose between optimizing for code size and optimizing for speed ( and sometimes size optimized code is faster ). I have also been told ( not tested that empirically ) that in some game dev studios whole code of the final build is merged into one big monster by a script and then compiled, as the hope for the compiler to see some specific code relations that can be optimized then.

and also it is a no optimisation vs -o2 as I've understood ( correct me if I'm wrong ) and that is a different story form just deciding over inline or not. out of optimizations that sound crazy  at first (code, not algorithmic) I mostly like cache optimizations and things like using aligned floats, for faster computation of floating point numbers. I wasn't barbaric enough to do asm blocks other than hooking to functions :P

Quote
What if was multiple calls of this function?
that is a valid question :)
the __forceinline marked method would be inlined into a method that is always invoked, because I have defined methods using those __forceinline mostly used by noninlined methods. in the assembly a code like this:

Code: [Select]
__forcedinline B(){ return something;}
A(){
stuff...
B();
other stuff...
}
int main(){
A();
A();
A();
}
would look something like this:

Code: [Select]
stuff in asm
B in asm
other stuff in asm

entry point:
call A in asm
call A in asm
call A in asm

so this is why I am not afraid about the cache in case of my methods inlined by force, because they are really simple, and private (not used outside the heap class).

thanks for your questions, I'm waiting also for something related to the code. and for someone to state what I haven't posted here :P

Offline Yankes

  • Global Moderator
  • Commander
  • *****
  • Posts: 3350
    • View Profile
Re: code review #1
« Reply #6 on: October 18, 2013, 10:51:11 pm »
Ok, I see your point. I think this is kind of style, like proper indent. It dont change meaning of code but I hate work with code that dont use it.

Offline djemon_lda

  • Captain
  • ***
  • Posts: 52
    • View Profile
Re: code review #1
« Reply #7 on: October 19, 2013, 03:49:10 pm »
yeah, you can say preciselly that :)

I just prefer to have things like index << 2 hidden behind a forcedly inlined method that fully tells my intentions through its name.