Author Topic: Random generator skewed  (Read 3555 times)

Offline Alpha Centauri Bear

  • Colonel
  • ****
  • Posts: 466
    • View Profile
Random generator skewed
« on: October 17, 2022, 09:53:42 pm »
Hello, fellow player and programmers.

I have been trying to optimize dogfight parameters for better game experience and stumbled across strange statistical skew. At low accuracy numbers craft weapon seems to miss more than expected.

Here is my test. I modified parameters so that I need exactly 100 bullets to down an USO (weapon damage and accuracy against USO damage capacity). I also set USO size to 2 to avoid size accuracy modifier for simplicity sake. This is not that much of statistics but I assumed with 100 tries the standard deviation of about 10 should ensure that most of results would fall into 3 sigma interval (30 in either direction). Pretty crude but I can reproduce multiple tries as necessary or anyone else can check it out too.

Here are results for different accuracy. Sometimes measured twice on different sets of parameters to add to variability.
accuracycount1count2
100106
8096
7094
60109105
50105
40124114
30125
20148135
10142130
5145
4147

As you can see, accuracy of 50% and above is within statistically acceptable variance. Whereas going down below it becomes more and more skewed toward more bullet needed. These results are way beyond 3-sigma variance. So I conclude it is definitely skewed.

I checked the coded and didn't find anything outright wrong.

This function, though, feels kind of fishy for me. It generates 64 byte long then convert it to double then divide this double to the (max long as double divided by interval). That may result in quite significant rounding issues.
https://github.com/OpenXcom/OpenXcom/blob/e541af4923fdf99b36b8e90dbbc1482aa73458b3/src/Engine/RNG.cpp#L90

Offline Alpha Centauri Bear

  • Colonel
  • ****
  • Posts: 466
    • View Profile
Re: Random generator skewed
« Reply #1 on: October 17, 2022, 11:12:27 pm »
If not much trouble, RNG should we rewritten in more accurate manner. I can volunteer for code modification if needed.

Offline Meridian

  • Global Moderator
  • Commander
  • *****
  • Posts: 9098
    • View Profile
Re: Random generator skewed
« Reply #2 on: October 17, 2022, 11:20:56 pm »
The code you mentioned is not even used in dogfight.

Before we need to rewrite it in a more accurate manner, you'll need to provide more rigorous evidence it's skewed.

Offline Alpha Centauri Bear

  • Colonel
  • ****
  • Posts: 466
    • View Profile
Re: Random generator skewed
« Reply #3 on: October 18, 2022, 05:23:19 am »
Agree.

However, that is the problem with random generators. They are random and require tons of statistics to prove any skewness. Besides, it could be not even randomness but something else.
For example, RNG uses seed that seems to be saved. So if I replay same fight over and over again, I get the same exactly results. Need to understand how to make it more randomized. Any idea if debug mode may disable or override current seed?

You are right about the code. This one is used. I confused types.
https://github.com/OpenXcom/OpenXcom/blob/e541af4923fdf99b36b8e90dbbc1482aa73458b3/src/Engine/RNG.cpp#L78



Here is a little bit more rigorous test. Not sure if rigorous enough, though. I modified parameters so that each bullet carries 0-1 damage, 5% accuracy and corresponding USO damage capacity so it should take 1000 bullets on average to down it. That should bring the standard deviation of binomial distribution to:
Code: [Select]
sigma = sqrt(1000 * 0.05 * 0.95) = 7
So we should expect 99% of the results to fall within of 7*3 = 20 of average or in the 980 - 1020 range.

I started new game and just waited for USOs one by one. Never repeated a test with same one to make sure I am not falling to repeated seed problem. Here are 5 observations;

accuracy = 5%
0823
0824
0726
1052
1044

Well, I stand corrected. There is no systematic skew to specifically reduce accuracy for low values. However, there is still extremely high spread way beyond 3 sigmas. Let me do more observations.
« Last Edit: October 18, 2022, 06:03:40 am by Alpha Centauri Bear »

Offline Meridian

  • Global Moderator
  • Commander
  • *****
  • Posts: 9098
    • View Profile
Re: Random generator skewed
« Reply #4 on: October 18, 2022, 10:52:36 am »
I'll add a small test with 4% RNG.

Code:

Code: [Select]
int sample_size = 100;
int hits = 0;
for (int tries = 0; tries < sample_size; tries++)
{
if (RNG::percent(4)) hits++;
}

Results with different sample sizes:

sample_size = 100; hits = 2 = 2.00%
sample_size = 1000; hits = 35 = 3.50%
sample_size = 10000; hits = 401 = 4.01%
sample_size = 100000; hits = 3927 = 3.92%
sample_size = 1000000; hits = 40054 = 4.01%
sample_size = 10000000; hits = 399864 = 4.00%
sample_size = 100000000; hits = 4001746 = 4.00%

Offline Alpha Centauri Bear

  • Colonel
  • ****
  • Posts: 466
    • View Profile
Re: Random generator skewed
« Reply #5 on: October 18, 2022, 03:36:59 pm »
Thank you for that, M. That was exactly what I would do to test it. This seems to be completely fine and I must just be using too small sample size to account for these deviation.

Let me close this question.

Offline Empiro

  • Captain
  • ***
  • Posts: 88
    • View Profile
Re: Random generator skewed
« Reply #6 on: September 30, 2023, 02:24:13 am »
Most random number generators work well, but it's the use of the random numbers that may be problematic. I remember a while back I took a look at the code, and noticed at least one or two places where there were off-by-one errors when it came to generating alien missions (the random number should have been [0, x - 1], but the game used [1, x]), slightly biasing larger values. I don't know if that has since been fixed.

Offline Meridian

  • Global Moderator
  • Commander
  • *****
  • Posts: 9098
    • View Profile
Re: Random generator skewed
« Reply #7 on: September 30, 2023, 10:36:47 am »
I'm not aware of any such issue, present or past.

Just guessing you might be referencing the WeightedOptions, in which case it is intentional and correct.

Offline Empiro

  • Captain
  • ***
  • Posts: 88
    • View Profile
Re: Random generator skewed
« Reply #8 on: October 04, 2023, 12:27:27 am »
I took another look at the code. If it is intentional, then it could be misleading:

Quote
/**
 * Generates a random integer number, inclusive.
 */
int RandomState::generate(int min, int max)
{
   return (int)(next() % (max - min + 1) + min);
}

Within WeightOptions, the selection code is:
Quote
    size_t var = RNG::generate(0, _totalWeight);
   auto ii = _choices.begin();
   for (; ii != _choices.end(); ++ii)
   {
      if (var <= ii->second)
         break;
      var -= ii->second;
   }

If we take a super simple example, with exactly 3 options each with weight 1, then _totalWeight is 3, but var can be any one of 4 values:

0, 1, 2, 3

Both 0 and 1 would return the 0th option, then 2 returns the next option, and 3 returns the final option.

It's a slight bias toward the 0th option and probably would go unnoticed in most X-Com games, since the mission weights tend to total to 100. I can see mods that use small sets of options with small weights being noticeably biased, however.

It might be worthwhile to do a bigger refactor and change generate() to return a non-inclusive end (by removing the +1 in generate()). Most random number generators behave in that manner because it eliminates a lot of +1s and -1s scattered through the code to make things work. For example, percent() would call generate(0, 100), and end - start always gives you the number of distinct possible values for generate.

Offline Yankes

  • Global Moderator
  • Commander
  • *****
  • Posts: 3350
    • View Profile
Re: Random generator skewed
« Reply #9 on: October 04, 2023, 01:21:04 am »
No, `RNG::generate` work correctly, and changing it would break any other code that use it.
Only correct solution is fix only this usage of random as it should return values `0,1,2` or `1,2,3`.
This mean more correct would be `RNG::generate(1, _totalWeight);` here if `_totalWeight > 0`

Offline Empiro

  • Captain
  • ***
  • Posts: 88
    • View Profile
Re: Random generator skewed
« Reply #10 on: October 04, 2023, 07:36:43 am »
No, `RNG::generate` work correctly, and changing it would break any other code that use it.
Only correct solution is fix only this usage of random as it should return values `0,1,2` or `1,2,3`.
This mean more correct would be `RNG::generate(1, _totalWeight);` here if `_totalWeight > 0`

Yes, your suggestion would definitely fix it here.

However, from professional experience, a random number function that generates inclusive end-bounds is prone to bugs and is generally more inconvenient to use. This is due to often needing to specify -1 or starting from 1 (0-based indexing is more common in languages, and mixing both leads to bugs). I mentioned a refactor, so we definitely would need to go to each call and fixing it if need be. I wouldn't be surprised if other off-by-one errors weren't also present.

Changing the signature of generate() would bring it in-line with widely-used industry standards.

Example Java - nextInt generates values [0, bound) : https://docs.oracle.com/javase/8/docs/api/java/util/Random.html#nextInt-int-

Example C# (generates exclusive end bound): https://learn.microsoft.com/en-us/dotnet/api/system.random.next?view=net-7.0#system-random-next(system-int32-system-int32

Example Go (same as Java): https://pkg.go.dev/math/rand#Int31n

Also a side note: the generate() function as implemented here actually has a tiny skew even if used properly. It's because it's unlikely that max + 1 - min neatly divides into the range of next(). When you take the %, it slightly biases the values to the lower end of the requested range. However, this bias is so tiny that unless you're doing a very careful statistical analysis or max + 1 - min is huge (like close to 2^30), you'll never notice it.

Offline Yankes

  • Global Moderator
  • Commander
  • *****
  • Posts: 3350
    • View Profile
Re: Random generator skewed
« Reply #11 on: October 04, 2023, 04:36:42 pm »
Changing `generate` is no-go as lot of other code really on its current behavior, and before change like this EVERY current usage (and future in OTHER branches) need be cheeked.

Beside C++ standard use inclusive range, because other wise you can't have `[0,MAX_INT]` correctly handled, simply each type of API have its strong and weaker sides. If you create new code and do not like current generate it would be lot better to add new function that have exactly behavior you propose.

Offline Meridian

  • Global Moderator
  • Commander
  • *****
  • Posts: 9098
    • View Profile
Re: Random generator skewed
« Reply #12 on: October 04, 2023, 05:38:46 pm »
Within WeightOptions, the selection code is:

You are right, there seem to indeed be a few places which use (0, x) instead of (1, x) in the context of "weighted options".
For example the WeightedOptions.cpp
(I actually didn't check exactly this one, even though my previous answer would suggest so, I checked a few other places with "weighted options" which all used (1, x), my bad.)

I will go through all calls of RNG::generate with Yankes and the OXC devs and correct the ones that need correction.

PS: I will also extract this discussion into a separate thread, since it's just openxcom bugs, not skewed rng