Getting a realistic output from simulated device

Using the simulated device from lightonml, I’m receiving the output corresponding to the formula |Rx|^2, with R sampled from complex normal (floats values). But real OPU returns uint8, and the mathematical model, that generates these values is not very clear. So |Rx|^2 was somehow squashed between 0 and 255 and discretized. How to make a simulated device return values like the real OPU?

Since it is difficult to model interference with billions of coefficients, the best we have found is to rescale the output in [0, 255] and quantize afterward. Rescaling is similar to tweaking the exposure time on the output device, and quantization is what happens in the hardware.

1 Like

Is it possible to retrieve this normalization constant beforehand to rescale real opu’s back? It is quite uncomfortable when you adjusted hyperparameters of the model for simulated OPU where output is real somewhere between 0 and 2, and in practice, you getting OPU output in [0,255]

Dear Bogdan, you’re right. However, there are some cases where we prefer to output the raw OPU output instead of a rescaled version. We figured it is easy for users to rescale it by themselves afterward but we may add this option in the next versions.

Given your use case in a NN pipeline, I guess it’s better for you to rescale the output of the real OPU to have a nicer range.

To have (roughly) the same range between simulated and real, you can rescale the output of the real OPU by dividing it by 21. There will still be differences in the distributions (the simulated one has more mass close to zero) but the range of values you see should be about the same. Note that this calibration is done for an input with 50% ones, so if your input is very sparse or very dense this normalization might change.

1 Like

Yes, I clearly understand that you would not (and should not) change the library for me :). And I’m trying to do exactly what you proposed - rescale OPUs output. And I not quite understand how constant scaling could work. Intuitively if we have 2n-dimensional input (with 0.5 chance of 1 of course), receiver should get more light, than when input is n-dimensional (with the same rates of ones). So the rescaling probably have to depend from input dimensionality.
PS. Did I understand correctly, that very dense vector of data and very sparse one will have different normalization?

I see your point.

If your dimension is relatively small (in the thousands/tens of thousands), the mean of the output will not change much for a 2x change in the dimension.

It might change a lot between a sample with e.g. 10% ones and one with 90% of ones but I think this is a relatively rare occurrence and it would change quite a bit also with properly normalized and high precision multiplication.

My recommendation would be to sample randomly a batch from your data and use it to compute the normalization constant at the beginning. If your dataset is not pathologically distributed (e.g. highly non-iid), this should be enough to ensure nice behavior for all other batches.