scri.utilities

Functions

`bump_function`(x, x0, x1, x2, x3[, y0, y12, y3])	Return a smooth bump function that is constant outside (x0, x3) and inside (x1, x2).
`fletcher32`(data)	Compute the Fletcher-32 checksum of an array
`multishuffle`(shuffle_widths[, forward])	Construct functions to "multi-shuffle" data
`transition_function`(x, x0, x1[, y0, y1, ...])	Return a smooth function that is constant outside (x0, x1).
`transition_function_derivative`(x, x0, x1[, ...])	Return derivative of the transition function
`transition_to_constant`(f, t, t1, t2)	Smoothly transition from the function to a constant.
`xor_timeseries`(c)	XOR a time-series of data in place
`xor_timeseries_reverse`(c)	XOR a time-series of data in place

scri.utilities.bump_function(x, x0, x1, x2, x3, y0=0.0, y12=1.0, y3=0.0)[source]

Return a smooth bump function that is constant outside (x0, x3) and inside (x1, x2).

This uses the standard C^infinity function with derivatives of compact support to transition between the the given values. By default, this is a standard bump function that is 0 outside of (x0, x3), and is 1 inside (x1, x2), but the constant values can all be adjusted optionally.

Parameters:

x: array_like: One-dimensional monotonic array of floats.
x0: float: Value before which the output will equal y0.
x1, x2: float: Values between which the output will equal y12.
x3: float: Value after which the output will equal y3.
y0: float [defaults to 0.0]: Value of the output before x0.
y12: float [defaults to 1.0]: Value of the output after x1 but before x2.
y3: float [defaults to 0.0]: Value of the output after x3.

scri.utilities.fletcher32(data)[source]

Compute the Fletcher-32 checksum of an array

This checksum is very easy to implement from scratch and very fast.

Note that it’s not entirely clear that everyone agrees on the naming of these functions. This version uses 16-bit input, 32-bit accumulators, block sizes of 360, and a modulus of 65_535.

Parameters:

data: ndarray: This array can have any dtype, but must be able to be viewed as uint16.

Returns:

checksum: uint32

scri.utilities.multishuffle(shuffle_widths, forward=True)[source]

Construct functions to “multi-shuffle” data

The standard “shuffle” algorithm (as found in HDF5, for example) takes an array of numbers and shuffles their bytes so that all bytes of a given significance are stored together — the first byte of all the numbers are stored contiguously, then the second byte of all the numbers, and so on. The motivation for this is that — with reasonably smooth data — bytes in the same byte position in sequential numbers are usually more related to each other than they are to other bytes within the same number, which means that shuffling results in better compression of the data.

There is no reason that shuffling can only work byte-by-byte, however. There is also a “bitshuffle” algorithm, which works in the same way, but collecting bits rather than bytes. More generally, we could vary the number of bits stored together as we move along the numbers. For example, we could store the first 8 bits of each number, followed by the next 4 bits of each number, etc. This is the “multi-shuffle” algorithm.

With certain types of data, this can reduce the compressed data size significantly. For example, with float data for which successive values have been XOR-ed, the sign bit will very rarely change, the next 11 bits (representing the exponent) and a few of the following bits (representing the highest-significance digits) will typically be highly correlated, while as we move to lower significance there will be less correlation. Thus, we might shuffle the first 8 bits together, followed by the next 8, then the next 4, the next 4, the next 2, and so on — decreasing the shuffle width as we go. The shuffle_widths input might look like [8, 8, 4, 4, 2, 2, 1, 1, 1, 1, …].

There are also some cases where we see correlation increasing again at low significance. For example, if a number results from cancellation — the subtraction of two numbers much larger than their difference — then its lower-significance bits will be 0. If we then multiply that by some integer (e.g., for normalization), there may be some very correlated but nonzero pattern. In either case, compression might improve if the values at the end of our shuffle_widths list increase.

Parameters:

shuffle_widths: list of integers: These integers represent the number of bits in each piece of each number that is shuffled, starting from the highest significance, and proceeding to the lowest. The sum of these numbers must be the total bit width of the numbers that will be given as input — which must currently be 8, 16, 32, or 64. There is no restriction on the individual widths, but note that if they do not fit evenly into 8-bit bytes, the result is unlikely to compress well.
forward: bool [defaults to True]: If True, the returned function will shuffle data; if False, the returned function will reverse this process — unshuffle.

Returns:

shuffle_func: numba JIT function: This function takes just one parameter — the array to be shuffled — and returns the shuffled array. Note that the input array must be flat (have just one dimension), and will be viewed as an array of unsigned integers of the input bit width. This can affect the shape of the array and order of elements. You should ensure that this process will result in an array of numbers in the order that you want. For example, if you have a 2-d array of floats a that are more continuous along the first dimension, you might pass np.ravel(a.view(np.uint64), ‘F’), where F represents Fortran order, which varies more quickly in the first dimension.

scri.utilities.transition_function(x, x0, x1, y0=0.0, y1=1.0, return_indices=False)[source]

Return a smooth function that is constant outside (x0, x1).

This uses the standard smooth (C^infinity) function with derivatives of compact support to transition between the two values, being constant outside of the transition region (x0, x1).

Parameters:

x: array_like: One-dimensional monotonic array of floats.
x0: float: Value before which the output will equal y0.
x1: float: Value after which the output will equal y1.
y0: float [defaults to 0.0]: Value of the output before x0.
y1: float [defaults to 1.0]: Value of the output after x1.
return_indices: bool [defaults to False]: If True, return the array and the indices (i0, i1) at which the transition occurs, such that t[:i0]==y0 and t[i1:]==y1.

scri.utilities.transition_function_derivative(x, x0, x1, y0=0.0, y1=1.0)[source]

Return derivative of the transition function

This function simply returns the derivative of transition_function with respect to the x parameter. The parameters to this function are identical to those of that function.

Parameters:

x: array_like: One-dimensional monotonic array of floats.
x0: float: Value before which the output will equal y0.
x1: float: Value after which the output will equal y1.
y0: float [defaults to 0.0]: Value of the output before x0.
y1: float [defaults to 1.0]: Value of the output after x1.

scri.utilities.transition_to_constant(f, t, t1, t2)[source]

Smoothly transition from the function to a constant.

This works (implicitly) by multiplying the derivative of f with the transition function, and then integrating. Using integration by parts, this simplifies to multiplying f itself by the transition function, and then subtracting the integral of f times the derivative of the transition function. This integral is effectively restricted to the region (t1, t2). Note that the final value (after t2) will depend on the precise values of t1 and t2, and the behavior of f in between.

Parameters:

f: array_like: One-dimensional array corresponding to the following t parameter.
t: array_like: One-dimensional monotonic array of floats.
t1: float: Value before which the output will equal f.
t2: float: Value after which the output will be constant.

scri.utilities.xor_timeseries(c)[source]

XOR a time-series of data in place

Assumes time varies along the first dimension of the input array, but any number of other dimensions are supported.

This function leaves the first time step unchanged, but successive timesteps are the XOR from the preceding time step — storing only the bits that have changed. This transformation is useful when storing the data because it allows for greater compression in many cases.

Note that the direction in which this operation is done matters. This function starts with the last time, changes that data in place, and proceeds to earlier times. To undo this transformation, we need to start at early times and proceed to later times.

The function xor_timeseries_reverse achieves the opposite transformation, recovering the original data with bit-for-bit accuracy.

scri.utilities.xor_timeseries_reverse(c)[source]

XOR a time-series of data in place

This function reverses the effects of xor_timeseries. See that function’s docstring for details.