The Perl Toolchain Summit 2025 Needs You: You can help 🙏 Learn more

=head1 NAME
PDL::Graphics::Prima::Internals - a space to put documentation on the internals
This is very disorganized at the moment. My apologies.
=head1 OVERVIEW
Here is an overview of the plotting infrastructure to help keep your head
straight. The data types are indicated after the datatype and information
that is only meant to be used internally is in parentheses
At the moment, it is not quite accurate and needs updating. I'm Sory. :-(
Plotting Widget
|- title string
|- backColor colorValue
|- replotDuration float in milliseconds
|- x and y axes
|- min float
|- max float
|- lowerEdge integer
|- upperEdge integer
|- scaling, a class name or an object
|- $self->compute_ticks($min, $max)
|- $self->transform($min, $max, $data)
|- $self->inv_transform($min, $max, $data)
|- $self->sample_evently($min, $max, $N_values)
|- $self->is_valid_extremum($value)
|- (minValue float)
|- (minAuto boolean)
|- (maxValue float)
|- (maxAuto boolean)
# |- (pixel_extent int)
# |- $self->pixel_extent([$new_extent])
? |- $self->recompute_min_auto()
? |- $self->recompute_max_auto()
|- $self->update_edges()
? |- $self->minmax_with_padding($data)
|- $self->reals_to_relatives($data)
|- $self->relatives_to_reals($data)
|- $self->pixels_to_relatives($data)
|- $self->relatives_to_pixels($data)
|- $self->reals_to_pixels($data)
|- $self->pixels_to_reals($data)
|- dataSets (name => data)
|- xs (floats)
|- ys (floats)
|- plotType
|- type-specific data
|- $self->xmin($dataset, $widget)
|- $self->xmax($dataset, $widget)
|- $self->ymin($dataset, $widget)
|- $self->ymax($dataset, $widget)
|- $self->draw($dataset, $widget)
|- $self->get_data_as_pixels($widget)
|- $self->extremum($nane, $comperator, $widget)
|- $self->compute_min_max_for($axis_name)
|- $self->get_image
|- $self->copy_to_clipboard
|- $self->save_to_file
=for details
XXX working here
XXX see also: Axis.pm recompute_max_auto, recompute_min_auto
=for motivation
The major issue with determining automatic scaling is that I consider two
distinct units of measure, the scale of the data being one of them and the other
being screen pixels. Furthermore, large padding on one side can impact the
scaling on the other. Determining the correct min and max so that the pixel
padding gets respected is, to the best of my knowledge, not a simple matter of
linear algebra.
=for first-naive-implementation
The first naive implementation, which was the implementation I used as my first
shot at solving the problem of automatic scaling, is to get the min/max of the
data, as well as the min/max padding. You do this for all the datasets and then
take the most extreme values as your guess. The problem with this approach is
that it could lead to overestimates of the extrema (i.e. guesses that are too
wide), leading to plots that are not ideal. For example, suppose you have two
datasets, one being a line plot with a wide range and the other being a blob
plot with a very narrow range but a large blob size (i.e. 40 pixels). Using this
method, you would allow for a 40-pixel padding on the most extreme data for the
line plot, which could lead to extra and unnecessary white space. However, it is
quite fast compared with the second naive implementation. The complexity for
this method is about O(n), where n is the number of data points.
=for second-naive-implementation
The second naive implementation is an iterative approach in which you guess at
the min and max that will display all of the data and plot types. You then run
through all the data points and see if any of them do not fit within the
min/max. If you find anything that doesn't fit, you widen your bounds and repeat
the search. Although the whitespace padding would be correctly computed using
this algorithm, this method is computationally inefficient and could be terribly
slow for very large datasets. In the worst case, I believe that this algorithm
would be O(n**2) or maybe even O(n**3), or it would make use of data structures
of size O(n).
=for better-implementation-analysis
My proposed algorithm is a sort of combination of both naive implementations. It
is slower than the first naive implementation but will almost always be as fast
as or faster than the second naive implementation and with much, much better
scaling properties. For large datasets, the computational complexity goes as
O(n) while keeping the memory footprint small. However, it should give the
bounds with accuracy as good as the second naive implementation and much better
than the worst-case bounds using the first method.
=for better-implementation-overview
Upon inspetion of the second naive implementation, it becomes clear that we can
greatly reduce the amount of time spent checking our guesses of the min and the
max by noting that we only need to keep track of the most extreme values for a
given amount of padding. In other words, if we have many data points that need a
padding of 10 pixels, we only need to keep track of their minimum and maximum
values. For example, suppose we have three blobs all with radii 10 pixels and
with x-values of 10, 12, 13, and 17. We know that if we determine a plotting
minimum that can accomodate the left edge of point with x = 10, the certainly
the point at x = 12 will fit within that minimum because we know it has the same
padding. Similarly, we only need to know that the maximum x-value of the set is
17. If we can accomodate a max of 17, the others will certainly fit within those
bounds. Coming back to the implementation, I can scan through all the data
keeping track of the minimum and maximum values for each value of pixel-padding.
That is, I keep track of the min and max x-values for a padding of 10 pixels,
and I seperately track the min and max x-values for a padding of 11, 12, 20, or
200 pixels as they arise. In the end, I have no more than a few hundred extrema
for different pixel paddings that I need to combine properly. Since I could
potentially try to plot millions of data points, this reduces the amount of data
that needs to be processed from millions of data points to only a few hundred.
=for better-implementation-collection
The better implementation works as follows. First choose a maximum padding that
you care about for the purposes of determining the scaling. 500 pixels seems
like a reasonable number but 2000 is just as feasable for the purposes of the
algorithm. Allocate two arrays with as many elements for the min and max values,
respectively. Then run through all the datasets. For each data point, get that
point's requested pixel padding as well as its value. Use the padding value as
the array offset and look up the currently known minimum and maximum values for
that pixel padding. If the point is more extreme, replace the old extremum with
the current value. This is only slightly slower than computing the min/max
values required in both naive implementations, and requires very little memory.
=for better-implementation-first-pruning
The next step is optional but will likely speed-up the iterative process. It is
likely that the plot will only have a handful of pixel-paddings, so running
through all 500 (or however many you allocated) is a waste of time. As such, the
next step of the process is to find the largest pixel padding representated in
the collection and then find all smaller pixel padding values for which the
extremum is more extreme than the extrema of the higher paddings. In the end,
you have a collection of extrema which you can think of as being in a pyramid:
the lowest pixel-padding is associated with the largest extremum, and the
highest pixel-padding is associated with the least extreme value.
=for better-implementation-iterating
XXX working here
Work with arrays, in which case the index itself is equal to the needed
padding. Build a doubly-linked list with a structure patterned after
padding => number # padding of interest
data => float # min/max for this padding
curr_value => float # computed extent
next => pointer # next (smaller) padding
Also, keep track of the tail, the current min, and the current max.
The linked list is initially assembled in order of decreasing padding
(largest padding on top). Here's something that's important, which you will
need to get your head around, and which I will illustrate with an example.
Suppose we have two paddings, 10 pixels and 5 pixels, and we're trying to
compute the minimum. If the minimum data value with a pixel padding of 5 is
2.2 and the minimum data value with a pixel padding of 10 is 2.1, we know
that the pixel padding of 10 must lead to a smaller minimum than the pixel
padding of 5. As such, we can remove the pixel padding of 5 from the
list. I call this weeding out the values. The result is that as we go
through the list in order of decreasing padding, the data values will become
more extreme, like a pyramid.
The argument I just made about the paddings for 5 and 10 pixels only took
their data values and the sort order of the padding into account. It did not
take the actual values of the paddings into account. In the next stage,
which is iterative, I will begin to account for the effect of the different
padding values.
With the pyramid in hand, examine the tail values for both the min and the
max. Each of these will have a padding associated with them. Estimate the
min and max by assuming that the tail values, together with their padding,
represent the most extreme values of the data set, which is a conservative
estimate. With this estimate in hand, run through the list and compute the
min or max associated with each list element, taking the padding and current
scaling into account, and storing the result in curr_value. Then weed out
the list using curr_value and iterate the procedure of this last paragraph
until the tail of both the min and the max lists does not change.
An important feature of this algorithm is that the min/max values begin with
very conservative estimates and become more extreme with each round.
Furthermore, the pyramid data structure is arranged so that with each round
the width of the top end of the pyramid grows more than the width of the
bottom end.
To actually implement this scheme, I will require that datasets monitor
their own data and report a data/padding list upon request. How the datasets
monitor their data is entirely up to them. (I am considering using an
on_change slice, which I secretly insert over the user's piddle by modifying
the @_ argument array, to efficiently monitor changes.)
In order to properly handle bad values, I need to write a function that can
look for bad values over many piddles, tens of piddles. I believe I can
achieve this by writing a function that takes, say, 20 piddles, and wrapping
it in Perl code that supplies null piddles when you only need to call the
function for 10 piddles. The funcion would be called collate_min_max_for_many
and the calling convention for it would look like this:
($min, $max) = collate_min_max_for_many(N_to_return, N_buckets, $index, $p1, $p2, ...)
This is best illustrated with the blobs plot type, since it would use a
nontrivial value for index. If I wanted to compute the collated min and max
for the x-data, taking potential bad values for y, xradii, yradii, and
colors into account, I would call the function like so:
my ($blob_x_min, $blob_x_max)
= collate_min_max_for_many(
1, # return min/max for $x
$widget->width, # only need the number of pixels corresponding to the widget
$xRadii, # the index
$x, # x-data for which to find min/max
$y, # \
$yRadii, # |- ignore x-values if any of these are bad
$colors, # /
)
In this case
where N <= M, M < 20, and the return dimensions are N x whatever.
=head1 RESPONSIVE PP CODE
If you create a long-running piece of code and you want your GUI to remain
responsive, you have a couple of options. Here's how you would call Prima's
yield function from within your C code:
Coce => q{
/* Need this to keep the app responsive */
SV * Prima_app = get_sv("::application", 0);
...
threadloop %{
...
/* Keep the system responsive */
if (Prima_app != NULL) {
dSP;
ENTER;
SAVETMPS;
PUSHMARK(SP);
XPUSHs(sv_2mortal(newSVpv("Prima::Application", 18)));
PUTBACK;
call_pv("Prima::Application::yield", G_DISCARD|G_NOARGS);
FREETMPS;
LEAVE;
}
%}
}
Alternatively, you could explicitly require a callback method from your user,
in which case you would have an OtherPars section. In this example, I pass the
current progress as the first argument of the function call, and I actually
take the return value of the callback into account: if the callback returned
zero, it quits the PDL function immediately:
OtherPars => 'SV * code_ref',
Code => q{
int k;
int total_count;
int total_expected;
...
threadloop %{
...
calculate total_expected and track total_count
...
/* execute the callback */
if (Prima_app != NULL) {
dSP;
ENTER;
SAVETMPS;
PUSHMARK(SP);
XPUSHs(sv_2mortal(newSVnv((double) total_count / (double) total_expected)));
PUTBACK;
k = call_sv($COMP(code_ref), G_SCALAR);
SPAGAIN;
if (k != 1) croak ("How did I get more than one arg returned?");
k = POPi;
if (k == 0) goto QUIT;
PUTBACK;
FREETMPS;
LEAVE;
}
%}
QUIT: k++;
}, # close the Code block
Here's a use of that function that lets Prima call its update functioin and
which quits the function if it takes more than five seconds:
use Time::HiRes qw(gettimeofday tv_internal);
my $start_time = [gettimeofday];
my_pp_func(... args ..., sub {
# Update the UI:
$::application->yield;
# Croak if it's taking too long
return 0 if tv_internal($start_time, [gettimeofday]) > 5;
return 1;
});
This is a general callback that takes note of your response. If your callback
returns zero, it quits immediately (and, since you told the function to quit,
it's your job to ignore the return result). The callback function itself would
=head1 AUTHOR
David Mertens (dcmertens.perl@gmail.com)
=head1 SEE ALSO
This is a component of L<PDL::Graphics::Prima>. This library is composed of many
modules, including:
=over
=item L<PDL::Graphics::Prima>
Defines the Plot widget for use in Prima applications
=item L<PDL::Graphics::Prima::Axis>
Specifies the behavior of axes (but not the scaling)
=item L<PDL::Graphics::Prima::DataSet>
Specifies the behavior of DataSets
=item L<PDL::Graphics::Prima::Internals>
A dumping ground for my partial documentation of some of the more complicated
stuff. It's not organized, so you probably shouldn't read it.
=item L<PDL::Graphics::Prima::Limits>
Defines the lm:: namespace
=item L<PDL::Graphics::Prima::Palette>
Specifies a collection of different color palettes
=item L<PDL::Graphics::Prima::PlotType>
Defines the different ways to visualize your data
=item L<PDL::Graphics::Prima::ReadLine>
Encapsulates all interaction with the L<Term::ReadLine> family of
modules.
=item L<PDL::Graphics::Prima::Scaling>
Specifies different kinds of scaling, including linear and logarithmic
=item L<PDL::Graphics::Prima::Simple>
Defines a number of useful functions for generating simple and not-so-simple
plots
=back
=head1 LICENSE AND COPYRIGHT
Copyright (c) 2011-2012 David Mertens.
All rights reserved.
This module is free software; you can redistribute it and/or
modify it under the same terms as Perl itself.
=cut