Forum: >>> Magnum BBS <<<

Statistics of problem-solution pairs

From David Stork@21:1/5 to All on Wed Mar 1 17:57:44 2023

Are there any large-scale simulation studies of the statistics of symbolic problem-solution pairs?

Let's consider just symbolic integration of some class of functions, for instance the class covered by the Risch algorithm (exponentials, logarithms, trigonometric functions, addition, subtraction, multiplication, and division). Suppose we quantify the "
size" of an integration problem by the number of leafs in its tree-based representation, and likewise for the problem's anti-derivative.

Problems of a given size may have solutions spanning a range of sizes, of course. Thus there is some statistical distribution of the sizes, and thus a statistical relation (perhaps correlation) between problem and solution sizes.

So if we double the size of an integration problem will the size of its solution double? or increase faster than linear? or slower than linear? What about the variance in sizes? Or other statistics?

Naturally there is an infinite number of integration problems of a given size, so any simulation study will have inherent uncertainties. Nevertheless, this seems like an interesting, and potentially fruitful, class of simulation problems, for which we
must use large-scale computer-algebra systems.

Anyone interested in working on this?

--David G. Stork

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Albert Rich@21:1/5 to David Stork on Wed Mar 1 19:12:43 2023

On Wednesday, March 1, 2023 at 3:57:46 PM UTC-10, David Stork wrote:

Are there any large-scale simulation studies of the statistics of symbolic problem-solution pairs?

Let's consider just symbolic integration of some class of functions, for instance the class covered by the Risch algorithm (exponentials, logarithms, trigonometric functions, addition, subtraction, multiplication, and division). Suppose we quantify the

"size" of an integration problem by the number of leafs in its tree-based representation, and likewise for the problem's anti-derivative.

Problems of a given size may have solutions spanning a range of sizes, of course. Thus there is some statistical distribution of the sizes, and thus a statistical relation (perhaps correlation) between problem and solution sizes.

So if we double the size of an integration problem will the size of its solution double? or increase faster than linear? or slower than linear? What about the variance in sizes? Or other statistics?

Naturally there is an infinite number of integration problems of a given size, so any simulation study will have inherent uncertainties. Nevertheless, this seems like an interesting, and potentially fruitful, class of simulation problems, for which we

must use large-scale computer-algebra systems.

Anyone interested in working on this?

--David G. Stork

From my experience implementing symbolic integrators, there is little to no correlation between the size of integrands and their optimal antiderivatives. Derivatives of relatively small expressions can be huge. Conversely antiderivatives of relatively
small expressions can be huge.

Albert

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Stork@21:1/5 to Albert Rich on Wed Mar 1 22:04:18 2023

On Wednesday, March 1, 2023 at 7:12:47 PM UTC-8, Albert Rich wrote:

On Wednesday, March 1, 2023 at 3:57:46 PM UTC-10, David Stork wrote:

Are there any large-scale simulation studies of the statistics of symbolic problem-solution pairs?

Let's consider just symbolic integration of some class of functions, for instance the class covered by the Risch algorithm (exponentials, logarithms, trigonometric functions, addition, subtraction, multiplication, and division). Suppose we quantify

the "size" of an integration problem by the number of leafs in its tree-based representation, and likewise for the problem's anti-derivative.

Problems of a given size may have solutions spanning a range of sizes, of course. Thus there is some statistical distribution of the sizes, and thus a statistical relation (perhaps correlation) between problem and solution sizes.

So if we double the size of an integration problem will the size of its solution double? or increase faster than linear? or slower than linear? What about the variance in sizes? Or other statistics?

Naturally there is an infinite number of integration problems of a given size, so any simulation study will have inherent uncertainties. Nevertheless, this seems like an interesting, and potentially fruitful, class of simulation problems, for which

we must use large-scale computer-algebra systems.

Anyone interested in working on this?

--David G. Stork

From my experience implementing symbolic integrators, there is little to no correlation between the size of integrands and their optimal antiderivatives. Derivatives of relatively small expressions can be huge. Conversely antiderivatives of relatively

small expressions can be huge.

Albert

Albert,

My experience over decades suggests instead that there IS (or at least MAY BE) interesting structure in the relation between problems and solutions, and a principled exploratory study might give unexpected results.

Here's one:

Integrate[((c + d Tan[e + f x])^{5/2}(a + b Tan[e + f x] + c Tan[e + f x]^2))/(a + b Tan[ e + f x])^{9/2},x]

has a solution that takes 36 Mbytes to write out!

--David Stork

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Albert Rich@21:1/5 to David Stork on Thu Mar 2 00:35:30 2023

On Wednesday, March 1, 2023 at 8:04:19 PM UTC-10, David Stork wrote:

On Wednesday, March 1, 2023 at 7:12:47 PM UTC-8, Albert Rich wrote:

On Wednesday, March 1, 2023 at 3:57:46 PM UTC-10, David Stork wrote:

Are there any large-scale simulation studies of the statistics of symbolic problem-solution pairs?

Let's consider just symbolic integration of some class of functions, for instance the class covered by the Risch algorithm (exponentials, logarithms, trigonometric functions, addition, subtraction, multiplication, and division). Suppose we quantify

the "size" of an integration problem by the number of leafs in its tree-based representation, and likewise for the problem's anti-derivative.

Problems of a given size may have solutions spanning a range of sizes, of course. Thus there is some statistical distribution of the sizes, and thus a statistical relation (perhaps correlation) between problem and solution sizes.

So if we double the size of an integration problem will the size of its solution double? or increase faster than linear? or slower than linear? What about the variance in sizes? Or other statistics?

Naturally there is an infinite number of integration problems of a given size, so any simulation study will have inherent uncertainties. Nevertheless, this seems like an interesting, and potentially fruitful, class of simulation problems, for which

we must use large-scale computer-algebra systems.

Anyone interested in working on this?

--David G. Stork

From my experience implementing symbolic integrators, there is little to no correlation between the size of integrands and their optimal antiderivatives. Derivatives of relatively small expressions can be huge. Conversely antiderivatives of

relatively small expressions can be huge.

Albert

Albert,

My experience over decades suggests instead that there IS (or at least MAY BE) interesting structure in the relation between problems and solutions, and a principled exploratory study might give unexpected results.

Here's one:

Integrate[((c + d Tan[e + f x])^{5/2}(a + b Tan[e + f x] + c Tan[e + f x]^2))/(a + b Tan[ e + f x])^{9/2},x]

has a solution that takes 36 Mbytes to write out!

--David Stork

Your example validates my point. The antiderivative of this relatively small integrand is huge. But the antiderivative of many other small integrands are small. Thus no correlation.

I recommend you use parentheses rather than curly brackets around fractional exponents, since curly brackets designate lists in Mathematica.

BTW, the leaf size of the valid antiderivative Rubi (the Rule-Based Integrator) returns for your example is only 789 leaves. So if you are planning to use antiderivative leaf size in your research, I suggest using the size of optimal antiderivatives
like those Rubi usually delivers. Rubi is freely available for downloading at https://rulebasedintegration.org/

Albert

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard Fateman@21:1/5 to David Stork on Sun Apr 9 13:30:31 2023

While there are easy ways of sometimes bounding the size of the derivative of an expression, you rapidly
run out of rules for more general "problems/solutions". For instance, here is a problem.
expand ((x+1)^(2^n)).
characterize the number of terms as a function of the integer n.

Simulation of random expression problems could (in general) provide super-exponential growth in size.

Another study of "statistics" might be -- given a free on-line computer algebra system that
does some task (you can pick indefinite integration), what is the distribution of inputs
from [random?] clients?
I suspect you will get (a) homework problems and (b) people just trying it out to see if
it works. People can ask ChatGPT to do math, and it sometimes gets the right answer.
It sometimes doesn't.
A (much) earlier experiment we ran at Berkeley TILU collected some problems (maybe a few hundred?) and mostly found that people were unlikely to master the first stage of the problem: getting the syntax right. Thus we collected stuff like
sin x, sinx, sin(x), Sin(x), Sin[x], SinX.

As for whether this is interesting or not, I would not expect simulation -- where you write
a program P to generate problems -- to reveal much other than the behavior of program P.
RJF

On Wednesday, March 1, 2023 at 5:57:46 PM UTC-8, David Stork wrote:

Are there any large-scale simulation studies of the statistics of symbolic problem-solution pairs?

Let's consider just symbolic integration of some class of functions, for instance the class covered by the Risch algorithm (exponentials, logarithms, trigonometric functions, addition, subtraction, multiplication, and division). Suppose we quantify the

"size" of an integration problem by the number of leafs in its tree-based representation, and likewise for the problem's anti-derivative.

Problems of a given size may have solutions spanning a range of sizes, of course. Thus there is some statistical distribution of the sizes, and thus a statistical relation (perhaps correlation) between problem and solution sizes.

So if we double the size of an integration problem will the size of its solution double? or increase faster than linear? or slower than linear? What about the variance in sizes? Or other statistics?

Naturally there is an infinite number of integration problems of a given size, so any simulation study will have inherent uncertainties. Nevertheless, this seems like an interesting, and potentially fruitful, class of simulation problems, for which we

must use large-scale computer-algebra systems.

Anyone interested in working on this?

--David G. Stork

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Brianm
  Wed May 8 22:25:33 2024
  from Glasgow via Raw
- Michal Wronka
  Wed May 8 21:31:48 2024
  from Wroclaw, Poland via SSH
- Cronus
  Wed May 8 19:22:39 2024
  from Provo, Ut via SSH
- Michal Wronka
  Wed May 8 18:58:52 2024
  from Wroclaw, Poland via SSH

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	300
Nodes:	16 (2 / 14)
Uptime:	75:47:51
Calls:	6,716
Calls today:	4
Files:	12,247
Messages:	5,357,403

Statistics of problem-solution pairs

Who's Online

Recent Visitors

System Info