• Haskell, anyone?

    From Hannah@21:1/5 to Paul Rubin on Sun Nov 29 00:01:04 2015
    Hi!

    Paul Rubin <no.email@nospam.invalid> wrote:
    [Note to whoever: this continues a discussion about functional
    programming that started on comp.lang.ada but got off-topic for that
    group so I've migrated it here. Please don't crosspost unless there's >significant Ada content in your post.]

    Hadrien Grasland <hadrien.grasland@gmail.com> writes:
    For example, if you wanted to explain the factorial to a student who
    has never been exposed to the concept, chances are that your first
    approach wouldn't be to write on a blackboard

    "fact 1 = 1 -- [order of these two lines switched by me -Paul]
    fact n = n * fact (n-1)"

    But that's the mathematical definition of the factorial--this isn't
    grade school. I assume I'm explaining it to a professional trying to do >high-assurance programming so of course that's the definition I'd use. >Treating the above as Haskell code that purports to compute the
    factorial, it's easier to reason about than the imperative loop version:

    Let P(n) be the proposition "fact(n)=n!". We want to verify that P(n)
    is true for all n. We prove it by induction. First, P(1) is obviously
    true. Second, assume P(n) is true. So by substitution, fact(n+1) =
    (n+1) * fact(n). And since P(n) is true by the induction hypothesis, >fact(n+1) = (n+1) * n! which is (n+1)!, which means we've proved P(n) => >P(n+1). Therefore by induction, forall n. P(n), QED, so we've verified
    the code, and this was very easy.

    I'm out of my depth here, but I think to do the same thing with
    imperative code is much messier, you have to put preconditions and >postconditions around every statement to prove the loop invariant that
    you're going through factorials, etc. I guess you can do that with
    SPARK but I don't know how it's done.

    At the level of everyday coding though, going from loops to recursion is >nothing to be bothered about, trust me. It's like when people freak out
    by whitespace as syntax when they first see Python, or by the
    parentheses when they first see Lisp. You get used to it quickly and it >stops mattering. Or anyway, if it's really an obstacle, FP probably
    isn't for you.

    Ok some things may eventually even be not so difficult as many recursion patterns can be abstracted away into higher order functions and there
    are many such already there. Like:
    forM_ [1..10] $ \a -> print a
    (monadic "for loop" over a list, bind each item to a, "execute" print
    on each a in "sequence" [i.e. connected with monadic >> under the hood,
    which *does* mean sequencing in the IO monad]).

    [...]

    Hannah.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Rubin@21:1/5 to Hadrien Grasland on Sat Nov 28 12:40:52 2015
    [Note to whoever: this continues a discussion about functional
    programming that started on comp.lang.ada but got off-topic for that
    group so I've migrated it here. Please don't crosspost unless there's significant Ada content in your post.]

    Hadrien Grasland <hadrien.grasland@gmail.com> writes:
    For example, if you wanted to explain the factorial to a student who
    has never been exposed to the concept, chances are that your first
    approach wouldn't be to write on a blackboard

    "fact 1 = 1 -- [order of these two lines switched by me -Paul]
    fact n = n * fact (n-1)"

    But that's the mathematical definition of the factorial--this isn't
    grade school. I assume I'm explaining it to a professional trying to do high-assurance programming so of course that's the definition I'd use.
    Treating the above as Haskell code that purports to compute the
    factorial, it's easier to reason about than the imperative loop version:

    Let P(n) be the proposition "fact(n)=n!". We want to verify that P(n)
    is true for all n. We prove it by induction. First, P(1) is obviously
    true. Second, assume P(n) is true. So by substitution, fact(n+1) =
    (n+1) * fact(n). And since P(n) is true by the induction hypothesis,
    fact(n+1) = (n+1) * n! which is (n+1)!, which means we've proved P(n) => P(n+1). Therefore by induction, forall n. P(n), QED, so we've verified
    the code, and this was very easy.

    I'm out of my depth here, but I think to do the same thing with
    imperative code is much messier, you have to put preconditions and postconditions around every statement to prove the loop invariant that
    you're going through factorials, etc. I guess you can do that with
    SPARK but I don't know how it's done.

    At the level of everyday coding though, going from loops to recursion is nothing to be bothered about, trust me. It's like when people freak out
    by whitespace as syntax when they first see Python, or by the
    parentheses when they first see Lisp. You get used to it quickly and it
    stops mattering. Or anyway, if it's really an obstacle, FP probably
    isn't for you.

    In any case, if you are using a machine which can only perform one
    binary operation at a time, you will always need an accumulator to
    compute the final result.

    I think that's an advanced, low level concept though. There's tons of
    magic in Javascript: JIT compilation, garbage collection, built-in
    objects that make RPC calls, etc. But there's zillions of
    not-so-experienced Javascript programmers writing crappy web pages all
    over the the world, who never think about that stuff and have never
    heard of an accumulator. A programmer with deep knowledge doing
    bleeding edge systems has to understand what the machine is doing, but
    someone just trying to bang out everyday tasks can usually stay at the
    high level. No accumulators.

    I would gladly challenge this claim by picking two large groups of 10
    years old who have never been exposed to imperative programming yet,

    I don't really know about this and it doesn't concern me too much. It
    might be easier to start one way or the other, but either way you pick
    up more technique as you gain experience. By the time you're evaluating different approaches to formal software verification (what this thread
    was originally about), you're outside the range of what most
    10-year-olds are capable of doing, so it's reasonable to assume more
    fluency.

    I'm writing programs for massively parallel GPU processors, and last
    time I checked Blelloch's scan was still one of the most efficient
    algorithms

    That's outside my area and maybe a niche, but I know some Haskell and FP
    work has been done on this stuff (Data Parallel Haskell is inspired by Blelloch's NESL nested parallel work):

    http://conal.net/blog/posts/parallel-tree-scanning-by-composition http://www.cs.cmu.edu/~rwh/papers/iolambda/short.pdf https://wiki.haskell.org/GHC/Data_Parallel_Haskell/References

    There is also an array library with GPU support:

    http://hackage.haskell.org/package/accelerate

    Well, I'm learning OCaml using a pretty nice MOOC which has been set
    up by my former university. If you're interested the address is

    Ocaml is close enough to what I've been doing already that I think I'm
    ok with just the online docs and the irc channel, plus hacking code.
    Thanks though. I might look at some online math classes sometime.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard@21:1/5 to All on Sun Nov 29 18:44:17 2015
    Am 2015-11-29 um 12:12 schrieb Hadrien Grasland:

    In imperative programming, recursion is just one tool in your
    toolbox among many others. When you meet a problem which recursion
    fits best, such as quicksort or tree traversal, you use recursion.
    When you encounter a problem which loops fit best, such as traversing
    a data structure while accumulating some intermediary state or
    mutating a data structure, you use loops. When neither approach has a
    clear advantage, you use loops because they are easier to
    understand.

    But couldn't the same point be made when comparing structured control structures with the goto statement? Some algorithms can be expressed in
    a simpler way using labels and goto statements.

    For example, the following pseudo code shows how a stream of characters delimited by a new line character could be processed using structured
    control statements:

    read (c)
    WHILE c # new line DO
    process (c)
    read (c)
    END

    Using labels and goto, it could be programmed like this:

    loop:
    read (c)
    IF c = new line THEN GOTO end
    process (c)
    end:

    At first sight, the second version is simpler and easier to understand:
    The read procedure is only called at one position, and it is immediately
    clear where the value of variable c is modified and used. In the
    structured version there are two (redundant) read calls, and variable c
    is assigned twice with its value used textually before the second
    assignment.

    Nevertheless, most developers would agree that the structured version is
    better because the control flow is visible immediately and there is no
    danger of some external code jumping into the loop with unclear
    consequences.

    Similiarly, implementing an algorithm with an iteration instead of
    recursion could be simpler and clearer at first sight. Nevertheless, it
    could be actually better to use the recursive version of a pure
    functional programming language because then one can be sure that
    (instances of) variables are assigned only once and there can be no
    unexpected behavior caused by uninitialized variables and side effects.

    constructs, like C++ or Python. Do not force developers to use the
    functional approach when it is suboptimal, and trust them to pick
    the option that fits their problem domain best.

    In my experience, you can not trust developers to pick the best option:
    they will use whatever the programming languages offers.

    Richard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Hadrien Grasland@21:1/5 to All on Sun Nov 29 03:12:22 2015
    Le samedi 28 novembre 2015 21:40:58 UTC+1, Paul Rubin a écrit :
    For example, if you wanted to explain the factorial to a student who
    has never been exposed to the concept, chances are that your first
    approach wouldn't be to write on a blackboard

    "fact 1 = 1 -- [order of these two lines switched by me -Paul]
    fact n = n * fact (n-1)"

    But that's the mathematical definition of the factorial--this isn't
    grade school. I assume I'm explaining it to a professional trying to do high-assurance programming so of course that's the definition I'd use. Treating the above as Haskell code that purports to compute the
    factorial, it's easier to reason about than the imperative loop version:

    Let P(n) be the proposition "fact(n)=n!". We want to verify that P(n)
    is true for all n. We prove it by induction. First, P(1) is obviously
    true. Second, assume P(n) is true. So by substitution, fact(n+1) =
    (n+1) * fact(n). And since P(n) is true by the induction hypothesis, fact(n+1) = (n+1) * n! which is (n+1)!, which means we've proved P(n) => P(n+1). Therefore by induction, forall n. P(n), QED, so we've verified
    the code, and this was very easy.

    I'm out of my depth here, but I think to do the same thing with
    imperative code is much messier, you have to put preconditions and postconditions around every statement to prove the loop invariant that
    you're going through factorials, etc. I guess you can do that with
    SPARK but I don't know how it's done.

    I'm not sure I understand. Here, you show an example of functional code verification which is done by hand, then you say that it would be much more difficult to do it automatically with imperative code. That looks like a comparison of apples to orange to
    me.

    I can verify imperative code manually, too. In fact, that's one of my favorite debugging techniques : sitting down, taking a pen and a piece of paper, carefully reading through the code I've just written, manually going through a few iterations with a
    simplified input, and wondering if it does what I want.

    But if you want to go into automated verification, you need to somehow instruct the formal proof system about what you're trying to do, and this is the messy part. For simple functions like factorial, you will effectively end up duplicating your code in
    the proof system's own language for preconditions/postconditions/invariants.

    To the best of my knowledge, this is true no matter whether you're using a functional or imperative programming language. It's a fundamental limitation of formal proof (and possibly a reason why it has failed to become mainstream so far, outside of
    niches like critical systems).


    At the level of everyday coding though, going from loops to recursion is nothing to be bothered about, trust me. It's like when people freak out
    by whitespace as syntax when they first see Python, or by the
    parentheses when they first see Lisp. You get used to it quickly and it stops mattering. Or anyway, if it's really an obstacle, FP probably
    isn't for you.

    As I said on CLA, I agree that recursive code is relatively easy to write. My concern is how easy to *read* it is after a couple months of developper working on another project + code bitrot.

    In imperative programming, recursion is just one tool in your toolbox among many others. When you meet a problem which recursion fits best, such as quicksort or tree traversal, you use recursion. When you encounter a problem which loops fit best, such as
    traversing a data structure while accumulating some intermediary state or mutating a data structure, you use loops. When neither approach has a clear advantage, you use loops because they are easier to understand.

    But in functional programming, it seems to me that people will often try to shoehorn recursion into problems which it doesn't fit well at all. This will result in complex code, where recursive functions have parameters which only serve to accumulate
    state changes, and where multi-step algorithms are implemented using complex nested trees of pattern matching, possibly further obscured by gratuitous use of higher-order functions.

    It is this kind of process which I am criticizing : it results in code which is about as easy to read as imperative code where all control flow is done using goto statements, and I am absolutely not convinced that codebases doing this will remain
    readable once they aren't fresh in a developer's mind anymore.

    Hence my opinion that the best attitude is probably that of functional programming languages embracing imperative constructs, like OCaml, or imperative programming languages embracing functional constructs, like C++ or Python. Do not force developers to
    use the functional approach when it is suboptimal, and trust them to pick the option that fits their problem domain best.


    In any case, if you are using a machine which can only perform one
    binary operation at a time, you will always need an accumulator to
    compute the final result.

    I think that's an advanced, low level concept though. There's tons of
    magic in Javascript: JIT compilation, garbage collection, built-in
    objects that make RPC calls, etc. But there's zillions of
    not-so-experienced Javascript programmers writing crappy web pages all
    over the the world, who never think about that stuff and have never
    heard of an accumulator. A programmer with deep knowledge doing
    bleeding edge systems has to understand what the machine is doing, but someone just trying to bang out everyday tasks can usually stay at the
    high level. No accumulators.

    I'm sorry, I just cannot let Javascript be categorized as a beginner-friendly programming language. When a programming language lets you, without even an implementation warning, do atrocious things at runtime like rewriting the standard library or
    summing values of different types, it may give the illusion of ease of use, but painful debugging experience will quickly correct this false impression.

    Anyway, if you want to write even mildly efficient code, you need to have at least a basic idea of what the implementation is doing. Otherwise, you end up overloading the GC with thousands of throwaway objects, as in beginner-level Java code. Or you end
    up crashing your program with stack overflows as soon as the datasets become non-trivial, as in beginner-level functional code. Or you end up using plenty of indirection in your data access when you could have used fast tables instead. And so on.

    Not caring about the implementation is a luxury that is only available when your program requires far smaller compute capabilities than what the underlying machine+OS can provide. Though I will agree that it is certainly nice when you can do it :)


    I'm writing programs for massively parallel GPU processors, and last
    time I checked Blelloch's scan was still one of the most efficient algorithms

    That's outside my area and maybe a niche, but I know some Haskell and FP
    work has been done on this stuff (Data Parallel Haskell is inspired by Blelloch's NESL nested parallel work):

    http://conal.net/blog/posts/parallel-tree-scanning-by-composition http://www.cs.cmu.edu/~rwh/papers/iolambda/short.pdf https://wiki.haskell.org/GHC/Data_Parallel_Haskell/References

    There is also an array library with GPU support:

    http://hackage.haskell.org/package/accelerate

    Here, my point was not that it is impossible to use functional algorithms on GPUs, only that sometimes, mutable data structures and associated algorithms are the most effective way to achieve some goals on shared-memory architectures.

    For example, in this post that you mentioned...

    http://conal.net/blog/posts/parallel-tree-scanning-by-composition

    ...it takes the author 10 A4 pages of code (accompanied by a bit of discussion and mathematical concepts) in order to implement an algorithm which I can describe in one schematic, then implement in a few dozen lines of code which anyone familiar with
    imperative constructs can understand. I hope we can agree that the signal-to-noise ratio is not in favor of the functional approach here.

    The reason, I believe, can be found in one core tenet of eXtreme Programming and agile development in general: never use an algorithm which is more general than what you really need. Instead, generalize algorithms only when you find yourself doing
    something closely related to what you've done before.

    It seems to me that more often than not, the functional approach is rather to take the most general abstraction that will possibly fit, then painstakingly go through the difficult process of shoehorning this complex abstraction into a simple problem. It
    is not without reminding how Java developers will sometimes design extremely complex class hierarchies to solve problems which are adressed by 10 lines of procedural code.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Hadrien Grasland@21:1/5 to All on Mon Nov 30 01:02:15 2015
    Le dimanche 29 novembre 2015 18:44:21 UTC+1, Richard a écrit :
    Am 2015-11-29 um 12:12 schrieb Hadrien Grasland:

    In imperative programming, recursion is just one tool in your
    toolbox among many others. When you meet a problem which recursion
    fits best, such as quicksort or tree traversal, you use recursion.
    When you encounter a problem which loops fit best, such as traversing
    a data structure while accumulating some intermediary state or
    mutating a data structure, you use loops. When neither approach has a
    clear advantage, you use loops because they are easier to
    understand.

    But couldn't the same point be made when comparing structured control structures with the goto statement? Some algorithms can be expressed in
    a simpler way using labels and goto statements.

    For example, the following pseudo code shows how a stream of characters delimited by a new line character could be processed using structured
    control statements:

    read (c)
    WHILE c # new line DO
    process (c)
    read (c)
    END

    Using labels and goto, it could be programmed like this:

    loop:
    read (c)
    IF c = new line THEN GOTO end
    process (c)
    end:

    At first sight, the second version is simpler and easier to understand:
    The read procedure is only called at one position, and it is immediately clear where the value of variable c is modified and used. In the
    structured version there are two (redundant) read calls, and variable c
    is assigned twice with its value used textually before the second
    assignment.

    Nevertheless, most developers would agree that the structured version is better because the control flow is visible immediately and there is no
    danger of some external code jumping into the loop with unclear
    consequences.

    I would say that the problem with goto is that it is usually too powerful as an abstraction. Every structured control flow primitive may be implemented on top of "if" and "goto", and many more styles of control flow can be implemented on top of goto (see
    e.g. how C programmers use goto as a replacement for exception handling). This also means, as a necessary consequence, that code using goto can do just about anything: jump in and out of variable scopes, bypass initialization instructions, etc. Which is,
    after all, goto's main problem.

    So for me, goto is like the C preprocessor: a dangerous and highly imperfect fix for language deficiencies, in this case missing control flow.

    For example, in the case of the loop you outlined above, Ada proposes a nice solution in the form of loop + exit when. This allows you to tell a developer reading your code that the loop you are writing uses nonstandard loop control flow. Using this
    primitive, an idiomatic Ada equivalent to your code would look like this :

    loop
    . read(c);
    . exit when c = new line;
    . process(c);
    end loop;

    With this construct, you avoid code duplication in the first part of the loop, which can in some cases be longer than one instruction. And for cases where you really need to implement even more complex control flow in the form of nested loops, Ada also
    allows you to name your loops and specify which loop you are exiting.

    I wonder why other programming languages did not copy that idea, as it is surely a lot nicer than the mess of breaks, gotos and returns that can plague C-derived programming languages in similar situations.

    Anyway, my point is, if your programming language actually took the time to offer a good design for loop early exit, the power of goto is overkill, and its use is thus solely harmful to readability. I am not convinced that the same is true for all
    imperative constructs.


    Similiarly, implementing an algorithm with an iteration instead of
    recursion could be simpler and clearer at first sight. Nevertheless, it
    could be actually better to use the recursive version of a pure
    functional programming language because then one can be sure that
    (instances of) variables are assigned only once and there can be no unexpected behavior caused by uninitialized variables and side effects.

    All popular compilers these days warn about code that reads from uninitialized variables, so one cannot really use an uninitialized variable without asking for it anymore. It's a former problem which has been (mostly) solved by toolchain improvements,
    like incomplete pattern matching in functional languages.

    Conversely, initializing variables whose content is not available yet, as is somewhat suggested by programming textbooks and made mandatory by some programming languages, should be considered harmful IMO. It leads people to put nonsensical content inside
    of a variable "since it needs to be initialized", and thus bypass this extremely useful toolchain warning. The result is useless memory traffic and obscured code workflow during debugging.

    As for side effects, well, they are a necessary evil really: can you describe any useful and complete program (where complete means runnable by an average operating system) which can perform its work without any kind of external input and output?


    constructs, like C++ or Python. Do not force developers to use the functional approach when it is suboptimal, and trust them to pick
    the option that fits their problem domain best.

    In my experience, you can not trust developers to pick the best option:
    they will use whatever the programming languages offers.

    That can be true, but I also believe that no programming language or tool can stop careless developers. It has been tried again and again, and it has always failed.

    If you suppress global variables, people will replace them by a gigantic "state" record that is passed as a parameter of every function and returned as a result. Same end result, with more useless memory traffic (actually, "pure" OO programming languages
    like Java could sometimes be said to revolve around this :) ).

    If you suppress goto, break, and other forms of tricky control flow, people will misuse exception handling in order to get it back.

    If you make it mandatory for a function to declare which exceptions it may be throwing, people will end up either declaring that all functions can throw <insert exception base class here> or, worse, swallowing exceptions silently.

    At some point, you always need to rely on developer education and trust your users in order to move forward. Tools can help a careful programmer make less mistakes, and notice them more quickly, but they cannot stop a programmer who is actively trying to
    cheat the system and do careless things.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)