• How to escape a relative URL string with arguments for usage in a HTTP

    From R.Wieser@21:1/5 to All on Wed Nov 24 09:53:59 2021
    Hello all,

    As in the subject line : I've got a relative URL path with arguments it
    which I need to have its special chars escaped so it can be used in a HTTP
    GET line. (example: "/some+part/another+part?arg1=foo+42%23brown&arg2=bar+fly"). The thing is
    that I can't seem to find an API function for it.

    The ShlwApi UrlEscape function doesn't even seem to want to touch the
    arguments on a full URL, and the WinInet InternetCreateUrl function does not want to function with only the path and arguments parts being provided.

    Looking at what docs.microsoft.com says about them does not show any leads either.

    In other words, does someone know which function I'm supposed to use to
    create an escaped relative URL with arguments ?

    Or am I supposed to (again) just roll my own ...

    Regards,
    Rudy Wieser

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Charlie Gibbs@21:1/5 to R.Wieser on Wed Nov 24 16:57:28 2021
    On 2021-11-24, R.Wieser <address@not.available> wrote:

    Hello all,

    As in the subject line : I've got a relative URL path with arguments it
    which I need to have its special chars escaped so it can be used in a HTTP GET line. (example: "/some+part/another+part?arg1=foo+42%23brown&arg2=bar+fly"). The thing is that I can't seem to find an API function for it.

    The ShlwApi UrlEscape function doesn't even seem to want to touch the arguments on a full URL, and the WinInet InternetCreateUrl function does not want to function with only the path and arguments parts being provided.

    Looking at what docs.microsoft.com says about them does not show any leads either.

    In other words, does someone know which function I'm supposed to use to create an escaped relative URL with arguments ?

    Or am I supposed to (again) just roll my own ...

    Maybe I'm different in that I've been writing string parsing code
    for decades, but for me it's much faster to roll my own than go
    through all the stuff you've described above (and you still haven't
    found a solution yet).

    Don't be afraid to do your own parsing. It's often simpler than
    figuring out how to use some proprietary API. And it's actually
    kind of fun once you get into it.

    Besides, you might want your program to run on a Linux box someday...

    --
    /~\ Charlie Gibbs | Microsoft is a dictatorship.
    \ / <cgibbs@kltpzyxm.invalid> | Apple is a cult.
    X I'm really at ac.dekanfrus | Linux is anarchy.
    / \ if you read it the right way. | Pick your poison.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From R.Wieser@21:1/5 to All on Wed Nov 24 20:09:50 2021
    Charlie,

    Maybe I'm different in that I've been writing string parsing
    code for decades, but for me it's much faster to roll my own
    than go through all the stuff you've described above (and you
    still haven't found a solution yet).

    :-) Yes, rolling my own encoding will most likely work ... I think. But
    than I will need to spend a /lot/ of time figuring out the requirements[1],
    and after implementing it testing the result.

    [1] Currently I can't even seem to find if a space char must be
    percent-escaped or should become a "+" character. The mentioned functions
    do the former, FF here does the latter ...

    Also, I still consider myself to be a hobbyist and assume that those MS guys are /way/ better at their jobs. Even though I've been disappointed a few
    times in that regard before and now again I still can't shake that feeling.
    Go figure ...

    Don't be afraid to do your own parsing. It's often simpler than
    figuring out how to use some proprietary API.

    In this case the usage of the mentioned functions is not the problem, as
    that can be found at docs.microsoft.com . What is is that it simply doesn't
    do what I think it should be doing. Its often feels as if they write their functions with a certain goal in mind, not at all to be general purpose ...

    And it's actually kind of fun once you get into it.

    Yes, it is. But at moments I would like to be "lazy" and just use an available, standard function for it.

    Besides, you might want your program to run on a Linux box someday...

    Alas, I don't think that that wil ever happen. I'm writing Assembly using Win32 API functions, and neither translates well.

    But yes, I might one day write something like it on a Linux box. I hope a Raspberry Pi counts as one ? :-)

    Regards,
    Rudy Wieser

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tavis Ormandy@21:1/5 to R.Wieser on Wed Nov 24 20:14:59 2021
    On 2021-11-24, R.Wieser wrote:
    In other words, does someone know which function I'm supposed to use to create an escaped relative URL with arguments ?

    I think it would be helpful if you gave an example of the output you
    wanted and the input you have.

    It sounds like you're looking looking for InternetCombineUrlA (with
    an empty lpszBaseUrl), but it seems unlikely you knew about
    InternetCreateUrl and not that one... :)

    Tavis.


    --
    _o) $ lynx lock.cmpxchg8b.com
    /\\ _o) _o) $ finger taviso@sdf.org
    _\_V _( ) _( ) @taviso

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From R.Wieser@21:1/5 to All on Wed Nov 24 23:52:35 2021
    Tavis,

    I think it would be helpful if you gave an example of the output
    you wanted

    Blimey, I thought I did : [quote]example: "/some+part/another+part?arg1=foo+42%23brown&arg2=bar+fly"[/quote]

    and the input you have.

    I think you misunderstood : I do not have some specific input, I need random inputs to be correctly (combined and) escaped into a valid, relative URLs
    with arguments.

    It sounds like you're looking looking for InternetCombineUrlA
    (with an empty lpszBaseUrl)

    I saw it but ignored it as I assumed it would just combine. But no, it
    ecapes too. Alas, it has the same flaw UrlEscape has : it ignores
    everything after the (first) question mark.

    Ohhh... Interresting - Using InternetCombineUrlA :

    Input: "/some#part/other stuff?arg=my cat# a fragment"
    Output: "/some?arg=my cat# a fragment#part/other stuff"

    Notice that the part from the first hash upto the questionmark (*not* the following slash) is moved to the end of the string. Ofcourse, now that that moved part is right of the questionmark its not escaped either.

    Also unexpected output when I provide the relative path as the first, and
    the argument string as the second one :

    Part1: "/some?part/other part"
    Part2: "?arg1=my dog"
    Output: "/some?arg1=my dog"

    Notice that from "part1" everything after the questionmark has just
    disappeared (without an error), and that the space in "part2" is still unescaped.


    Also consider the below, all of which produces garbage output when InternetCreateUrl is used (which accepts the path part and arguments as seperate strings, which /should/ make lots of stuff easier) :

    Path: "some?folder'
    Args: '?arg1=my dog'
    Fail: first question mark is not escaped.

    Path: "some folder'
    Args: '?arg=my?dog'
    Fail: Last question mark is not escaped.

    Path: "some folder"
    Args: "#fragment"
    Fail: hashmark *is* escaped when at that position it shouldn't

    Path: "some folder"
    Args: "arg=dog"
    Fail: The "Args"part is directly concatenated to the "path" part (without
    the insertion of a question mark)

    I hope thats enough info. :-)

    Regards,
    Rudy Wieser

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tavis Ormandy@21:1/5 to R.Wieser on Thu Nov 25 03:15:10 2021
    On 2021-11-24, R.Wieser wrote:
    Tavis,

    I think it would be helpful if you gave an example of the output
    you wanted

    Blimey, I thought I did : [quote]example: "/some+part/another+part?arg1=foo+42%23brown&arg2=bar+fly"[/quote]

    Sure, but the point was it wasn't clear if that is what you *want* or
    what you *have*. If it is what you *want*, then what do you have?

    Your reply makes it clear that this is what you *want* - good - but
    you didn't give an example of what you have.


    and the input you have.

    I think you misunderstood : I do not have some specific input, I need random inputs to be correctly (combined and) escaped into a valid, relative URLs with arguments.

    Umm.. I think that's obvious :)

    What I don't understand is why you can't show what specific input you
    have that should produce the specific output above? e.g. "I have a
    cracked URL in a URL_COMPONENTS, lpszUrlPath is /foobar/".

    It sounds like you're looking looking for InternetCombineUrlA
    (with an empty lpszBaseUrl)

    I saw it but ignored it as I assumed it would just combine. But no, it ecapes too. Alas, it has the same flaw UrlEscape has : it ignores
    everything after the (first) question mark.

    Ohhh... Interresting - Using InternetCombineUrlA :

    Input: "/some#part/other stuff?arg=my cat# a fragment"
    Output: "/some?arg=my cat# a fragment#part/other stuff"

    Notice that the part from the first hash upto the questionmark (*not* the following slash) is moved to the end of the string. Ofcourse, now that that moved part is right of the questionmark its not escaped either.

    It's helpful that you showed the input you have, but again, it
    would be helpful if you show the output you *wanted* too :)

    Is the URL already cracked, i.e. you know which part is a fragment and a
    query?

    Tavis.

    --
    _o) $ lynx lock.cmpxchg8b.com
    /\\ _o) _o) $ finger taviso@sdf.org
    _\_V _( ) _( ) @taviso

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From JJ@21:1/5 to R.Wieser on Thu Nov 25 10:01:48 2021
    On Wed, 24 Nov 2021 20:09:50 +0100, R.Wieser wrote:

    [1] Currently I can't even seem to find if a space char must be percent-escaped or should become a "+" character. The mentioned functions do the former, FF here does the latter ...

    The "+" space escaped character is dependent on the server script. i.e. not every server (script) support the "+" space escaped character.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From R.Wieser@21:1/5 to All on Thu Nov 25 09:17:27 2021
    JJ,

    The "+" space escaped character is dependent on the server script.
    i.e. not every server (script) support the "+" space escaped character.

    What I did was fully client-side, using FireFox 52.

    Though I what I posted came from a HTML form element, where spaces in the
    path where indeed percent encoded, but spaces in the argument part where encoded as plus signs.

    When I entered the same in the URL bar all spaces got percent encoded - but none of any plus signs I introduced got encoded, regardless of in the path
    or argument parts.

    Not really consistent behaviour, :-(

    I've also got absolutily no idea how a webserver / future me is supposed to deal with that ...

    Regards,
    Rudy Wieser

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From R.Wieser@21:1/5 to All on Thu Nov 25 11:15:56 2021
    Tavis,

    Sure, but the point was it wasn't clear if that is what you *want* or
    what you *have*.

    Neither. What I *want* is to send a GET followed by a string that can
    again be split into its basic parts by a webserver. What I *have* is tho strings, one containing the URL path, and the other the arguments and/or fragment.

    Umm.. I think that's obvious :)

    As you asked for an "before encoding" example - which doesn't really exist -
    I wasn't so sure about that anymore.

    What I don't understand is why you can't show what specific input
    you have that should produce the specific output above?

    Perhaps that is because I have not yet found a function which will actually, you know, encode the whole URL ? :-p

    What I showed is what I gathered what, from the information I googeled, the result of the encoding could/would look. I did that encoding by hand,
    starting with
    "/some part/another part?arg1=foo 42#brown&arg2=bar fly".

    It's helpful that you showed the input you have, but again, it
    would be helpful if you show the output you *wanted* too :)

    Thats the wrong question.

    Its not about what *I* want, but what the rules (whatever they are) decree
    it should look like - so it can be broken up by the webserver into the exact same parts as provided by the client program.

    But I'll try :

    Input: "/some#part/other stuff?arg=my cat# a fragment"
    Output: "/some?arg=my cat# a fragment#part/other stuff"

    Expected output ? Either an error because a fragment cannot be part of the first part, or something like this : "/some%23part/other+stuff?arg=my+cat#+a+fragment"

    Another example :
    Input: "/part1#part2/part3#part4?arg1=data1#part5&arg2=data2#part6"

    This one is problematic : how the <beep> do I know if that last "#part6" is
    a fragment, or just a part of "arg2" ? I've not found anything even
    wanting to touch tat question ...

    But, other than just throwing an error because of ambiguity, there are at
    least three outputs I can think of :

    Naive:
    "/part1%23part2/part3%23part4?arg1=data1%23part5&arg2=data2%23part6"

    Making a guess: "/part1%23part2/part3%23part4?arg1=data1%23part5&arg2=data2#part6"

    Fragment combining (a-la InternetCombineUrlA): "/part1/part3?arg1=data1&arg2=data2#part5#part4#part2#part6"

    The first one will cause problems if the last part was actually ment as a fragment. In the same way, the second one causes a problem when it was ment
    as part of "arg2". The same goes for the third one (the "#part2" could be
    part of the folder name). Also, I've not seen any indication that multiple fragments are allowed or used anywhere.

    The only solution I see is to provide the path, argument and fragment as seperate strings, so that all three can be encoded /before/ gluing them together (using their respective delimiters).

    Is the URL already cracked, i.e. you know which part is a fragment
    and a query?

    I've been ninja-ed :-)

    No, I do not *know*. But I do know that determining which is what is
    already a problem ...

    "InternetCrackUrl" doesn't seperate them either. Whatever is starting with
    a "?" or "# is returned in the "extra" part.

    By the way, FireFox doesn't really know either. When entering hash symbols
    into the "input" boxes of an HTML "form" element the hash symbols get
    percent encoded.

    But put them into the "action" part, and everything starting from them will, for that part, just disappear.

    The same happens when hash symbols are used in the URL bar - even though the URL bar does not reflect that throwing-away change. <whut?>

    Damn .. It looked to be so easy, just finding the right function. :-)

    Regards,
    Rudy Wieser

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tavis Ormandy@21:1/5 to R.Wieser on Thu Nov 25 15:39:08 2021
    On 2021-11-25, R.Wieser wrote:
    It's helpful that you showed the input you have, but again, it
    would be helpful if you show the output you *wanted* too :)

    Thats the wrong question.

    Its not about what *I* want, but what the rules (whatever they are) decree
    it should look like - so it can be broken up by the webserver into the exact same parts as provided by the client program.

    I've know how URL encoding works ;-)

    I was asking those questions for a reason. Imagine that you had the
    string "/foo/bar?a=x#z", you could encode this a bunch of different
    ways.

    "/foo/bar?a=x%23z" -- ?a=x#z is the query, there is no fragment "/foo/bar?a=x" -- ?a=x is the query, #z was a fragment (removed, not sent to server)
    "/foo/bar%3fa=x" -- bar%3fa was part of the path, #z was a fragment "/foo/bar%2fa=x%23x" -- everything was a (weird) path component
    etc.

    For each of these cases, you need to know what the components are, so I
    asked you what you have and what you want. For example, if you got this
    URL from a user, then it needs to be cracked first - if you generated
    it, then the encoding needs to happen *before* you create it, etc, etc.

    I think I've run out of patience for this, sorry!

    Tavis.

    --
    _o) $ lynx lock.cmpxchg8b.com
    /\\ _o) _o) $ finger taviso@sdf.org
    _\_V _( ) _( ) @taviso

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From R.Wieser@21:1/5 to All on Thu Nov 25 18:23:18 2021
    Tavis,

    I was asking those questions for a reason. Imagine that you had
    the string "/foo/bar?a=x#z", you could encode this a bunch of
    different ways.

    I know. That is what I tried to make clear in my previous message with
    those examples.

    For example, if you got this URL from a user, then it needs to
    be cracked first

    /How/ I get the involved parts is fully outside the scope. My only interest was-and-is how to combine them.

    And yes, I did think about that too, as I tried to make clear here :

    The only solution I see is to provide the path, argument and fragment
    as seperate strings, so that all three can be encoded /before/ gluing
    them together (using their respective delimiters).

    I think I've run out of patience for this, sorry!

    My apologies for not allowing you to muddy the waters. I've got bad experiences (multiple) with that (responders running off hunting pursuing
    their ideas, never actually adressing the stated problem).

    Thanks for trying anyway.

    I think I can conlude that encoding an URL is a mess. Not because its difficult, but as both of us have shown, it can be done in too many ways.

    Regards,
    Rudy Wieser

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From R.Wieser@21:1/5 to All on Mon Nov 29 20:59:56 2021
    In other words, does someone know which function I'm supposed to use to create an escaped relative URL with arguments ?

    Or am I supposed to (again) just roll my own ...

    Hmmm... After having tried to come up with something, I have to admit my defeat : there is simply no single solution possible that will cover all use-cases.

    The ShlwApi UrlEscape function doesn't even seem to want to touch the arguments on a full URL

    But at least I figured out why it doesn't want to that.

    Oh well. Another lesson of "looks to be easy - but it isn't".

    Regards,
    Rudy Wieser

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)