• Generate hash from canonicalized xml

    From sky@21:1/5 to All on Mon Aug 17 06:25:29 2015
    I am trying to generate a hash to sign an xml document using xmllint and openssl on Linux. These are the requirements:

    * canonicalise (c14n) the XML document
    * generate a 160-bit binary secure hash from the canonicalise XML
    using the SHA-1 algorithm
    * encode the binary data using base-64 to produce a 28 character string

    The hash should be calculate over the <Body> of the document excluding the <IRmark> element. The hash for the example xml file is HH5368bVAEspgZvoWiL21hI76rs= but I am not able to reproduce that hash using xmllint and openssl, here are the steps I'm
    using:

    # Remove <IRmark>
    sed -i '/^[ \t]*<IRmark*/d' fps.xml

    # The hash should only be on the Body and excluding IRmark
    xmllint --xpath "//*[name()='Body']" --c14n fps.xml | openssl dgst -binary -sha1 | openssl enc -base64

    The xml file is tab indented so you will need to use xmllint to format it before calculating the hash. XMLLINT_INDENT is set to a tab (ctrl-v, tab).

    $ XMLLINT_INDENT=" " xmllint --format input.xml > fps.xml

    I am getting a hash of 8IcGWQYYJO3IvuahyOORRCZgmXs=
    but this is not correct it should be HH5368bVAEspgZvoWiL21hI76rs=

    <?xml version="1.0" encoding="UTF-8"?>
    <GovTalkMessage xmlns="http://www.govtalk.gov.uk/CM/envelope">
    <EnvelopeVersion>2.0</EnvelopeVersion>
    <Header>
    <MessageDetails>
    <Class>HMRC-PAYE-RTI-FPS</Class>
    <Qualifier>request</Qualifier>
    <Function>submit</Function>
    <CorrelationID/>
    <Transformation>XML</Transformation>
    <GatewayTest>1</GatewayTest>
    </MessageDetails>
    <SenderDetails>
    <IDAuthentication>
    <SenderID>ISV635</SenderID>
    <Authentication>
    <Method>clear</Method>
    <Role>principal</Role>
    <Value>testing1</Value>
    </Authentication>
    </IDAuthentication>
    </SenderDetails>
    </Header>
    <GovTalkDetails>
    <Keys>
    <Key Type="TaxOfficeNumber">635</Key>
    <Key Type="TaxOfficeReference">A635</Key>
    </Keys>
    <TargetDetails>
    <Organisation>IR</Organisation>
    </TargetDetails>
    <ChannelRouting>
    <Channel>
    <URI>Your 4 digit vendor ID</URI>
    <Product>Your product name</Product>
    <Version>Your product version</Version>
    </Channel>
    <Timestamp>2016-03-20T12:00:00</Timestamp>
    </ChannelRouting>
    </GovTalkDetails>
    <Body>
    <IRenvelope xmlns="http://www.govtalk.gov.uk/taxation/PAYE/RTI/FullPaymentSubmission/15-16/1">
    <IRheader>
    <Keys>
    <Key Type="TaxOfficeNumber">635</Key>
    <Key Type="TaxOfficeReference">A635</Key>
    </Keys>
    <PeriodEnd>2016-04-05</PeriodEnd>
    <DefaultCurrency>GBP</DefaultCurrency>
    <IRmark Type="generic">HH5368bVAEspgZvoWiL21hI76rs=</IRmark>
    <Sender>Employer</Sender>
    </IRheader>
    <FullPaymentSubmission>
    <EmpRefs>
    <OfficeNo>635</OfficeNo>
    <PayeRef>A635</PayeRef>
    <AORef>635PC00000000</AORef>
    <ECON>E3567891A</ECON>
    <COTAXRef>1111111111</COTAXRef>
    </EmpRefs>
    <RelatedTaxYear>15-16</RelatedTaxYear>
    <Employee>
    <EmployeeDetails>
    <NINO>AB164231A</NINO>
    <Name>
    <Ttl>Mr</Ttl>
    <Fore>Alan</Fore>
    <Sur>Example</Sur>
    </Name>
    <Address>
    <Line>1 The Lane</Line>
    <Line>Shipley</Line>
    <Line>West Yorkshire</Line>
    <UKPostcode>BD17 2AD</UKPostcode>
    </Address>
    <BirthDate>1996-10-28</BirthDate>
    <Gender>M</Gender>
    </EmployeeDetails>
    <Employment>
    <Starter>
    <StartDate>2015-04-08</StartDate>
    <StartDec>B</StartDec>
    </Starter>
    <PayId>123-A03</PayId>
    <FiguresToDate>
    <TaxablePay>1445.00</TaxablePay>
    <TotalTax>283.40</TotalTax>
    </FiguresToDate>
    <Payment>
    <BacsHashCode>1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef</BacsHashCode>
    <PayFreq>M1</PayFreq>
    <PmtDate>2015-06-30</PmtDate>
    <MonthNo>3</MonthNo>
    <PeriodsCovered>1</PeriodsCovered>
    <HoursWorked>E</HoursWorked>
    <TaxCode>410L</TaxCode>
    <TaxablePay>815.00</TaxablePay>
    <PayAfterStatDedns>702.36</PayAfterStatDedns>
    <TaxDeductedOrRefunded>94.40</TaxDeductedOrRefunded>
    </Payment>
    <NIlettersAndValues>
    <NIletter>M</NIletter>
    <GrossEarningsForNICsInPd>815.00</GrossEarningsForNICsInPd>
    <GrossEarningsForNICsYTD>2260.00</GrossEarningsForNICsYTD>
    <AtLELYTD>1443.00</AtLELYTD>
    <LELtoPTYTD>546.00</LELtoPTYTD>
    <PTtoUAPYTD>271.00</PTtoUAPYTD>
    <UAPtoUELYTD>0.00</UAPtoUELYTD>
    <TotalEmpNICInPd>0.00</TotalEmpNICInPd>
    <TotalEmpNICYTD>0.00</TotalEmpNICYTD>
    <EmpeeContribnsInPd>18.24</EmpeeContribnsInPd>
    <EmpeeContribnsYTD>32.52</EmpeeContribnsYTD>
    </NIlettersAndValues>
    </Employment>
    </Employee>
    <Employee>
    <EmployeeDetails>
    <Name>
    <Ttl>Mr</Ttl>
    <Fore>John</Fore>
    <Fore>Edward</Fore>
    <Sur>Surname</Sur>
    </Name>
    <Address>
    <Line>45 High Street</Line>
    <Line>Gosforth</Line>
    <Line>Newcastle upon Tyne</Line>
    <Line>Tyne and Wear</Line>
    <UKPostcode>NE1 7XF</UKPostcode>
    </Address>
    <BirthDate>1964-05-11</BirthDate>
    <Gender>M</Gender>
    </EmployeeDetails>
    <Employment>
    <PayId>123-A02</PayId>
    <IrrEmp>yes</IrrEmp>
    <FiguresToDate>
    <TaxablePay>3750.00</TaxablePay>
    <TotalTax>1687.50</TotalTax>
    </FiguresToDate>
    <Payment>
    <BacsHashCode>ef1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcd</BacsHashCode>
    <PayFreq>M1</PayFreq>
    <PmtDate>2015-06-20</PmtDate>
    <LateReason>H</LateReason>
    <MonthNo>3</MonthNo>
    <PeriodsCovered>1</PeriodsCovered>
    <HoursWorked>B</HoursWorked>
    <TaxCode>D1</TaxCode>
    <TaxablePay>1250.00</TaxablePay>
    <PayAfterStatDedns>627.83</PayAfterStatDedns>
    <TaxDeductedOrRefunded>562.50</TaxDeductedOrRefunded>
    </Payment>
    <NIlettersAndValues>
    <NIletter>A</NIletter>
    <GrossEarningsForNICsInPd>0.00</GrossEarningsForNICsInPd>
    <GrossEarningsForNICsYTD>2500.00</GrossEarningsForNICsYTD>
    <AtLELYTD>962.00</AtLELYTD>
    <LELtoPTYTD>364.00</LELtoPTYTD>
    <PTtoUAPYTD>1174.00</PTtoUAPYTD>
    <UAPtoUELYTD>0.00</UAPtoUELYTD>
    <TotalEmpNICInPd>0.00</TotalEmpNICInPd>
    <TotalEmpNICYTD>162.02</TotalEmpNICYTD>
    <EmpeeContribnsInPd>0.00</EmpeeContribnsInPd>
    <EmpeeContribnsYTD>140.88</EmpeeContribnsYTD>
    </NIlettersAndValues>
    <NIlettersAndValues>
    <NIletter>D</NIletter>
    <SCON>S1111111M</SCON>
    <GrossEarningsForNICsInPd>1250.00</GrossEarningsForNICsInPd>
    <GrossEarningsForNICsYTD>1250.00</GrossEarningsForNICsYTD>
    <AtLELYTD>481.00</AtLELYTD>
    <LELtoPTYTD>182.00</LELtoPTYTD>
    <PTtoUAPYTD>587.00</PTtoUAPYTD>
    <UAPtoUELYTD>0.00</UAPtoUELYTD>
    <TotalEmpNICInPd>54.86</TotalEmpNICInPd>
    <TotalEmpNICYTD>54.86</TotalEmpNICYTD>
    <EmpeeContribnsInPd>59.67</EmpeeContribnsInPd>
    <EmpeeContribnsYTD>59.67</EmpeeContribnsYTD>
    </NIlettersAndValues>
    </Employment>
    </Employee>
    </FullPaymentSubmission>
    </IRenvelope>
    </Body>
    </GovTalkMessage>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Peter Flynn@21:1/5 to sky on Sun Aug 30 21:57:36 2015
    On 08/17/2015 02:25 PM, sky wrote:
    I am trying to generate a hash to sign an xml document using xmllint and openssl on Linux. These are the requirements:

    * canonicalise (c14n) the XML document
    * generate a 160-bit binary secure hash from the canonicalise XML
    using the SHA-1 algorithm
    * encode the binary data using base-64 to produce a 28 character string

    As well as canonicalisation, I would turn all TABs and newlines to
    spaces, and get rid of all white-space in element markup, and and then
    subset what is left, eg

    xmllint --c14n fps.xml | tr '\009\012' '\040\040' |\
    sed -e "s+>[\ ]*<+><+g" | lxgrep Body - | lxreplace -q IRmark -d

    You can get lxgrep and lxreplace from the LTxml2 toolkit at https://www.ltg.ed.ac.uk/software/ltxml2/

    The xml file is tab indented so you will need to use xmllint to
    format it before calculating the hash. XMLLINT_INDENT is set to a tab (ctrl-v, tab).

    Getting rid of the white-space in element content avoids this problem.
    It's not something you would want to do if you don't know whether or not
    any particular element might in fact contain mixed content (in the
    absence of a schema or DTD declaration, the parser cannot tell), but if
    you know from your business rules that it is true, this can save a lot
    of aggravation.

    Be careful of generating hashes from streams that do or do not end with
    a newline character: they will be different.

    ///Peter

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)