{"id":638,"date":"2021-08-26T11:23:33","date_gmt":"2021-08-26T09:23:33","guid":{"rendered":"https:\/\/nubisoft.io\/blog\/?p=638"},"modified":"2023-05-22T12:31:15","modified_gmt":"2023-05-22T10:31:15","slug":"how-to-deal-with-non-reproducible-bugs-in-java-web-applications","status":"publish","type":"post","link":"https:\/\/nubisoft.io\/blog\/how-to-deal-with-non-reproducible-bugs-in-java-web-applications\/","title":{"rendered":"How to deal with non-reproducible bugs in Java web applications?"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">Non-reproducible bugs is probably the most challenging <a rel=\"noreferrer noopener\" href=\"https:\/\/www.professionalqa.com\/non-reproducible-bug\" data-type=\"URL\" data-id=\"https:\/\/www.professionalqa.com\/non-reproducible-bug\" target=\"_blank\">category<\/a> of software bugs. To solve such kind of a problem you need to be like a Sherlock Holmes and use all your knowledge and experience to put all pieces of the puzzle together. In this blog post, we&#8217;ll show you how we squashed real-life bug from that category. <\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Introduction<\/h3>\n\n\n\n<p>In one of our <a href=\"https:\/\/nubisoft.io\/blog\/nubisoft-is-transforming-the-software-development-game-for-smes\/\" data-type=\"post\" data-id=\"1066\">healthcare projects<\/a> we&#8217;ve been integrating with a\u00a0government service for collection, analysis, and sharing of medical information about patients, and because it&#8217;s very sensitive data whole communication had to be encrypted and secured with digital certificates. Since it was a SOAP Web Service we decided to use Apache CXF with Merlin as a Crypto interface implementation because it&#8217;s a reliable and widely used tool in the Java world.<\/p>\n\n\n\n<p>Once the solution was delivered to the customers and deployed onto production in the next few months medical practitioners were using it in their everyday work without any issues. But one day our monitoring dashboard in New Relic started getting red. There were errors in communication with that government SOAP Web Service. Heavy cup of coffee and we were ready to start the investigation. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Investigation<\/h3>\n\n\n\n<p>Doing the logs analysis we&#8217;ve found the error:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">org.apache.wss4j.common.ext.WSSecurityException: The private key for the supplied alias does not exist in the keystore\n\tat org.apache.wss4j.common.crypto.Merlin.getPrivateKey(Merlin.java:767)\n\tat org.apache.wss4j.dom.message.WSSecSignature.computeSignature(WSSecSignature.java:548)\n\tat org.apache.wss4j.dom.action.SignatureAction.execute(SignatureAction.java:154)\n\tat org.apache.wss4j.dom.handler.WSHandler.doSenderAction(WSHandler.java:234)\n\tat org.apache.cxf.ws.security.wss4j.WSS4JOutInterceptor.access$100(WSS4JOutInterceptor.java:54)\n\tat org.apache.cxf.ws.security.wss4j.WSS4JOutInterceptor$WSS4JOutInterceptorInternal.handleMessageInternal(WSS4JOutInterceptor.java:261)\n\tat org.apache.cxf.ws.security.wss4j.WSS4JOutInterceptor$WSS4JOutInterceptorInternal.handleMessage(WSS4JOutInterceptor.java:135)\n\tat org.apache.cxf.ws.security.wss4j.WSS4JOutInterceptor$WSS4JOutInterceptorInternal.handleMessage(WSS4JOutInterceptor.java:122)\n\tat org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)\n\tat org.apache.cxf.endpoint.ClientImpl.doInvoke(ClientImpl.java:514)\n\tat org.apache.cxf.endpoint.ClientImpl.invoke(ClientImpl.java:423)\n\tat org.apache.cxf.endpoint.ClientImpl.invoke(ClientImpl.java:324)\n\tat org.apache.cxf.endpoint.ClientImpl.invoke(ClientImpl.java:277)\n\tat org.apache.cxf.frontend.ClientProxy.invokeSync(ClientProxy.java:96)\n\tat org.apache.cxf.jaxws.JaxWsClientProxy.invoke(JaxWsClientProxy.java:139)\n\tat .....\nCaused by: java.security.UnrecoverableKeyException: Get Key failed: Given final block not properly padded. Such issues can arise \nif a bad key is used during decryption.\n\tat sun.security.pkcs12.PKCS12KeyStore.engineGetKey(PKCS12KeyStore.java:455)\n\tat java.security.KeyStore.getKey(KeyStore.java:1023)\n\tat org.apache.wss4j.common.crypto.Merlin.getPrivateKey(Merlin.java:756)\n\t... 193 common frames omitted\nCaused by: javax.crypto.BadPaddingException: Given final block not properly padded. Such issues can arise \nif a bad key is used during decryption.\n\tat com.sun.crypto.provider.CipherCore.unpad(CipherCore.java:975)\n\tat com.sun.crypto.provider.CipherCore.fillOutputBuffer(CipherCore.java:1056)\n\tat com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:853)\n\tat com.sun.crypto.provider.PKCS12PBECipherCore.implDoFinal(PKCS12PBECipherCore.java:405)\n\tat com.sun.crypto.provider.PKCS12PBECipherCore$PBEWithSHA1AndDESede.engineDoFinal(PKCS12PBECipherCore.java:437)\n\tat javax.crypto.Cipher.doFinal(Cipher.java:2168)\n\tat sun.security.pkcs12.PKCS12KeyStore.lambda$engineGetKey$0(PKCS12KeyStore.java:396)\n\tat sun.security.pkcs12.PKCS12KeyStore$RetryWithZero.run(PKCS12KeyStore.java:285)\n\tat sun.security.pkcs12.PKCS12KeyStore.engineGetKey(PKCS12KeyStore.java:390)\n\t... 195 common frames omitted\n<\/pre>\n\n\n\n<p>As our application was deployed on <a href=\"https:\/\/nubisoft.io\/blog\/how-to-enable-kubernetes-cluster-autoscaling-in-aws\/\" data-type=\"post\" data-id=\"761\">Kubernetes with horizontal scaling<\/a> we checked other instances of that microservice and their corresponding logs. We were really surprised when realized that this signing error was occurring only on this specific instance. Moreover, the issue couldn\u2019t be replicated neither on development nor on the staging environment.<\/p>\n\n\n\n<p>So how could we find out the root cause of the issue?<\/p>\n\n\n\n<p>The only clue we had was the stack trace we found in logs, so we&#8217;ve taken a closer look at it again!<\/p>\n\n\n\n<p>Our investigation started by attaching the debugger to the Docker container. We&#8217;ve chosen  the following line to put a breakpoint at and it was really surprising when we actually didn\u2019t hit it:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">sun.security.pkcs12.PKCS12KeyStore.engineGetKey(PKCS12KeyStore.java:390<\/pre>\n\n\n\n<p>So essentially we discovered that there was a different execution path on our local machine. To confirm that and find the diverge point we were debugging the whole stack trace step by step. As we can see from logs, the exception was thrown at <\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">org.apache.wss4j.common.crypto.Merlin.getPrivateKey(Merlin.java:756) <\/pre>\n\n\n\n<p>so we stopped the debugger at this line and investigated the state of program variables. It was a good call as we noticed that <em>KeyStoreSpi<\/em> implementation was different from what we observed in the stack trace. But why was it different?<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized is-style-default\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/nubisoft.io\/blog\/wp-content\/uploads\/2021\/08\/Screenshot-from-2021-08-02-16-48-08.png\" alt=\"\" class=\"wp-image-662\" width=\"1656\" height=\"169\" srcset=\"https:\/\/nubisoft.io\/blog\/wp-content\/uploads\/2021\/08\/Screenshot-from-2021-08-02-16-48-08.png 1656w, https:\/\/nubisoft.io\/blog\/wp-content\/uploads\/2021\/08\/Screenshot-from-2021-08-02-16-48-08-300x31.png 300w, https:\/\/nubisoft.io\/blog\/wp-content\/uploads\/2021\/08\/Screenshot-from-2021-08-02-16-48-08-1024x105.png 1024w, https:\/\/nubisoft.io\/blog\/wp-content\/uploads\/2021\/08\/Screenshot-from-2021-08-02-16-48-08-768x78.png 768w, https:\/\/nubisoft.io\/blog\/wp-content\/uploads\/2021\/08\/Screenshot-from-2021-08-02-16-48-08-1536x157.png 1536w, https:\/\/nubisoft.io\/blog\/wp-content\/uploads\/2021\/08\/Screenshot-from-2021-08-02-16-48-08-1200x122.png 1200w\" sizes=\"auto, (max-width: 1656px) 100vw, 1656px\" \/><\/figure>\n\n\n\n<p>The answer was hidden inside the <em>sun.security.jca.ProviderList.getService(String type, String name) <\/em>method which source you can see below:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"java\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">    \/**\n     * Return a Service describing an implementation of the specified\n     * algorithm from the Provider with the highest precedence that\n     * supports that algorithm. Return null if no Provider supports this\n     * algorithm.\n     *\/\n    public Service getService(String type, String name) {\n        for (int i = 0; i &lt; configs.length; i++) {\n            Provider p = getProvider(i);\n            Service s = p.getService(type, name);\n            if (s != null) {\n                return s;\n            }\n        }\n        return null;\n    }<\/pre>\n\n\n\n<p>This method returns the first Java KeyStore provider that supports the claimed <em>pkcs12<\/em> algorithm, however, the repetitive order of the provider list is not guaranteed. In our case, we registered <a rel=\"noreferrer noopener\" href=\"https:\/\/www.bouncycastle.org\/\" data-type=\"URL\" data-id=\"https:\/\/www.bouncycastle.org\/\" target=\"_blank\">Bouncy Castle<\/a> as an additional provider, but sometimes it wasn&#8217;t registered at index 0 and that was causing problems.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Solution<\/h3>\n\n\n\n<p>Fortunately, with the help of <a href=\"https:\/\/ws.apache.org\/wss4j\/config.html\">Apache WSS4J documentation<\/a> we&#8217;ve found a solution, wherein you can set KeyStore provider explicitly.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"java\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">        Properties sig_props = new Properties()\n        sig_props.put(\"org.apache.ws.security.crypto.merlin.keystore.provider\", 'BC')\n        \/\/ another signature properties...\n\n        Map&lt;String, Object> outProps = new HashMap&lt;String, Object>()\n        outProps.put(WSHandlerConstants.ACTION, WSHandlerConstants.SIGNATURE)\n        outProps.put(\"properties\", sig_props)\n        outProps.put(WSHandlerConstants.SIG_PROP_REF_ID, \"properties\")\n        \/\/ another out interceptor properties properties...\n\n        Endpoint cxfEndpoint = ((Client) wsClient).getEndpoint()\n        cxfEndpoint.getOutInterceptors().add(new WSS4JOutInterceptor(outProps))<\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Summary<\/h3>\n\n\n\n<p>Troubleshooting non-reproducible bugs in Java-based web applications it&#8217;s a hard task. It may consume a lot of time and make you very frustrated. Remembering few basic rules may save you hours of work and few white hairs:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>gather and analyze application logs<\/li>\n\n\n\n<li>monitor your application at runtime<\/li>\n\n\n\n<li>read stack traces carefully<\/li>\n\n\n\n<li>debugger is your friend!<\/li>\n<\/ul>\n\n\n\n<p>Happy debugging!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Non-reproducible bugs is probably the most challenging category of software bugs. To solve such kind of a problem you need to be like a Sherlock Holmes and use all your knowledge and experience to put all pieces of the puzzle together. In this blog post, we&#8217;ll show you how we squashed real-life bug from that [&hellip;]<\/p>\n","protected":false},"author":7,"featured_media":661,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_case_study_excerpt":"","footnotes":""},"categories":[221,3],"tags":[213,215,210,212,216],"class_list":["post-638","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-healthcare","category-software-development","tag-apache-wss4j","tag-pkcs12-signature","tag-platforma-p1","tag-soap-signature","tag-ws-signature"],"_links":{"self":[{"href":"https:\/\/nubisoft.io\/blog\/wp-json\/wp\/v2\/posts\/638","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nubisoft.io\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nubisoft.io\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nubisoft.io\/blog\/wp-json\/wp\/v2\/users\/7"}],"replies":[{"embeddable":true,"href":"https:\/\/nubisoft.io\/blog\/wp-json\/wp\/v2\/comments?post=638"}],"version-history":[{"count":34,"href":"https:\/\/nubisoft.io\/blog\/wp-json\/wp\/v2\/posts\/638\/revisions"}],"predecessor-version":[{"id":1261,"href":"https:\/\/nubisoft.io\/blog\/wp-json\/wp\/v2\/posts\/638\/revisions\/1261"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/nubisoft.io\/blog\/wp-json\/wp\/v2\/media\/661"}],"wp:attachment":[{"href":"https:\/\/nubisoft.io\/blog\/wp-json\/wp\/v2\/media?parent=638"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nubisoft.io\/blog\/wp-json\/wp\/v2\/categories?post=638"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nubisoft.io\/blog\/wp-json\/wp\/v2\/tags?post=638"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}