Categories
Docker Kubernetes Kubernetes

Kubernetes Upgrade fails with timeout

What the heck? The latest upgrade procedure of my Kubernetes cluster gave me headaches. Not only because it failed with a timeout – mainly because the root cause was not obvious. In fact, the maintainers of Kubernetes made an infrastructure change long time ago but forgot to properly communicate to their users.

But before we start the rant, let’s check what happened – I tried to upgrade from v1.18.2 to v1.18.14. This happened:

timed out waiting for the condition
couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced
k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.rollbackOldManifests

So I started to re-run the upgrade with verbosity on. Nothing more information. What I saw was that the kube-apiserver won’t come up – no log file gave a reason why this could happen. I asked Google – very little information but one hint – the image pull could have been failing.

Another search revealed that Kubernetes maintainers changed their repository from gcr.io/google_containers to k8s.gcr.io – presumingly long time ago. And checking my cluster more thoroughly I found out that the old repository was being used. But why was my cluster not knowing the new one? I was upgrading each major version – since the beginning.

Next search for the information on how to change it – nothing on Kubernetes docs (WTF!) but in some change request. You need to see the kubeadm-config ConfigMap in your kube-system namespace. There you’ll find the repository address. Changing this to the correct name finally did the trick and the upgrade succeeded.

But the more I think about this challenge, the more angry I get.

  1. How can such an essential change not be communicated more prominently – especially since the old repository was abandoned with v1.18.6 – the last image version in the old repo. Every upgrade document sind 1.18 must have a warning that the old repo is out of order now and a link to the change procedure
  2. Why is the error message not telling anything useful? The stacktrace is useless for the information about what happened.
  3. And why – for God’s sake – does the upgrade procedure itself not check for this essential change? Especially since v1.18.7.

This way of maintaining software is a very unprofessional one. Kubernetes is the foundation of so many productive systems now that this essential change must be taken more seriously by the maintainers. Breaking the procedures is a danger to all these systems and a proper communication or risk mitigation is not in place.

I need to stress out that upgrading Kubernetes is always risky. I experienced so many issues in the past that blocked an upgrade. Most of them were better documented and so I could resolve them. But this infrastructure change is a sign of unprofessional risk management. And I hope they will do much better next time.

Categories
Applications Bugzilla for Java CSV Compiling Miscellaneous Projects RS Library RsBudget Templating

The End of Atlassian JIRA

Atlassian announced the end of their various licensed stand-alone products. This heavily affects several of my projects, especially my Open Source projects. That’s why I am preparing now the migration away from Atlassian products. Of course, I could stay with Atlassian using their Cloud offerings. I like their products as they address my needs like no other products on the market. However, there are some downsides if I’d stay with Atlassian:

  • A migration of all existing data is unavoidable and will cost time and effort. It is not clear whether the Cloud product configuration would match my needs.
  • My Open Source projects are already code-hosted by GitHub. That’s why GitHub is the natural migration target for them.
  • I want to have full control over my CI/CD pipelines. A cloud Bamboo solution will take away a lot of freedom and I am not sure whether the various secrets I require during build and develop will stay on my servers and only there.
  • A long-term availability of issues and documentation is not guaranteed if I shall be forced to abandon projects.

So, all issue trackers are migrated to GitHub by today. The JIRA server has been shutdown. Please refer to the respective GitHub repositories in case you need to report an issue or require support for any of my projects.

Bamboo will be migrated to Jenkins. However, I experience some performance issues when starting Jenkins. That’s why this migration will still take a while. However, this would not affect your activities when using any of the projects.

I deeply regret to take this decision and would have loved to stay with Atlassion JIRA and Bamboo.

Categories
Kubernetes

Kubernetes Service names in HELM templates

Based on my previous post, here comes a snippet that will correctly produce a full DNS name of a service in the cluster from the same namespace.

{{/*
Makes a full hostname from the given string if it's not one already or an IP address.
Attaches ".<namespace>.svc.cluster.local" to the end and includes the release name if required.
Please note that you need to call this template with (dict "Context" . "Value" "your-value")
*/}}
{{- define "prefix.serviceName" -}}
{{- if include "prefix.isIpAddress" .Value }}
    {{- print .Value }}
{{- else -}}
    {{- $parts := splitList "." .Value -}}
    {{- if gt (len $parts) 1 -}}
        {{- print .Value }}
    {{- else -}}
        {{- if eq .Context.Chart.Name .Context.Release.Name -}}
            {{- printf "%s.%s.svc.cluster.local" .Value .Context.Release.Namespace }}
        {{- else -}}
            {{- printf "%s-%s.%s.svc.cluster.local" .Context.Release.Name .Value .Context.Release.Namespace }}
        {{- end -}}

    {{- end -}}
{{- end -}}
{{- end -}}

Please note that using the template is a bit more cumbersome due to some Go language issues:

serviceName-anIpAddress:  {{ include "prefix.serviceName" (dict "Context" . "Value" "1.2.3.4") }}
serviceName-anIpAddress2: {{ include "prefix.serviceName" (dict "Context" . "Value" "1.0.3.4") }}
serviceName-NoIpAddress:  {{ include "prefix.serviceName" (dict "Context" . "Value" "1.2.3.4.5") }}
serviceName-NoIpAddress2: {{ include "prefix.serviceName" (dict "Context" . "Value" "hello") }}
serviceName-NoIpAddress3: {{ include "prefix.serviceName" (dict "Context" . "Value" "hello.svc") }}
serviceName-NoIpAddress4: {{ include "prefix.serviceName" (dict "Context" . "Value" "hello.svc.cluster.local") }}
serviceName-NoIpAddress5: {{ include "prefix.serviceName" (dict "Context" . "Value" "1") }}

The template needs access to the root context. So the dict function is used to pass the context and the actual, simple service name.

Feel free to adjust the function when you need another namespace as an argument.

Categories
Kubernetes

HELM template to detect IP address

I was in a need to detect whether the content of a variable is an IP address or not. I guess the function is not perfect, but it fulfills the basic need:

{{/*
Test if the given value is an IP address
*/}}
{{- define "prefix.isIpAddress" -}}
{{- $rc := . -}}
{{- $parts := splitList "." . -}}
{{- if eq (len $parts) 4 -}}
    {{- range $parts -}}
        {{- if and (not (atoi .)) (ne . "0") -}}
            {{- $rc = "" -}}
        {{- end -}}
    {{- end -}}
{{- else -}}
    {{- $rc = "" -}}
{{- end -}}
{{- print $rc }}
{{- end -}}

The function at least detects these values correctly:

{{ include "prefix.isIpAddress" "1.2.3.4" }}
{{ include "prefix.isIpAddress" "1.0.3.4" }}
{{ include "prefix.isIpAddress" "1.2.3.4.5" }}
{{ include "prefix.isIpAddress" "hello" }}
{{ include "prefix.isIpAddress" "hello.svc" }}
{{ include "prefix.isIpAddress" "hello.svc.tld.com" }}
Categories
Kubernetes

IPv6 with Kubernetes

Awwww – so much work I had put into setting up a Kubernetes cluster (this blog will run there in a few days). I set up the pods and containers, cron jobs, services, and, and, and. Then I started renewing my SSL certificates from LetsEncrypt. This renewal failed hilariously, but with a weird error message:

1
Timeout

What? I can reach my websites. Did I miss something? I checked connectivity. The IP addresses were right, the ACME challenge directory was available as required by LetsEncrypt, the DNS was working properly. Why couldn’t LetsEncrypt servers not reach my cluster? I soon found out that they prefer IPv6 over IPv4 which I had both enabled. But the IPv6 connection failed. From everywhere. Ping6 though succeeded.

Further analysis revealed that Kubernetes is not able to expose IPv6 services at all (or at least at now, so I researched). What shall I do now? All my work was based on the assumption that IPv4 and IPv6 will be there. But it’s not with Kubernetes. Of course I could move my reverse proxy out of Kubernetes and put it in front of it. But that would require more work as all the automation scripts for LetsEncrypt would need to be rebased. Testing again and again. Let aside the disadvantage of not having it all self-contained in containers anymore. Another solution must be there.

Luckily there was an easy solution: socat. It’s a small Linux tool that can copy network traffic from one socket to another. So that was setup easily with a systemd script (sock_80.service):

1
2
3
4
5
6
7
8
9
10
11
12
[Unit]
 Description=socat Service 80
 After=network.target
 
[Service]
 Type=simple
 User=root
 ExecStart=/usr/bin/socat -lf /var/log/socat80.log TCP6-LISTEN:80,reuseaddr,fork,bind=[ip6-address-goes-here] TCP4:ip4-address-goes-here:80
 Restart=on-abort
 
[Install]
 WantedBy=multi-user.target

That’s it. Enabled it (systemctl enable sock_80.service), reloaded systemd (systemctl daemon-reload), and started the service (systemctl start sock_80). Voilá! Here we go. IPv6 traffic is now routed to IPv4. I repeated it with port 443 and the setup is done. And LetsEncrypt servers are happy too 🙂

Categories
Apache Linux Perl

How to automate LetsEncrypt

A new service is born: Let’s Encrypt. It offers free SSL certificates that you can use for web servers, email servers or whatever service you want to secure with TLS. This blog post presents my strategy to automate certificate creation and renewal. Please, install Let’s Encrypt on your web server box before you start to follow the presented strategy.

The key to success is to have Let’s Encrypt running without any further interaction. I use webroot authentication – which allows me to leave the productive web service up and running while the certificates are being issued or renewed. Therefore, I created a file named “myserver.ini” in folder /etc/letsencrypt. This configuration file contains all details that are required for the certification process;

1
2
3
4
5
6
7
8
rsa-key-size = 4096
authenticator = webroot
webroot-path = /path/to/webroot/
server = https://acme-v01.api.letsencrypt.org/directory
renew-by-default = True
agree-tos
email = <my-email-address>
domains = domain1.com, domain2.com

The second component of my strategy is the central piece: a script called “renewCertificates.pl”:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#!/usr/bin/perl
 
my $DOMAINS = {
    'myserver' => {
        'configFile' => '/etc/letsencrypt/myserver.ini',
        'leSubDir'   => 'domain1.com',
        'certDir'    => '/var/www/domain1.com/certs',
    },
};
 
my $domain;
my $renewed = 0;
chdir ('/usr/local/letsencrypt');
foreach $domain (keys(%{$DOMAINS})) {
    print "INFO  - $domain - START\n";
    my $cmd = '/usr/local/scripts/checkCertExpiry.sh 30 '.$DOMAINS->{$domain}->{'certDir'}.'/cert.pem >/dev/null';
    my $rc = system($cmd);
    if ($rc) {
        $cmd = './letsencrypt-auto certonly --config '.$DOMAINS->{$domain}->{'configFile'}.' --renew-by-default';
        $rc = system($cmd);
        if (!$rc) {
            $cmd = 'cp /etc/letsencrypt/live/'.$DOMAINS->{$domain}->{'leSubDir'}.'/* '.$DOMAINS->{$domain}->{'certDir'}.'/';
            $rc = system($cmd);
            if ($rc) {
                print "ERROR - $domain - Cannot deploy\n";
            } else {
                print "INFO  - $domain - Deployed\n";
                $renewed = 1;
            }
        } else {
            print "ERROR - $domain - Cannot generate certificates\n";
        }
    } else {
        print "INFO  - $domain - Certificate does not expire within 30 days\n";
    }
    print "INFO  - $domain - END");
}
 
if ($renewed) {
   system("/etc/init.d/apache2 reload");
}
 
exit 0;

This scripts allows renewal of multiple certificates by supporting multiple configurations. Lines 3-9 describe these configurations. leSubDir (line 6) is the sub directory that Let’s Encrypt creates in the certification process. It is the name of the first domain specified in the configuration file, here: domain1.com. certDir (line 7) is the target path where the certificates will be deployed to.

A second script supports this procedure by telling whether a certificate will expire within a certain number of days (see line 16 above):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
#!/bin/bash
 
# First parameter specifies if certificate expire in the next X days
DAYS=$1
 
target=$2
if [ ! -f "$target" ]; then
    echo "Certificate does not exist (RC=2)"
    exit 2;
fi
 
openssl x509 -checkend $(( 86400 * $DAYS )) -enddate -in "$target" >/dev/null 2>&1
expiry=$?
if [ $expiry -eq 0 ]; then
    echo "Certificate will not expire (RC=0)"
    exit 0
else
    echo "Certificate will expire (RC=1)"
    exit 1
fi

This script returns 0 when the given certificate will not expire, otherwise it returns a non-0 value. The Perl script above will renew certificates 30 days before expiration only.

The last piece is the Apache configuration to be used on these domains:

1
2
3
4
    SSLEngine on
    SSLCertificateFile /var/www/domain1.com/certs/cert.pem
    SSLCertificateKeyFile /var/www/domain1.com/certs/privkey.pem
    SSLCertificateChainFile /var/www/domain1.com/certs/fullchain.pem

I run the central Perl script above daily and do not need to worry about certificates anymore 🙂

Categories
Eclipse Java RsBudget

RsBudget 2.0 released

logoIt’s done. My first official Eclipse/RCP application is out. RsBudget is an Expense Tracker for everyone. I’ve been developing it now for three years while constantly using it for private purposes. That’s how it grew to its functionality as it is today. I simply used these previous versions in order to feel and learn what’s been missing. Now it’s up to you to tell me what there is to be done next (a few tasks are already waiting ;)).

The application still misses some features, e.g. nice graphical statistics. But I don’t regard them as a must-have so far. They will be added with next versions, some will be available as commercial add-ons later.

The main features are:

  • General Expense Planning
  • Monthly Expense Planning, Tracking and Control
  • Categorization of expenses
  • Comparison of planned and actual values
  • Free text field for personal notes for each month
  • Forecasting of balances and profit/loss
  • Statistics and History
  • Export of transaction records to Excel and CSV
  • Multi-language support (English and German)
  • Online Help
  • Online Update

RsBudget runs on all major desktop platforms (Windows, MacOS, Linux) with Java 7 installed. Just download your version here!

Categories
CSV Eclipse Java RS Library

Eclipse RCP Common Feature launched

Good news for all Eclipse developers that want to use some of my projects in their own Eclipse/RCP projects. I bundled some modules and projects into a Luna Eclipse Feature Plug-In – called RCP Common Feature.

You will need the Update Site http://download.ralph-schuster.eu/rcp-updates/luna/releases/ to be added in your IDE and install the feature as you would do with every other Eclipse feature plug-in.

These are the modules and projects currently bundled:

Furthermore, there are three more plug-ins available specific to Eclipse/E4 UI and logging. The feature plug-in is released under LGPL V3 license (as all projects bundled in it).

 

Categories
Upload Maven Plugin

Plugin to Publish Eclipse P2 Repositories

I currently work to publish my first RCP application, based on Eclipse/E4 (Kepler). One of the major topics is to automate the complete build and publish process. Tycho does a good job there already, and in fact it works perfect :). I setup my Bamboo instance to build it without any interaction.

However, Tycho does not offer yet any possibility to finally publish P2 repositories. I understand this as publishing can be a difficult job when it comes to various P2 repository flavours (combined, single etc). The Deploy plugin of Maven 3 is not good for this job as far as I have discovered. So I faced the problem of somehow to publish Snapshots and  final releases automatically. The net didn’t come up with any automated solution. Most folks just recommended to perform that step manually.

So I created the Upload Files Maven Plugin. It’s job is to upload any file(s) to the repository valid for an artifact. Users simply define their repositories as they did previously (in <distributionManagement> section of their POM), and then add the plugin to their lifecycle. Here is an example of how to do this.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
   <build>
      <plugins>
         <plugin>
            <groupId>eu.ralph-schuster</groupId>
            <artifactId>uploadfiles-maven-plugin</artifactId>
            <version>1.1.0</version>
            <executions>
               <execution>
                  <goals>
                     <goal>upload</goal>
                  </goals>
                  <phase>deploy</phase>
               </execution>
            </executions>
            <configuration>
               <path>target/repository</path>
               <targetPath>.</targetPath>
            </configuration>
         </plugin>
         ...
      </plugins>
      ...
   </build>

This will upload the P2 repository (created by Tycho) to your defined server repository. As the plugin uses Wagon for this task, you can use protocols such as SCP, FTP, WebDAV etc.

You also might also want to disable default install and deploy targets of Maven for your project:

1
2
3
4
   <properties>
      <maven.install.skip>true</maven.install.skip>
      <maven.deploy.skip>true</maven.deploy.skip>
   </properties>

One nice feature is the execution of pre and post commands on your remote server (tested with SCP only, other protocols might not support this). That can be used e.g. to clean your server repository before uploading the new one:

1
2
3
4
5
6
            <configuration>
               ...
               <preCommands>
                  <preCommand>rm -rf /path/to/repository/*</preCommand>
               </preCommands>
            </configuration>

These commands are executed in the user’s home directory, so please be careful :). There exists several possibilities to handle errors of such commands, so just check out the Goal documentation. It also lists other options you might find useful.

Version 1.1.1 will add a small variable substitution for commands, e.g. you can access the user name and base path of your server repository. This would allow a configuration such as:

1
2
3
4
5
6
            <configuration>
               ...
               <preCommands>
                  <preCommand>rm -rf $repository.basepath/*</preCommand>
               </preCommands>
            </configuration>

However, if you want to use that feature by now, you must use the 1.1.1-SNAPSHOT version. It is not officially released yet as I still want to get more experience with publishing P2 repositories before finally releasing it.

Feedback is welcome…

PS: Maven 3.1.1 is required to use this plugin!
PS: Version 1.1.1 was released meanwhile.

Categories
Typo3

Typo3 and RealURL

I use various sites with Typo3 and RealURL extension. Mysteriously, it sometimes happens that links at the frontend do not work after updating Typo3 itself or any extension. Problem detection is very complicated because RealURL has a very hidden and strange behaviour. Even worse: it seems that there is no rule on what links can be decoded and what not.

Main things you have to do if you get “segment X ist not a postVarSet variable” error:

  1. Make sure you are logged off from the backend
  2. Make sure you give RealURL a chance to fill the path cache by clicking in your frontend from top to bottom!

What internally happens is as follows: Once the path cache is emptied, it will be filled again only when you are logged off from backend. I have no idea why RealURL implemented this feature but it has a special condition check before updating the cache that makes sure the user is not logged into the backend.

Furthermore, cache update happens only when “Encoding” takes place (ID to path translation). That means that you will get the postVarSet error when you try to get a page from your frontend before RealURL did encode it. That’s why you have to click “down” your website from top to bottom, so all links are created before they are ever requested by your browser.

So next time just follow this two simple rules and your RealURL extension will behave as expected.