Tinkerbell is a tool open sourced recently by Packet, an Equinix company, the company I work for.
It is a provisioner for bare metal. You can switch servers on and off via API, executing workflows and install operating systems on a server that does not have one!
Tinkerbell is in its early days as open source project but the concept is battle tested from 6 years of production use internally at Packet.
I am excited to learn a lot of the cool technologies that are making datacenters working, but I am not here to write about it1.
One of my recent tasks2 was about end to end testing the Vagrant Setup tutorial3 we wrote.
I like the idea! The Setup tutorial is important for our community because it is the entry point for a lot of people and having a consistent way to test its accuracy is crucial.
It is also a quick way to get a valuable end to end test running that covers the entire project, at a high level.
Tinkerbell is under development and it is easy to make mistakes and break things at this point, we have to know when it happens. Tinkerbell requires virtualisation capabilities, and we do not have an end to end testing framework for that yet.
Tell me more about the test itself
It is a long tasks but let’s summarize it (have a look at the tutorial, it helps to read this article moving forward):
- The script has to start a vagrant machine called provisioner
- When the provisioner is up it has to exec via ssh a docker-compose command that starts a bunch of containers, one of those is Tink grpc server
- When Tinkerbell is up and running we have to do a bunch of things like: a. Register a new hardware b. Create a template c. Create the workflow that will get executed in the worker from a template
- Start the worker
- Wait and check if the workflows executes as expected.
NOTE: the test should clean up after itself, Vagrant is not ideal to get parallelization of VMs, and we do not support it. A dirty environment will break future tests as it is today.
How to write this test
There are a million way to write end to end test the one I evaluated are bash and Go.
The project is in Go, Tinkerbell serves a gRPC server and a client, I thought it was a good idea to write everything in Go to try the client itself and because it is easier to coordinate long running actions with channels and context compared with bash for example. Or at least that’s what I think.
I can also keep the code inside the testing
framework that Go provides keeping
the test closer to the code and the developers that contribute to the project,
compared with a random scripts.sh
.
I am not sure if this will be useful in the future but one of my goal was to serve a clean API and a small framework that can be used to write other tests that starts from the Vagrant setup. This is the API I designed:
type Vagrant struct {}
func Up(ctx context.Context, opts ...VagrantOpt) (*Vagrant, error) {}
func (v *Vagrant) Destroy(ctx context.Context) error {}
func (v *Vagrant) Exec(ctx context.Context, args ...string) ([]byte, error) {}
Consistency is important, developers who knows vagrant or that will have to fix
the tests coming from the tutorial will know Up
, Destroy
and Exec
because
those verbs are used by Vagrant and in the documentation itself.
Even for Go developers Exec
is not a new function, os/exec
4 exists and it
does a similar job, the one I wrote is over ssh.
This library now has its own repository: gianarb/vagrant-go.
Go challenges and tips and tricks
I would like to share some of the challenges I faced when writing the Vagrant framework and some tips useful for this task.
Opt are great!
I have to say options are great! It is a well known pattern in Go and it translates to:
ctx := context.Background()
machine, err := vagrant.Up(ctx,
vagrant.WithLogger(t.Logf),
vagrant.WithMachineName("provisioner"),
vagrant.WithWorkdir("../../deploy/vagrant"),
)
if err != nil {
t.Fatal(err)
}
It allowed me to add new options and to tune the Vagrant struct with strong default. If you never used it, do it! It is pretty easy, you need an interface like this:
type VagrantOpt func(*Vagrant)
In this way you can write as many With
function you need:
func WithStderr(s io.ReadWriter) VagrantOpt {
return func(v *Vagrant) {
v.Stderr = s
}
}
func RunAsync() VagrantOpt {
return func(v *Vagrant) {
v.async = true
}
}
I execute the opts as part of the Up
function:
func Up(ctx context.Context, opts ...VagrantOpt) (*Vagrant, error) {
const (
defaultVagrantBin = "vagrant"
defaultName = "vagrant"
defaultWorkdir = "."
)
v := &Vagrant{
VagrantBinPath: defaultVagrantBin,
Name: defaultName,
Workdir: defaultWorkdir,
log: func(format string, args ...interface{}) {
fmt.Println(fmt.Sprintf(format, args))
},
}
for _, opt := range opts {
opt(v)
}
// ...
}
test segmentation with packages
I don’t want to run the vagrant end to end tests as part of the default test suite because they take too much time and they require Vagrant installed. They do not even run in CI in the same way unit test works, but I will get to it later.
I learned that packages that starts with _
does not get executed when using
something like ./...
.
I wrote the framework and tests as part of the package:
./test/_vagrant/
./vagrant.go
./vagrant_test.go
In this way to run the tests you have to explicitly call the package out:
$ go test ./test/_vagrant
Observability or “what is going on?”
Go has its own way to print logs during the execution of the tests:
$ go test -v ./...
It works because testing
has a function called t.Log
and t.Logf
. Those
functions watches the -v
flags. To be complaint with that and to keep the
Vagrant
struct agnostic I wrote a WithLogger
:
func WithLogger(log func(string, ...interface{})) VagrantOpt {
return func(v *Vagrant) {
v.log = log
}
}
The function it accepts as a argument is t.Logf
.
Continuous Integrations runs with verbosity enabled for this task because it is
long and complicated, the logging prints all the outputs from the vagrant up
and destroy
command, and the stdout for the exec
over ssh, it gives a very
good overview about what is going on.
Stdout and Stdin, buffer and loggers
I don’t have a lot to say about this other than: “it was very hard to do!!”. The code that fixed my problems can be summarized in this way:
stderrPipe, err := cmd.StderrPipe()
if err != nil {
return nil, fmt.Errorf("exec error: %v", err)
}
stdoutPipe, err := cmd.StdoutPipe()
if err != nil {
return nil, fmt.Errorf("exec error: %v", err)
}
go v.pipeOutput(ctx, fmt.Sprintf("%s stderr", cmd.String()), bufio.NewScanner(stderrPipe))
go v.pipeOutput(ctx, fmt.Sprintf("%s stdout", cmd.String()), bufio.NewScanner(stdoutPipe))
err = cmd.Start()
func (v *Vagrant) pipeOutput(ctx context.Context, name string, scanner *bufio.Scanner) {
for scanner.Scan() {
select {
case <-ctx.Done():
return
default:
v.log("[pipeOutput %s] %s", name, scanner.Text())
}
}
}
Kill process and subprocess
There are a lot of process going on when creating or destroying a VM with
Vagrant. There is VirtualBox for example, and we have an edge case for the
worker machine because the up
commands technically never ends, it is in
pending until you destroy
the machine. But you can’t run multiple commands
against the same machine because up
holds a lock and it blocks destroy
to
execute. os/exec
helps here but you have to tune it a little bit:
cmd := exec.CommandContext(ctx, v.VagrantBinPath, args...)
cmd.Dir = v.Workdir
cmd.Stdout = v.Stdout
cmd.Stderr = v.Stderr
cmd.SysProcAttr = &syscall.SysProcAttr{Setpgid: true}
Now when killing cmd
the subprocess terminates as well.
Continuous Integration
We decided to go with GitHub Actions with a self running runner, in this way we can use Packet bare metal that supports virtualisation.
As I told you I don’t want this test to run for all the commit, or for all the pull request because it is time and resource consuming. It is also risky, so I want maintainers to decide when to trigger it.
That’s why it gets triggered with a GitHub label:
name: Setup with Vagrant on Packet
on:
push:
pull_request:
types: [labeled]
jobs:
vagrant-setup:
if: contains(github.event.pull_request.labels.*.name, 'ci-check/vagrant-setup')
runs-on: self-hosted
steps:
- name: Checkout
uses: actions/checkout@v2
- name: Vagrant Test
run: |
export VAGRANT_DEFAULT_PROVIDER="virtualbox"
go test -v ./test/_vagrant
This is what it takes to make the process working!! And I am still surprised it
is so easy! When a contributor label a PR with ci-check/vagrant-setup
the
process starts. My idea was to remove the label straight away, but I am
blocked.
An alternative that we are evaluating is to run it as a cronjob5 as well.
Testing is the real power
E2E testing are fun to write because they bring a lot of challenges in terms of coordination and stability. You have to write good code in order to make them stable. I hope you learned something from my experience and if you have any question let me know here. I am happy to go deeper on some of those topics based on your suggestions.