Gianluca Arbezzano - GianArb

Linkerd jumped on the bandwagon

2024-02-22T06:10:10+00:00

I am not here to say if service mesh is useful or not I am sure it depends. The press “Linkerd service mesh production users will soon have to pay” reports that we have lost another opensource project or at least there is a new drama in town.

Nothing bad but it comes at the perfect time with the meme we are all looking at those days. The timing is so perfect that I had to check the calendar, it is April 1st.

If you want your exe you will need to pay Buoyant for that!

I am being too shallow, only the stable releases will be on Buoyant! In practice they do what everyone else does, they have used GitHub as a platform to do category creation, community, and so on. Now is the time for the company to monetize.

Nothing to be surprised this is the evolution of the opensource ecosystem, or at least that’s what company founded by VCs relaying on GitHub for marketing want us to believe. We have all worked for some of those in the last 10 years.

Since I do opensource and I value this ecosystem I am wondering how such decision can be taken and communicated so poorly, the definition of opensource per se should avoid those standpoints. How can a company on its own define and change the release management for an opensource project? The company is not even called as the opensource project!

Can a single company can say something like this? What are the contributors and maintainers doing? At this point is Linkerd sustainable as opensource project behond Buoyant? Probably no.

The Cloud Native Computing Foundation guarantees for Linkerd, thankfully they gather and organize stats about their opensource project. So let’s have a look, in 2023 they counted 128.856 contributions to the project, 112.721 from the same company Buoyant Inc. followed by 1.400 contributions made by independent contributors, 484 from the CNCF and 313 from Microsoft Inc. I won’t calculate the percentage becauase it looks and unpleasant definition of opensource in my opinion.

Anyway, this can be an opportunity for you if you are one of those independent contributors. I spent a couple of minutes reading the HackerNews thread coming from this press release and here some numbers:

Their new offering (BEL) would be around $14k/mo for our org (though they say discounts are available), with 90 days notice. That’s a rather large chunk of change I didn’t request in our 2024 budget, for a cost-category that didn’t exist before.

If you want to market yourself as an alternative to Buoyant helping company staying on top of their Linkerd installation, you just have to ask for less than 2k USD a cluster and with that you are helping Linkerd to stay opensource avoiding vendor lock-in.

NOTE: this is not a legal advice, I am not sure if you can do it, nowadays opensource licenses are madness.

Open your editor, smash some HTML and CSS it is now the time to use that service mesh related domain you bought a few years back!

How I discover new codebases

2023-10-24T06:10:10+00:00

Today is my time to be honest with you. I think this meme describes a lot of places and codebases that I had to deal with or that I contributed to.

I don’t want to tell you why because there are many reasons about how you can end up with a blob of undocumented code and I can write an article on its own but I want to share the strategy I use to figure it out.

Why am I the right person to do so? Because I change job frequently, not because of undocumented code obviously and I like to contribute to opensource software that I use, and sometime I end up contributing to small undocumented libraries, or to overly documented massive projects.

Take a look at the CI/CD system

Many applications have a basic CI/CD system, to run tests, to check code formatting and sometime to build the software itself. When I don’t event know the language because you I am contributing to a codebase developed in a language I am not familiar with the CI/CD teaches a lot about the toolchain that I need to have in order to be effective. Moving forward I tend to replace tools I don’t like with alternatives I am more familiar with but at the beginning CI/CD, makefiles, npm packages or equivalent files are gold. Worst case if they don’t have unit tests or they don’t build their code in CI/CD I end up knowing about what they use to format their code, it does not look much but usually it drives to the required tech stack.

Dockerfile

Dockerfile are useful to figure out dependency tree and system dependencies that can teach you a lot about the codebase you are dealing with. It is also useful to figure out if my teammate are familiar with containers, or more old style cmake and ./configure kind of people.

The entrypoint, look for that!

fn main, func main, index.php look for the entrypoint! If I see more than one entrypoint I am in a monorepo, if there is only one it is a single application. If I can’t find one maybe is a library but libraries should have an entrypoint as well, so look for Option or Configure classes or objects. If you find one the class using it is often the library endpoint.

Run the test suite

I like to run code locally when I can because it makes things a bit more real, It validates that I figured out the right toolchain and that I am starting from a trusted checkpoint.

I need an easy win

Why are you looking at such codebase? Do not miss the why! If it is your first day at work and you got assigned to an apparently easy bug to fix this is your goal, so try to figure out the right path, you know how to build the software now, you know how to run the testsuite so leverage that when running the entire application is still an unknown or is not possible.

I am not saying that tests should be the end goal for you, because there are codebase with zero or unless tests, but they can be helpful at the beginning as a north start, if they are unreliable or absent point 5 is still valid but there is no escape the entrypoint needs to be discovered and used to validate that your change has the right impact.

When I feel brave, I figured out the part of the code I want to contribute and the are no tests I write a small scripts that imports such path so I can run the subset of the code in isolation, quickly and repeatedly without too much noise. At some point it ca even be turned into a unit test.

Kubernetes is finally just an utility

2023-08-02T10:08:27+00:00

Kubernetes and cloud-native is a topic I spent a good time of my professional life contributing to. I wrote some code, operators, articles, talks, libraries, and I have been a member of the awesome Release Team for a few iterations. Kubernetes helped me a lot to grow as a developer and improved my ability to collaborate with people all over the world.

I know how to operate it, how it is written, the components it is made of, and so on, but I do not work for a company that makes money out of it anymore.

I worked for cloud providers making money building software that integrates with Kubernetes. I worked for companies that made money and took investments saying: “Our solution runs natively on Kubernetes”. They didn’t age well and I am not surprised at all.

If you are like me and your company does not profit from Kubernetes do not look at it as something important. It is good to learn about it, to operate it but that’s it, like you do with Linux, systemd, git, or everything else, because that’s what it is. I spoke with many people who told me that Kubernetes is complex, I got it, so it can’t be that complicated, can’t be more complicated than systemd. There are cloud providers or companies that you can pay to get a fully functioning Kubernetes endpoint to interact with up and running. I read articles related to how the EC2 service works, and how and why they build it, but that’s it. I don’t feel bad about using tools or services without knowing all the details about how they are made.

I don’t want to discourage you from using Kubernetes or contributing to it. I advocate for the good practices that Kubernetes enforces and teaches but that’s the best it does. Today after two years of not touching it I installed kind, the kubectl and I wrote 352 lines of YAML that I successfully applied to a Kubernetes cluster because I am working with a potential customer that runs on Azure and we picked Kubernetes as the common language, I think this is its superpower. A technology capable of improving collaboration, and breaking barriers is a gift that we should protect. And it does not require me to know about CNI, CRI, CTO, and kubelet (can you guess the wrong one?).

Last week we expanded our solution from AWS, where I built the infrastructure out of autoscaling groups, Launch Templates, EC2, load balancers, and duct take to GCP where I decided to use GKE Autopilot driven all by Terraform. Not Flux or Helm, two technologies that I never used but Terraform because the “IoC” solution we have is 100% based on that and I didn’t feel the need for something else. GKE because it makes sense, it is quick and I don’t have the experience on GPC as I have on AWS to operate at a “lower level” in a reasonable time.

The solution we have on AWS is too expensive because there are a million tiny details when building using simple components like the one I mentioned that are painful to figure out at least for me, or at least not interesting from a business point of view. This is why I am probably moving to ECS pretty soon. Not EKS, because EKS does not look at simple as GKE Autopilot.

The developer I want to be and the one I like to work with put effort into finding the right solution based on their current context, building all the surroundings that will play a difference in the game we all have to fight with: evolution and time. It requires knowledge and skills exceeding a specific technology or trend. There are similarities with woodworking where complex cuts or repetitive tasks require jigs, and those need to get built with accuracy because they can drive the success of the primary project.

You should avoid Meetup.com

2022-11-24T10:08:27+00:00

A few years ago I decided to spend some of my time organizing a Meetup about cloud computing in Turin, Italy. I got support from the Cloud Native Computing Foundation to get a paid account to Meetup.com, food and drinks. I organized various events in Turin and Milan as well. We had a good time and I worked with different companies to help them share what they are building or passionate about.

I work remotely and it was for me the best way to connect with other people in my area working on similar challenges. The event was in English because a video maker was there recording and mounting videos to share on the CNCF blog, or with the companies or communities the speaker was involved with.

COVID-19 changed our daily routine drastically as you know and I embraced virtual events as many others did. We gained good popularity and we reached 800 people registered to our meetup group, but we have missed the locality part of all of this . In the meantime I decided to take a break as an active organizer, leaving my spot to another person who supported me a lot during the day to day operations. I left my role as CNCF Ambassador and in the meantime the CNCF decided to move all those communities out from Meetup.com to their internal platform.

I don’t want to comment on their internal platform, the migration was left to the organizer, at some point we were maintaining events on both platforms having the end of 2022 as the deadline to close the Meetup.com group.

In the meantime I tried to export the people who trusted me as organizer to import them elsewhere but the attendees belong to Meetup.com and there is not much you can do about it, Meetup.com locks you in.

What about deleting a Meetup group? Well, apparently you can remove all the organizers and leave it in a limbo until Meetup.com removes it, or until somebody else claims to be the new organizer.

Really! You work a couple of years to build a group of people who trust you, your way of dealing with their time and when you decide that it is time to move on the best Meetup offers is to leave those people on their own. Obviously just as it happens with DNS the last day before termination somebody else claimed to be the new organizer and took over the meetup group to share their own event. Luckily it was a person I collaborated with in the past and I was able to become the organizer again. It took me 2 hours to do the right things. I had to kick out of the meetup group all 800 members one by one leaving an empty group that nobody has any reason to claim.

It is my responsibility as organizer to take care of what happens to the people who trusted me, and I think leaving them on their own to the first person that finds itself in the right place, at the right time is wrong and you should not trust a platform forcing this behavior.

Avoid Meetup.com, you can do better! Setup a mailing list, build a static website on GitHub.com, write a few lines of whatever language you want to learn and expose an HTTP server.

I like to share what I do and to experiment. In the last 10 years I organized meetups, I tried to self-publish a book, I wrote on my blog and many of you trusted me, leaving their emails to me and sharing their own time. The only reasonable thing I can do is to clean up after myself when I am done. A few years ago when I realized that the book I was trying to write didn’t make much sense I deleted the 2000 people who registered to receive updates about it because it is the right thing to do. If you find yourself in a similar situation do the right thing, cleanup after yourself.

From Ubuntu to NixOS the story of a mastodon migration

2022-11-24T10:08:27+00:00

Do you know that Elon Musk bought Twitter for a lot of money? As a consequence many people are trying to figure out what to do. Developers quickly turned to Mastodon.

I decided to self host my server, you can interact with me on Mastodon as @gianarb@m.gianarb.it.

I do not have strong opinions about a decentralized system, I think it is another way to build a distributed system, experimenting with it is an opportunity, nothing more right now. I never liked the idea to sell my identity for free on social media but having a presence online proved to be crucial for my career and I don’t want to miss that.

Mastodon pushes many people, myself included, to think about: “should I host my own server?”. In my opinion it is an important question because it forces us to make our hands dirty again. We all know how comfortable GitHub pages are. You can set up your own static website in a minute, for free but it lowered my enthusiasm for technology because it makes things too easy. If you answered “yes” and now you are hands down trying to run your own Mastodon I hope you are having fun and that you are learning something that raises your excitement for how computers work. “Hosting more of my own things” was on my bucket list and Mastodon pushed me down the stairs.

At the beginning it was all about Ubuntu

NixOS has been my way to go for everything since the last two years, but I am not good at it. I tried to run my own Mastodon for a few days but I was not getting anywhere, got stuck trying to figure out how to properly manage secrets, and the machine lifecycle, how to deploy, how to interact with the tootctl, everything was a big unknown. Mastodon itself was a big unknown too. So I decided to step back and run my own instance following a random blog post: “How to Install Mastodon on Ubuntu 22.04/20.04 Serves”. Not sure if it is the best one out there but it gave me a Mastodon to play with in 10 minutes. Don’t need to tell me about infrastructure such as code, immutability and so on, this environment teached me all of this crap works, Mastodon is a bit more familiar and my end goal is still to figure out how to run it with NixOS.

Build a migration plan

“Setting up your own Mastodon instance with Hetzner and NixOS” by romeov explained how to get Mastodon running on NixOS. A few lines of configuration and the NixOS Mastodon Module configures Postgres, Redis, Nginx with TLS, and Mastodon itself for me. It is not the only way to go, the module supports running dedicated pools of those services as well but for my single user and single server configuration it is more than enough. So I started planning how to migrate my own server following the official Mastodon documentation and it ended up looking like this:

Provision a very basic NixOS instance (called beetroot from now on)
Stop mastodon services (web, sidekiq, stream) in the Ubuntu box
Take a backup of Postgres with the suggested command: pg_dump -Fc mastodon_production -f backup.dump
Create a tar.gz archive for the system directory in Mastodon
Move the archive and the sql backup to beetroot via tailscale file: tailscale file cp public-system.tar.gz beetroot:
Get the two files from beetroot via tailscale: tailscale file get .
Untar the system directory
Stop the mastodon systemd services, drop the mastodon database from beetroot and replace it with the backup from the Ubuntu server
Restart the mastodon services via systemd and have fun

How it went

The plan was solid! CULTPONY looked at it briefly as well, so we are good!

But you know, in reality there are many unknowns. There is only one way to figure them out, time to stop making plans, it is time to break them!

services.mastodon = {
  enable = true;
  localDomain = "PUT-YOUR-DOMAIN-HERE e.g. computing.social";
  configureNginx = true;
  smtp.fromAddress = "";
};

First, when I initialized the NixOS Mastodon module it starts an Nginx server because Mastodon requires TLS, it uses Let’s Encrypt for that and this requires the DNS record to point to the NixOS instance otherwise Let’s Encrypt won’t be able to close the loop, but I can’t point the DNS to a not yet ready instance because who knows if I am gonna be able to make it today, tomorrow or never! I decided to tell the Mastodon module to skip Nginx configuration for now setting services.mastodon.configureNginx=false;

Technically there is another way to do it but it did not work for me and I still don’t know why. Let me know if you figure it out because it will be way more comfortable to get a self signed certificate so we can test without having to change DNS.

In the process of making the tar archive for the system directory I saw it contained a directory called cache, huge like multiple GBs. Cache to me means ephemeral, easy to rebuild, safe to wipe. So I did it! To be fair, I knew I was doing something stupid. And I knew the dirty way to go requires at least to move the directory, to keep it around until realization of the silly fact that cache means something important that should not be lost! Too late! I lost it, my Mastodon instance was then empty of all the avatars and profiles images, not a great start. After some googling and some struggling, I was able to build it back (if you have a more official answer for this issue let us know there). My 2 cents: move the cache folder around, way easier than figure out how to get it back. Oh get another 2 cents, remember to check file permissions when you do this (guess why I know).

Everything now was set and ready to receive traffic, so I pointed the DNS to the new server, I set my host file to get routed to it quickly, I changed services.mastodon.configureNginx to true and I waited.

Ok! This is how it went after a day of struggling obviously! Last time I used postgres was probably 6 years ago. pg_dump, pg_restore are easy but I had to figure out how to authenticate properly, Ubuntu was set up to run over 127.0.0.1, the NixOS Mastodon Module by default provisions Postgres with auth trust trust and with a socket entrypoint. It means that authentication does not require a password and it is based on a UNIX user. For example the Linux user Postgres owns and has access to the database owned and managed by Postgres. The NixOS Mastodon module creates a mastodon user in Linux with access to the mastodon file (the system directory for example) and with access to its own mastodon Postgres database. Nothing that looks like rocket science but still, it took me some time to figure it all out.

How to manage password in NixOS is a question I don’t feel comfortable answering yet and it blocked me at the beginning when I was trying to setup my own instance because I wanted to manage tailscale auth key for example automatically, or when thinking about how to manage the connection between mastodon web and postgres. Currently my answer is to avoid passwords. It works for now, but I know it won’t be the right answer for the following articles in this series that will probably title: “Mastodon monitoring a success story” where I will share how to configure the monitoring and observability pipeline for my instance with Grafana Cloud, but this is a story for another time.

Point 7 of the migration plan was about untar the system directory, but I realized I didn’t know where to place it. Looking at the NixOS module there is a path for that:

PAPERCLIP_ROOT_PATH = "/var/lib/mastodon/public-system";

But what does it look like? And what is PAPERCLIP_ROOT_PATH? Is it really what I think it is? It was not clear to me and only var/lib/mastodon was there in the system because the public-system folder gets created when Mastodon is actually in use. So I had to take a step back and I created a vanilla e2e working Mastodon instance to figure it out. At the end it obviously look like it should be, but who knew that!

[nix-shell:/var/lib/mastodon]# tree -L 2
.
├── public-system
│   ├── accounts
│   ├── cache
│   ├── custom_emojis
│   └── media_attachments
└── secrets

Show me the code

Currently I published the NixOS configuration for beetroot as part of my dotfiles along with the other NixOS configurations for my Thelio workstation and for the Asus Zenbook I use at home. It uses flake and deploy-rs. It targets a Linode shared CPU virtual machine and that’s why, as you can see in the hardware-configuration NixOS detected Qemu as hardware.

deploy.nodes.beetroot = {
      hostname = "139.162.167.171";
      sshUser = "root";

      profiles.system = {
        user = "root";
        path = deploy-rs.lib.x86_64-linux.activate.nixos
          self.nixosConfigurations.production;
      };
    };

Do not ask me about my deploy preference when it comes to Nix, deploy-rs is just the one I figured out, I may switch to Nixops because it is a bit more standard, they work similarly from a configuration standpoint but in theory deploy-rs is designed with profiles in mind, to deploy single users, something that I don’t think I need. But it works well enough for now.

If you look inside the flake.nix file you see two different nixosConfigurations, production and vm, both importing the same configuration.nix. Production is deployed via deploy-rs and vm is used for testing purposes with: nixos-build build-vm -flake .#vm

I didn’t find a good use of it just yet, I am currently blocked by the acme certificate and because I am lazy. I am not sure if it is needed for Linode Shared CPU since it is a VM as well and it detects Qemu as a hypervisor. Time will help me figure it out.

At the beginning I developed this configuration outside of my dotfiles. Mainly because I didn’t know what to expect from it. Now that Mastodon is up and running and this configuration is in use I feel more confident. Even if I have a lot I want to do I decided to move it in my dotfiles to have access to other NixOS components there. I need to add a secret to authenticate to Grafana Cloud, probably with agenix in its own private repo imported via flake so I won’t have my password shared with you all (forgive me, it is not you, it is me or something else I don’t know), I want to move the cache directory and postgres data to a ZFS pool as well, but not now, right now I want to enjoy my running instance.

Now what?

This is everything I have learned for now migrating from Ubuntu to NixOS. I want to be clear, even if the core of this article looks like a bunch of mistakes I am not frustrated, I think the NixOS Mastodon Module is comfortable to use and well written. The challenges I described come from a rusty and inexperienced ops person. The module lacks documentation around operational experience, how to use it and what it provides but it is reasonable and I hope those notes will help to improve it and will push me to contribute back to the official documentation.

When I mentioned Prometheus and Grafana I shared that I am thinking of writing a series of posts about this topic, those are the one I have currently ongoing:

Monitoring success story (probably with a deep dive in Password management on its own)
NixOS configuration, GitOps and machine lifecycle (this is about how I manage my NixOS configuration, how I deploy NixOS and so on)
Data management with ZFS
Mastodon update from 3.x to 4.0

Your support and interest will push me forward writing all of them so let me know what you think about this one, the following topics and if you would like to read something else, like my journey with Linode, since I decided to try it out running this Mastodon instance there.

I would like to thanks all the writers behind the documentations, articles, GitHub discussions I have linked, and all the GitHub issues, StackOverflow questions, and GitHub repositories I have looked at to resolve my unknowns, sharing is caring! Thanks @hazelweakly for your early review!

My workflow with NixOS. How do I work with it

2022-09-12T10:08:27+00:00

Some context

Coding is fun when you can figure out the right workflow. There is nothing fun when it comes to writing software in a way that is not sustainable or that does not sparks joy.

I started to use Nix and NixOS almost two years ago, in a previous job in a totally different context.

Back then we had to quickly and often provision operating system, build software and so on. Since I moved back to write Software and to write Rust I have to admit that building my code, or shipping operating systems is not something I have to do very often, but I decided to keep learning and fighting against NixOS because it fits my mindset.

Recently I resumed a few NUCs I keep in a box because everybody deserves a home lab, and a good home lab deserves some netbooting, so it was time to play with NixOS for something that is not my workstation or my laptop.

The workflow

Nix is code, finally. It means that there are libraries, you can import them, run tests, and execute such code. YAML, Json in my experience, at some point are a limitation, or they create friction, you ended up with an easy to break template engine.

I decided to invest some time to figure out how to use flake. And this is where I am so far:

{
  description = "A generic and minimal netbooting OS for my homelab";

  inputs =
    {
      nixpkgs.url = "github:NixOS/nixpkgs/nixos-22.05";
    };

  outputs = { self, nixpkgs, ... }:
    let
      system = "x86_64-linux";
    in
    {
      nixosConfigurations = {
        generic = nixpkgs.lib.nixosSystem {
          inherit system;
          modules = [
            ./configuration.nix
          ];
        };
      };
      packages.${system}.netboot = nixpkgs.legacyPackages.${system}.symlinkJoin {
        name = "netboot";
        paths = with self.nixosConfigurations.generic.config.system.build; [
          netbootRamdisk
          kernel
          netbootIpxeScript
        ];
        preferLocalBuild = true;
      };
    };
}

I am not the right person to tell you what all of this does because I am not an expert and it is the outcome of many videos on YouTube, questions on discourse.nixos.org, articles and beers, a lot of beers.

The output part describes what I want to build and as you can see there are two outcomes. One is a nixosConfigurations, potentially it can contain more than one NixOS description but right now I have a single one called generic and as you can see it imports a module called configuration.nix. You can see it as a ready to go NixOS provisioned as I want. This is 99% a copy paste of a traditional configuration.nix file as you may know them. The one I use comes from “Netbooting Wiki” in NixOS.org.

{ config, pkgs, lib, modulesPath, ... }: with lib; {
  imports = [
    (modulesPath + "/installer/netboot/netboot-base.nix")
  ];
  users.users.root.openssh.authorizedKeys.keys = [
    "ssh-sfdbsrbs"
  ];

  ## Some useful options for setting up a new system
  services.getty.autologinUser = mkForce "root";

  environment.systemPackages = [ pkgs.tailscale ];

  networking.dhcpcd.enable = true;

  services.openssh.enable = true;
  services.tailscale.enable = true;

  hardware.cpu.intel.updateMicrocode =
    lib.mkDefault config.hardware.enableRedistributableFirmware;

  systemd.services.tailscale-autoconnect = {
    description = "Automatic connection to Tailscale";

    # make sure tailscale is running before trying to connect to tailscale
    after = [ "network-pre.target" "tailscale.service" ];
    wants = [ "network-pre.target" "tailscale.service" ];
    wantedBy = [ "multi-user.target" ];

    # set this service as a oneshot job
    serviceConfig.Type = "oneshot";

    # have the job run this shell script
    script = with pkgs; ''
      # wait for tailscaled to settle
      sleep 2

      # check if we are already authenticated to tailscale
      status="$(${tailscale}/bin/tailscale status -json | ${jq}/bin/jq -r .BackendState)"
      if [ $status = "Running" ]; then # if so, then do nothing
        exit 0
      fi

      # otherwise authenticate with tailscale
      ${tailscale}/bin/tailscale up -authkey tskey-really
    '';
  };

  networking.firewall = {
    checkReversePath = "loose";
    enable = true;
    trustedInterfaces = [ "tailscale0" ];
    allowedUDPPorts = [ config.services.tailscale.port ];
  };

  system.stateVersion = "22.05";
}

The only difference compared with a traditional non-flake configuration is the import:

  imports = [
    (modulesPath + "/installer/netboot/netboot-base.nix")
  ];

Flake provides the utility variable modulesPath as a shortcut for accessing the nixpkgs modules described as flake input.

This OS does a few simple things:

Setup a public ssh key for the root user that I can use to ssh into the server.
It register itself to Tailscale

The output nixosConfigurations is used via nixos-build. It took me some time to figure out that nixos-build used in the right wat does not replace my current operating system. Do not run nixos-build switch if you won’t want to screw up your local NixOS OS! Instead you can build this operating system in the ./result directory via:

$ nixos-rebuild build --flake .#generic

A single configuration can describe different NixOS, that’s why you have to identify what you want to build with ` .#generic`.

The second output builds the same OS but it shapes the content of the ./result directory as I want it (I am not sure if I need it but this is what the NixOS netbooting wiki does, so far so good).

To build it you can use nix build:

$ nix build .#netboot

Pretty cool! I can tar.gz that and ship it where I want. Straightforward.

How to run this VM

Do you know how boring and time consuming it is to test a new operating system?

If you want to do it on real hardware you have to set it up, and if you want to use QEMU you have a few days in front of you to remember all the flags you need, how to bridge the guest with the host and who knows what. I tried for a few days and I failed, until I discovered:

$ nixos-rebuild build-vm --flake .#generic
building the system configuration...

Done.  The virtual machine can be started by running /nix/store/dk4i22xmacnxxdmgvjhlyain5spb11yn-nixos-vm/bin/run-nixos-vm

Pure gold! If you run the run-nixos-vm script a QEMU virtual machine will appear ready for you to test your operating system. Kind of cool! I can even see it showing up in the Tailscale admin console!

A zero friction experience that boost my ability to try what I am working on.

Integration tests

Nix provides a testing framework, but I started to use it recently. It spins up one or more virtual machines and assert that they work as expected. I wrote a test that looks for the tailscale network inteface:

let
  nixpkgs = fetchTarball "https://github.com/NixOS/nixpkgs/archive/0f8f64b54ed07966b83db2f20c888d5e035012ef.tar.gz";
  pkgs = import nixpkgs { };
in
pkgs.nixosTest
  ({
    system = "x86_64-linux";

    nodes.machine = import ./configuration.nix;

    testScript = ''
      start_all()
      machine.succeed("sleep 5")
      machine.succeed(
          "ifconfig | grep tailscale0",
      )
    '';
  })

This test uses the same configuration.nix I used to generate my netbooting NixOS. It starts a node called machine and via python script it runs the bash command ifconfig | grep tailscale0. I am sure I can do better than sleep 5 but as I said, I am far away from being good at this.

You can use this approach to run assertions on multiple nodes, here an example from Nix.dev “Integration testing using virtual machines (VMs)”.

Steep learning curve

Everyone agrees that Nix and NixOS are not easy technology to pick up. And I can confirm, there are articles, blogs, dotfiles available everywhere but they look all different and it is hard to figure out if they are new, old or how to apply them to your use case.

Flake is an attempt from the community to standardize all of that, and much more. We will see!

It is also true that motivation and context can flat the curve. My plan is to write more about this topic since I am trying to spin up and automated a home lab.

I have to figure out how to do secret management but as soon as I have it sorted out I will share my homelab configuration as I share my laptops configuration in my dotfiles.

Stay tuned.

Website redesign and goodbye Bootstrap

2021-12-17T10:08:27+00:00

I managed to remove Bootstrap from my website! For me, it was the example of vendor lock-in.

A few years ago the trend was to use cloud provider but not enough to feel locked to a particular provider. In practice compute service was the only service allowed to be used. Everything else was an attempt from the devil to keep you down. The cause was lack of trust. Compute didn’t matter that much, it was not making anything more complicated, it was somebody else virtual machine.

Now it is clear that services like object store, managed databases, queue systems, machine learning or serverless are the secret of success when it comes to cloud providers. Because those services are stable and available for you to use quickly with zero operational effort. It gets harder to move to another vendor but there is not a lot you can do to avoid that. Motivation avoids vendor lock-in.

For me was the same, I tried many time to remove Bootstrap from my website but I never cared enough about the outcome, because I am not a designed and… really I don’t care.

Friday 17th December something changed! I had time and I was up for something boring, so I made it! I replaced Bootstrap with a couple of CSS classes.

I decided to remove the navigation bar at the top of the website. First, I don’t know how to make one on my own. Second, the number of pages were limited to three. Not enough to justify a real menu. Now a post contains the content and nothing more, very clean and I think it helps stay focused on what matters.

I left a small link to get back to the list of the other posts I wrote and that’s it. This is not a magazine and I won’t make any money out from this website, no reasons to drive you to other articles. Also, the people who reads what I write are good enough with computers to figure out what they want on their own.

Your browser is cooler than me

No extra fonts, or font size, I tried to limit the number of html tags winning in accessibility. I trust your browser ability to interpret HTML and the font you have installed should be enough to read what I have to write! Let’s get back to simple things.

ADV or not ADV

I removed Google Analytics months ago. Is it time to remove ads? I didn’t make much money from this website, probably not enough to justify a banner and that javascript. I owned enough to run this website for other 10 years and I think it is enough! I reached sustainability! And you don’t need to thank me! I am the fist one using Brave as a browser, blocking noisy banners. I think the majority of the people reading my posts do the same.

The downside is that for me joining the Carbon network was a goal I had a few years ago. Because there is number you have to reach in order to enter the program. Adv is the only source of vanity number of this project and I kind of like to use $ for this even if it is a bit more than “nothing a month”. I don’t know, if you have feedback let me know!

My feeling is that 2022 will be all about finding other source of income that are not coming from my spending hours writing code. Not sure if spending time writing articles counts just yet. Probably not what I want.

What’s next

Not much, my readers are not many, and this year I didn’t have much to write about. A website with fewest components will enable me to experiment a little bit more. An item for my invisible todo list is to write a yet another static site generator. The word needs more Rust code… who knows.

Probably it won’t happen. I don’t have special needs, but if will happen, you will notice!

Credit

I think I am reading too much what Drew DeVault writes! The minimalism of this initiative comes from his webiste. My friend Lorenzo, when it comes to CSS is a person I admire. He is good with eBPF as well but not as good!

Have a great Christmas!! Relax and do what you like with the people you love!

How I started with NixOS

2021-10-01T10:08:27+00:00

I frequently change operating-system and distribution moving between macOS and Linux because I didn’t marry any of them yet.

Just before having a MacBook again, I was an ArchLinux user, a happy one. I have to admit it was not that different compared with other distributions, at least as a user. Yes, fewest packages installed, a few services, please don’t freak out, as I wrote I enjoyed it.

I see a value when it comes to describing as code your desires, learning from other people sharing their code, importing or copy-pasting it in different places.

Developers do that all day. I am a representative, I hope what I write will match my desires. After so many years I am full of hope.

With this in mind, Arch, Debian, or Ubuntu do not make a difference. It is all about the package manager. NixOS and Nix looked to me as a step forward in this sense.

I decided to end my vacation with macOS earlier. I picked up my personal Asus Zenbook 3 from its box to install NixOS.

Coming from ArchLinux the NixOS installation process is similar, we are on our own:

Format disks
Write partition table
Mount partitions
And so on

The main difference comes when you run nixos-generate-config:

# nixos-generate-config --root /mnt

The command tries its best to detect kernel modules from your hardware, mount points, and so on. This phase is a great time to start your first fight of many with NixOS. The generated file will be in /mnt/etc/nixos/configuration.nix and /mnt/etc/nixos/hardware-configuration.nix. Open the generated file to can validate if they have sense. Don’t worry. It is a Linux distribution. If something is missed, it will tell us. The hardware-configuration.nix file as the name suggests identifies your hardware.

Not everything can be detected yet, I use luks to encrypt my disks; the generated hardware-configuration needs a bit of help to figure it out.

  boot.initrd.luks.devices = {
    root = {
      device = "/dev/nvme0n1p2";
      name = "root";
      preLVM = true;
      allowDiscards = true;
    };
  };

Nix as a programming language takes a bit of practice, but NixOS is different. Many people share their configuration in GitHub, a boost in productivity. I keep a list of NixOS configurations or Nix-related repositories that I look at when I don’t know how to solve a particular issue. I really think you should do the same because nobody wants to spend a day fixing its laptop, even worst if it is the one you use at work.

Start simple

My end goal was to checkout my NixOS configuration as part of my dotfiles in a git repository. Too much when you don’t even know how NixOS works.

I have put aside this goal for a few weeks, and my new goal was to get my laptop working in all its part. The complicated part, that I didn’t solve in total is audio, it works but the volume control is not as good as it should be. You can check the configuration I use in my dotfiles but the solution does not matter.

Check it out

When I was happy with my configuration it was time to finally move it in its final destination. I joined the “stable era” for my Nix configuration, everything was good enough and it was not changing costantly. Perfect time for some refactoring.

I decided to use my dotfiles repository with a nixos subdirectory. This is the one I had when I first moved the configuration from my local environment to Git:

$ tree -L 1 ./nixos
./nixos
└── machines
    └── AsusZenbook
        ├── configuration.nix
        └── hardware-configuration.nix

Those *.nix files are a copy of the one I have in /etc/nixos.

Now I had to teach NixOS where the new configuration are, the are various way, I decided to delete everything inside /etc/nixos/configuration.nix, leaving only an import to the configuration I moved as part of my dotfiles.

NOTE: I clone my dotfiles at /home/gianarb/.dotfiles.

I didn’t need /etc/nixos/hardware-configuration.nix and this is the content for my /etc/nixos/configuration.nix:

{ config, ... }:

{
  imports = ["/home/gianarb/.dotfiles/nixos/machines/AsusZenbook/configuration.nix"];
}

Pick a second use case

I got a new Thelio System76 workstation (thanks to EraDB) and it was the perfect opportunity to re-use my fresh NixOS configuration, and my new skill.

At this point I am still working from my Asus Zenbook, but it is time to get a new ./machines/thelio directory without a hardware-configuratio.nix, but only with the configuration.nix in there. The idea is to start extracting what you want to reuse from your first machine in its own files that can be imported everywhere you want.

I started from my user, because it is a common desire to reuse the same user across different machines. That’s why I have in my dotfiles a users subdirectory.

gianarb@huge ~/.dotfiles  (master=) $ cat nixos/users/gianarb/default.nix
{ config, inputs, lib, pkgs, ... }:
with lib;
{
  # Define a user account. Don't forget to set a password with ‘passwd’.
  users.users.gianarb = {
    isNormalUser = true;
    uid = 1000;
    createHome = true;
    extraGroups = [
      "root"
      "wheel"
      "networkmanager"
      "video"
      "dbus"
      "audio"
      "sound"
      "pulse"
      "input"
      "lp"
      "docker"
    ];
    openssh.authorizedKeys.keys = [
      "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIEKy/Uk6P2qaDtZJByQ+7i31lqUAw9xMDZ5LFEamIe6l"
    ];
  };
}

I have imported it in both machines as we did previously for the all configuration and I splitted other applications like: i3, my audio configuration, vscode and so on. You can find all of them inside the applications directory:

$ tree -L 1 nixos/applications/
nixos/applications/
├── i3.nix
├── sound-pipewire.nix
├── sound-pulse.nix
├── steam.nix
├── sway.nix
├── tailscale.nix
└── vscode.nix

Double checking that the refactoring is just a matter of re-building NixOS:

# nixos-rebuild test
# nixos-rebuild switch

Time to install NixOS the second target

I had everything I needed to re-install NixOS with my configuration into another target. It was time to setup a USB stick and boot Thelio from the USB. The system I want is described as Nix configuration, the installation looks the same as the one we have done, or the one described in the documentation, but at this point, we do not need to generated configuration. We have our own one. The only part we need, the first time, if we want is the hardware-configuration.nix.

When you have booted from USB you can do what you have done previously, and what it is explained in the NixOS installation guide, format and parition disk.
When you have the disk layout done you can mount it to /mnt and you can clone/download somewhere the git repository with your nix configuration. I usually create an clone it where I want it to end up: /home/gianarb/.dotfiles.
Time to run nixos-generate-config as you do all the time
Replace /ect/nixos/configuration.nix with the import, copy the generated hardware-configuration.nix to your machines folder
Last step is to open the hardware-configuration and figure out if it has sense for your hardware.
When you are happy with it you can run nixos-install and it will install it from the configuration you have just declared.

If it sounds like a convoluted process, it can be simplfied. But I didn’t yet invested into it yet! I don’t want to reinstall them all the time. You can read this article if you want to erase your laptop every day: “Erase your darlings by Graham Christensen”.

Conclusion

You just read about my journey with NixOS. With a centralized repository I can assemble, compile, and ship images to run on AWS, or ISO I can PXE boot. I can build and compile a NixOS derivation that I can use as installation driver, for example cloning my dotfiles.

As next project I want to build an ISO that I can flash into a Raspberry PI who will act as media hub for my speakers playing Bluetooth audio or Spotify playlists via raspotify.

How I tricked the cable mafia with PXE. Install OpenWRT on APU4d

2021-04-06T10:08:27+00:00

I am too lazy to buy a cable or another adapter. But not to buy an APU4d. A specialized for networking hardware with AMD Embedded G series GX-412TC, widely used for routers.

I got it directly from the manufacturer PC Engine. With a serial-USB cable and a Huawei Lte miniPCI chip.

I have also got the 16GB mSata SSD module because you never know, having a 16GB SSD, 4GB of RAM router sounds like an opportunity to run more tools on it!

I assembled all of it nicely at my desk. It was too late when I realized I don’t know how to flash an mSATA SSD because I don’t have proper cabling…

No matter how big the box with all my cables and dongles is, I will never own the one I need. It is a mantra nobody can escape. The best you can do is to trick the system.

Luckily for me, the APU4d supports PXE booting, and we know how cool it is, perfect opportunity to try pixiecore and have some fun with netbooting.

It worked. If all of this sounds unreasonable, you need to remember that most likely you are right. But you know how much I like simple tools. Pixiecore was on my radar.

Get what you need

First of all, I installed Pixiecore. It is a Go binary, you can run it as a Docker container or, you can compile it with go build but I decided to use Nix shell:

nix-shell -p pixiecore

In practice, it is a program that helps you to serve what a piece of hardware needs to PXE boot over the network, it servers IPXE and a TFTP server for example. It is light and not intrusive. You can keep your DHCP server, and if you like, even implement an API to drive how and what to PXE boot dynamically. Today I have to boot only one server in a very boring network. My solution is already too overly engineered. I decided to run it in static mode:

sudo pixiecore boot ./vmlinuz-vanilla initramfs-vanilla \
    --cmdline='console=ttyS0,115200n8 \
    alpine_repo=http://dl-cdn.alpinelinux.org/alpine/v3.9/main/ \
    modloop=http://dl-cdn.alpinelinux.org/alpine/v3.9/releases/x86/netboot-3.9.6/modloop-vanilla'

The first two arguments of the command line are the Alpine init ramdisk and the kernel. I got them directly from the Alpine repository.

The --cmdline option can be used to pass configuration to the operating system. The Alpine netboot wiki page to know the various options supported by the init script.

Now that I have set the PXE distribution tool, I powered on the APU4d board. By default, it tries to boot from a couple of different devices. The last one is PXE mode.

sudo pixiecore boot \
    ./vmlinuz-vanilla initramfs-vanilla \
    --cmdline='console=ttyS0,115200n8 \
        ssh_key=https://github.com/gianarb.keys \
        alpine_repo=http://dl-cdn.alpinelinux.org/alpine/v3.9/main/ \
        modloop=http://dl-cdn.alpinelinux.org/alpine/v3.9/releases/x86/netboot-3.9.6/modloop-vanilla'

Password:
[DHCP] Offering to boot 00:0d:b9:5a:3e:10
[DHCP] Offering to boot 00:0d:b9:5a:3e:10
[TFTP] Sent "00:0d:b9:5a:3e:10/4" to 192.168.1.87:55360
[DHCP] Offering to boot 00:0d:b9:5a:3e:10
[HTTP] Sending ipxe boot script to 192.168.1.87:29233
[HTTP] Sent file "kernel" to 192.168.1.87:29233
[HTTP] Sent file "initrd-0" to 192.168.1.87:29233

192.168.1.87 is the IP the APU4 got from my DHCP. Everything is working and from the serial port I see Alpine booting, the root password is root! Classy!

Time to install OpenWRT

I never used OpenWRT before. It is a Linux distribution for routers. You can even flash it to TP-LINK or Netgear devices if supported, at your own risk.

Anyway, since I am now running Alpine in memory on my APU4d I have a functional operating system and access to the device. I can use traditional tools like dd to write OpenWRT directly to disk, manipulate partitions and, so on… I followed the blog post “OpenWRT installation instructions for APU2/APU3/APU4 boards” written by TekLager.

Conclusion

My router looks up and running. I was able to reach the administrative Web UI. I didn’t use it yet because I have to relocate it to my new house. So I am sure you will read more about it in future articles.

Pixiecore was on my TODO list because those days hardware, datacenters automation are taking a good part of my daily working activity. Its support for external API makes it a great alternative to provide an installation environment like Hook (the one we developed with Tinkerbell) without having to onboard the full Tinkerbell stack, in particular, I can avoid boots when not needed.

DIY Board management control for an Intel NUC: power control

2021-03-14T10:08:27+00:00

I want to start this article with a disclaimer. What follows is not a tutorial or a guide, do what you want, but do not blame me if your fried your Intel NUC (they do not taste good).

When it comes to hardware and datacenters, I am not an expert. I was born and raised in the cloud, and recently, I joined Equinix and Metal (previously PacketHost). That’s why my interest changed, and I now have a disassembled NUC and a multimeter on my desk.

If you don’t know the origin of that PCB, I wrote a piece about my homelab for the Equinix Metal blog: “Building an Ephemeral Homelab.”

Long story short, almost one year ago, straight after joining Equinix Metal, I got a couple of NUCs and Invidia Jetsons to play with, fully cabled in a 1U brick. Cool, but I have to admin only helpful for experimentation. It was cheap and the boards themself are old. But this is everything I need, something I can break without feeling too bad.

When it comes to fully flagged servers, you quickly learned that they are made of building blocks when an essential one is the board management console (BMC). Think about that as a small and low-consuming PC that manages the big brother, the actual server. When it comes to servers, you know they are loud, consuming a lot of power. That’s why you have a BMC that has the only responsibility to manage the expensive server. It can power control it, switching it on, off, and monitoring its status with metrics like volt consume, temperature, and so on. It can even select the boot device, for example. It is handy if you want to enter PXE mode, for example, to manage your server without touching it.

The BMC is wired to the server; you don’t have to power it separately; usually, you only have to hook it with an RJ45 to a switch. Extremely functional because those who have access to the BMC can take control of the actual server; it is an excellent idea to place the NIC in a dedicated VLAN.

Time to hack

My homelab arrived cabled with a relay controllable from an outside board like an Arduino or a Raspberry Pi. Switching on or off the relay cuts the power brutally, almost like directly pulling the board’s power cable off.

NUC does not have a BMC and does not consume much, so there is no point in having another computer controlling them, but hey, this is my home lab, and we are after something here.

I downloaded my board’s schematic a few months ago, and from time to time, I look at it for inspiration. I studied electronics at high school, and Arduino got invented in my region, but I don’t think it counts.

I want to switch on and off my boards properly without leaving my desk because I am lazy. I want to use a Raspberry PI for this job because I can write code in any language I know. Spoiler alert for this prototype I have used ~5 lines of Bash.

— Picture front panel schematics —

During one of many rounds of randomly reading the table of contents, I saw a Front Panel Header exposed from the NUC that says: “Power/Sleep LED Header”. It looks like there is a way to connect a LED to the NUC to see its status, fun! Nothing complicated: 1 the board is on, 0 the board is off. The LED can be replaced with a GPIO from the Raspberry PI (I used GPIO22) and hooked to a few BASH lines (as a prototype) to read the actual value from the NUC. I used this guide, “Bash Control of GPIO Ports.” I used tmux so I can leave it running in the background:

#!/bin/bash

tmux new-session -d -s power\_status
tmux send-keys "watch -n 1 'cat /sys/class/gpio/gpio22/value >> /tmp/current\_power\_status'" C-m
tmux detach -s power\_status

Now that the first mini-circuit is done and I am a bit more confident, I kept reading: “Power Switch Header.”

Pins 6 and 8 can be connected to a front panel momentary-contact power switch. The switch must pull the SW_ON# pin to ground for at least 50 ms to signal the power supply to switch on or off.

This sounds easy; I cabled PIN 6 from the NUC to GPIO 17 in the RPI and PIN 8 to ground, and with two bash scripts, I figured it all out!


$ root@raspberrypi:~/power\_swtich# ls
log.sh poweroff.sh poweron.sh

$ root@raspberrypi:~/power\_swtich# cat poweroff.sh
#!/bin/bash
echo "0" > /sys/class/gpio/gpio17/value

$ root@raspberrypi:~/power\_swtich# cat poweron.sh
#!/bin/bash
echo "1" > /sys/class/gpio/gpio17/value
sleep 0.2
echo "0" > /sys/class/gpio/gpio17/value
sleep 0.2
echo "1" > /sys/class/gpio/gpio17/value

Power the Raspberry PI

Half of myself likes the idea of getting a Raspberry PI or equivalent for each NUC, pretending that it is a BMC (I have other things I want to do with it, more at the end of the article). Either way, I like the idea to power the Raspberry PI from the NUC itself to save a power supplier and a cable. If you carefully looked at the front panel picture, you probably noticed that PIN 9 is a +5V_DC (2A), just enough to power an RPI via GPIO. But you need to know that GPIO unlikely the USB one does not implement any safety protection technique. If you supply an incorrect voltage, the RPI will burn.

Anyway, PIN 9 is not what I am looking for because it goes up to +5 V only when the NUC is on. We want to get the RPI powered on even when the NUC is off (but plugged with the power supply).

The NUC has a header called: “Auxiliary power connector” that does just what I need! I hooked it all up, and we have power.

Conclusion

I can’t tell if this is or will never be a BMC, but I quite like where this is going, and I had fun. Short term, I can hook more NUCs to the same RPI and play with it. I can re-write the bash scripts you saw using some other languages exposing something like an HTTP API that I can interact with programmatically.

But I am after something better that I am not sure I can figure out. There is something else. I want to visualize output from the NUC. With Tinkerbell, I already have some control over the machine lifecycle because the NUC is capable of PXE booting. I can inject an in-memory operating system (Linux) and SSH into a NUC even if it does not have an operating system installed. But I want more; I want to look at the BIOS and things like that. An “easy” solution is the HDMI dongle. I can get an HDMI video capture, hook it up to the RPI and forward the NUC output with VNC or something like that; I can do something similar to forward what I type with a keyboard. A better solution is to use a serial console. Unfortunately, my board does not expose it. Joel, one of my colleagues at Equinix Metal, told me that my CPU most likely has it, and he is correct (accordingly with the CPU schematic), but the board does not have a header that I can use. But this is a story for a next article (if we will figure it out).

Hero image from Pixabay

Nix for developers

2021-01-25T10:08:27+00:00

Nix is slowly sneaking in my toolchain.

Currently, a lot of my colleagues use Nix. It is a package manager that runs on Linux and macOS. It is versatile. I will show you more about it moving forward, but for now, think about it as a replacement for APT, YUM, and HomeBrew can work on both Mac and Linux. It is also a build system, but I didn’t use it much for it just yet.

Not tight to an operating system

For me, this is already a huge benefit. From time to time, for no reason, I end up switching from Mac to Linux and vice versa. It usually happens because I change the place where I work, and the policy forces me to make some weird decisions.

When I got off college, my parents bought me my first laptop; it was a Macbook Pro, but for the first two, three jobs, I used Linux because Mac was too expensive for my employers, and the main reason for owning a Mac was because back then I was not a fully flashed developers. I used to do video editing, playing with Photoshop, and so on. I quickly learned that I couldn’t match two colors nicely together, and Linux was just enough for me as a developer logging CLI, VIM, and tools like that. When I was in Dublin, Macbook Pro was the only available option; at InfluxData, I had a Thinkpad (the best laptop I ever had). Currently, I work on a Macbook Pro again because the non-apple available option was pretty low in terms of performance.

Now that you know my struggle for laptop and operating system consistency, a tool that works on both sounds appealing.

Nix has its own Linux distribution called NixOS. I slowly have a lot at it, but it is not a topic for this article.

Declarative environment

The open-source project I mainly consistently from years is my dotfiles repository. I am probably the only person who knows how to run it, but it contains configuration for the various tools I use.

I have to admit that I would like to install it consistently and quickly on any of my on-demand servers I spin up, but I too lazy for it. Anyway, I like that approach because I describe what I want, and I can consistently get it everywhere. Nix gave me the same possibility, and it does not use a specification language like YAML, JSON, or whatever it uses a dialect.

It is a lazy, pure, and functional language. It is pretty awkward; I have to say, at least for my background. I didn’t figure it out yet, but the more I use it, and better it sticks to my mind.

I am also not that good when it comes to picking up new languages, it takes me some time, and I have to practice with them.

The good thing is that there are plantly of tutorials, each of them with different stakeholders. Do you like to be driven by example? There is “Nix by example.” You have time, and you want a more traditional reference manual; they got you covered.

The fact that I don’t have to fight with the template engine makes me happy.

It is all based on Git.

I use Git since my first day at my first job. I was a solo developer, and my remote repository was not GitHub but a USB stick.

The Nix package manager is a GitHub repository. You can have your one, or you can use nixpkgs. You can even merge multiple ones. Or import your derivation (the way Nix calls package).

A text editor and Git to clone a repository are what you need to look at all the packages, and their definition gives me a friendly feeling.

Based on how you want to define your environment, you can pin all the packages you are installing to a specific commit SHA from the package manager repository:

let _pkgs = import <nixpkgs> { };
in
{ pkgs ?
  import
    (_pkgs.fetchFromGitHub {
      owner = "NixOS";
      repo = "nixpkgs";
      #branch@date: nixpkgs-unstable@2021-01-25
      rev = "ce7b327a52d1b82f82ae061754545b1c54b06c66";
      sha256 = "1rc4if8nmy9lrig0ddihdwpzg2s8y36vf20hfywb8hph5hpsg4vj";
    }) { }
}:

with pkgs;

Very powerful.

Environment composition with Nix

I am not sure if environment composition has any sense, but it sounds descriptive to me. Nix is user and project aware.

With nix-env, you can install packages as a user. With nix-shell, you can manipulate your system at the project level. If you add NixOS to this chain, you get free customization at the operating system layer.

Currently, nix-shell is the tool I know more about, and I am in love with.

I didn’t experience those composition levels; I am currently writing my home-manager configuration file to solve my dotfiles repository’s required dependencies. Right now, I don’t have a way to install them automatically. I am not sure if that’s the right layer for such a problem yet, but I will figure it out soon.

Project level sandboxing with nix-shell

With a combination of symlinks and who knows what, nix-shell gives you a sandboxed environment with the only dependencies you need for your project. When you run nix-shell, it looks for a file called shell.nix that describes needed dependencies, environment variables, and so on. By default, you get all the commands and utilities you have in your system plus the one you declared for that project. If you have Go 1.15 in your system but want 1.13 for a single project nix-shell, you want to make it happen, for example. Tinkerbell has a shell.nix for almost all the repositories.

For some particular scenarios, I use the Docker container in development. But with Nix, I can remove that extra layer. I use containers and images to ship and run my applications on Kubernetes. Removing that layer decreases the need for volume mounting, port forwarding, the debugger works much more comfortably, and performance is the one your hardware provides to you, without virtualization if you are on Mac.

Containers in development are my way to go when for dependencies that I don’t care about or that I will never modify and have a state such as databases. But it is a joy to develop “locally.”

Everything can be “nixyfied”

Passing the flag –pure to nix-shell won’t rely on the system installed packages but only on the one specified in nix-shell. It is a great way to validate that the declaration you wrote for your project can work everywhere you can run Nix. It makes continuous delivery what it should be a way to run workflows. It is not like that; for me, it is a constant translation exercise between Jenkinsfile, bash, YAML for GitHub Actions, drone, or Travis. With Nix, you declare the environment, and you can run it everywhere. For example, you can set shebang in your scripts, leaving to nix-shell the responsibility for satisfying the dependencies it needs:

#!/usr/bin/env nix-shell
#!nix-shell -i bash ../shell.nix

make deploy

If you don’t want to translate from Nix to GitHub actions, there is an action that installs Nix, combined with the right shebang; you can reuse the shell.nix description for your project. I do that in gianarb/tinkie:

name: For each commit and PR
on:
  push:
  pull_request:

jobs:
  validation:
    runs-on: Ubuntu-20.04
    env:
      CGO_ENABLED: 0
    steps:
    - name: Checkout code
      uses: actions/checkout@v2
    - uses: cachix/install-nix-action@v12
      with:
        nix_path: nixpkgs=channel:nixos-unstable
    - run: ./hack/build-and-deploy.sh

As you can see, I am not using actions to install the dependencies I need, I use cachix/install-nix-action@v12 to get Nix, and everything is managed as I do locally. Something I don’t have to maintain, I suppose.

Mitchell Hashimoto uses Nix to provision its virtual machine, quickly enjoying the Linux environment when it comes to development compared with MacOS.

Conclusion

I tend to avoid complications, and I am picky when it comes to the number of tools I have in my toolchain, but after a few months of observation, I think Nix deserves a place in my daily workflow. I just scratched the Nix surface; I didn’t even write my first derivation yet.

It brings back the joy I had a few years ago provisioning infrastructure, which I have lost in the last few years.

Hero image via Medium.com

Evolution of a logline

2021-01-15T10:08:27+00:00

This story represents the evolution of a logline for myself when I write and interpret it.

Late in 2012, I worked as a developer for a software agency in Turin, specialized in software for tour operators. It was my second “real” job and the first one not as a solo developer. Exciting time! An AJAX application with PHP and MySQL backend running as a service developed mainly by a single person, the lead developer. I interpreted the log back then as the equivalent of a save point in a game. The tail was the primary tool to figure out what was going on; a logline was helpful to figure out that a lot of customers were reaching a particular line of code. The interpretation of the situation was up to humans.

Every developer involved in the project participates in adding the logline in the codebase directly, developing the code, or indirectly when chatting about a particular feature over lunch or doing a code review. Building a context from an unknown log line was a useless exercise because the lead was always there to help you figure out what that logline was supposed to tell.

Even the stream’s speed was a crucial metric to figure out the sanity of the application. Where the logs too fast? The application was under heavy load, and probably it was slow, not fast enough, or not smooth as usual, well something was going on, and it was not good!

You can judge this story as unpractical but not as unusual. This approach does not scale; it has an unmeasurable risk of “bus factor” but, if you don’t have a panel in your Grafana dashboard representing the distribution of loglines, you should look at it. Just for fun.

Bus Factor

Bus factor represents the risk of knowledge and responsibility centralization in a single location. If the lead in my story resigns or gets hit by a bus, nobody will build context from “not that descriptive” log lines quickly like him. And the “speed of tail” requires to be very familiar with the stream. Sharing knowledge and responsibility across the company, writing documentation, and doing staff rotations are standard techniques that mitigate such risk.

Automation

When your application state’s interpretation requires a human, it is tough to build automation for it. Standardization in the way your application communicates to the outside is another way to spread the knowledge in a team, allowing you to write automation for it.

The format of a logline is the protocol to develop.

The format has to be parsable and useable by automation. You have to see logs as a point in time, as time series more than as something that I should carefully watch and try to interpret by myself.

A logline that looks like this:

1610107485 New user inserted in the DB with ID=1234

Will become:

time=1610107485 service="db" action="insert" id=1234 resource="user"

You can add a message that can be used to communicate with a person: msg=”new user registered.” but not sure if it is mandatory, you can combine it later.

We do this exercise with ElasticSearch, applying full-text algorithms, and tokenizing on the message. It is expensive, and it hides the developer’s responsibility when it comes to consciously describe the current state of the system with a logline. No, they are not random printf anymore. You can even see it as JSON if you prefer.

Or, more in general, as a description of a particular point in time via key-value pairs that you can aggregate, visualize, and use to drive powerful automation, I work a lot in the cloud field. For me, reconciliation or a system’s ability to repair itself is often based on those pieces of information. If you want to go deeper on this topic, structured logging is what you should look for.

Flexibility

Having a logging library that allows you to do structured logging is a must-have, and there is no answer about the number of key-value pairs you need. The overall goal is to derive issues to learn the behavior of an application from those points. It is not something that you use when having problems. The application exposes the tool we have to figure out what a piece of code we didn’t write is doing in production. In a highly distributed system, a logline with the right fields such as hostname, PID, region, Git SHA, or version can distinguish where the application having problems is running without looking across many dashboards, Kubernetes UIs, and CLI.

Parsing and manipulating a structured log is more convenient than a random text that has to be parsed and tokenized, but everything has a limit, so you have to find the right balance based on experience. It is another never-ending iterative process that we can call the evolution logline!

Conclusion

Logline is the way you tech your application on how to communicate with the outside.
Communication is useful in many fields. It is an opportunity to learn something new or a way to communicate that we are in trouble. Same as logs, use them as an opportunity to learn how a system works overall.
As a developer, do not see a logline as a random printf. The way it is structured and articulated improves the communication quality between your application the outside world.
A logline is not a fire and forget but an entity that evolves in time.
Logs represent the internal state of your application at some point in time and somewhere in your codebase.

Recently I spoke with Liz and Shelby from HoneyComb about observability and monitoring during o11ycast a podcast about observability if you want to know more about this topic.

Hero image via Pixabay

Kubenetes v1.20 the docker deprecation dilemma in practice

2020-12-03T10:08:27+00:00

Kubernetes v1.20 is not yet out but there is already a lot going on behind the scene. The main reason is the deprecation of Docker as default runtime.

I won’t go too deep in the theory because at this point I think it is a well covered part. But a few things:

If you run Docker, you run containerd. That’s it. Even if you didn’t know, or you don’t like the idea.
The container runtime interface is there since a good amount of time and the goal for it was to decouple the orchestrator (kubernetes), from other business like running containers. Who cares about containers at the end.
Deprecating dockershim or at least removing it from the kubelet itself is the right thing to do.

Lessons learned working as site reliability engineer

2020-11-26T10:08:27+00:00

I worked for three years at InfluxData as Site Reliability Engineer. When onboarding a new role in this jungle called “career” in information technology, you should be ready to learn what that role means.

Along the way, I developed new skills and mastered a few that I already had, but they were not widely used elsewhere.

My title is not Site Reliability Engineer anymore, but the IT career is like a roulette, and everything you learn will get back at some point. So I want to share skills I think are essential when working as a Site Reliability Engineer and that I feel they are useful to keep in your toolchain even when your title changes.

Ability to develop a friendly environment for yourself

One of my goals as a Site Reliability Engineer is to quickly support developers having trouble with their code at scale.

Another one is to figure out criticalities when it comes to on-call.

I am far from my local environment in both cases, usually interacting with an environment much more complicated. Having something you can call familiar help. It can be whatever:

A few bash scripts that wrap other commands with a UX hard to remember
A CLI tool you wrote with your team
A directory where you can quickly go and write notes about what is going on for future use
A set of bullet points or a runbook you know is rock solid and can drive you where you want to go.

Those are just a few tricks I use. If you have yours and you want to share them as a comment, please do it!

This is an essential skill that everyone has to master, but as an SRE, when you have to act quickly, I really learned it, and now I do my best to develop my workflows and a working environment I like. Those days I am giving a try with Nix and, in particular, nix-shell because it helps me customize my environment without the overhead of a Docker container.

This may sound time-consuming. Many projects have a README describing tools and requirements to contribute or build the project. Why I need my way? Well, I am not saying you should start from zero, but when I glue it with the flavor I like, I code better and am happier. So for me, it is a big YES!

Troubleshoot like a ninja

Starting from the same purpose as before, a Site Reliability Engineer looks at the code when it runs in production, and production is a scary and dangerous place. As a developer, if you are lucky and smart, you try to focus on one application at a time, yes it probably has many dependencies, but still, code moves one line at a time.

In production with concurrency and thousands of requests happening almost simultaneously, things get pretty messy. Having the ability and the right tools to slice and dice from different points of view and prospectives, from an entire region to a specific application requires operational experience and training.

A good exercise is to have the desire to troubleshoot everything. Does a teammate have a question about a system in production? Go and help him. Visit logs, traces, and dashboards even when everything looks quiet if you are so lucky to have a definition of it.

Another thing I do, but more in general, is to follow the best. There is a lot about the topic in the forms of books, talks, and similar. Read them but even more importantly, follow who master these topics every day: @rakyll, @brendangregg, @relix42, @lizthegrey, @lauralifts. Please do not follow them on Twitter only, but look at their GitHub as well; sometimes, a small project that works reliably and well for us is gold.

Think about code debuggability in production

Like everything you read so far, it is always essential because, as I said, Site Reliability Engineer is just a role that has a subset of responsibilities and objectives, but we are not silos; everything matters a feature has to be usable; good looking, functional. Code review for me become a lot more about: “is this code understandable in production?”, “what do I want it to tell me when running at scale?”, “how this trace looks like?” “is this log useful, and how does it impact the overall context?”.

Those questions strongly show my mind when working as an SRE, but they made me a better developer. I still try to answer them when coding or when doing code reviews.

Conclusion

Titles are just titles; reading them will help to know a set of skills you leveraged most, but that’s it, and it is not always true. You are not married to your title, and if you are curious about various aspects of our work, you will change many of them. The right balance of all those skills will make you unique.

Vanity URL for Go mod with zero infrastructure

2020-11-13T10:08:27+00:00

This post is about how I renamed a go module from github.com/something/somethingelse to go.gianarb.it/somethingelse. It requires zero infrastructure, just a static site that can run on GitHub.

I used one of the many projects I have that nobody cares about gianarb/go-irc.

Why

You know, is one of those ideas you have in your mind for ages, but who cares? At least for me that I don’t have any cool open source project under my name.

I am not one of those people who suffer when realizing that it is not escale. If you end up having a project that gets traction and is tight to github.com because you didn’t think about another way to go, you are stuck. And even if GitHub today is cool, it won’t stay cool forever.

Filippo Valsorda @FiloSottile today tweeted about this topic, and I looked at how he set up filippo.io/age to solve this little dilemma.

Goals

This is not about how to escape from GitHub but is about setting up a “vanity URL” that won’t lock you or your project to GitHub. It does not require any infrastructure, just a domain that you can point to a GitHub Pages.

Prerequisite

Create a DNS record that points as CNAME to <github-handle>.github.io. I used go.gianarb.it
Create a repository; it will be the home for your static site. Mine is gianarb/go-libraries
Set the repository up to be a GitHub page and enable HTTPS. You can enable it via the repository Settings; we will push HTML files to it directly, so I used the master branch as the GitHub page’s source.

Add your first library.

If your library is already using go mod, you have to change the module name to the new one. In my case, from was github.com/gianarb/go-irc to go.gianarb.it/irc. I just searched and replaced with my editor in all the project. Renaming a module is a bc break; I am not sure how to avoid or mitigate that; if you know, let me know!

You can push a new file to your static site repository; I called my irc:

<html>
    <head>
        <meta name="go-import" content="go.gianarb.it/irc git https://github.com/gianarb/go-irc">
        <meta http-equiv="refresh" content="0;URL='https://github.com/gianarb/go-irc'">
    </head>
    <body>
        Redirecting you to the <a href="https://github.com/gianarb/go-irc">project page</a>...
    </body>
</html>

Replace your URL accordingly, but as soon as you push this file and GitHub will release it to your page, you will be able to import: go.gianarb.it/irc.

Conclusion

This methods works as a safeguard if you decide to move your code out from GitHub. The static site can be deployed to Netlify, S3 or served by Nginx. It does not need to stay on GitHub.

Same for your code, if you decide to move from GitHub to GitLab you can do it transparently.

What is Tinkerbell?

2020-11-06T10:08:27+00:00

First things first, Tinkerbell is an open-source project mainly written in Go that comes from PacketHost, now Equinix Metal. Equinix Metal is a cloud provider that serves bare metal servers. No virtual machines, no high-level services, I said bare metal! Imagine a colocation that you can rent per hour.

Tinkerbell is the software Equinix Metal dreamed about as an internal provisioner for datacenter automation. They took their internal provisioner and removed any PacketHost references of business specific code, and pushed it to GitHub for the community to enjoy the same technologies.

The project is a number of micro-services that provide various functionality to configure hardware and provision bother Operating System and additional software through it’s workflow engine.

What the project provides

The first micro-service is called boots. This Tinkerbell service provides a DHCP and a TFTP server to tell a piece of hardware (server) what to do when netbooting, it provides this information through the iPXE project.
Tinkerbell serves a CLI that you can use to interact with a control plane that serves HTTP and gRPC API. The service which does all those things is in the tink repository, and it provides three binaries: tink-server (the control plane), tink-CLI, the command line interface, and tink-worker.
Tinkerbell provides an operating system that runs in memory, it is based on Alpine, and it is called Osie. This in-memory operating system runs directly on the hardware you want to provision, and it runs the tink-worker.
Once the in-memory Osie has started it begins the tink-worker which in turn communicates with the control plane (tink-server), asking for any work that has to be done on that server. This unit of work is called a workflow.
Hegel is a metadata server, comparable to the AWS EC2 metadata or the Equinix Metal one; the majority of cloud vendors provide this type of service , so you should have it as well! It is crucial when running scripts in a particular server because you can get concrete variables from it, such as the operating system it runs, its IPs, location, and so on.

The end goal

The Tinkerbell end work is to bring to life a piece of hardware.

Workflow and template

A template is a specification file that describes what we want to execute. A workflow starts from a template, and it has a particular target. Templates are reusable; workflows are a single execution and can’t be reused. The single unit of work in a template is called action. You can get as many actions you want in a template, and each action runs in its own Docker container.

Action

As mentioned above, actions are Docker containers and that means that you can build each action in isolation in the language you want. It can use python, bash, Golang, Rust, or whatever you can run in a container.

You may think that Docker would sound like an overhead, however we took a natural decision based on how we could use the container concept in operations. The concept of build, pull, and push has become commonplace within development environments, and we think it could also work well in operational environments too. Building containers to contain operational tasks in isolation and enhancing that with testing and simplified execution of a container is a clear benefit. It is an effective way to move code around in a reusable way without having to reinvent the distribution model. Some of the actions you will see very often in a Tinkerbell workflow may be:

Disk related actions: mounting a disk, wiping it, or setting up a partition table to boot an operating system
Downloading an Operating System like Ubuntu, Debian, NixOS, CentOS
Copy an operating system in a partition

But you will be able to write actions related to your business:

notify a particular API when provisioning fails
Attempt a recovery
Observe and mark the status of your provisioning
Who knows! There are no limitations here.

How a template and a workflow looks like

Unfortunately, there are not many examples, but as maintainers, the next three months will be all about public workflows and reusable actions.

Kinvolk wrote a blog post about how to provision Flatcar on bare metal with Tinkerbell.

The Tinkerbell documentation has an example of a “hello world.” template.

Frans van Berckel wrote a workflow for CentOS and Debian.

One of my next projects will be to write a workflow that won’t install an operating system. It will start something like k3s or k8s directly on Osie for my ephemeral homelab! I am not sure it has a sense or will ever work, but I think it is an excellent example: “it is not all about having a persisted and traditional operating system those days.”

How to get started

We put a fair amount of effort into a sandbox project and setup guide. You can run it locally with Vagrant or on Equinix Metal.

Aaron Ramblings wrote a blog post, “Tinkerbell or iPXE boot on OVH” using the sandbox to run Tinkerbell on OVH! I am still surprised when I read it because he experimented with the sandbox in a very early stage of the project, and in the same way, he was able to run sandbox on OVH; it can run almost wherever else (at least for the control plane part).

Next steps

With the help of our community we recently improved our continuous integration pipeline to build all the projects for various architecture: linux/386, linux/amd64, linux/arm/v6, linux/arm/v7, linux/arm64 levering Docker buildx, Qemu, and GitHub Actions. My goal was to be able to run the provisioner in a Raspberry Pi. Because as I wrote before, my homelab tends to go away, get moved, disconnected, and I think I can keep running reliably only a Raspberry PI as it is today. So I want to run the control plane on a RaspberryPI. I presume there are smarter things to do with multi-arch, but let’s be honest; we all have a RaspberryPI leftover somewhere.

We use the sandbox project as a way to release Tinkerbell’s version as an all project. We are pinning all the various dependencies such as Boots, Hegel, Tink-Server, CLI, Osie, and when they all pass the integration tests, we tag a new release. The generated artifacts are containers for now. We want to get binaries in this way. You can run Tinkerbell as you like, even without containers. At some point, we will tag and manage each component independently, but for now, it is a lot of effort.

Releasing new workflows is something we are working on already. So stay tuned!

Another project is available in the Tinkerbell GitHub organization that I didn’t mention because it is not hooked yet as part of the stack. After all, we are working at its version two. PBNJ provides a standard API to interact with various BMCs and IPMIs (Intelligent Platform Management Interface). Having this kind of ability in a datacenter is essential because we want to pilot things like reboot, restart, switch off for each server programmatically, and even as part of a workflow.

Conclusion

There already exists huge demand for bare-metal usage, which with the growth caused by things like 5G, dedicated GPUs/FPGAs, HPC, constant and expected performance and security boundaries is only going to grow. A recent report by the Mordor Intelligence company reports “The bare metal cloud market was valued at USD 1.75 billion in 2019 and expected to reach USD 10.56 billion by 2025” which clearly shows that there is growing demand for a modern platform to provision their bare-metal infrastructure.

Datacenter management is hard, and that’s why the public cloud got so much traction. For companies and products, managing hardware is unnecessary and a distraction, but when it becomes a requirement or when you think it is strategic to manage your own hardware Tinkerbell and its community comes to rescue you.

A big thank you goes to Dan for his review and support writing this article!

More, I want more!

Dan and Jeremy had a conversation about netbooting and bare metal provisioning. It is available on YouTube, you should really have a look at it!

Alex Ellis and Mark Coleman recorded a video setting up and using Tinkerbell. The video is a bit out of date and they did not use the new sandbox project because it was not available at that time. But still a good and valuable!

Reactive planing in Golang. Reach a desired number adding and subtracting random numbers

2020-10-26T10:08:27+00:00

Ciao! A few months ago, probably a year, I wrote a small library called planner. It comes from my experience using reactive planning and Kubernetes. I am really in love with this way of writing code because it sounds very reliable to me.

Over the last couple of days, I decided to write documentation for it! So now it is presentable; I streamed that with Twitch if you like to watch people coding!

As part of the library’s readme, I wrote a small program, and I left a couple of exercises to the reader. With this article, I want to solve them.

You can follow this article and try it yourself, starting from play.golang.com.

package main

import (
	"context"
	"time"

	"github.com/gianarb/planner"
	"go.uber.org/zap"
)

func main() {
	ctx, done := context.WithTimeout(context.Background(), 10*time.Second)
	defer done()

	countPlan := &CountPlan{
		Target: 20,
	}
	scheduler := planner.NewScheduler()
	scheduler.WithLogger(initLogger())

	scheduler.Execute(ctx, countPlan)
}

type CountPlan struct {
	Target  int
	current int
}

func (p *CountPlan) Create(ctx context.Context) ([]planner.Procedure, error) {
	if p.current < p.Target {
		return []planner.Procedure{&AddNumber{plan: p}}, nil
	}
	return nil, nil
}

func (p *CountPlan) Name() string {
	return "count_plan"
}

type AddNumber struct {
	plan *CountPlan
}

func (a *AddNumber) Name() string {
	return "add_number"
}

func (a *AddNumber) Do(ctx context.Context) ([]planner.Procedure, error) {
	a.plan.current = a.plan.current + 1
	return nil, nil
}

func initLogger() *zap.Logger {
	cfg := zap.NewProductionConfig()
	cfg.Encoding = "console"
	l, _ := cfg.Build()
	return l
}

This program tries to reach the Target (20 in this case) adding numbers to the current state. If you execute this program as it is you will get the following logs:

1.257894e+09	info	planner@v0.0.1/scheduer.go:41	Started execution plan count_plan	{"execution_id": "98d28eed-9b3b-4ad8-bfbd-1b5338d1a649"}
1.257894e+09	info	planner@v0.0.1/scheduer.go:59	Plan executed without errors.	{"execution_id": "98d28eed-9b3b-4ad8-bfbd-1b5338d1a649", "execution_time": "0s", "step_executed": 20}

As you can see, the scheduler executed the plan count_plan successfully, and it took 20 steps to get there (step_executed: 20).

Reasonable because, as you can see, the CounterPlan.Create function returns an AddNumber procedure and that procedure only adds 1 to the current state. It is just a counter; let’s make it a bit more fun. I want to add or substract random number until the target is reached. The program adds when the current state is above the target, when above it subtracts. If it’s equal we are done. This is a simple way to simulate something that has to adapt, too simple to sound cool but still something understandable.

Change the AddNumber to use a randomly generated number.

We need to change the AddNumber in order to add not 1 but a random number. Let’s do it:

var random *rand.Rand
func initRandom() {
    s1 := rand.NewSource(time.Now().UnixNano())
    random = rand.New(s1)
}

At this point, we can use random as part of the AddNumber.Do function.

type AddNumber struct {
	plan *CountPlan
}

func (a *AddNumber) Name() string {
	return "add_number"
}

func (a *AddNumber) Do(ctx context.Context) ([]planner.Procedure, error) {
	a.plan.current = a.plan.current + random.Intn(10)
	return nil, nil
}

For simplicity, I am taking a random number between 0 and 10. What happens now? The problem now is that we can go above the target, so we have to make our CounterPlan.Create function and our logic a bit more complicated.

Evolve the Create function to subtract numbers from the current state

func (p *CountPlan) Create(ctx context.Context) ([]planner.Procedure, error) {
	if p.current < p.Target {
		return []planner.Procedure{&AddNumber{plan: p}}, nil
	} else if p.current > p.Target {
		return []planner.Procedure{&SubtractNumber{plan: p}}, nil
	}
	return nil, nil
}

When we go above the target, the Plan subtracts a random number, and it keeps going until we get to it. SubtractNumber does the opposite of what AddNumber does, it subtracts a random number between 0 an 10.

type SubtractNumber struct {
	plan *CountPlan
}

func (a *SubtractNumber) Name() string {
	return "subtract_number"
}

func (a *SubtractNumber) Do(ctx context.Context) ([]planner.Procedure, error) {
	a.plan.current = a.plan.current - random.Intn(10)
	return nil, nil
}

You can run the result here, and you will see that based on the random numbers, it adds or subtracts the number of executed steps changes.

NOTE: the golang playground always starts from the same time; in my example, I use time as Seed; for this reason, to see a variation in the number of steps, you will have to run the code locally.

Conclusion

This is probably a too straightforward example, but let’s imagine that your Target is not fixed and varies based on external factors. Your house’s temperature and this program is a thermostat that has to keep your room at the desired temperature. Or the number of instances running in your cloud provider, and you have to keep them balanced. This last use case is the exact problem I solved writing keepit a replica set for Equinix Metal servers. I used planner, so check it out.

I didn’t highlight this example because this pattern gives you an excellent way to measure how reliable your program is. Think about it in this way; you can programmatically handle errors returning a procedure or more than one that can mitigate the error itself. It can be a “sleep for 5 minutes and retry”, or you can do something more complicated, and until the Plan keeps returning work to do, you will have the opportunity to succeed. I extracted an highlight from the Twitch stream rambling about this.

Have a nice week!

Your release workflow is code, it is just about time

2020-10-21T10:08:27+00:00

20th October 2020 is the day I released Kubernetes for the first time. To be precise, I piloted with the help of sig-release Kubernetes v1.20.0alpha3. One of the reasons I am happy to work with the sig-release is to learn and see how such a significant process gets released reliably and continuously by a group of people coming from different backgrounds, jobs, and locations.

The first lesson learned you would notice as soon as you join the SIG meeting more frequently or as soon as you start contributing is the general effort in converting what used to be bash scripts to Go.

Now, don’t fight against the languages by themself, but I think the story is reasonable. You start small, and when it comes to releasing code, a lot happens in somebody’s terminal. That’s why many of the release workflows I saw in my life are a mix of Makefile and bash script.

I don’t think it scales because it is hard to get error handling, retry logic, and testing made right in bash. Maybe I am just not good enough with BASH, and I know there are testing libraries for it like bats, for example.

Anyway, I have to admit, I feel good enough with BASH, but I code way better in Go, PHP, and probably even JavaScript. Also, I am sure this is a feeling a share with many people, and more in general, the Kubernetes development community is very fluent with Golang.

Anyway, let’s treat the code that empowers the release lifecycle as application code, just as the Sig Release is doing with Kubernetes. Documentation, testing, user experience, and so on. Develop useful libraries that can be encapsulated in command-line tools, or API, or bots.

There is a BASH script that takes snapshots from Testgrid called testgridshot, uploads them to Google Cloud, and outputs a markdown that can be copy-pasted as a comment in the issue we use to track every release. We run it to take a snapshot of the various testing pipeline status at the time of a release.

testgridshot is the unique one in BASH I had to interact with for now, and it didn’t work because of some environmental issues with my laptop. Coincidence? It can be solved by running it as a container and having a binary with statically compiled with all the needed dependencies.

Carlos is currently working on rewriting testgridshot in Golang; it will use as a command-line interface, and I think it will be even better to encapsulate it as prow capability.

Prow is the Kubernetes CI/CD system. It can trigger jobs for particular actions, and almost everything you see happening in GitHub when using / commands like /open /assign and so on is a Prow responsibility.

Testgridshot is useful during a release cut. The cut starts from a GitHub issue; as we saw, it sounds very comfortable to have a command available like /testgridshot and leaving Prow the responsibility to comment.

Now the takeaway hidden by the word encapsulates, it is great to have both a CLI and a Prew command. Go becomes your baseline where the operational experience live. All the rest is a UX, and you can have as many of them.

I am not writing this because you should stop and move all your BASH to something else, but I experienced by myself. I see this little story with the Kubernetes SIG release to confirm that it’s easy to block ourselves as release engineering because there is a BASH script that we don’t want to rewrite. After all, it is like that since forever. The project is not the same since day one, the team grew or changed, and it is reasonable for a workflow to follow this evolution.

How bare metal provisioning works in theory

2020-10-08T10:08:27+00:00

I am sure you heard about bare metal. Clouds are made of bare metal, for example.

The art of bringing to life an unanimated piece of metal like a server to something useful is something I am learning since I joined Equinix Metal in May.

Let me make a comparison with something you are probably familiar with. Do you know why Kubernetes is hard? Because there is not one Kubernetes. It is a glue of an unknown number of pieces working together to help you deploy your application.

Bare Metal is almost the same, hundreds of different providers, server size, architectures, chips that in some way you have to bring to life.

We have to work with some common concepts we have. When a server boots, it runs a BIOS that looks in different places for something to run:

It looks for a hard drive
It looks for external storage like a USB stick or a CD-Rom
It looks for help from your network (netbooting)

Options one and two are not realistic if the end goal is to get to a handsfree, reliable solution. I am sure cloud providers do not have people running around with a USB stick containing operating systems and firmware.

Netbooting

I spoke about my first experience netbooting Ubuntu on my blog. That article is efficient with reproducible code. Here the theory.

When it comes to netbooting, you have to know what PXE means. Preboot Execution Environment is a standardized client/server environment that boots when no operating system is found, and it helps an administrator boot an operating system remotely. Don’t think about this OS as the one you have in your laptop, I mean, technically it is, but the one your run there or in a server is persisted, that’s why you can have files that survive a reboot.

The one you start with PXE runs in memory, and from there, you have to figure out how to get the persisted OS you will run in your machine.

When the in-memory operation system is up and running, you can do everything you are capable of with Ubuntu, Alpine, CentOS, or Debian. In practice, what people tend to do is to run applications and scripts to format a disk with the right partition and to install the end operation system.

Pretty cool. PXE is kind of old, and for that reason, it is burned into a lot of different NICs. You will hear a lot more about iPXE, a “new” PXE implementation. What is cool about those is the chain function. From one PXE/iPXE environment, you can chain another PXE/iPXE environment. That’s how you get from PXE (that usually runs by default in a lot of hardware (if you have a NUC you run it)) to iPXE.

chain --autofree https://boot.netboot.xyz/ipxe/netboot.xyz.lkrn

iPXE supports a lot more protocols usable to download OS from such as TFTP, FTP, HTTP/S, NFC…

This is an example of iPXE script:

#!ipxe
dhcp net0

set base-url http://archive.ubuntu.com/ubuntu/dists/focal/main/installer-amd64/current/legacy-images/netboot/ubuntu-installer/amd64/
kernel ${base-url}/linux console=ttyS1,115200n8
initrd ${base-url}/initrd.gz
boot

The first command, dhcp net0, gets an IP for your hardware from the DHCP. kernel and initrd set the kernel and the initial ramdisk to run in memory.

boot starts the kernel and the initrd you just set.

There is more; this is what I find myself using more often.

Infrastructure

To netboot successfully, you need to distribute a couple of things:

An iPXE script
The operating system you want to run (kernel and initrd)

Workflow

Server starts
There is nothing to boot in the HD
Starts netbooting
It makes a DHCP request to get network configuration, and the DHCP returns the TFTP address with the location of the iPXE binary
iPXE starts and makes another DHCP request; the response contains the URL of the iPXE scripts with the commands you saw above
At this point, iPXE runs the script, downloads the kernel, and the initrd with the protocol you specified, and it runs the in-memory operating system.

Pretty cool!

The in-memory operating system

The in-memory operating system can be as smart as you like; you can build your one, for example, starting from Ubuntu or Alpine. Size counts here because it has to fit in memory.

When the operating system starts, it runs as PID 1, what is called init. It is an executable located in the ramdisk called /init. That script can be as complicated as you like. It can be a problematic binary that downloads from a centralized location commands to execute, or it can be bash scripts that format the local disk and installs the final operating system.

What I am trying to say is that you have to make the in-memory operating system useful for your purpose. If you use native Alpine or Ubuntu, the init script will start a bash shell, not that useful.

DHCP

As you saw, the DHCP plays an important role. It is the first point of contact between unanimated hardware and the world. If you can control what the DHCP can do, you can, for example, register and monitor the healthiness of a server.

Imagine you are at your laptop, and you are expecting a hundred new servers in one of your datacenters, monitoring the DHCP requests. You will know when they are plugged into the network.

Containers what?

Containers are a comfortable way to distribute and run applications without having to know how to run them. Think about this scenario. Your in-memory operating system at boot runs Docker. The init script at this point can pull and run a Docker container with your logic for partitioning the disk and installing an operating system, or it runs some workload and exit leaving space for the next boot (a bit like serverless, but with servers, way cooler).

Or the Docker Container can run a more complex application that reaches a centralized server that dispatches a list of actions to execute via a REST or gRPC API. Those actions can be declared and stored from you.

Conclusion

The chain of tools and interactions to get from a piece of metal to something that runs some workload is not that long. Controlling all the steps and the tools along the way gives provides the ability to provisioning cold servers from zero to something that developers know better how to use.

Ok, I lied to you. This is not just theory. This is how Tinkerbell works.

This post was originally posted on dev.to.

Thinking in Systems written by Donella H. Meadows

2020-09-12T10:08:27+00:00

“Thinking in Systems” acted on me as a reinforcing loop. The motivation I have to explore the ability to think of Software as loops and systems show up reading those pages.

The second kind of feedback loop is amplifying, reinforcing, self-multiplying, snowballing– a vicious or virtuous circle that can cause healthy growth or runaway destruction. It is called a reinforcing feedback loop…

Thinking in systems - Donella H. Meadows

It is a great way to put words close to concepts I am exploring in practice. It is not a book written for developers, but you know, it works, and you can apply what you read everywhere.

Words are coming from Donatella Meadows, the author of this book, a scientist, author, teacher, and a system analyst.

If you enjoyed “Reactive planning and reconciliation in Go.” this book is a great way to go deeper into this topic without reading a long book; in the end, it is less than 200 pages.

Systems are everywhere: the water cycle, the ability our body has to recover, the evolution. This book helps you understand how to describe and spot them, giving you the ability to see systems, such as when writing Software or when looking for alternative solutions.

From my experience, when you can simplify a problem into a system, you get an entity capable of balancing itself, a more resilient solution, and repeatable workflow.

Self-balancing as a thermostat, for example. Resilient as a Kubernetes reconciliation loop, repeatable as the water cycle is or an idempotent server provisioning.

Maintainer life, be an enabler

2020-08-20T09:08:27+00:00

This is not something important only in open source, or as maintainer. But it is a skill I have personally learned as one.

When I have to build a sustainable open source project, but it applies with teams as well developing the right code or new feature is often not that useful in the long term. I think you get quickly to something better when people collaborate together in an effective way.

The maintainer’s role is to enable other people to contribute successfully.

You have to switch from “let’s write documentation” to “how do I create a workflow that enables contributors to write documentation.” I started with doc because I think it is crucial. It is easier to write documentation when you are writing code or a new feature. And we know a developer prefers to write code; as a maintainer, you need to create a workflow that allows the contributor to write documentation quickly when writing code. In practice, mark a PR with a label needs-doc and make it a requirement for the PR to be merged. The maintainer has to design a rock-solid structure for the documentation. In this way, the contributor won’t spend two days trying to figure out where to add the documentation for its feature.

You can’t ask a contributor to create an entire test suite or to write documentation if you don’t have one. But from a solid foundation is reasonable to ask a contributor to keep at least the same quality level.

You don’t write all the tests; you create and maintain the continuous delivery pipeline required to help contributors to stay compliant. Is your project suffering from low test coverage? Do not waste time writing all the tests yourself; codebase is significant, and pull requests are flowing continuously. You have to stay focused on developing a system that brings and keeps you where you want: good coverage in this case.

In practice, you can create another label needs-tests to notify the contributor that its work won’t be merged until tests will be added (the plural is crucial, tests!). You can use something like codecov in your CI to evaluate with numbers the situation. Invest time being sure that tests are easy to write; write a doc in the contributor file highlighting how to write a good test. If a package is too hard to test, you can write a few tests that other people can use as a starting point or a reusable set of utility functions.

Being a facilitator or an enabler is a lot of work. If you feel less effective because you wrote 90% of the codebase at this point and you can write documentation and tests by yourself in a couple of days you are wrong. Or at least from my experience you can do it but the outcome will be worst in quality compared with the one you can build from a solid foundation in a collaborative environment.

Interface segregation in action with Go

2020-08-20T09:08:27+00:00

Everybody should write an article about Golang interface! I don’t know why I waited so long for mine!

Golang interfaces are your best friends when it comes to mocking an object or to specify a well scoped set of functionalities required by a function to interact with an object.

Yep! That’s how they work, you have an entire object that does a lot of cool things, but when you pass it to a function only a subset of it get used, that’s when you can replace the structure itself with an interface that only requires what it is needed by the structure.

In this way you will have a smaller piece of code to mock in your test and to deal with (this is a good way to hide functions you don’t want other people or yourself in a rush to use).

Even more when you remember to keep the interface small via composition.

For example let’s suppose you have to build an interface that describes a generic resource that you can Create, Update and Delete. This is useful to standardize something that can be persisted in a database. I am setting this up so.

You should not use interface{} because it is too generic. I used it for simplicity but Kubernetes for examples uses an object called runtime.Object and it way better. Go 2 will have generics that will make this situation even easier. Or you can use code generation as well. But the idea to use a serializable object like Kubernetes is good.

type Resource interface {
    Create(ctx context.Context) error
    Update(ctx context.Context, updated interface{}) error
    Delete(ctx context.Context) error
}

This is a reasonably small interface, it is easy to satisfy but I do not like the name. I think it does not give me the ability to figure out what’s its purpose. It represents, a resource but I prefer to call interface as actions or a adjective. In this case the structure who implements this interface can be stored in a database. I think a better name for it is: “Persistable” because it makes clear its purpose.

A strategy to make an interface smaller in this case is to break it in actions:

type Creatable interface {
    Create(ctx context.Context) error
}

type Updatable interface {
    Update(ctx context.Context, updated interface{}) error
}

type Deletable interface {
    Delete(ctx context.Context) error
}

And you can use composition to create an interface that requires all the three actions to work if you need it:

type Persistable interface {
    Deletable
    Updatable
    Creatable
}

This is useful when a function uses more than one of those actions, if you have an interface that contains also Get or View you can think about a different split ReadOnly contains Get, View and Modifiable that will require only the functions Update, Create, Delete.

Imagine you are writing a set of http handlers to expose a CRUD API around your resources:

Create
Update
Delete
List
GetByID

Usually it looks like this, you can create an interface for every function, all your resources will implement the functions and you will be able to write a single “Create” handle for all the resources:

func CreateHandle(c Creatable) func(w http.ResponseWriter, r *http.Request) {
    return http.HandleFunc("/resource", func(w http.ResponseWriter, r *http.Request) {
        if err := c.Create(r.Context); if err != nil {
            w.WriteHeader(http.StatusInternalServerError)
            return
        }
        w.WriteHeader(http.StatusCreated)
    })
}

If you have to write a test for the handler it does not matter how complicated the resource is, you just have to mock the Creatable interface, one single function. This is a very basic example, if you need to add validation the Creatable function can require a func Valid() error that you can add incrementally in all your resources.

func CreateHandle(c Creatable) func(w http.ResponseWriter, r *http.Request) {
    return http.HandleFunc("/resource", func(w http.ResponseWriter, r *http.Request) {
        if err := c.Valid(); err != nil {
            w.WriteHeader(http.StatusBadRequest)
            return
        }
        if err := c.Create(r.Context); if err != nil {
            w.WriteHeader(http.StatusInternalServerError)
            return
        }
        w.WriteHeader(http.StatusCreated)
    })
}

E2E testing Tinkerbell Setup tutorial in Go

2020-08-03T09:08:27+00:00

Tinkerbell is a tool open sourced recently by Packet, an Equinix company, the company I work for.

It is a provisioner for bare metal. You can switch servers on and off via API, executing workflows and install operating systems on a server that does not have one!

Tinkerbell is in its early days as open source project but the concept is battle tested from 6 years of production use internally at Packet.

I am excited to learn a lot of the cool technologies that are making datacenters working, but I am not here to write about it¹.

One of my recent tasks² was about end to end testing the Vagrant Setup tutorial³ we wrote.

I like the idea! The Setup tutorial is important for our community because it is the entry point for a lot of people and having a consistent way to test its accuracy is crucial.

It is also a quick way to get a valuable end to end test running that covers the entire project, at a high level.

Tinkerbell is under development and it is easy to make mistakes and break things at this point, we have to know when it happens. Tinkerbell requires virtualisation capabilities, and we do not have an end to end testing framework for that yet.

Tell me more about the test itself

It is a long tasks but let’s summarize it (have a look at the tutorial, it helps to read this article moving forward):

The script has to start a vagrant machine called provisioner
When the provisioner is up it has to exec via ssh a docker-compose command that starts a bunch of containers, one of those is Tink grpc server
When Tinkerbell is up and running we have to do a bunch of things like: a. Register a new hardware b. Create a template c. Create the workflow that will get executed in the worker from a template
Start the worker
Wait and check if the workflows executes as expected.

NOTE: the test should clean up after itself, Vagrant is not ideal to get parallelization of VMs, and we do not support it. A dirty environment will break future tests as it is today.

How to write this test

There are a million way to write end to end test the one I evaluated are bash and Go.

The project is in Go, Tinkerbell serves a gRPC server and a client, I thought it was a good idea to write everything in Go to try the client itself and because it is easier to coordinate long running actions with channels and context compared with bash for example. Or at least that’s what I think.

I can also keep the code inside the testing framework that Go provides keeping the test closer to the code and the developers that contribute to the project, compared with a random scripts.sh.

I am not sure if this will be useful in the future but one of my goal was to serve a clean API and a small framework that can be used to write other tests that starts from the Vagrant setup. This is the API I designed:

type Vagrant struct {}

func Up(ctx context.Context, opts ...VagrantOpt) (*Vagrant, error) {}

func (v *Vagrant) Destroy(ctx context.Context) error {}

func (v *Vagrant) Exec(ctx context.Context, args ...string) ([]byte, error) {}

Consistency is important, developers who knows vagrant or that will have to fix the tests coming from the tutorial will know Up, Destroy and Exec because those verbs are used by Vagrant and in the documentation itself.

Even for Go developers Exec is not a new function, os/exec⁴ exists and it does a similar job, the one I wrote is over ssh.

This library now has its own repository: gianarb/vagrant-go.

Go challenges and tips and tricks

I would like to share some of the challenges I faced when writing the Vagrant framework and some tips useful for this task.

Opt are great!

I have to say options are great! It is a well known pattern in Go and it translates to:

ctx := context.Background()

machine, err := vagrant.Up(ctx,
    vagrant.WithLogger(t.Logf),
    vagrant.WithMachineName("provisioner"),
    vagrant.WithWorkdir("../../deploy/vagrant"),
)
if err != nil {
    t.Fatal(err)
}

It allowed me to add new options and to tune the Vagrant struct with strong default. If you never used it, do it! It is pretty easy, you need an interface like this:

type VagrantOpt func(*Vagrant)

In this way you can write as many Withfunction you need:

func WithStderr(s io.ReadWriter) VagrantOpt {
	return func(v *Vagrant) {
		v.Stderr = s
	}
}

func RunAsync() VagrantOpt {
	return func(v *Vagrant) {
		v.async = true
	}
}

I execute the opts as part of the Up function:

func Up(ctx context.Context, opts ...VagrantOpt) (*Vagrant, error) {
	const (
		defaultVagrantBin = "vagrant"
		defaultName       = "vagrant"
		defaultWorkdir    = "."
	)
	v := &Vagrant{
		VagrantBinPath: defaultVagrantBin,
		Name:           defaultName,
		Workdir:        defaultWorkdir,
		log: func(format string, args ...interface{}) {
			fmt.Println(fmt.Sprintf(format, args))
		},
	}
	for _, opt := range opts {
		opt(v)
	}

    // ...
}

test segmentation with packages

I don’t want to run the vagrant end to end tests as part of the default test suite because they take too much time and they require Vagrant installed. They do not even run in CI in the same way unit test works, but I will get to it later.

I learned that packages that starts with _ does not get executed when using something like ./....

I wrote the framework and tests as part of the package:

./test/_vagrant/
    ./vagrant.go
    ./vagrant_test.go

In this way to run the tests you have to explicitly call the package out:

$ go test ./test/_vagrant

Observability or “what is going on?”

Go has its own way to print logs during the execution of the tests:

$ go test -v ./...

It works because testing has a function called t.Log and t.Logf. Those functions watches the -v flags. To be complaint with that and to keep the Vagrant struct agnostic I wrote a WithLogger:

func WithLogger(log func(string, ...interface{})) VagrantOpt {
	return func(v *Vagrant) {
		v.log = log
	}
}

The function it accepts as a argument is t.Logf.

Continuous Integrations runs with verbosity enabled for this task because it is long and complicated, the logging prints all the outputs from the vagrant up and destroy command, and the stdout for the exec over ssh, it gives a very good overview about what is going on.

Stdout and Stdin, buffer and loggers

I don’t have a lot to say about this other than: “it was very hard to do!!”. The code that fixed my problems can be summarized in this way:

stderrPipe, err := cmd.StderrPipe()
if err != nil {
    return nil, fmt.Errorf("exec error: %v", err)
}
stdoutPipe, err := cmd.StdoutPipe()
if err != nil {
    return nil, fmt.Errorf("exec error: %v", err)
}

go v.pipeOutput(ctx, fmt.Sprintf("%s stderr", cmd.String()), bufio.NewScanner(stderrPipe))
go v.pipeOutput(ctx, fmt.Sprintf("%s stdout", cmd.String()), bufio.NewScanner(stdoutPipe))

err = cmd.Start()

func (v *Vagrant) pipeOutput(ctx context.Context, name string, scanner *bufio.Scanner) {
	for scanner.Scan() {
		select {
		case <-ctx.Done():
			return
		default:
			v.log("[pipeOutput %s] %s", name, scanner.Text())
		}
	}
}

Kill process and subprocess

There are a lot of process going on when creating or destroying a VM with Vagrant. There is VirtualBox for example, and we have an edge case for the worker machine because the up commands technically never ends, it is in pending until you destroy the machine. But you can’t run multiple commands against the same machine because up holds a lock and it blocks destroy to execute. os/exec helps here but you have to tune it a little bit:

cmd := exec.CommandContext(ctx, v.VagrantBinPath, args...)
cmd.Dir = v.Workdir
cmd.Stdout = v.Stdout
cmd.Stderr = v.Stderr
cmd.SysProcAttr = &syscall.SysProcAttr{Setpgid: true}

Now when killing cmd the subprocess terminates as well.

Continuous Integration

We decided to go with GitHub Actions with a self running runner, in this way we can use Packet bare metal that supports virtualisation.

As I told you I don’t want this test to run for all the commit, or for all the pull request because it is time and resource consuming. It is also risky, so I want maintainers to decide when to trigger it.

That’s why it gets triggered with a GitHub label:

name: Setup with Vagrant on Packet
on:
  push:
  pull_request:
    types: [labeled]

jobs:
  vagrant-setup:
    if: contains(github.event.pull_request.labels.*.name, 'ci-check/vagrant-setup')
    runs-on: self-hosted
    steps:
    - name: Checkout
      uses: actions/checkout@v2
    - name: Vagrant Test
      run: |
        export VAGRANT_DEFAULT_PROVIDER="virtualbox"
        go test -v ./test/_vagrant

This is what it takes to make the process working!! And I am still surprised it is so easy! When a contributor label a PR with ci-check/vagrant-setup the process starts. My idea was to remove the label straight away, but I am blocked.

An alternative that we are evaluating is to run it as a cronjob⁵ as well.

Testing is the real power

E2E testing are fun to write because they bring a lot of challenges in terms of coordination and stability. You have to write good code in order to make them stable. I hope you learned something from my experience and if you have any question let me know here. I am happy to go deeper on some of those topics based on your suggestions.

If you are curious ask me any question on Twitter @gianarb ↩
https://github.com/tinkerbell/sandbox/pull/7 ↩
https://tinkerbell.org/setup/local-with-vagrant/ ↩
https://golang.org/pkg/os/exec/#pkg-examples ↩
https://docs.github.com/en/actions/reference/workflow-syntax-for-github-actions#onschedule ↩

Show Me Your Code with David McKay (rawkode): Terraform what?

2020-07-29T09:00:27+00:00

🆘Tomorrow is "Show me your code" time! ⌛️Live on Twitch!

Who is the guest? The unique @rawkode !
What is all about? David will teach #Terraform to a newbie (me!) and I hope to learn how to manage a GitHub organization as code! https://t.co/jNbq7uoAD3
See you there! 🖥️ pic.twitter.com/b2P4bj8Tki
— gianarb (@GianArb) July 29, 2020

Let’s suppose I don’t know a lot about Terraform because infrastructure as code is not something that makes me happy. I know it is useful. There is a lot of cool things in Terraform itself and I should know it more.

I happen to know and work with David that is always on top of the topic! I had an idea, and he will help me to validate it live tomorrow August 29th in the afternoon Central Europe.

We will put together a repository of “GitHub as Code” that we can use to manage collaboratively a GitHub organisations:

Adding new members
Creating new teams
New repositories
Importing already created resources from GitHub to Terraform
If we can have a look at CI/CD with GitHub action and/or Terraform Cloud.

I will try first time Visual Studio Code with Live Share and I think David will teach me a lot of things about Terraform and 0.12 syntax.

This is something I hope we can use with Tinkerbell.

David McKey (rawkode)

David McKay is a technologist from Glasgow, Scotland. Currently, working at Packet as Senior Tech Evangelist. Well known on Twitter as @rawkode and he writes on rawkode.com.

First journeys with netboot and ipxe installing Ubuntu

2020-06-10T09:08:27+00:00

I recently joined Packet, a company acquired by Equinix, finally after 7 years working with cloud computing I can see how they look like from the other side!

Spoiler alert: clouds are made of servers

As a first task I had to revamp the kubernetes cluster-api implementation, moving it from v1alpha1 to v1alpha3. Kind of cool and in a domain I know very well. We tagged the first release and I can’t wait to see how it will be used. I got some meetups and webinars planned about it, so stay in touch on twitter to know more about it.

Anyway, one of the topics I am curious about is hardware automation. The idea to get up and running, in a repeatable and autonomous way a piece of inanimate metal well known as rack, switch, server is a topic I never touched and I would like to know more! Obviously, this is only one of the articles I will write about the topic. Mainly because there is much to learn.

As you can image when we buy a server it does not do much, it’s great to keep your door open, and as a table. It has to be configured, in our case customers can do it via API, it means there is some code involved! I want to know more!!

For sure there are a couple of things that you have to do manually, assemble the sever, power it on, plug the ethernet cable in, pick the right location and things like that.

But as you can imagine it comes without operating system, even more complicated to install because the customers can select the one they like most, or even push their own one. This is for sure something that has to be done magically, I doubt we have people running with USB stick in a datacenter installing operating systems.

One of the things that happen when booting a laptop or server is the bootloader. The one that requires a master skill to get in because there is a timeout, and I never know what to press! I have to be honest I thought they were pretty static and not that fun.

BUT, there are smart boot-loaders! We know what smart means does days: internet! In practice there is a bootloader capable of booting not from USB, not from disk but from the internet.

Usually a private network but it is not mandatory, and we spend the last couple of years doing curl something.com | bash. The bootloader downloads a kernet, the initrd and it will boot an operating system. It is like having a USB stick that starts the live installation of Ubuntu, there is a lot more after that because we have to persist the installation on disk, format them and so on.

In this article I will show you at first how to get to something that looks like like the installation wizard for ubuntu, and also how to automate the installation via preseed.

The bootloader is called PXE, and the new generation I used is called iPXE.

The cool things about PXE/iPXE is that you can chain scripts from one to another. Graham Christensen told me that the main goal when you work with PXE is to escape from it and get to iPXE, that is way cooler. Time for a recap:

The machine starts and enters PXE
PXE download and chains iPXE in this way we are on iPXE bootload
From iPXE you can download kernel, initrd and boot the OS in RAM.

IPXE supports different ways to download what it needs from the internet, the one I used so far are TFTP and HTTP.

What is this PXE/iPXE?

Such a nice question, I had the same one few days ago. A couple of links:

Roughly you can think about iPXE as a shell that has a bunch of commands like:

dhcp: to require an IP from a DHCP server and configures the network interface
route: to figure out if the network interface is configured (if it has an IP already)
chain: gets an argument (a URL) and it executes the content, it is a good way to pass scripts
You can see variables set name value
kernet: downloads the kernel from a source and load it
initrd: download the init ramdisk
boot: triggers the boot

And many more that I did not use yet but docs lists them.

It also has support for building a menu like this one:

The image comes from netboot.xyz as you can see from their website is a project that simplify the process of installing a lot of different operating systems via PXE. I started with it at first for my experiments. Obviously menus and automations does not play nice together but in the process of learning I took this extra step.

Hello world

To give you some context this is the script I used to start the installation wizard for Ubuntu:

#!ipxe
dhcp net0

set base-url http://archive.ubuntu.com/ubuntu/dists/focal/main/installer-amd64/current/legacy-images/netboot/ubuntu-installer/amd64/
kernel ${base-url}/linux console=ttyS1,115200n8
initrd ${base-url}/initrd.gz
boot

I hope it looks familiar, some code at least. In order to reach the internet you need an IP, and to get one if you are lazy like me is to use a DHCP (the alternative is to set one statically). The first command does that, asks the DHCP an IP for the network interface net0.

When the IP is set, iPXE reachs ubuntu.com to get the kernel and the initrd. Everything I need to boot an OS in RAM.

let’s try

I am using Packet for my tests because it serves the low level capabilities I need, it supports the server creation without OS and with iPXE. You can register and do it yourself, gophernetes is a coupon that will give you 30$ credit.

When you request a device (a server) on Packet you can select the operating system, we don’t need to do it, so you can select Custom iPXE because we are going to install it ourselves.

There are two ways we can inject out script to iPXE in order to teach the server what to do when it boot, first one is giving a URL (I use a gist (raw link)), or via user data. The script I used is the one pasted above. You can create a gist and paste the link in “iPXE Script URL” or you can use the user data, as I am doing right now.

As soon as the machine starts you can click on its name to enter the get its details and you get a ssh into the “Out-of-Band Console” console:

ALERT: if you are doing this activity, remember to enable OpenSSH when you follow the installation wizard otherwise you won’t be able to SSH in the server at the end!

When deploying the server the code you passed gets chained from the Packet iPXE. And you should see the Ubuntu wizard ready for you:

At the end of the wizard you will get a persisted operating system in the server itself, it will survive the reboot and it will be just as any other server you used in the past, but better because you know how you installed the OS! Get its IP and ssh in!

Preseed

Debin like operating system, like Ubuntu support a technology called preseed, it practice is a text file that contains the answers for all the questions the Ubuntu wizard makes.

In this way no point and click is required. I put together a file here and I uploaded it as a gist:

#### Contents of the preconfiguration file (for stretch)
### Localization
# Preseeding only locale sets language, country and locale.
d-i debian-installer/locale string en_US.UTF-8
d-i localechooser/supported-locales multiselect en_US.UTF-8
d-i console-setup/ask_detect boolean false
d-i keyboard-configuration/xkb-keymap select GB

# Keyboard selection.
# Disable automatic (interactive) keymap detection.
d-i console-setup/ask_detect boolean false
d-i keyboard-configuration/xkb-keymap select us

# netcfg will choose an interface that has link if possible. This makes it
# skip displaying a list if there is more than one interface.
d-i netcfg/choose_interface select auto

# Any hostname and domain names assigned from dhcp take precedence over
# values set here. However, setting the values still prevents the questions
# from being shown, even if values come from dhcp.
d-i netcfg/get_hostname string unassigned-hostname
d-i netcfg/get_domain string unassigned-domain

# Disable that annoying WEP key dialog.
d-i netcfg/wireless_wep string

### Mirror settings
d-i mirror/country string manual
d-i mirror/http/hostname string archive.ubuntu.com
d-i mirror/http/directory string /ubuntu
d-i mirror/http/proxy string

# Root password, either in clear text
d-i passwd/root-password password rootroot
#d-i passwd/root-password-again password rootroot
# or encrypted using a crypt(3)  hash.
#d-i passwd/root-password-crypted password [crypt(3) hash]

# To create a normal user account.
d-i passwd/user-fullname string yay
d-i passwd/username string yay
# Normal user's password, either in clear text
d-i passwd/user-password password norootnoroot
d-i passwd/user-password-again password norootnoroot

# Set to true if you want to encrypt the first user's home directory.
d-i user-setup/encrypt-home boolean false

### Clock and time zone setup
# Controls whether or not the hardware clock is set to UTC.
d-i clock-setup/utc boolean true

# You may set this to any valid setting for $TZ; see the contents of
# /usr/share/zoneinfo/ for valid values.
d-i time/zone string US/Eastern

# Controls whether to use NTP to set the clock during the install
d-i clock-setup/ntp boolean true
# LG provided NTP, should be replaced!
d-i clock-setup/ntp-server string ntp.ubuntu.com

### Partitioning
d-i preseed/early_command string umount /media || true
d-i partman-auto/method string lvm
d-i partman-auto-lvm/guided_size string max
d-i partman-lvm/device_remove_lvm boolean true
d-i partman-lvm/confirm boolean true
d-i partman-lvm/confirm_nooverwrite boolean true
d-i partman-auto-lvm/new_vg_name string main
d-i partman-md/device_remove_md boolean true
d-i partman-md/confirm boolean true
d-i partman-partitioning/confirm_write_new_label boolean true
d-i partman/choose_partition select finish
d-i partman/confirm boolean true
d-i partman/confirm_nooverwrite boolean true
d-i partman-basicmethods/method_only boolean false

### Partitioning
d-i partman-auto/method string lvm
d-i partman-lvm/device_remove_lvm boolean true
d-i partman-lvm/confirm boolean true
d-i partman-lvm/confirm_nooverwrite boolean true

### Package selection
tasksel tasksel/first multiselect ubuntu-desktop

# Individual additional packages to install
d-i pkgsel/include string openssh-server build-essential
# Whether to upgrade packages after debootstrap.
# Allowed values: none, safe-upgrade, full-upgrade
d-i pkgsel/upgrade select full-upgrade

d-i pkgsel/update-policy select none

# Individual additional packages to install
d-i pkgsel/include string openssh-server \
    vim \
    git \
    tmux \
    build-essential \
    telnet \
    wget \
    curl

# This is fairly safe to set, it makes grub install automatically to the MBR
# if no other operating system is detected on the machine.
d-i grub-installer/only_debian boolean true

# This one makes grub-installer install to the MBR if it also finds some other
# OS, which is less safe as it might not be able to boot that other OS.
d-i grub-installer/with_other_os boolean true

# Avoid that last message about the install being complete.
d-i finish-install/reboot_in_progress note

It is a bit weird but if you are familiar with the Ubuntu installation process I am sure you can spot some similarity.

At this point we have to pass some cmdline arguments to the kernel in order to have it downloading the preseed file from a raw gist and to tell the kernel that the installation is automatic:

#!ipxe
dhcp net0

set base-url http://archive.ubuntu.com/ubuntu/dists/focal/main/installer-amd64/current/legacy-images/netboot/ubuntu-installer/amd64/
set preseed-url https://gist.githubusercontent.com/gianarb/acea1ca5b73a318fd74cbb002cae21f3/raw/76e5d036ee28c485cc7cf42a317c99e678f08a6c/ubuntu.preseed
kernel ${base-url}/linux console=ttyS1,115200n8 auto=true fb=false priority=critical preseed/locale=en_GB url=${preseed-url} DEBCONF_DEBUG=5
initrd ${base-url}/initrd.gz
boot

The mechanism is the same as before, you can create a gist and link it during the server creation or you can paste this as cloud init.

At this point you can connect to the Out of Band console via ssh and the installation wizard will look like a movie! When the process is over the server reboots and you will be able to SSH in using username: yay, password norootnoroot. If you are looking for the root password have a look at the preseed file, the answer is there!

Preseed is probably not what you want at the end, but it is an easy enough way to get to a persisted OS. It does a lot runtime, by consequence it is time consuming and it can be flaky when reaching the network. Dan pointed me to other ways to do it using raw images but probably I will experiment moving forward.

Conclusion

That’s it, this is a layer I am not familiar with Packet has an open source project called Tinkerbell, that does bare metal provisioning.

I want to know what it does under the hood! In practice is an open source version of the provisioner used internally to set up servers. We are moving towards to it as well!

A lot of the underline technologies like preseed, PXE are 20 years old, and as I like to say: “I have a lot of new things to learn from the 80s”.

Don’t know where this will bring me but I think the next articles will look like:

How to get a iPXE server to serve my own kernel, initrd
How to get a set of RPIs provisioned

Point me to the right direction or if you are curious to know more about this topic.

Show Me Your Code with Ivan Pedrazas: Application Lifecycle and GitOps

2020-05-05T09:00:27+00:00

Everyone speaks about GitOps those days! Everyone in their own way. Some people think that GitOps means pulling git from inside a kubernetes cluster, other people think it has to be done in CI.

I don’t know how it works or how it should work but my friend Ivan Pedrazas (@ipedrazas) knows, so we are gonna learn from it.

The idea is to take a simple application and try to figure out how to apply GitOps to it. Let’s see how far we go.

My plan is to show a simple app (frontend in vue.js and a flask API), how to build the different components, how to deploy them using GitOps. Finally, we will modify different parts of the app and build and deploy and rinse and repeat :)
— Ivan Pedrazas (@ipedrazas) May 28, 2020

Show Me Your Code with Enrique Paredes: Kubernetes Permission Manager

2020-05-05T09:00:27+00:00

Kubernetes when it comes to authentication and authorization is extremely complicated.

I think its philosophy is well described in the documentation:

Normal users are assumed to be managed by an outside, independent service. An admin distributing private keys, a user store like Keystone or Google Accounts, even a file with a list of usernames and passwords. In this regard, Kubernetes does not have objects which represent normal user accounts. Normal users cannot be added to a cluster through an API call.

Kubernetes has user but they have to come from the outside, it is not its business to care about them. For authorization it uses RBAC and you have a very long list of possibilities and combinations between actions like: LIST, WATCH, CREATE, DELETE and resources: pods, deployments, ingress, services…

Sighup is a company based in Italy and well known for their contributions to the Cloud Native ecosystem. One of their last project is called permission-manager, it is open source and it can be describe as follow: “it is a project that brings sanity to Kubernetes RBAC and Users management, Web UI FTW”.

I will host Enrique (@twitter) to talk about the challenges he had when writing such a crucial project, hoping to see some code!

Links:

Show Me Your Code with Philippe and Giacomo: Vault plugin for Wireguard

2020-04-30T09:00:27+00:00

When: Friday 1st May 10am GMT+2 (4am EDT)

Let’s write a Vault plugin for Wireguard

1st of May. We are on vacation, the perfect day to start a side project.

Giacomo and Philippe started coding an integration between HashiCorp Vault and Wireguard. This is great by itself. Vault is cool, Wireguard is awesome, what do you need more?

The integration is Vault plugin, and we decided to stream the session on Twitch because that’s what cool kids do those days.

We actually made it, as every side project it is not ready at all, but we got the boilerplate code required by a Vault Plugin to work and the project is available on GitHub gitirabassi/vault-plugin-secrets-wireguard).

After more than an hour of fun we had to stop but they promise we will have a follow up meeting as soon as they have an E2E workflow to show me!

About Giacomo

Giacomo works as Site Reliability Engineer at InfluxData. He is an expert on Kubernetes (💼 AWS DA, CKA, CKAD), containers, Terraform and everything coming from Hashicorp! Traveler and cooker, in a quest for flavors 🤖

About Philippe

Philippe works as DevOps engineer at Sighup. Computer Science and Engineering M.Sc. Student @ Politecnico di Milano. Linux user, open source lover and new technologies’ explorer.

How to write documentation efficiently

2020-04-18T09:00:27+00:00

You have to remember two things to effectively read this article:

I have a blog post where I create content with a good frequency, and I do it for fun, so I like to write.
I work in remote, the HQ for InfluxData is in San Francisco, it means I am +9 from a lot of my colleagues. Writing is a solid communication channel I use every day at work because I think it is great, and because I do not have other alternatives.

Develop a workflow

If you do not like to clean your apartment a strategy you have is to try to keep it as clean as possible, and in order day by day, in this way you won’t have to spend a full weekend cleaning every corner of it. Spread a boring task in a way that won’t make you too tired. An effective way is to write along the way, side by side with the code you are developing.

I can highlight a few steps in the process of writing code: analysis, design, validation, PoC, rollout. Those phases are not unique, they go continuously over many iteration. I write during all of those steps, many times. Iterations do not help only your code, they make documentation solid, you can check for typos an so on.

If you make writing an ongoing process you will find yourself at the end where the only thing left is to organize and move what you wrote a way that readers will find familiar.

Find the right place

There are many time of documentation, because there are a lot of stakeholders and many phases to document (some of them where listed previously).

If I have to think about my stakeholders they are:

project managers
documentation team if you are lucky, otherwise let’s say customers or end users.
VP or tech leads.
your teammate or reviewers

All those people will enjoy reading a specific point of view, or phase of work.

I think teammate or reviewers are kind of happy to read the process you followed to design and implement what you wrote, and they will really appreciate to read inline documentation for your code, doc blocks and so on.

Project Managers will enjoy reading considerations on issues and things like that, they are super valuable and I end up copy pasting a lot from those discussions.

End user obviously need a function documentation they they can follow and also a bit about internal design, monitoring mainly to get them onboard with the work you developed. It really depends on your audience. We are lucky and we have a team that is capable of reading code and figure out what we did, but it is a nice exercise to help them explaining in a good way your work.

VP and tech leaders are usually focused on the design, why you did something in a way other than another, the trade off you accepted, the one you avoided, why and how. I like the idea to write this kind of documentation in the code itself.

I am fascinated when I open C codebases where the first thousen lines of code are documentation. In Go packages can have a file called ./doc.go that godoc will render as a package introduction. If you work with the kind of tech lead or VP that are not used to read code anymore, you can always copy paste it to google doc.

Write a lot

This point self explains itself. More you write during all the phases if your work, less you will have to do all together at the end of the code iteration.

Where I usually end up tired about the code I wrote, even more when it takes weeks, and it is not easy to work on.

Pair on documentation

I am not a fan of pair programming but recently I changed my mind a little bit, probably caused by all this social isolation. Before jumping straight on writing code with my teammate two solid hours over two iterations writing the ./doc.go file together. The outcome made me happy, I hope it will work the same for you.

Conclusion

This is my experience when writing documentation, but as I said, I love to do it! Do you have anything to share about it? I am particularly curious about how and if you READ somebody else documentation, internally written by your teammate how do you evaluate it and if you have any suggestion to make it more friendly. Because it is good to write but people has to be able to read it and get what they need out of it without wasting too much time.

Show Me Your Code with Dan and Walter: How to contribute to OpenTelemetry JS

2020-04-11T09:00:27+00:00

When: Thursday 16th 6-7pm GMT+2 (9am PDT)

OpenTelemetry for JS and how to contribute

OpenTelemetry is a specification and set of instrumentation libraries developed in open source from multiple companies such as Google, HoneyComb.io, Dynatrace, LightStep and many more!

OpenTracing and OpenCensus joined the force, and they started a common project called OpenTelemetry that I hope will become the way to go in terms of code instrumentation because I really think it is something we need.

Walter and his team develop in Javascript, frontend and backend and back in the day we experimented OpenTracing but we had some issue and it was not easy to pick up at that time. When I tried OpenTelemetry I realized that it was for him.

He tried it out and he wrote its first opentelemetry instrumentation plugin mongoose, a popular library he uses and that it was not instrumented yet.

Dan will help us to figure how they designed the opentelemetry-js implementation as it is today, the good the bad and the ugly about this experience. I hope to get some feedback about roadmap and future development as well now that the library reached its first release beta.

About Dan

When I was working on my observability workshop Dan gave me a huge help, drastically increasing my very low experience with NodeJS. Thank you for that.

Dan works as Engineer at Dynatrace, and he maintains the OpenTelemetry JS library. You can find him on twitter as @dyladan and in Gitter discussing opentelemetry.

About Walter

Walter Dal Mut works as a Solutions Architect @Corley SRL. He is an electronic engineer who moved to Software Engineering and Cloud Computing Infrastructures. Passionate about technology in general and open source movement lover.

You can follow him on Twitter and GitHub.

Show Me Your Code with Carlos and Tibor: Chat about GoReleaser and multiarch support

2020-04-08T09:00:27+00:00

When: Thursday 23th 6-7pm GMT+2 (9am PDT)

GoReleaser and BuildKit

The main reason about why I started “Show me your code” is to chat with a couple of friends from the open source space about what they are doing, what I am doing. And ideally have a drink: beers, water or coffe based on timezone discussing the same topic.

Me, Carlos and Tibor will meet on Skype for an informal chat about two open source project I love: GoReleaser and BuildKit. The conversation will be streamed on Twitch.

You can follow the event live, or the recording will be available here! Watching it lives will give you the unique opportunity to share your love for those projects and your feedback about how to support multi arch docker build in GoReleaser.

IMPORTANT: The outcome of this conversation will not force in any way Carlos to do anything! I hope the experience you as attendees and user of GoReleaser can share and the experience Tibor has with buildkit will drive to a possible integration. Because I will love to have multi arch support for my releases!

I had this idea because there is a long-standing PR about this “Support multi-platform docker images#530” and I am sure that a discussion all together will be constructive and nice!

About Carlos and GoReleaser

I am very excited to have Carlos with me, I relay so much on GoReleaser and its integration with GitHub action to make my development life cycle reliable, repeatable and fast, and I am happy to have a chat about its project and what he will do next!

About Tibor from Docker and BuildKit

Tibor @tiborvass is a well-known contributor and maintainer for Docker since the early days. Active on various open source communities he is now involved with BuildKit as maintainer.

We know each other virtually and thanks to DockerCon and other events since I joined the Docker Captain program, I am happy to have him around showing BuildKit, buildx and the multi arch feature.

How to start tracing with OpenTelemetry in NodeJS?

2020-04-07T09:08:27+00:00

This post is to celebrate the first beta release for the OpenTelemetry NodeJS application

Recently I developed a workshop about code instrumentation and application monitoring. It is an 8 hours full immersion on logs, metrics, tracing and so on. I developed it last year and I gave it twice. Let me know if you are looking for something like that.

Almost all of it is opensource but I didn’t figure out a good way to make it usable without my brain for now. This year I updated it to use OpenTelemetry and InfluxDB v2.

Anyway the application is called ShopMany. This application does not return any useful information about its state. It is an e-commerce made of a bunch of services in various languages. Obviously one of them is in NodeJS and that’s the one I am gonna show you today.

Discaimer: I can not define myself as a NodeJS developer. I wrote a bunch of AngularJS single page application back in the day, I wrote some Cordova mobile applications ages ago. I am not writing any JS production code since 2015 more or less.

First approach

I concluded the application instrumentation it was the day the maintainers tagged the first beta release. Overnight I had to update libraries and test code. Very luckily.

The way I learned about how to properly instrument discount required a lot of digging in the actual opentelemery-js but luckily for us it has a lot of examples and the library is designed to load a bunch of useful modules that are able to instrument the application by itself. The community is very helpful and you can chat via Gitter.

Getting Started

I am using ExpressJS and OpenTelemetry has a plugin for it that you can load, and it instruments the app by itself, same for MongoDB that is the packaged I am using.

Those are the dependencies I installed in my applications, all of them are provided by the repository I linked above:

"@opentelemetry/api": "^0.5.0",
"@opentelemetry/exporter-jaeger": "^0.5.0",
"@opentelemetry/node": "^0.5.0",
"@opentelemetry/plugin-http": "^0.5.0",
"@opentelemetry/plugin-mongodb": "^0.5.0",
"@opentelemetry/tracing": "^0.5.0",
"@opentelemetry/plugin-express": "^0.5.0",

I created a ./tracer.js file that initialize the tracer, I have added inline documentation to explain the crucial part of it:

'use strict';

const opentelemetry = require('@opentelemetry/api');
const { NodeTracerProvider } = require('@opentelemetry/node');
const { SimpleSpanProcessor } = require('@opentelemetry/tracing');
// I am using Jaeger as exporter
const { JaegerExporter } = require('@opentelemetry/exporter-jaeger');

// This is not mandatory, by default httptrace propagation is used
// but it is not well supported by the PHP ecosystem and I have
// a PHP service to instrument. I discovered B3 is supported
// form all the languages I where intrumenting
const { B3Propagator } = require('@opentelemetry/core');

module.exports = (serviceName, jaegerHost, logger) => {
  // A lot of those plugins are automatically loaded when you install them
  // So if you do not use express for example you do not have to enable all
  // those plugins manually. But Express is not auto enabled so I had to add them
  // all
  const provider = new NodeTracerProvider({
    plugins: {
      mongodb: {
        enabled: true,
        path: '@opentelemetry/plugin-mongodb',
      },
      http: {
        enabled: true,
        path: '@opentelemetry/plugin-http',
          // I didn't do it in my example but it is a good idea to ignore health
          // endpoint or others if you do not need to trace them.
          ignoreIncomingPaths: [
            '/',
            '/health'
          ]
      },
      express: {
        enabled: true,
        path: '@opentelemetry/plugin-express',
      },
    }
  });

  // Here is where I configured the exporter, setting the service name
  // and the jaeger host. The logger is helpful to track errors from the
  // exporter itself
  let exporter = new JaegerExporter({
    logger: logger,
    serviceName: serviceName,
    host: jaegerHost
  });

  provider.addSpanProcessor(new SimpleSpanProcessor(exporter));
  provider.register({
    propagator: new B3Propagator(),
  });
  // Set the global tracer so you can retrieve it from everywhere else in the
  // app
  return opentelemetry.trace.getTracer("discount");
};

You will be thinking, that’s too easy! You are right, the nature of NodeJS makes tracing very code agnostic. With this configuration you get a lot “for free”.

You get a bunch of spans for every http request that ExpressJS serves, plus a span for every MongoDB query. All of them with useful information like the status code, path, user agents, query statements and so on.

We have to include it in our ./server.js the entrypoint for our nodejs application:

'use strict';

const url = process.env.DISCOUNT_MONGODB_URL || 'mongodb://discountdb:27017';
const jaegerHost = process.env.JAEGER_HOST || 'jaeger';

const logger = require('pino')()

// Import and initialize the tracer
const tracer = require('./tracer')('discount', jaegerHost, logger);

var express = require("express");
var app = express();

const MongoClient = require('mongodb').MongoClient;
const dbName = 'shopmany';
const client = new MongoClient(url, { useNewUrlParser: true });

const expressPino = require('express-pino-logger')({
  logger: logger.child({"service": "httpd"})
})

As I told you, that’s it! With this code you have enough to make your NodeJS application to show up in your trace.

The instrumented version of the application is available here github.com/gianarb/shopmany/tree/discount/opentelemetry

understand the project

I tend to checkout projects when in the process of learning how they work. Documentation is useful but always incomplete for such a high moving projects.

I have to say that the scaffolding is clear even for an not fluent NodeJS developer like me.

$ tree -L 1
.
├── benchmark
├── CHANGELOG.md
├── codecov.yml
├── CONTRIBUTING.md
├── doc
├── examples
├── getting-started
├── karma.base.js
├── karma.webpack.js
├── lerna.json
├── LICENSE
├── package.json
├── packages
├── README.md
├── RELEASING.md
├── scripts
├── tslint.base.js
└── webpack.node-polyfills.js

I would like to define it as a monorepo, and it uses lerna to delivery multiple packages from the same repository.

examples contains workable example of how to use the different packages.

$ tree -L 1 ./examples/
./examples/
├── basic-tracer-node
├── dns
├── express
├── grpc
├── grpc_dynamic_codegen
├── http
├── https
├── ioredis
├── metrics
├── mysql
├── opentracing-shim
├── postgres
├── prometheus
├── redis
└── tracer-web

$ tree -L 1 ./packages/
./packages/
├── opentelemetry-api
├── opentelemetry-base
├── opentelemetry-context-async-hooks
├── opentelemetry-context-base
├── opentelemetry-context-zone
├── opentelemetry-context-zone-peer-dep
├── opentelemetry-core
├── opentelemetry-exporter-collector
├── opentelemetry-exporter-jaeger
├── opentelemetry-exporter-prometheus
├── opentelemetry-exporter-zipkin
├── opentelemetry-metrics
├── opentelemetry-node
├── opentelemetry-plugin-dns
├── opentelemetry-plugin-document-load
├── opentelemetry-plugin-express
├── opentelemetry-plugin-grpc
├── opentelemetry-plugin-http
├── opentelemetry-plugin-https
├── opentelemetry-plugin-ioredis
├── opentelemetry-plugin-mongodb
├── opentelemetry-plugin-mysql
├── opentelemetry-plugin-postgres
├── opentelemetry-plugin-redis
├── opentelemetry-plugin-user-interaction
├── opentelemetry-plugin-xml-http-request
├── opentelemetry-propagator-jaeger
├── opentelemetry-resources
├── opentelemetry-shim-opentracing
├── opentelemetry-test-utils
├── opentelemetry-tracing
├── opentelemetry-web
└── tsconfig.base.json

The suffix of the package helps you to figure out what they are:

opentelemetry-plugin-* usually contains the code that instrument a specific library, you can see here express, http, https, dns. Some plugins are loaded by the NodeTracerProvider by default. Other has to be specified. You can to relay on the code or read the documentation to figure it out. For example http is loaded by default but if you need express you have to load them up by yourself, figuring out the right dependencies. At least or now.
opentelemetry-exporter-* contains various exporters for now Jaeger, Prometheus, Zipkin the and otel-collector.

Anyway, what I am trying to say is that it is very intuitive and looking here it is clear what you can get from this project.

Plugin

NodeJS sounds very easy to instrument and on the right path to get automatic instrumentation right, because you can listen from the outside to function call, you do not need to specifically change your code where you do a request or where you get one, you can add tracing in a centralized location. That’s how the provided plugins work.

Shimmer is the library that simplify the trick. I recently had a chat with Walter because I know he works in NodeJS and during the experiments otel were easy enough to fit his use case. He is currently trying it and as he discovered that mongoose, the ORM library he uses does not use the officially provided mongodb driver so the otel-plugin-mongodb where not magically tracing his requests to mongodb, sadly. But he is currently writing a plugin for that, so it won’t be a problem pretty soon.

Checklist for a new project

2020-04-02T09:08:27+00:00

Back in the day I used to start a lot of projects. From zero on GitHub, some of them are still there, unused probably.

Recently I started to take part of other people projects like testcontainers or profefe. I wrote about why I do it during the “2019 year in review” post.

In both cases, so even when joining a new existing project or when I start a new one I try to follow a checklist.

I developed this checklist along the years, moving parts and extending the number of checks. The main goal for that is to validate that the project has good answers for a couple of questions, not related at what it does, but to how it does it.

is it easy to onboard as a user?
as a new contributor is the project easy to understand?
as a maintainer do I have everything I can under control in order to waste as less time as possible?

I follow the checklist when working on opensource but also in close source project and what I like about it is that you can propose a change by youself, you can try to apply those feedback as a solo-developer, hoping to make contributors, maintainers and colleagues to buy them spreading joy.

But let’s make to the list now.

Have a place where you can write

When I start a new project, but also during the onboard of an existing one in my tool chain I look for a written format of it.

I look for a readme, an installation process, a getting started guide, a contribution document. It does not need to be pretty one, a copy/paste of a few bash scripts that the maintainer does to set itself up is enough.

Having a place during the early days of a project where I can write what I think, how I would like to get things done is important in order to design something usable and to spot sooner misleading assumption.

If you can build the place for all those information it will take you one second to save them forever, it is just a matter of copy/pasting the command you run in your terminal to spin up dependencies, build the project and so on.

I like to use the README.md, CONTRIBUTOR.md and a ./docs folder to save everything I am thinking about or everything I do that I hope will make my life easy in a month where I will be back on that piece of code without even knowing it was there. The feeling you get is the same a new person has when it looks at your project for the first time.

There is no way you can get it right since the beginning, because there is not a definition of right. First day everything you write is mainly for yourself, in a month and some editing it will become the first version of the documentation for your project.

Logging and instrumentation library

As I said at the beginning of the article all the checks do not depend on the business logic of your application or library. All of them has to speak with the outside world sharing their internal state in a way that is reusable, comprehensive, configurable.

There are a lot of people that speaks about observability, logging, tracing, monitoring. Everybody has its own opinion, but form a technical point of your what you write has to be easy to troubleshoot.

You do it using the right telemetry libraries. For logging I do not have any doubt. In Go I use zap.

During a workshop about observability I built where I had to instrument 4 applications on different languages I selected:

pino for NodeJS
monolog for PHP
log4j for JAVA

In general I look for libraries that allow me to do structured logging, so for the one that enables me to attach key value pairs to log line. I also look for logging libraries that has the concept of exporters and format. Nothing unusual.

For tracing and events I do not have a favourite one but I would like to see Opentelemetry to become the way to go.

Continuous integration

A project without CI can not be called in that way. Nowadays there are a lot of free services that you can use, so no excuse. When I am on GitHub I go for Actions now because they are free and embedded in the VCS itself.

If you didn’t write any test at least set the process up and running. Just run tests, usually they do not fails if empty. And there are static checker, linters and things like that for every language, set them up!

Continuous delivery

You made the CI part, you are half way done. Release is important and we have the tools to get it right since day one. It is a pain to do a release, there are a lot of potential manual state to get it right:

Bump version
Changelog
Compile and push binaries if it is an application
…

There are tools that helps you to do that in automation. For my apps I use goreleaser, for the libraries I use Releaser drafter.

Testing framework

Write tests, and when you see repeated code extract it in a testing package. zap has zaptest, your project should have yourprojecttest as well.

It is useful for yourself, because it will make the job to write more test effortless, and if you well document your testing package contributors will be able to use it when opening a PR because you will make writing tests easier for everybody. As a bonus who ever uses your libraries can use the testing package to write their own tests for their application.

Conclusion

This is the list I use, and I will keep it up to date now that I wrote it down, adding or editing it, so be sure to stay around!

I hope this checklist is general enough and useful to be reusable for you in some of its part.

What I like about it is that I do not need to be a CTO, a maintainer or something like that to drive the adoption of those point that I think are crucial, I drove the adoption of some of them even as a solo contributor.

Why code instrumentation?

2020-03-29T09:08:27+00:00

I am writing this blog post as a common introduction for a new category I would like to write about consistently on my blog. If this is the first time you land here this is my blog, and I write about everything that catches my attention but sometime earlier sometime later on I realize I can group my posts in categories, and that’s what I am doing now.

Some of them are: Assemble Kubernetes, Docker, MockMania. This one will be called Code Instrumentation.

There are a lot of people writing about observability, monitoring and I did it for the last 3 years as well. I learned a lot along the way but what I think is crucial is that developers has to write code that is understandable and easy to debug where it is more valuable, in production. And if an application or a system is hard to figure out we as a developer play a mojor role on it.

That’s why Site Reliability Engineering (SRE) is not related to ops, servers, Kubernetes but it is something that plays its match in your code.

What’s why I think SRE and DevOps are different, not at all connected.

The technologies that are leading the landscape are:

Prometheus but not the time series database, their client libraries and the exposition format, now branded from the community and the Cloud Native Computing Foundation (CNCF) as OpenMetrics
OpenTracing, OpenCensus and OpenTelemetry. They are part of the same bullet points because I think about them as the consequence of each other since what I hope is “THE LAST ONE”, OpenTelemetry. They are instrumentation libraries and specification to increase interoperability and to avoid vendor lock-in for what concerns distributed tracing and metrics. I hope logs will jump onboard at some point

Prometheus and OpenMetrics

I wrote about this topic previously, so have a look there if you do not know what I am speaking about.

I think they are worth to mention here because that’s how I learned the effect of good or bad code instrumentation, and the fact that it has to happen in your code, when you develop it.

It has the same weight has writing a good data structure, writing solid unit tests, or picking the right design pattern.

OpenTelemetry (otel)

As I said I will refer to otel when I can, not because I think OpenTracing or OpenCensus is bad, but because I do not see this as a religion, but for me it is a technical problem, they is well spread and it has to find a good answer.

Those communities decided to merge to otel in the way they are doing, good or bad? We can get a beer at some point and I will tell you. It is out of scope.

What I am gonna talk about

This is a long new category introduction blog post probably but that’s it. Over the last two years I tried to share what I experienced around this topic with a workshop called: “Application Monitoring”. A lot of the articles that I will write comes from there, and it is an attempt to share what I think worked or failed.

CNCF Webinar: Continuous Profiling Go Application Running in Kubernetes

2020-03-27T09:08:27+00:00

Microservices and Kubernetes help our architecture to scale and to be independent at the price of running many more applications. Golang provides a powerful profiling tool called pprof, it is useful to collect information from a running binary for future investigation. The problem is that you are not always there to take a profile when needed, sometimes you do not even know when you need to one, that’s how a continuous profiling strategy helps. Profefe is an open-source project that collect and organizes profiles. Gianluca wrote a project called kube-profefe to integrate Kubernetes with Profefe. Kube-profefe contains a kubectl plugin to capture locally or on profefe profiles from running pods in Kubernetes. It also provides an operator to discover and continuously profile applications running inside Pods.

A bunch of links for you:

How to do testing with zap a popular logging library for Golang

2020-03-24T09:08:27+00:00

TestProject is a community all about testing and you know how much I love communities! Join us.

If you follow me on twitter you know that I am passionate about o11y, monitoring and code instrumentation.

I see logs not as a random print statement that you use only when something is wrong, but they have value. Logs are the communication channel our applications use. As developers it is our job to make them to speak in a comprehensive way.

Logs should be structured and in some way consistent across functions, http handlers, applications even languages to simplify their use. From algorithms and human operators.

In Go zap is a popular logging library provided by Uber, I use it almost by default for all my applications.

package main

import "go.uber.org/zap"

func main() {
	logger, _ := zap.NewProduction()
	do(logger)
}

func do(logger *zap.Logger) {
	logger.Error("Start doing things")
}

So logging and testing? In the same article? I should be really drunk!

When I discovered that zap comes with a testing utility package called zaptest I felt in love with this library even more:

package main

import (
	"testing"

	"go.uber.org/zap/zaptest"
)

func Test_do(t *testing.T) {
	logger := zaptest.NewLogger(t)
    do(logger)
}

The go test command supports the flag -v to improve verbosity of test execution. In practice that’s how you forward to stdout logs and print statements during a test execution. zaptest works with that as well.

Very cool, and useful if you write smoke tests, pipeline tests, or how ever you call them and see the logs can be spammy, but helpful to figure out the actual issue.

package main

import (
	"testing"

	"go.uber.org/zap"
	"go.uber.org/zap/zapcore"
	"go.uber.org/zap/zaptest"
)

func Test_do(t *testing.T) {
	logger := zaptest.NewLogger(t, zaptest.WrapOptions(zap.Hooks(func(e zapcore.Entry) error {
		if e.Level == zap.ErrorLevel {
			t.Fatal("Error should never happen!")
		}
		return nil
	})))
	do(logger)
}

You can use hooks to check for expected or unexpected logs.

Hook are executed for every log line:

func(e zapcore.Entry) error {
    if e.Level == zap.ErrorLevel {
        t.Fatal("Error should never happen!")
    }
    return nil
})

If you do not expect any error level log line for your execution because you are testing the happy path, you can do something like that.

DISCALIMER: This is another way to write assertion. You will may use them to enforce other checks, or to validate the workflow from a different point of view that will may be easier to do as first attempt. As I usually say: “an easy and partial test is better than no test”.

Do not test only logs, it won’t age well! Keep writing good tests!

Show Me Your Code with Walter Dal Mut: Extend Kubernetes in NodeJS

2020-03-13T09:08:27+00:00

About Walter Dal Mut

Walter Dal Mut works as a Solutions Architect @Corley SRL, is is an electronic engineer who moved to Software Engineering and Cloud Computing Infrastructures. Passionate about technology in general and open source movement lover.

If you wont you can follow him on Twitter and GitHub.

Kubernetes extendibility and NodeJS

Almost everybody that is currently working on Kubernetes, developing extensions, controllers, operator is doing it in Go. That’s reasonable because Kubernetes is written Go and there is a lot of code that you can reuse in that language.

What if you are not a Go developer?

Walter coded a shared informer in NodeJS that watches and take actions on Pod events.

How to test CLI commands made with Go and Cobra

2020-03-09T09:08:27+00:00

Almost everything is a CLI application when writing Go. At least for me. Even when I write an HTTP daemon I still have to design a UX for configuration injection, environment variables, flags and things like that.

The set of libraries I use is very standard, I use Cobra, pflags and occasionally Viper. I can say, without a doubt that Steve Francia is awesome!

This is how a command looks like directly from the Cobra documentation:

var rootCmd = &cobra.Command{
  Use:   "hugo",
  Short: "Hugo is a very fast static site generator",
  Long: `A Fast and Flexible Static Site Generator built with
                love by spf13 and friends in Go.
                Complete documentation is available at http://hugo.spf13.com`,
  Run: func(cmd *cobra.Command, args []string) {
    // Do Stuff Here
  },
}

I like to write a constructor function that returns a command, in this case it will be something like:

func NewRootCmd() *cobra.Command {
    return &cobra.Command{
      Use:   "hugo",
      Short: "Hugo is a very fast static site generator",
      Long: `A Fast and Flexible Static Site Generator built with
                love by spf13 and friends in Go.
                Complete documentation is available at http://hugo.spf13.com`,
      Run: func(cmd *cobra.Command, args []string) {
        // Do Stuff Here
      },
  }
}

The reason why I like the have this function is because it helps me to clearly see the dependency my command requires. In this case nothing. I also like to use not the Run function but the RunE one, it works in the same way but it expects an error in return.

func NewRootCmd(in string) *cobra.Command {
    return &cobra.Command{
      Use:   "hugo",
      Short: "Hugo is a very fast static site generator",
      Long: `A Fast and Flexible Static Site Generator built with
                love by spf13 and friends in Go.
                Complete documentation is available at http://hugo.spf13.com`,
      RunE: func(cmd *cobra.Command, args []string) (error) {
          fmt.Fprintf(cmd.OutOrStdout(), in)
          return nil
      },
  }
}

In order to execute the command, I use cmd.Execute().

Let’s write a test function:

The output with go test -v contains “hi” because by default cobra prints to stdout, but we can replace it to assert that automatically

func Test_ExecuteCommand(t *testing.T) {
	cmd := NewRootCmd("hi")
	cmd.Execute()
}

=== RUN   Test_ExecuteCommand
hi--- PASS: Test_ExecuteCommand (0.00s)
PASS
ok      ciao    0.006s

The trick here is to replace the stdout with something that we can read programmatically like a bytes.Buffer for example:

func Test_ExecuteCommand(t *testing.T) {
	cmd := NewRootCmd("hi")
	b := bytes.NewBufferString("")
	cmd.SetOut(b)
	cmd.Execute()
	out, err := ioutil.ReadAll(b)
	if err != nil {
		t.Fatal(err)
	}
	if string(out) != "hi" {
		t.Fatalf("expected \"%s\" got \"%s\"", "hi", string(out))
	}
}

Personally I do not think there is much more to know in order to effectively test CLI commands, they can be very complex, but if you can mock its dependencies and check what the execution prints out you are very flexible!

Another thing you have to control when running a command is its arguments and its flags because based on them you will get different behavior that you have to test in order to figure out that your commands work with all of them.

The logic works the same for both but arguments are very easy, you just have to set the argument in the command with the function cmd.SetArgs([]string{"hello-by-args"}).

package main

import (
	"bytes"
	"fmt"
	"io/ioutil"
	"testing"

	"github.com/spf13/cobra"
)

func NewRootCmd() *cobra.Command {
	return &cobra.Command{
		Use:   "hugo",
		Short: "Hugo is a very fast static site generator",
		RunE: func(cmd *cobra.Command, args []string) error {
			fmt.Fprintf(cmd.OutOrStdout(), args[0])
			return nil
		},
	}
}

func Test_ExecuteCommand(t *testing.T) {
	cmd := NewRootCmd()
	b := bytes.NewBufferString("")
	cmd.SetOut(b)
	cmd.SetArgs([]string{"hi-via-args"})
	cmd.Execute()
	out, err := ioutil.ReadAll(b)
	if err != nil {
		t.Fatal(err)
	}
	if string(out) != "hi-via-args" {
		t.Fatalf("expected \"%s\" got \"%s\"", "hi-via-args", string(out))
	}
}

Flags works in the same:

package main

import (
	"bytes"
	"fmt"
	"io/ioutil"
	"testing"

	"github.com/spf13/cobra"
)

var in string

func NewRootCmd() *cobra.Command {
	cmd := &cobra.Command{
		Use:   "hugo",
		Short: "Hugo is a very fast static site generator",
		RunE: func(cmd *cobra.Command, args []string) error {
			fmt.Fprintf(cmd.OutOrStdout(), in)
			return nil
		},
	}
	cmd.Flags().StringVar(&in, "in", "", "This is a very important input.")
	return cmd
}

func Test_ExecuteCommand(t *testing.T) {
	cmd := NewRootCmd()
	b := bytes.NewBufferString("")
	cmd.SetOut(b)
	cmd.SetArgs([]string{"--in", "testisawesome"})
	cmd.Execute()
	out, err := ioutil.ReadAll(b)
	if err != nil {
		t.Fatal(err)
	}
	if string(out) != "testisawesome" {
		t.Fatalf("expected \"%s\" got \"%s\"", "testisawesome", string(out))
	}
}

This is it! I like a lot to write unit tests for cli command because in real life they are way more complex than the one I used here. It means that they run a lot more functions but the command is well scoped in terms of dependencies (if you write a constructor function) and in terms of input and output. So it is easy to write an assertion and write table tests with different inputs.

Smart working does not need to be remote

2020-03-04T09:08:27+00:00

I work remotely for almost 3 years now and I am happy with it. First of all, because the company I am working for, InfluxData, is not based where I live. I am currently in Turin and it is in San Francisco, so the unique way for me to do the kind of work I am doing today is to be remote. I moved to Dublin for 2 years because my English was not as I hoped it to be and I was looking for a quick way to make it right. Beers and good friends helped me to succeed! I still have to use Grammarly to write blog posts but hey, at least I can write them.

Anyway, I like to work with people from all over, and that’s why I am not even thinking about the possibility to get back to work for a small company in Italy for now. Right now the perception is almost like getting paid to work in the same environment I used to work for free when contributing to open source communities. People from all over my conditions in terms of time commitment. I just have to work hard and do what it takes to make my team and the company to improve. This is, in essence, the difference between remote work and smart working. You can work remotely as you work in an office. You have to be at your desk, at some time for 8 hours, even when it is raining, even when you feel unproductive.

This rambling is to remember to myself that I do not like remote working, I like smart working. I like to be able to organize my time because my boss trusts my ability to judge the situation. I can work day and night and take a break an entire day if I feel like I have to. Obviously it is a huge responsibility and a risk from both side, as a company you do not have the common framework that we built across years to figure out how productive an employee is, and as a worker, you have to develop the risk skills to read the situation you are in. But this ability helped me to be a conscious developer and not just a code generator that solves self-built challenges.

Hero image via Pixabay

So I would like to rephrase the title: “Smart working does not need to be remote, but it helps”.

I realized the difference because I had to get smarter, my company is 9 hours behind my current time. I work alone a lot. I have to read in advance what the product or my product manager will ask me to work on because I have to develop a buffer that will keep you busy when the current task I am resolving is blocked or I can’t get around it without reaching to a person that will probably be offline, or maybe I can but it will take so much effort that it is smarter for me to just wait. It is equivalent to procrastinate until you can shake the chair of your coworker that wrote the freaking recursive function that you have to debug. The problem is that you never know if the coworker will even show up.

Companies you have to set up your workload to assist smart workers, not a remote worker. In tech at least, where this is a reachable goal.

I think it is a temporary condition, right now I feel like I need both, remote to be able to work where I think I will learn or perform better, or where it is funnier. And smart, because I like to develop organization skills and to feel the master of my clock.

Who knows how it will evolve. Thank you for your time.

The awesomeness of the httptest package in Go

2020-02-25T09:08:27+00:00

Go has a nice http package. I am able to say that because I am not aware of any other implementation of it in Go other than the one provided by the standard library. This is for me a good sign.

resp, err := http.Get("http://example.com/")
if err != nil {
	// handle error
}
defer resp.Body.Close()
body, err := ioutil.ReadAll(resp.Body)

This example comes from the documentation itself.

We are here to read about testing, so who cares about the http package itself! What matters is the httptest package! Way cooler.

This article is not the first one for the MockMania series, I wrote about titled “InfluxDB Client v2”, it uses the httptest service already! But hey it deserves its own blog post.

Server Side

The http package provides a client and a server. The server is made of handlers. The handler takes a request and based on that it returns a response. This is its interface:

type Handler interface {
    ServeHTTP(ResponseWriter, *Request)
}

As you can see if gets a ResponseWriter to compose a response based on the Request it gets. This process can be as complicated as you like, it can reaches databases, third party services but in the end, it writes a response.

It means that mocking all the dependencies to get the right scenario we use the ResponseWriter to figure out if the handler made what we want.

The httptest package provides a replacement for the ResponseWriter called ResponseRecorder. We can pass it to the handler and check how it looks like after its execution:

handler := func(w http.ResponseWriter, r *http.Request) {
	io.WriteString(w, "ping")
}

req := httptest.NewRequest("GET", "http://example.com/foo", nil)
w := httptest.NewRecorder()
handler(w, req)

resp := w.Result()
body, _ := ioutil.ReadAll(resp.Body)

fmt.Println(resp.StatusCode)
fmt.Println(string(body))

This handler is very simple, it just manipulates the response body. If your handler is more complicated and it has dependencies you have to be sure to replace them as well, injecting the appropriate one.

Client-Side

Handlers are useful if you can’t use them. The Go http package provides an http client as well that you can use to interact with an http server. An http client by itself is useless, but it is the entry point for all the manipulation and transformation you do on the information you get via HTTP. With the proliferation of microservices, it is a very common situation.

The workflow is well understood, you have an HTTP backend to interact with, you fetch data from there are you manipulate them with your business logic. When testing what you can do is to mock the http backend in order to return what you want, testing that your business logic does what it is supposed to do based on the input you get from the HTTP server.

During our first example, the handler was the subject of our testing, this is not the case anymore, we are testing the consumer this time, so we have to mimic and handler in order to get what we expect to return

ts := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
	fmt.Fprintln(w, "I am a super server")
}))
defer ts.Close()

As you can see we are creating a new HTTP server via the httptest. It accepts a handler. The goal for this handler returns what we would like to gest our code on. In theory, it should just use the ResponseWriter to compose the response we expect.

The server has a bunch of methods, the one you are looking for is the URL one. Because we can pass it to an http.Client, the one we will use as a mock for our function

res, err := http.Get(ts.URL)
if err != nil {
	log.Fatal(err)
}
bb, err := ioutil.ReadAll(res.Body)
res.Body.Close()

That’s it, as you can see ts.URL points the http.Client to the mock server we created.

Conclusion

I use the httptest package a lot even when writing SDKs for services that do not have integration with Go because I can follow their documentation mocking their server and I do not need to reach them until I am confident with the code I wrote.

My suggestion is to test your client code for edge cases as well because of the httptest.Server gives you the flexibility to write any response you can think about. You can mimic an authorized response to seeing how your code with handle it, or an empty body or a rate limit. The only limit is our laziness.

Golang MockMania InfluxDB Client v2

2020-02-09T09:08:27+00:00

Recently I had to develop an integration with the InfluxDB Client v2 Golang SDK.

This SDK is useful to interact with InfluxDB v2, create organizations and users, write new points, and submit queries; it accepts the Golang http.Client.

influx, err := influxdb.New(myHTTPInfluxAddress, myToken, influxdb.WithHTTPClient(myHTTPClient))
if err != nil {
	panic(err)
}

Having the ability to pass the HTTP client from the outside influxdb.WithHTTPClient(myHTTPClient) improves the familiarity golang developers have with the library; they know how to configure Transporters or how to inject logging, tracing, debugging. For what concerns Golang MockMania, it gives to use the possibility to pass the httptest client.

influxDBServer := httptest.NewServer(http.HandlerFunc(func(rw http.ResponseWriter, req *http.Request) {

}))
influxClient, err := influxdb.New(myHTTPInfluxAddress, myToken, influxdb.WithHTTPClient(influxDBServer.Client()))

At this point you can write the response you expect from the influxdb server using the http.ResponseWriter.

Either way, even if you have to check what influxdb receives from the sdk or if you have to obtain a specific answer from InfluxDB to validate what your business logic will do, nothing will stop you from using checking the http.Request or utilizing the http.ResponseWriter to get what you expect.

Continuous Profiling Go applications running in Kubernetes

2020-02-04T09:08:27+00:00

Recently I wrote “Continuous profiling in Go with Profefe”, an article about the new shiny open source project I am contributing to.

TLDR: Profefe is a registry for pprof profiles. You can push them embedding an SDK in your application or you can write a collector (cronjob) that gets profiles and push the tar via the Profefe API. Side by side with the profile you have to send other information like:

Type: represents the profile type such as mutex, goroutines, CPU and so on
Service: identifies the source for this profile, for example, the binary name
InstanceID: identifies where it comes from, for example, pod name or server hostname
Labels: are optional key/value pairs that you can use at query time to filter profiles. If you are building the same service with two different Go versions to check for performance degradation you can label the profiles with go=1.13.4 for example.

The article has way more content but that’s enough. You can keep reading with only this information.

Kubernetes

As you know at InfluxData we use Kubernetes, our services already expose the pprof HTTP handler and we can not instrument all the services with the Profefe SDK, for those reasons we had to write our own collectors capable of getting pprof profiles via the Kubernetes API and to push them into Profefe. That’s why we decided to go with a different approach. I wrote a project called kube-profefe. It acts as a bridge between the Profefe API and Kubernetes. The repository provides two different binaries:

A kubectl plugin that you can install (even via krew) that servers useful utilities to interact with the profefe API (profefe at the moment does not have a CLI) and to capture profiles from running pod.
A collector that can run as a cronjob, it goes pod by pod looking for profiles to collect and it will push them to Profefe.

Architecture

In order to configure the collector or to capture profiles from a running container, it leverages pod annotations. Only the pods with the annotation pprof.com/enable=true will be taken into consideration from kube-profefe. Other annotations are optional or they have default values. This one is the unique one that has to be set to make kube-profefe aware of your pod.

The example above shows a Pod spec that enables profefe capabilities:

apiVersion: v1
kind: Pod
metadata:
  name: influxdb-v2
  annotations:
    "profefe.com/enable": "true"
    "profefe.com/port": "9999"
spec:
  containers:
  - name: influxdb
    image: quay.io/influxdb/influxdb:2.0.0-alpha
    ports:
    - containerPort: 9999

As you can see there are other annotations such as profefe.com/port by default is 6060. In this case it is pointed to 9999 because that’s where the pprof HTTP handler runs in InfluxDB v2. A full list of annotations is maintained in the project’s README.md.

There is not a lot more to know about the underling mechanism that enpowers kube-profefe, we are gonna deep dive on both components: the kubectl plugin and the collector.

Kubectl-profefe: the kubectl plugin

A kubectl plugin is nothing more than a binary located in your $PATH with the prefix name “kubectl-”. In my case the binary is released with the name kubectl-profefe, when located in your $PATH you will be able to run a command like:

$ kubectl profefe --help
It is a kubectl plugin that you can use to retrieve and manage profiles in Go.

Usage:
  kubectl-profefe [flags]
  kubectl-profefe [command]

Available Commands:
  capture     Capture gathers profiles for a pod or a set of them. If can filter by namespace and via label selector.
  get         Display one or many resources
  help        Help about any command
  load        Load a profile you have locally to profefe

Flags:
  -A, --all-namespaces                 If present, list the requested object(s) across all namespaces. Namespace in current context is ignored even if specified with --namespace.
      --as string                      Username to impersonate for the operation
      --as-group stringArray           Group to impersonate for the operation, this flag can be repeated to specify multiple groups.
      --cache-dir string               Default HTTP cache directory (default "/home/gianarb/.kube/http-cache")
      --certificate-authority string   Path to a cert file for the certificate authority
      --client-certificate string      Path to a client certificate file for TLS
      --client-key string              Path to a client key file for TLS
      --cluster string                 The name of the kubeconfig cluster to use
      --context string                 The name of the kubeconfig context to use
  -f, --filename strings               identifying the resource.
  -h, --help                           help for kubectl-profefe
      --insecure-skip-tls-verify       If true, the server's certificate will not be checked for validity. This will make your HTTPS connections insecure
      --kubeconfig string              Path to the kubeconfig file to use for CLI requests.
  -n, --namespace string               If present, the namespace scope for this CLI request
  -R, --recursive                      Process the directory used in -f, --filename recursively. Useful when you want to manage related manifests organized within the same directory. (default true)
      --request-timeout string         The length of time to wait before giving up on a single server request. Non-zero values should contain a corresponding time unit (e.g. 1s, 2m, 3h). A value of zero means don't timeout requests. (default "0")
  -l, --selector string                Selector (label query) to filter on, supports '=', '==', and '!='.(e.g. -l key1=value1,key2=value2)
  -s, --server string                  The address and port of the Kubernetes API server
      --token string                   Bearer token for authentication to the API server
      --user string                    The name of the kubeconfig user to use

Use "kubectl-profefe [command] --help" for more information about a command.

This output should look very familiar to you, there are a lot of options usable with any other kubectl native command. Mainly around authentication: –user, –server, –kubeconfig, –client-certificate… Or around pod selection: -l, –selector, -n, –namespace, –all-namespaces. If you are curious about how to write a friendly kubectl plugin I wrote “kubectl flags in your plugin” check it out.

This plugin, even if it is not native, uses the same authentication mechanism in use from the kubectl so, where ever the kubectl works, this plugin should work as well.

The pod selectors -l, -n, for example, are useful when running the command:

$ kubectl profefe capture

Capture, as the name suggests, goes straight to one or more pods and it downloads or pushes to profefe various profiles. It is very flexible, you can capture pprof profiles from a specific pod (or multiple pods) by ID:

$ kubectl profefe capture <pod-id>,<pod-id>...

NB: just remember to use the namespace where the pods are running with the flag -n or –namespace.

You can use the pod selectors to collect multiple profiles:

$ kubectl profefe capture -n web

Captures profiles from all the pod with the pprof.com/enable=true annotation running in the pod namespace and it will store them under the /tmp directory. You can change the output directory with --output-dir. If you do not want to store them locally you can push them to profefe specifying its location via --profefe-hostport.

The are other combinations for the capture command and you can get profiles from profefe, I will leave the rest to you!

Hero image via Pixabay

Kprofefe: the collector

The main responsability for the collector is to make the continuous profiling magic to happen! It uses the same mechanism we already saw for the capture kubectl plugin but it is a single binary and it can run as a cronjob.

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: kprofefe-allnamespaces
  namespace: profefe
spec:
  concurrencyPolicy: Replace
  jobTemplate:
    metadata:
    spec:
      template:
        spec:
          containers:
          - args:
            - --all-namespaces
            - --profefe-hostport
            - http://profefe-collector:10100
            image: profefe/kprofefe:v0.0.8
            imagePullPolicy: IfNotPresent
            name: kprofefe
          restartPolicy: Never
          serviceAccount: kprofefe-all-namespaces
          serviceAccountName: kprofefe-all-namespaces
  schedule: '*/10 * * * *'
  successfulJobsHistoryLimit: 3

You can run a single cronjob that will over all the pods across all the namespaces or you can deploy multiple cronjobs, playing with the label selector (-l) and the namespace selector (-n) you can configure the ownership for every running cronjob. The reasons to split in multiple cronjobs can be:

Scalability: one cronjob is not enough, so you can have one per namespace for example
Time segmentation: if you have a single cronjob it means that all the pods profiles will get captured with the same frequency, but you will may want to get high frequent profiles for a specific subset of applications and less dencity for others.

Documentation about “Label and Selector” for your reference.

Note: serviceAccount is required only if you have RBAC enabled (you should) because the collector needs access to Kubernetes API to list/view pods across all namespaces in this case.

Conclusion

There is a lot to do in both the collector and kubectl plugins. I would like to add logs and monitoring to the collector for example. The kubectl plugin get profiles command needs some love, ideally using the same format that kubectl get has via kubernetes/cli-runtime/pkg/printers. Try, contribute and let me know!

Make boring tasks enjoyable with go and colly

2020-01-23T09:08:27+00:00

Recently I had the idea to update the conference page on my website with the end goal to make it a bit more structured. Where structured means a bit more reusable compared with the static HTML table I used to have.

In the beginning, I decided to do an HTML table every year listing all the conferences as a single row. It worked but I think at this point I can do something even cooler with a single page for every conference talk with YouTube and slides embedded, the abstract and few links to deep dive on the topic.

Jekyll has a cool feature called collections: “Collections are a great way to group related content like members of a team or talks at a conference.” I decided to do a “my_talks” collection.

I first added the right configuration in the _config.yaml and I added my first conference in 2020, DevOps Pro in Vilnius (see you there!!).

collections:
  my_talks:
    output: true

I have created my first talk as a markdown file, just as I do for my posts:

---
title: Continuous Profiling Go Application Running in Kubernetes
date: 2020-03-24
slide:
embedSlide:
video:
embedVideo:
eventName: DevOps Pro Europe
eventLink: https://devopspro.lt/
city: Vilnius, Lithuania
---
Microservices and Kubernetes help our architecture to scale and to be
independent at the price of running many more applications. Golang provides a
powerful profiling tool called pprof, it is useful to collect information from a
running binary for future investigation. The problem is that you are not always
there to take a profile when needed, sometimes you do not even know when you
need to one, that's how a continuous profiling strategy helps. Profefe is an
open-source project that collects and organizes profiles. Gianluca wrote a
project called kube-profefe to integrate Kubernetes with Profefe. Kube-profefe
contains a kubectl plugin to capture locally or on profefe profiles from running
pods in Kubernetes. It also provides an operator to discover and continuously
profile applications running inside Pods.

As you can see I decided to set a bunch of variables that I will hope to re-use where I will do the “single page” for each talk.

That’s it. All done: 2020 looks awesome and I added a for loop in the conference page to print out the row as before:

<div class="row">
    <h3></h3>
    <div class="col-md-12">
        <table class="table table-hover" id="">
          <thead>
            <tr>
              <th>Date</th>
              <th>Event</th>
              <th>Talk</th>
              <th>Slide</th>
            </tr>
          </thead>
          <tbody>
            
          </tbody>
      </table>
    </div>
</div>

In order to make everything a bit more reusable and organized this piece of code is what Jekyll call include. The way I use it inside the conference page looks like:

{ assign talks2020 = site.my_talks | where:'date', "2020" }
{ include talks_per_year.html year="2020" talks=talks2020 }

Everything is working fine, and I am pretty happy but I have over 6 years of talks to convert in this new format, it means over 50 conferences to convert one by one in the new format made of files and YAML.

Scraping is my superpower

I am not a fan of scraping things around and I never did that before, but hey! This solution looks less boring that me doing it manually. I deep dive looking for scraping libraries in new languages (yes you always have to learn new languages when doing a new side project), but at the end I discovered colly: “Elegant Scraper and Crawler Framework for Golang”. I decided to be elegant and effective.

A bit about Colly

I have to say that it took me less than 2 hours to hack a script in Go using Colly that converted all my tables year by year from HTML to files with the format you saw above. I also added some sweet sugar like:

Be able to convert YouTube links when detected to their embeddable version
I converted and standardized the end/start date for the talks because it changed year by year (I am lazy and unconsistent! Don’t tell anybody)

It was soo easy that I didn’t write any test… yep, that’s it. The file name is a bit weird but at the end it works, so who cares!

$ tree ./_my_talks/
./_my_talks/
├── 2013-09-12-what-is-vagrant.markdown
├── 2014-02-c'è-un-modulo-zf2-per-tutto!---there-is-a-module-for-all.markdown
├── 2014-03-zend-queue.markdown
├── 2014-05-getting-start-chromecast-developer.markdown
├── 2014-05-vagrant,-riutilizzo-dell'infrastruttura---vagrant,-reuse-architecture.markdown
├── 2014-10-sviluppo-di-api-rest-con-zf2-&-mongodb.markdown
├── 2014-10-time-series-database,php-&-influx-db.markdown
├── 2015-01-angularjs-advanced-startup.markdown
├── 2015-06-delorean-made-in-home---reaspberry,-gobot-and-mqtt.markdown
├── 2015-07-joomla-and-scalability-with-aws-beanstalk.markdown
├── 2015-09-penny-php-middleware-framework.markdown
├── 2015-10-angularjs-in-cloud.markdown
├── 2015-10-doctrine-orm-cache-layer---it-is-not-a-boomerang.markdown
├── 2015-11-wordpress-and-scalability-with-docker.markdown
├── 2016-02-slimmer---poc-born-after-a-revolt-instant-vs-jenkins.markdown
├── 2016-03-a-zf-story:-parallel-made-easy.markdown
├── 2016-04-listen-your-infrastructure-and-please-sleep.markdown
├── 2016-05-continuous-delivery-with-jenkins-in-the-real-world.markdown
├── 2016-06-aws-under-the-hood.markdown
├── 2016-06-listen-your-infrastructure-and-please-sleep.markdown
├── 2016-06-parallel-made-easy.markdown
├── 2016-07-docker-1.12-and-orchestration-built-in.markdown

Anyway, let’s get to some snippets!

type Talk struct {
	Title      string            `yaml:"title"`
	Date       time.Time         `yaml:"date"`
	Slide      string            `yaml:"slide"`
	EmbedSlide string            `yaml:"embedSlide"`
	Video      string            `yaml:"video"`
	EmbedVideo string            `yaml:"embedVideo"`
	EventName  string            `yaml:"eventName"`
	EventLink  string            `yaml:"eventLink"`
	City       string            `yaml:"city"`
	Links      map[string]string `yaml:"links"`
}

var dateLayout = "_2 Jan 2006"
var year = "2020"
var outputDir = "/tmp"

var errorsToCheck = map[string]string{}

Those are the variables and struct I set. The Talk represent every single talk, the dataLayout converts the way the end/start date is written into a time.Time object. year is a parameter that tells which table to scrape, outputDir tells where to place the files. Those 3 variables can be changed with cli flags:

flag.StringVar(&year, "year", "2020", "The year used to identify the table to parse")
flag.StringVar(&dateLayout, "date-layout", "_2 Jan 2006", "The golang format layour to parse the event date column")
flag.StringVar(&outputDir, "output-dir", "/tmp", "Where to place the generated files")

flag.Parse()

errorsToCheck is an easy way to collect all the errors for every run. I printed them in a file, if the errors were easy to fix with a code change I did that, if they were easier to change modifying the current conference page I did that.

// Instantiate default collector
c := colly.NewCollector(
	// Visit only domains: coursera.org, www.coursera.org
	colly.AllowedDomains("gianarb.it", "www.gianarb.it"),

	// Cache responses to prevent multiple download of pages
	// even if the collector is restarted
	colly.CacheDir("./gianarb_cache"),
)
talks := []Talk{}
c.OnHTML("table[id=\""+year+"\"] tbody", func(e *colly.HTMLElement) {
	e.ForEach("tr", func(_ int, row *colly.HTMLElement) {
		talk := Talk{}
                        // for each line "tr" do amazing things
		talks = append(talks, talk)
	})
})

// Before making a request print "Visiting ..."
c.OnRequest(func(r *colly.Request) {
	log.Println("visiting", r.URL.String())
})
err := c.Visit("https://gianarb.it/conferences.html")
if err != nil {
	println(err)
}

This is how easy colly is to run. You have to configure the collector and with the function OnHTML you can look for whatever you need to scrape. In this case I was looking for the table identified with the id equals the year got from the CLI. For each TR element I was creating a new talk to append in a slice. The talk has to to be populated with the actual values scraped cell by cell. It means that ForEach row we need to look for each td (cell in html) and based on its index we can identify the content. In my case it looks like this:

c.OnHTML("table[id=\""+year+"\"] tbody", func(e *colly.HTMLElement) {
	e.ForEach("tr", func(_ int, row *colly.HTMLElement) {
		talk := Talk{}
		row.ForEach("td", func(_ int, el *colly.HTMLElement) {
			switch el.Index {
			case 0:
			    // Date
			case 1:
                      // Event Name and conference URL (task.EventLink)
			case 3:
                      // Video and slides link
			}
		})
		talks = append(talks, talk)
	})
})

I can show you how I coded the case 3, the one that looks for Video or Slides, takes its link and in case of a YouTube Video it also converts the link into an embeddable one:

links := map[string]string{}
el.ForEach("a", func(_ int, el *colly.HTMLElement) {
	switch el.Text {
	case "Video":
		talk.Video = el.Attr("href")
		if strings.Contains(talk.Video, "youtube.com") {
			u, err := url.Parse(talk.Video)
			if err == nil {
				talk.EmbedVideo = "https://www.youtube.com/embed/" + u.Query().Get("v")
			} else {
errorsToCheck[row.Text+"/youtube_video_without_id"] = el.Text
			}
		} else {
			errorsToCheck[row.Text+"/no_youtube_video"] = el.Attr("href")
		}
	case "Slides":
		talk.Slide = el.Attr("href")
	default:
		links[el.Text] = el.Attr("href")
	}
	talk.Links = links
})

This is how I made a boring task enjoyable! And now I have all the talks (minus two that didn’t get converted but I will add manually) converted and ready to be rendered as posts.

Conclusion

This post should not start a useless war between static side generator, Wordpress or whatever. If you follow me on Twitter you know that I tweeted recently about changing Jekyll with something else, mainly because I was thinking how to make a better use of the contents I create. Digging deeper with Jekyll I discovered that for now I don’t need more than that and changing tool will end up as useless and probably not that fun exercise. I am sure all other tools like Wordpress, Hugo, Gatsby have something similar.

My experience with Krew to manage kubectl plugins

2020-01-16T09:08:27+00:00

I wrote a good number of kubectl plugins so far, but there is a lot more I can do with them. Every time I write a new one I discover something new and that is why I am always excited to see what will happen with the next one.

“Unit Testing Kubernetes Client in Go” and “Kubectl flags in your kubectl plugin” are two of the lessons learned along the way.

With kubectl-profefe I decided to have a look at krew. It is a package manager for kubectl plugins. It is a plugin itself with the end goal to help you installing and managing the lifecycle for your plugin.

$ kubectl krew install profefe

It gives you the ability, with a single command to install, update or delete the kubectl-profefe cli command.

Twitter got pretty excited recently about kubectl-tree, a plugin from @ahmetb an old friend of mine and active Kubernetes contributor and maintainer for krew as well. It helps you to visualize kubernetes resources as a tree to simplify the comprehension of the hierarchy and the connection between resources.

Two other examples that I would like to mention are from @ahmetb too. Kubectl plugins don’t need to be extremely complicated, but you always have to keep in mind the mantra “usability first.” It doesn’t matter how many lines of code you write: the end goal should be to develop something usable and well-integrated with kubernetes! kubectl ctx and kubectl ns are fabulous examples of something easy but helpful. We switch between context and namespace more than once a day between production clusters, local development, and so on. It is not a very complicated thing to do natively: for example, changing context with kubectl is just a matter of typing:

$ kubectl config use new-context

Worst case scenario for the namespace, you have to type the -n flags every time you run a kubectl command that is not in the namespace you have set by default for the context you are using.

But kubectl ctx and kubectl ns simplify this process even more. You only have to type:

$ kubectl ctx new-context

$ kubectl ns new-namespace

If you are developing an open-source kubectl plugin and you need a friendly and easy way to distribute it, you should have a look at krew. The publication process is straightforward, this is the PR I had to submit for profefe, you have to type some YAML as usual.

Unit test kubernetes client in Go

2020-01-10T09:08:27+00:00

I write a lot of operations and integrations with Kubernetes those days. You can follow my journey in its dedicated section on this blog “Building Kubernetes”.

I had to write a function recently capable of filtering pods based on assigned annotations.


const (
	ProfefeEnabledAnnotation = "profefe.com/enable"
)

// GetSelectedPods returns all the pods with the profefe annotation enabled
// filtered by the selected labels
func GetSelectedPods(clientset kubernetes.Interface,
	namespace string,
	listOpt metav1.ListOptions) ([]v1.Pod, error) {

	target := []v1.Pod{}
	pods, err := clientset.CoreV1().Pods(namespace).List(listOpt)
	if err != nil {
		return target, err
	}
	for _, pod := range pods.Items {
		enabled, ok := pod.Annotations[ProfefeEnabledAnnotation]
		if ok && enabled == "true" && pod.Status.Phase == v1.PodRunning {
			target = append(target, pod)
		}
	}
	return target, nil
}

This function is pretty easy, but it has a good amount of assertions that we can check. Even more when we have a so well scoped functions writing tests should be almost mandatory.

The returned list of pods should only contains pods with the ProfefeEnabledAnnotation set
The returned list of pods should only returns pods from the specified namespace
The returned list of pods should observe the filtering and label selection criteria specified by metav1.ListOptions

Covering those use cases will give us a solid foundation to avoid regression when this function will get more complicated (usually that’s the evolution for successful piece of code).

Kubernetes Client Mock

Kubernetes offers a simple and powerful fake client that has a very efficient mechanism to simulate the desired output from a specific request, in our case clientset.CoreV1().Pods(namespace).List(listOpt). You have to pass the slice of runtime.Object you desire when you create a new fake client. Awesome and easy.

clientset: fake.NewSimpleClientset(&v1.Pod{
    ObjectMeta: metav1.ObjectMeta{
        Name:        "influxdb-v2",
        Namespace:   "default",
        Annotations: map[string]string{},
    },
}, &v1.Pod{
    ObjectMeta: metav1.ObjectMeta{
        Name:        "chronograf",
        Namespace:   "default",
        Annotations: map[string]string{},
    },
}),

For example this clientset will return two pods, one called influxdb-v2 and one called chronograf, but you can return what ever you need: Services, Deployments, Ingress, Custom Resource Definition or even a mix of everything.

In practice

I wrote a bunch of tests for kube-profefe that are using a fake client. You can get inspiration over there.

Conclusion

fake client is easy to use, so easy that since I added it in my tool chain for some functions like the one I described here I efficiently do TDD because it makes the iteration over my code way faster.

Continuous profiling in Go with Profefe

2020-01-03T09:08:27+00:00

There are a lot of articles about profiling in Go. Julia Evans for examples wrote “Profiling Go programs with pprof” and I rely on it when I do not remember how to properly use pprof.

Rakyll wrote “Custom pprof profiles”.

pprof is a powerful tool provided by Go that helps any developer to figure out what is going in the Go runtime. When you see a spike in memory in your running container the next question is who is using all that memory. Profiles tell you the answer.

But they need to be grabbed at the right time. The unique way to have a profile when you need it is by taking them continuously. Based on your application you should be able to specify how often you have to gather a profile.

This requires a proper infrastructure that we can call “Continuous profiles infrastructure”. It is made of collectors, repositories and you need an API to store, retrieve and query those profiles.

When we had to set it up at InfluxData we started to craft our own one until I saw profefe on GitHub. What I love about the project is its clear scope. It is a repository for profiles. You can push them in Profefe and it provides an API to get them out, it servers the profiles in a way that make them easy to visualize directly with go tool pprof, you can even merge them together and so on. It also have a clear interface that helps you to implement your own storage.

The project README.md well explains how it works but I am going to summarize the most important actions in this article.

Getting Started

There is a docker image that you can run with the command:

docker run -d -p 10100:10100 profefe/profefe

You can push a profile in profefe:

$ curl -X POST \
    "http://localhost:10100/api/0/profiles?service=apid&type=cpu" \
    --data-binary @pprof.profefe.samples.cpu.001.pb.gz

{"code":200,"body":{"id":"bo51acqs8snb9srq3p10","type":"cpu","service":"apid","created_at":"2019-12-30T15:18:11.361815452Z"}}

You can retrieve it directly via its ID:

$ go tool pprof http://localhost:10100/api/0/profiles/bo51acqs8snb9srq3p10

Fetching profile over HTTP from http://localhost:10100/api/0/profiles/bo51acqs8snb9srq3p10
Saved profile in /home/gianarb/pprof/pprof.profefe.samples.cpu.002.pb.gz
File: profefe
Type: cpu
Time: Dec 23, 2019 at 4:06pm (CET)
Duration: 30s, Total samples = 0

There is a lot more you can do, when pushing a profile you can set key value pairs called labels and they can be used to query a portion of the profiles.

You can use env=prod|test|dev or region=us|eu and so on.

Retrieving a profile only via ID it’s not the unique way to visualize it. Profefe merges together profiles from the same type in a specific time range:

GET /api/0/profiles/merge?service=<service>&type=<type>&from=<created_from>&to=<created_to>&labels=<key=value,key=value>

It returns the raw compressed binary, it is compatible with go tool pprof as well as the single profile by id.

Conclusion

I didn’t develop profefe, Vladimir (@narqo) is the maintainer, I like it and how it is coded. I think it solves a very common issue. He wrote a detailed post about his project “Continuous Profiling and Go”

Wouldn’t it be great if we could go back in time to the point when the issue happened in production and collect all runtime profiles. Unfortunately, to my knowledge, we can’t do that.

One of my colleague Chris Goller wrote a self contained AWS S3 implementation that is now submitted as PR. We are running it since a couple of weeks now. It is hard to onboard developers in a new tool, even more during Christmas but the API layers makes it very comfortable and friendly to use. Next article will be about what we did to get it running in Kubernetes continuously profiling our containers.

Year in review

2019-12-30T06:08:27+00:00

I can’t say 2019 was great from a working point of view. I struggled a lot and I probably had a small and dirty burnout. I learned about myself along the way. A few folks that joined me at Influx left, I didn’t really enjoyed that time and the massive growth the company had didn’t help me. I had some difficulties finding my place and since there I didn’t really find the right motivation to be productive such the passion for this job forces me to be.

Luckily I am surrounded by friends and colleagues with an open mind, I just had to ask and to speak loud about my feelings, I always got back useful conversations. I am sure every situation is different but other people experienced similar situation and it is great to have them around. I think it is getting better, I work in a different team now, the amount of YAML I have to write decreases, I am back writing bugs and fixing them.

Open Source is about collaboration

Open Source is part of my daily job and the reasons about why are well explain in my first podcast ever! You can find it on The New Stack.

TLDR: I learned how to write code pinging people on IRC since day one. Different communities helped me to improve in my daily job as nobody ever did. That’s why open source is part of myself and I can’t stay without it. Even more now that I have something to get back.

This year I stopped to do small projects in my GitHub profile all alone. It was a natural decision that I didn’t took on purpose. Even more where I discovered that my shitty useless code is gonna destroy the Arctic because GitHub spams it there.

I had the opportunity to discover a community called testcontainers they do cool things but you know about them because I wrote “testcontainer library to programmatically provision integration tests in Go with containers”, I tweet a lot about it and I spoke at DockerCon about the same topic “Write Maintainable Integration Tests with Docker”.

Recently at Influx we was looking for a way to setup a continuous profiling infrastructure, some work is still ongoing but Vladimir wrote a nice open source project called profefe, we deployed it and I wrote a Kubernetes integration called kube-profefe. It is now part of the profefe organization and I am planning to write a series of posts about it, so stay toned!

Join ongoing community and projects that you LIKE and USE is way better than writing something alone that looks probably similar to something that already exists. It is not easy, you have read more code, you have to reach out to other people that will may be busy but I will keep doing it!

Meetups & Conferences

I run the CNCF Meetup in Turin, let me know if you would like to speak! I do it because I work from home for a company based in San Francisco. They are far away and I spend a lot of my working our by myself. My local community helps me to develop great connections with people close to me! Ordering pizza, finding locations, speakers, sponsors are unused tasks that I enjoy. All the videos are available on YouTube some of them are in English other are not.

This year the plan is to run a nomad meetup, we will move office by office in order to meet more people and to be cool companies or startups.

Would you like to sponsor, speak, host us?! Reach out ciao@gianarb.it (Turin only locations).

I made way too many talks during the first part of the year (counting 11), and the difficulties I had at work convinced me to take a break. I didn’t took any flight since June (almost), I feel recharged now but I will keep a low number of events this year. I would like to write more and to do more podcast. Do you host one? Let me know!

Write

I wrote 26 articles. I am impressed by the number now that I see it. My articles come from what I built I need to keep doing fun projects in order to have something useful to write. I will probably stay focused on Extending Kubernetes because I like how dynamic the code is, I would like to keep experimenting with reconciliations loops and reactive planning and to study Control Theory because “it is done”.

CherryServers is a cloud provider that I met at ContainerDays in Hamburg and since there we loved each other! I can ping them and have fun on their platform as much as I like and this is great. They do not have a Kubernetes story yet, let’s see if we can do something about it! If you need to write an operator or a CSI plugin (persistent storage), or who knows even a Cluster API implementation let me know!

My first collaboration with a publisher was good but not excellent. I wrote a report for O’Reilly called “Extending Kubernetes”. I didn’t get any information from there about how it is going but it is a normal practice for “a report” for what they told me. I defined it “not excellent” because it does not look like a collaboration, it is a one shot effort. I am happy to see it live because I like to write, but it is not my best skill. This collaboration helped me to raise the bar.

In 2019 as I did for open source I would like to collaborate with other people, maybe to write another book. Something is moving but let me know if you have any idea.

2020!

If I have to pick a word to describe 2019 I will use join. I joined a lot of great people/teams embracing what they care or they were working on. I loved that. I hope to keep doing it with the help of the communities I am part of like Docker, observability, Kubernetes, CNCF, testcontainers. I hope to join more people that shares my passions in order to improve and build something together.

In order to do that I need you all around! Reach out @gianarb.

Home sweet home!

We have a new project! The most important one! We bought a house and there is a lot of work to do! Look at how this wall is going down! Made in YAML.

Free PDFs about Docker from a Captain

2019-12-18T06:08:27+00:00

In the process of make some order in all my repositories, side projects and so on my terminal crossed over a project I made in 2017 called scaledocker. The idea was a full book. It turned out to be a 2 pdf one describing Docker for a totally beginner, the second one about security.

Looking back I see why I am so happy to build and share what I do, I am still like that today!

I was discussing with Jenny about what do with this project and we thought about the possibility to refresh those contents as part of a series in docker.com.

I had a look around at this scenario, the contents are still good but I can’t find the raw version of the PDF. It makes this work very hard. That’s why I decided to archive it here.

scaledocker.com will stay up and running redirecting everybody to this article, where you can find both PDFs. It is for sale, just make an offer you can have it starting from Sept 2020.

This is not at end at all! There is an all section in my blog related to docker and who knows what I will work on in 2020! I just do not like the idea to have this project keeping a domain busy, I bet there are people that can make a better use of it as I tried to do few years ago!

In order to stay in touch with me my website is a good way and I rumble a lot on twitter as well!

Docker the Fundamental

25 pages

Getting Started about Docker, it covers basic concepts about what container means and it’s a starting point to understand containers, Docker and all the ecosystem.

Introduction
Install Docker on Ubuntu 16.04
Install Docker on Mac
Install Docker on Windows
Run your first HTTP application
Docker engine architect
Image and Registry
Docker Command Line Tool
Volumes and File Systems 20
Network and Links
Conclusion

Open the pdf

Docker Security - Play Safe

55 pages

When you think about production everything stops to be a joke. Containers, cloud computing, scalability is something that fit well with security? I have my personal response for this question and with this paper I am going to show what I mean with security and how containers, docker, Linux can make this real.

Introduction
Mutual TLS and Security by default
Content Trust
Overlay Network
Docker Bench Security
Process Restriction and Capabilities
Open Source
Linux Kernel Security
Cilium
About your images
Secret Manager
Immutability

Open the pdf

Let’s move on!

I hope this project will keep helping new folks to get onboard with Docker or to figure out what they can do to improve security.

From what I can tell more than 2200 people required those PDFs! I am happy and impressed. I can’t wait to see what we will do next!

A big thanks to CherryServer for hosting the reverse proxy from scaledocker.com.

Programmatically Kubernetes port forward in Go

2019-12-05T06:08:27+00:00

Along the way I saw at least two different ways to manage Kubernetes clusters from a networking prospective. Some companies configure a VPN inside the Kubernetes Cluster, in this way a developer connected in the VPN can reach pods and services.

It is not mandatory but suggested having a good network segmentation in order to be able to manage what a person connected in the VPN can touch and see. Achieving this level of control is not easy in Kubernetes a lot of the open source CNI plugin does not have this feature at all and I understand why in operations this is evaluated as a safe approach. It is very convenient if close an eye because pods and services are just IPs that you can reach from your laptop and if you configure the VPN to push the Kubernetes DNS you can also resolve them as DNS lookup.

The alternative I saw is to lock everybody out leaves as unique way to interact with a service or a pod the command kubectl port-forward. In this way the authentication and authorization method in Kubernetes allows you to decide who can do port-forwarding on what based on namespace for example. Or at least you can use Kubernetes Audit logs to figure out who did port forwarding if something bad happens.

We tried both ways, I was to one pushing for the first, but we never achieved a good segmentation and at some point I got locked down, sadly as it sound. Anyway I like to automate things and I had to figure out a way to make my scripts to work with this new approach.

I started to dig in the kubectl code because we all know that it is capable of doing the port forwarding. I had some trouble figuring out the right parameters and to make them to work but at the end I did it! So here we are! If I can do it you can do it as well!

The main repository with the code and an example is in github.com/gianarb/kube-port-forward, you can run it there. I am gonna explain it a bit here.

It is a simple CLI that mocks what kubectl port-forward already does but I extrapolated the code needed to do and control a port forwarding. I will write here as soon as the reason about why I did that is open source, I am telling it to you right now STAY TUNED! It will be great!

First of all I used the k8s.io/cli-runtime/pkg/genericclioptions library to configure a stream, we already used in the blog post about writing a CLI that uses the same flags as the kubectl. A stream is a struct used by different kubernetes service when they need to get or print information from a stream, in this case I am using os.Stdout, os.Stdin, os.Stderr for simplicity, but where I do not need to print out the output I use a bytes.Stream like this:

var berr, bout bytes.Buffer
buffErr := bufio.NewWriter(&berr)
buffOut := bufio.NewWriter(&bout)

In order to make this code easy to read I had a structure to request the port forwarding for a pod:

type PortForwardAPodRequest struct {
	// RestConfig is the kubernetes config
	RestConfig *rest.Config
	// Pod is the selected pod for this port forwarding
	Pod v1.Pod
	// LocalPort is the local port that will be selected to expose the PodPort
	LocalPort int
	// PodPort is the target port for the pod
	PodPort int
	// Steams configures where to write or read input from
	Streams genericclioptions.IOStreams
	// StopCh is the channel used to manage the port forward lifecycle
	StopCh <-chan struct{}
	// ReadyCh communicates when the tunnel is ready to receive traffic
	ReadyCh chan struct{}
}

And I wrote the function that actually does the port forward:

func PortForwardAPod(req PortForwardAPodRequest) error {
	path := fmt.Sprintf("/api/v1/namespaces/%s/pods/%s/portforward",
		req.Pod.Namespace, req.Pod.Name)
	hostIP := strings.TrimLeft(req.RestConfig.Host, "htps:/")

	transport, upgrader, err := spdy.RoundTripperFor(req.RestConfig)
	if err != nil {
		return err
	}

	dialer := spdy.NewDialer(upgrader, &http.Client{Transport: transport}, http.MethodPost, &url.URL{Scheme: "https", Path: path, Host: hostIP})
	fw, err := portforward.New(dialer, []string{fmt.Sprintf("%d:%d", req.LocalPort, req.PodPort)}, req.StopCh, req.ReadyCh, req.Streams.Out, req.Streams.ErrOut)
	if err != nil {
		return err
	}
	return fw.ForwardPorts()
}

An exercise that I can leave for you is to add Service support to this function, you can open a PR if you like on github.com/gianarb/kube-port-forward.

The Stop and Ready channels are crucial to manage the port forward because as you see in the example it is a blocking operation it means that it will luckily always run inside a goroutine. Those two channels gives you what you need to understand when the port forward is ready to get traffic ReadyCh and you have the capabilities to stop it StopCh.

My example is basic, I am closing the port forwarding when the SIGTERM signal gets notified:

sigs := make(chan os.Signal, 1)
signal.Notify(sigs, syscall.SIGINT, syscall.SIGTERM)
go func() {
    <-sigs
    fmt.Println("Bye...")
    close(stopCh)
    wg.Done()
}()

I just wait until the readyCh tells me that the connection is up and running

select {
case <-readyCh:
    break
}
println("Port forwarding is ready to get traffic. have fun!")

As soon as I coded this feature I saw that it was gonna be an easy but useful post. I wrote a report with O’Reilly about how to extend Kubernetes, you can find more about Go and Kube there. It is a free PDF.

I hope you enjoyed it and let me know what cool things you are gonna do port-forwarding the universe!

OpenTelemetry the instrumentation library, I hope

2019-11-20T08:08:27+00:00

Hello! If you follow my rumbling here or on twitter you know that I like to speak about observability and tracing.

If you don’t know what I am speaking about this is an FAQ about distributed tracing and something about OpenTracing and OpenCensus.

Observability is the ability to figure out what is going on in your application from the outside. In order to do that you need to instrument your applications in order to expose the right information.

The instrumentation is not easy, there are too many developers and too many opinions, too many languages but in order observe a system that cross all the applications and services everything needs to come together in the same way. Otherwise the aggregation will become very a complicated job.

When you instrument an application there is a lot of code to write and inject, you can not do or change it based on the vendor or services you are using to store your telemetry: Zipkin, InfluxDB, NewRelic, HoneyComb and so on.

That’s why over the last couple of years big foundations and companies such as LightSteps, Google, CNCF, Uber tried to get their hands on the democratization of code instrumentation. First with OpenTracing, after that with OpenCensus and now with OpenTelemetry that is the merge between OpenCensus and OpenTracing.

At the beginning when this project went out a was very tired and stressed about the topic. I made a workshop last year about code instrumentation at the CloudConf and I wish it was easier to prepare and develop. At the end attendees were satisfied btw, so I am happy enough.

Since the beginning I had a very bad feeling about OpenTracing and OpenCensus, it is necessary as a project but the fact that we had two ways because they didn’t want to agree on only one for me was unbelievable.

Anyway, now that I pushed that feeling back I will give it another try. I will get my observability workshop and I will refresh it to use OpenTelemetry because as I said, we need a way to instrument applications, cross vendor and cross languages.

Here some link about it:

At KubeCon 2019 lizthegrey gave a demo about OpenTelemetry and I am confident that my experience will be a bit better.

It is not easy to democratize something, even less when you need to change the habit for developers across programming languages. But that’s the goal for OpenTelemetry and I think we need to get there and to make it a commodity. It is not a joke!

If you would like to help me, let me know!

o11y.guru introduction and first set of iterations

2019-11-09T08:08:27+00:00

Introduction

This article is part of a series I am writing about a side project I made called o11y.guru. Who knows what this series or the project itself will become. The reason why I started it was to have my wonderland. A place where I was able to do my mistakes without any intermediary.

This series is about my journey there. I will keep the this Introduction common to all the articles part of this series and I will keep a Table of Content up to date. The best way to follow my journey here is to subscribe to the RSS feed or to follow me on @twitter.

what is this project?

I had this idea to create a mechanism to enable other people that uses twitter to follow a group of leaders in a particular space. I decided to start from the observability (#o11y) because it is the area where I am in at the moment.

o11y.guru is pretty simple, a list of faces and a button, they you press it and if you will authorize the twitter application you will get to follow them.

Table of Content

First day and first set of iterations
Build process and automation driven by simplicity
Monitoring and instrumentation with Honeycomb
The history of the first bug
OpenTelemetry it is time to embrace a unicorn standard
The magic of structured logging
Infrastructure monitoring with InfluxCloud
Infrastructure as code with Terraform and CherryServer. First deploy.
FAQ

olly.guru is a website that I wrote in Go. It lists a group of people active on twitter that I like to follow around monitoring, reliability, observability. It allows you to follow them all in once as well.

A few years ago when I was developing almost in PHP somebody from the community, I don’t remember who was, I am getting old, did a similar website and I thought it was a great idea.

Since then, the project was in the back, in my mind. I am a lazy person when it comes to writing code. I think there is enough useless code around, and I don’t want to incentive that practice. That’s why I tend to write as less code I can.

There are a bunch of reasons why I changed my mind, and I started to do it:

Better mood and I had to try Honeycomb.io, but I never had the right opportunity. I didn’t want to try it with a demo running on my laptop. I have a couple of new friends from CherryServer that supports my crazy ideas, and I was looking for a reason to glue a bunch of reliable infrastructure as code that I actually like. Even if it is usually a task I hate, mainly because there is no code involved.

As Docker Captain and CNCF Ambassador, I have the feeling than a project like that can be re-used.

I made the mistake that everyone does; I started to think about cool technologies and not the problem I was going to solve or the project I was going to write.

I made a react application, just the foldering, and I quickly realized that I do not know how to React, and I was wasting my time. But for this project, I got lucky enough to keep going. I started to think about the problem again, and I decided to make it as simple as possible. In practice almost everything gets generated by an html template and a piloted by a list of names in a txt file. Very easy!

.
├── cmd
│   ├── generate
│   └── www
├── Dockerfile
├── go.mod
├── go.sum
├── index.tmpl
├── Makefile
├── people
│   └── people.go
├── people.txt
├── style
│   ├── css
│   ├── fonts
│   ├── img
│   ├── index.html
│   ├── js
│   ├── node_modules
│   ├── package.json
│   ├── package-lock.json
│   └── scss
├── vendor
└── www

You can see the shape of the project; it has a minimal amount of technologies involved: Go, HTML, Bootstrap 4, and a bit of Javascript. I started from the style directory. I use it for prototyping the HTML, CSS part. I am far away to be good with colors and CSS, and I am cool with that, we do not like each other. So I do all my tests there, and when I am ready, I port the style/index.html into index.tmpl. I started from a Bootstrap 4 layout already done as you can see. It is in their documentation.

index.tmpl is the template I use to render the actual homepage. www is the target destination for all the static files and the generated index page. I use Make to copy files from style into www, and I wrote a CLI that generates the HTML and it populates it with Twitter informations. It is inside ./cmd/generate.

./people.txt is the list of twitter gurus. It is just a list:

gianarb
rakyll

The cmd/generate reads that file and, it gets the information it needs from the Twitter API like user bio, avatar and it renders the ./index.tmpl into the actual index inside the www folder.

./cmd/www is an HTTP server written Go that serves the content of the www directory. Plus it uses:

github.com/dghubble/go-twitter
github.com/dghubble/gologin/v2

To manage the Twitter authentication flow.

I am sure you are wondering, “is he gonna open-source that!?”. I am. Not now. The project needs to refactoring and some code needs to get stronger around instrumentation and logging. As you see in the introduction, I am using this experience as a use case to write down a bunch of practices I like or that I would like to investigate. So stay tuned! It will be available very soon.

tldr lesson learned

Some of lessons I learned comes from how I am, but hey, this is my blog, I can do whatever I like! It is refreshing to start a project, but it is way cooler to have something to show. So be careful when you start it, get it right so you won’t get tired.

Set clear goals, and see point 1, they need to be very easy to achieve, at least at the beginning.

Do not type on your terminal, but write bash scripts. I started to do this months ago at work. Bash scripts are way better than random commands in a terminal because you can move them around composing way more powerful workflows. You won’t lose them. That’s how I built my Makefile, just from the terminal history or from the shortcut I made along the way. Often a well-done dotenv file is enough to manage everything you need.

let’s get to some code

I told you about bash scripts and Makefile, I will write a post about automation for a small project, but this is part of my Makefile:

style/build:
    cd ./style && npm install
    cp -r ./style/node_modules/jquery/dist/jquery.js ./style/js/jquery.js
    cp ./style/node_modules/@fortawesome/fontawesome-free/js/all.js ./style/js/
    cp -r ./style/node_modules/@fortawesome/fontawesome-free/webfonts ./style/fonts
    cd ./style && npm run scss

style/start: style/build
    cd ./style && npm start

style/compile: style/build
    rm -rf ./www
    mkdir ./www
    cp -r ./style/img ./www
    cp -r ./style/fonts ./www
    cp -r ./style/css ./www
    cp -r ./style/js ./www

People can do the same with npm and, a hundred node packages, I like to keep things simple at this point to avoid unnecessary blockers that will get my tired. This is how I manage the style directory and how I build the www one.

With blockers I mean: googling around for things that should be easy.

flag.StringVar(&flags.consumerKey, "consumer-key", "", "Twitter Consumer Key")
flag.StringVar(&flags.consumerSecret, "consumer-secret", "", "Twitter Consumer Secret")
flag.StringVar(&flags.accessToken, "access-token", "", "Twitter access key")
flag.StringVar(&flags.accessSecret, "access-secret", "", "Twitter access secret")
flag.StringVar(&flags.guruFile, "guru-file", "", "File that contains the guru's name")
flag.StringVar(&flags.indexTemplate, "index-template", "", "File that contains the guru's name")
flag.Parse()
flagutil.SetFlagsFromEnv(flag.CommandLine, "TWITTER")

config := oauth1.NewConfig(flags.consumerKey, flags.consumerSecret)
token := oauth1.NewToken(flags.accessToken, flags.accessSecret)
httpClient := config.Client(oauth1.NoContext, token)

// Twitter client
client := twitter.NewClient(httpClient)

// Verify Credentials
verifyParams := &twitter.AccountVerifyParams{
    SkipStatus:   twitter.Bool(true),
    IncludeEmail: twitter.Bool(true),
}
_, _, err := client.Accounts.VerifyCredentials(verifyParams)
if err != nil {
    println(err.Error())
    os.Exit(1)
}

gurus := []*twitter.User{}

lines, err := people.ReadLineByLine(flags.guruFile)
if err != nil {
    println(err.Error())
    os.Exit(1)
}
for _, eachline := range lines {
    user, _, err := client.Users.Show(&twitter.UserShowParams{
        ScreenName: eachline,
    })

The generate command is straightforward, I get over the people.txt file line by line, and for every record, I get information about the user. When I have the slice of gurus populated I render the template:

t, err := template.ParseFiles(flags.indexTemplate)
if err != nil {
    panic(err)
}
err = t.Execute(os.Stdout, Render{
    Gurus: gurus,
})

I decided to print the HTML into the stdout because it is way easier to use > other than accepting another parameter to specify the target output.

LDD: laziness driven development.

The cmd/www uses the same people.txt file to know who to follow when the user presses the Follow button and authorize the twitter application:

for _, eachline := range lines {
    if strings.EqualFold(eachline, me.ScreenName) {
        continue
    }
    time.Sleep(5 * time.Second)
    err = newFriendship(ctx, twitterClient, eachline)
    if err != nil {
        logger.Warn(err.Error(), zap.String("follower_screenname", eachline), zap.Error(err))
    }
}

The project in the project

This series of posts I am writing is a side project in the side project. As I wrote earlier I like to share what and why I do things. I hope to keep having practical experiences to write down.

The high level expectation I set are:

Have fun
Create a good network of followers on twitter that likes to speak about observability
Learning how Honeycomb works and why everybody says that it sounds like magic
Writing down something about code instrumentation, infrastructure as a code and automation
Exercise my experience as a decision maker driven by simplicity and efficiency.
I hope to work with a couple of friends from Docker, HashiCorp, CherryServier, InfluxData, HoneyComb to help me out with secret management, monitoring, terraform and, automation in order to build the coolest project ever. You will get an email from me (or reach out if you have suggestions).

That’s it

I am sure it gets out clearly from this article the friction between the excitement about having an idea and the effort to make it real, even when it is as simple as a single html page.. I struggle with that all the time, and the laziness usually wins. Will this time be different?! Well, I have a domain that is not a blank page. I think it is a good starting point.

Time matter and have fun!

o11y.guru the history of the first bug

2019-11-07T08:08:27+00:00

Introduction

what is this project?

o11y.guru is pretty simple, a list of faces and a button, they you press it and if you will authorize the twitter application you will get to follow them.

Table of Content

First day and first set of iterations
Build process and automation driven by simplicity
Monitoring and instrumentation with Honeycomb
The history of the first bug
OpenTelemetry it is time to embrace a unicorn standard
The magic of structured logging
Infrastructure monitoring with InfluxCloud
Infrastructure as code with Terraform and CherryServer. First deploy.
FAQ

The history of the first bug

After the first deploy I used my twitter account @gianarb and @devcampy to try the application. I have also asked a friend to try it out.

So far so good, the way I coded the following workflow is very basic, and probably it will quickly reach its scalability limit. It is a loop with a time.Sleep(5 * time.Second) break between each account to avoid the Twitter rate limit.


for _, guru := range gurus {
    time.Sleep(5 * time.Second)
    err = newFriendship(ctx, twitterClient, guru)
    if err != nil {
        logger.Warn(err.Error(), zap.Error(err))
    }
}

No retry or things like that for now. Very simple. I hope to iterate on it in the future when it will start to not working well enough anymore.

It does not report any error if the Twitter API request to follow a person fails, it just go to the next one. All three tests went well for what I was able to say, all three accounts followed the gurus.

One of the first benefit about using HoneyComb is that out of the box they are able to detect errors looking at the events you return and the graph is made by them. Just clicking around to their UI I ended up with weirdness like this graph:

I noticed some 500 error page, and I do not like that. As you can see there is an Error tab, built by Honeycomb again and this is what it showed to me:

At this point it is clear to me where the problem is: “You can’t follow yourself”. It sounds reasonable.

I changed the code and I added a simple if statement to skip the guru if it is the person actually following all the other people.

// me comes from above when I validate that the token behaves to a user.

if guru == me.ScreenName {
    continue
}

It does not sound trivial at all but when I tried the fix didn’t work.

I decided to face up the problem differently. Spoiler alert: I didn’t write any unit test yet. Feel free to leave now.

Looking at the trace I knew I had set for every following request the guru name to follow and at the root span I had who required to follow the gurus. In practice, I had in the root span required_by=me.ScreenName, and for every guru its span with their name. The next image has those two spans side by side:

At the left the span newFriendship describes a single following action (a twitter create friendship api request). As you can see it has the error="you can't follow yourself" and the follower_screenname=gianarb.
The one to the right is the root span, it has the required_by=GianArb field, it is the me.ScreenName variable.

Looking at this span the situation is clear, I was comparing GianArb the required_by variable that you see in the right with gianarb, the follower_screenname you see at the left span.

At the end of the story the check needs to be case-insensitive. And that’s how it is now:

if strings.EqualFold(guru, me.ScreenName) {
    continue
}

This is the history of the first but I randomly discovered and I had to fix twice for o11y.guru.

O'Reilly Report Extending Kubernetes

2019-10-07T08:08:27+00:00

A few months ago I wrote a report with O’Reilly called Extending Kubernetes. I realized I didn’t share this major achievement here with all of you!

It comes from my experience not from ops but as a developer that works with Kubernetes.

K8S requires a big effort in terms of maintenance and setup, who says something different lies. Complexity is not too bad and sometime it is a requirement, a good way to justify it is to use kubernetes as much as you can.

Operations and teams need a good UX, Kubernetes is extremely extensible with custom resource definitions, kubectl plugins, controller, shared informers, audit logs and operators.

This report is about it. I meanly use Go for the examples but a lot of them can be re-written using any SDK provided by the Kubernetes community.

It is a practical report, with code examples and ideas about what you can do to integrate your day to day operations with Kubernetes in order to share the pain with the developers.

Now that I recovered from the effort of writing it, stay tuned because I am looking for something else to do!

Read it and let me know if you like it via @gianarb.

Hero image via Pixabay

kubectl flags in your plugin

2019-10-07T08:08:27+00:00

This is not at all a new topic, no hacking involved, but it is something everybody needs to know where we design kubectl plugin.

I was recently working at one and I had to make the user experience as friendly as possible compared with the kubectl, because that’s what a good developer does! Tricks other developers to make their life comfortable, if you are used to do:

$ kubectl get pod -n your-namespace -L app=http

To get pods from a particular namespace your-namemespace filtered by label app=http and your plugin does something similar or it will benefit from an interaction that remembers the classic get you should re-use those flags.

Example: design a kubectl-plugin capable of running a pprof profile against a set of containers.

My expectation will be to do something like:

$ kubectl pprof -n your-namespace -n pod-name-go-app

The Kubernetes community writes a lot of their code in Go, it means that there are a lot of libraries that you can re-use.

kubernetes/cli-runtime is a library that provides utilities to create kubectl plugins. One of their packages is called genericclioptions and as you can get from its name the goal is obvious.


// import "github.com/spf13/cobra"
// import "github.com/spf13/pflag"
// import "k8s.io/cli-runtime/pkg/genericclioptions"

// Create the set of flags for your kubect-plugin
flags := pflag.NewFlagSet("kubectl-plugin", pflag.ExitOnError)
pflag.CommandLine = flags

// Configure the genericclioptions
streams := genericclioptions.IOStreams{
    In:     os.Stdin,
    Out:    os.Stdout,
    ErrOut: os.Stderr,
}

// This set of flags is the one used for the kubectl configuration such as:
// cache-dir, cluster-name, namespace, kube-config, insecure, timeout, impersonate,
// ca-file and so on
kubeConfigFlags := genericclioptions.NewConfigFlags(false)

// It is a set of flags related to a specific resource such as: label selector
(-L), --all-namespaces, --schema and so on.
kubeResouceBuilderFlags := genericclioptions.NewResourceBuilderFlags()

var rootCmd = &cobra.Command{
    Use:   "kubectl-plugin",
    Short: "My root command",
    Run: func(cmd *cobra.Command, args []string) {
		cmd.SetOutput(streams.ErrOut)
    }
}

// You can join all this flags to your root command
flags.AddFlagSet(rootCmd.PersistentFlags())
kubeConfigFlags.AddFlags(flags)
kubeResouceBuilderFlags.AddFlags(flags)

This is the output:

$ kubectl-plugin --help
My root command

Usage:
  kubectl-plugin [flags]

Flags:
      --as string                      Username to impersonate for the operation
      --as-group stringArray           Group to impersonate for the operation, this flag can be repeated to specify multiple groups.
      --cache-dir string               Default HTTP cache directory (default "/home/gianarb/.kube/http-cache")
      --certificate-authority string   Path to a cert file for the certificate authority
      --client-certificate string      Path to a client certificate file for TLS
      --client-key string              Path to a client key file for TLS
      --cluster string                 The name of the kubeconfig cluster to use
      --context string                 The name of the kubeconfig context to use
  -f, --filename strings               identifying the resource.
  -h, --help                           help for kubectl-pprof
      --insecure-skip-tls-verify       If true, the server's certificate will not be checked for validity. This will make your HTTPS connections insecure
      --kubeconfig string              Path to the kubeconfig file to use for CLI requests.
  -n, --namespace string               If present, the namespace scope for this CLI request
  -R, --recursive                      Process the directory used in -f, --filename recursively. Useful when you want to manage related manifests organized within the same directory. (default true)
      --request-timeout string         The length of time to wait before giving up on a single server request. Non-zero values should contain a corresponding time unit (e.g. 1s, 2m, 3h). A value of zero means don't timeout requests. (default "0")
  -l, --selector string                Selector (label query) to filter on, supports '=', '==', and '!='.(e.g. -l key1=value1,key2=value2)
  -s, --server string                  The address and port of the Kubernetes API server
      --token string                   Bearer token for authentication to the API server
      --user string                    The name of the kubeconfig user to use

Kubernetes is not for operations

2019-09-18T08:08:27+00:00

I work in tech from 8 years. It is not a lot but it is something. I started as a PHP developer doing CMS with MySQL and things like that.

Where I saw what I was capable to do with a set of API requests to AWS I enjoyed it and I moved to what people called DevOps probably.

I like communities and people so Docker was everywhere and I became a Docker Captain for my passion about delivery, and development workflow, containers but with always developers in mind. That’s what I like do to. Write code.

The complexity not hidden behind Kubernetes, or not solved by who runs Kubernetes in your company creates that friction.

Everyone that was/is in the containers space more or less touched Kubernetes. I did it, I enjoyed to look at the patterns used by it like control theory, reconciliation loops and so on.

In the last couple of years I saw a lot of company moving to Kubernetes and I worked on that path in InfluxData as well. Yes we use Kubernetes obviously!

I have always sawed friction from developers forced to onboard Kubernetes (no developer will do it otherwise). First because everybody uses YAML and I think yaml is just the wrong answer for your problem, nothing personal with it.

What developers are happy to do is to write code that runs in production and that gives them good challenge to debug and fix. write code is in bold because that’s what we like most. At least the majority of us.

The complexity not hidden behind Kubernetes, or not solved by who runs Kubernetes in your company creates that friction.

Running Kubernetes is not hard, we have tutorials, companies, contractors, cloud providers that can help us out. It is a set of binaries and a database. We run them since ages! There are a good amounts of them, and they need to be configured, connected and there are also a lot of different combinations, but that’s fine. We are used to playing with mobile apps, wordpress plugins and so on.

When I think about myself as a developer I understand why there is this friction, if I was not passionate about containers at the right time to try out Kubernetes I probably even had that friction myself.

It does not help me to write better code or to do something different compared with updating systemd service one by one via ssh. I bet developers working with Kubernetes in a system under real load will likely get back to ssh to the servers one by one deploying their new version of the application to have all the control and visibility they can. That’s what a lot of developers tries to achieve when I look at them using Kubernetes.

What Kubernetes does very well is democratize ops, it provides a common set of concepts that we can use to run applications and very good API that abstract the concrete implementation of containers, VMs, workload, ingress, dns and so on.

We should not west our time trying to run it, we should spend time to make it usable in our company because that’s we can get from k8s.

my recipe

I do not have a recipe, a product or something ready to go. But I think there are two directions I would like to see and to try with a team.

leave yaml

YAML is the wrong answer, it is good to make an impact and to write a document that everyone can read, but your company is not “everybody”, you are pretty unique. You should use the K8S API. I didn’t have time to make a public prototype yet but I will do I promise. You should use the language you know better! I have a lot of experience with go, so my suggestion is to replace yaml with real code, real function and so on. From Kubernetes 1.16 kubectl diff runs server side. Sweet!

split spec file by team

It is very easy to end up with a single Kubernetes YAML file that is crazy long. That file contains everything you run. Across teams, responsabilities and people. Do not do it. Split it in different files or repositories by team or application owners.

DevOps, SRE, Sysadmin, reliability, penguins or what ever you call the team that owns the underline architecture will have the Yaml related to the foundation of the infrastructure. The content of it is not important for other teams, they will only write and see what matters to them.

This approaches will decrease complexity for developers making them probably less worried to screw up part of the infrastructure that is not related to their work.

Conclusion

If you are a developer please develop good code! If you own Kubernetes in your company make it to work for your users.

Reactive planning and reconciliation in Go

2019-09-13T08:08:27+00:00

Reading my recent posts you can spot my attempt to share what I learned, and I am still studying around distributed system, control theory and provisioning.

I wrote a quick introduction about why I think reactive planning is a cloud native pattern and I published an article about control theory, but I have just scratched the surface of this topic obviously. I have a 470 pages to read from the book Designing Distributed Control Systems: A Pattern Language Approach. It will take me forever.

Introduction

It is easier to explain how much powerful reactive planning is looking at one example, I wrote it in go, and in this article I am explaining the most important parts.

Just to summarize, I think resiliency in modern application is crucial and very hard to achieve in practice, mainly because we need to implement and learn a set of patterns and rules. When I think about a solid application inside a microservices environment, or in a high distributed ecosystem my mind drives me into a different industry. I think about tractors, boilers, and what ever does not depend on a static state stored inside a database but on a dynamic source of truth.

When I think about an orchestrator it is clear to me that there is no way to trust a cache layer in order to understand how many resources (VMs, containers, pods) are running. We need to check them live because you never know what is happening to your workload. Those kinds of applications are sensible to latency, and they require a fast feedback loop.

That’s one of the reason about why when you read about Kubernetes internals you read about reconciliation loops and informers.

Our use case

I wrote a small PoC, it is an application that I called cucumber, it is available on GitHub and you can run it if you like.

It is a CLI tools that provisions a set of resources on AWS. The provisioned architecture is very simple. You can define a number of EC2 and, they will be created and assigned to a Route53 record, when the record does not exist the application will create it.

I learned about how to think about problems like that. At the beginning of my career the approach was simple, “I know what to do, I need to write a program that reads the request and does what need to be done”. So you start configuring the AWS client, parsing the request and making a few API requests.

Everything runs perfectly and you succeed at creating 100 clusters. Thing starts to be more complicated, you have more resources to provisioning like load balancers, subnets, security groups and more business logic related to who can do what. Requests start to be more than 5 at execution and the logic somethings does not work as linear as it was doing before. At this point you have a lot of conditions and figuring out where the procedure failed and how to fix the issue becomes very hard.

This is why my current approach is different when I recognize this kind of pattern I always start from the current state of the system.

You can question the fact that at the first execution it is obvious that nothing is there, you can just create what ever needs to be created. And I agree on that, but assuming that you do not know your starting points drives you to implement the workflow in a way that is idempotent. When you achieve this goal you can re-run the same workflow over and over again, if there is nothing to do the program won’t do anything otherwise it is smart enough to realize what needs to be done. In this way you can create something called reconciliation loop.

Reconciliation loop

The idea to re-run the procedures over and over assuming you do not know where you left it is very powerful. Following our example, if the creation flow does not end because AWS returned a 500 you won’t be stuck in a situation where you do not know how to end the procedure, you will just wait for the next re-execution of the flow and it will be able to figure what is already created. In my example this patterns works great when provisioning the route53 DNS record because the DNS propagation can take a lot of time and in order to realize if the DNS record already exists or if there are the right amounts of IPs attached to it I use the net.LookupIP, it is the perfect procedure that can take an unknown amount of time to be addressed.

Reactive planning

At the very least reconciliation loop can be explained as a simple loop that will execute a procedure forever but how do you write a workflow that is able to understand the state of the system and autonomously make a plan to fix the gap between current and desired state? This is what reactive planning does and that’s why control theory is done!

// Procedure describe every single step to be executed. It is the smallest unit
// of work in a plan.
type Procedure interface {
	// Name identifies a specific procedure.
	Name() string
	// Do execute the business logic for a specific procedure.
	Do(ctx context.Context) ([]Procedure, error)
}

// Plan describe a set of procedures and the way to calculate them
type Plan interface {
	// Create returns the list of procedures that needs to be executed.
	Create(ctx context.Context) ([]Procedure, error)
	// Name identifies a specific plan
	Name() string
}

Let’s start with a bit of Go. Procedure and Plan are the fundamental interfaces to get familiar with:

Plan are a collection of Procedures. The Create function is able to figure out the state of system adding procedures dynamically
Procedure are the unit of work, they need to be as small as possible. The cool part about them is that they can return other procedures (and they can return other procedures as well) in this way build resilience. If a procedure returns an error the Plan is marked as failed.

// Scheduler takes a plan and it executes it.
type Scheduler struct {
	// stepCounter keep track of the number of steps exectued by the scheduler.
	// It is used for debug and logged out at the end of every execution.
	stepCounter int
	// logger is an instance of the zap.Logger
	logger *zap.Logger
}

Plan and Procedure are crucial, but we need a way to execute a plan, it is called scheduler. The Scheduler has an Execture function that accept a Plan and it executes it until there is nothing left to do. Procedures can returns other procedures it means that the scheduler needs to recursively execute all the procedures.

The way the scheduler has to understand where the plan is done if via the number of steps returned by the Plan.Create function. The scheduler executes every plan at last twice, if the second time there are not steps left it means that the first execution succeeded.

// Execute accept an plan as input and it execute it until there are not anymore
// procedures to do
func (s *Scheduler) Execute(ctx context.Context, p Plan) error {
	uuidGenerator := uuid.New()
	logger := s.logger.With(zap.String("execution_id", uuidGenerator.String()))
	start := time.Now()
	if loggableP, ok := p.(Loggable); ok {
		loggableP.WithLogger(logger)
	}
	logger.Info("Started execution plan " + p.Name())
	s.stepCounter = 0
	for {
		steps, err := p.Create(ctx)
		if err != nil {
			logger.Error(err.Error())
			return err
		}
		if len(steps) == 0 {
			break
		}
		err = s.react(ctx, steps, logger)
		if err != nil {
			logger.Error(err.Error(), zap.String("execution_time", time.Since(start).String()), zap.Int("step_executed", s.stepCounter))
			return err
		}
	}
	logger.Info("Plan executed without errors.", zap.String("execution_time", time.Since(start).String()), zap.Int("step_executed", s.stepCounter))
	return nil
}

The react function implements the recursion and as you can see is the place where the procedures get executed step.Do.

// react is a recursive function that goes over all the steps and the one
// returned by previous steps until the plan does not return anymore steps
func (s *Scheduler) react(ctx context.Context, steps []Procedure, logger *zap.Logger) error {
	for _, step := range steps {
		s.stepCounter = s.stepCounter + 1
		if loggableS, ok := step.(Loggable); ok {
			loggableS.WithLogger(logger)
		}
		innerSteps, err := step.Do(ctx)
		if err != nil {
			logger.Error("Step failed.", zap.String("step", step.Name()), zap.Error(err))
			return err
		}
		if len(innerSteps) > 0 {
			if err := s.react(ctx, innerSteps, logger); err != nil {
				return err
			}
		}
	}
	return nil
}

All the primitives described in this section are in their go module called github.com/gianarb/planner that you can use. Other than what showed here the scheduler supports context cancellation and deadline. In this way you can set a timeout for every execution.

One of the next big feature I will develop is a reusable reconciliation loop for plans. In cucumber, it is very raw. Just a goroutine and a WaitGroup to keep the main process up:

go func() {
    logger := logger.With(zap.String("from", "reconciliation"))
    scheduler.WithLogger(logger)
    for {
        logger.Info("reconciliation loop started")
        if err := scheduler.Execute(ctx, &p); err != nil {
            logger.With(zap.Error(err)).Warn("cucumber reconciliation failed.")
        }
        time.Sleep(10 * time.Second)
        logger.Info("reconciliation loop ended")
    }
}()

But this is too simple and it does not work in a distributed environment where only one process should run the reconciliation and not all the replicas.

I wrote this code to help myself to internalize and explain what reactive plans means. And also because I think the go community has a lot of great tools that make uses of this concept like Terraform, Kubernetes but there are not low level or simple to understand pieces of code. The next chapter describes how to write your own control plan using reactive planning.

Theory applied to cucumber…

Let’s start looking at the main function:

p := plan.CreatePlan{
    ClusterName:  req.Name,
    NodesNumber:  req.NodesNumber,
    DNSRecord:    req.DNSName,
    HostedZoneID: hostedZoneID,
    Tags: map[string]string{
        "app":          "cucumber",
        "cluster-name": req.Name,
    },
}

scheduler := planner.NewScheduler()
scheduler.WithLogger(logger)

if err := scheduler.Execute(ctx, &p); err != nil {
    logger.With(zap.Error(err)).Fatal("cucumber ended with an error")
}

In cucumber there is only one Plan the CreationPlan. We create it based on the YAML file that contains the requested cluster. For example:

name: yuppie
nodes_num: 3
dns_name: yeppie.pluto.net

And it gets executed by the scheduler. As you can see if the schedule returns an error we do not exit, we do not kill the process. We do not panic! Because we know that things can break and our code is designed to break just a little and in way it can be recovered.

After the first execution the process spins up a goroutine that is the one I copied above to explain a raw and simple control loop.

The process stays in the loop until we kill the process.

In order to test the reconciliation you can try to remove one or more EC2 or the DNS record, watching the logs you will see how inside the loop the scheduler executes the plan and reconcile the state of the system in AWS with the one you described in the YAML.

CUCUMBER_MODE=reconcile AWS_HOSTED_ZONE=<hosted-zone-id> AWS_PROFILE=credentials CUCUMBER_REQUEST=./test.yaml go run cmd/main.go

This is the command I uses to start the process.

The steps I wrote in cucumber are not many and you can find them inside ./cucumber/step:

create_dns_record
reconcile_nodes
run_instance
update_dns_record

run_instance for example is very small, it interacts with AWS via the go-sdk and it creates an EC2:

package step

import (
	"context"

	"github.com/aws/aws-sdk-go/aws"
	"github.com/aws/aws-sdk-go/service/ec2"
	"github.com/gianarb/planner"
	"go.uber.org/zap"
)

type RunInstance struct {
	EC2svc   *ec2.EC2
	Tags     map[string]string
	VpcID    *string
	SubnetID *string
	logger   *zap.Logger
}

func (s *RunInstance) Name() string {
	return "run-instance"
}

func (s *RunInstance) Do(ctx context.Context) ([]planner.Procedure, error) {
	tags := []*ec2.Tag{}
	for k, v := range s.Tags {
		if k == "cluster-name" {
			tags = append(tags, &ec2.Tag{
				Key:   aws.String("Name"),
				Value: aws.String(v),
			})
		}
		tags = append(tags, &ec2.Tag{
			Key:   aws.String(k),
			Value: aws.String(v),
		})
	}
	steps := []planner.Procedure{}
	instanceInput := &ec2.RunInstancesInput{
		ImageId:      aws.String("ami-0378588b4ae11ec24"),
		InstanceType: aws.String("t2.micro"),
		//UserData:              &userData,
		MinCount: aws.Int64(1),
		MaxCount: aws.Int64(1),
		SubnetId: s.SubnetID,
		TagSpecifications: []*ec2.TagSpecification{
			{
				ResourceType: aws.String("instance"),
				Tags:         tags,
			},
		},
	}
	_, err := s.EC2svc.RunInstances(instanceInput)
	if err != nil {
		return steps, err
	}
	return steps, nil
}

As you can see the unique situation where I return an error is if the ec2.RunInstance fails, but only because this is a simple implementation. Moving forward you can replace the return of that error with other steps, for example you can terminate the cluster and cleanup, in this way you won’t left broken cluster around, or if you try other steps to recover from that error leaving at the next executions (from the reconciliation loop) to end the workflow.

From my experience reactive planning makes refactoring or development very modular, as you can see you do not need to make all the flow rock solid since day one, because it is very time-consuming, but you always have a clear entrypoint for future work. Everywhere you return or log an error can be replaced at some point with steps, making your flow rock solid from the observation you do from previous run.

The reconcile_nodes is another interesting steps. Because run_insance only calls AWS and it creates one node, but as you can image we need to create or terminate a random amount of them depending on the current state of the system.

if you required 3 EC2 but there are zero of them you need to run 3 new nodes
if there are 2 nodes but your required 3 we need 1 more
if there are 56 nodes but you required 3 of them we need to terminate 63 EC2s

The reconcile_nodes procedures makes that calculation and returns the right steps:

package step

import (
	"context"

	"github.com/aws/aws-sdk-go/service/ec2"
	"go.uber.org/zap"

	"github.com/gianarb/planner"
)

type ReconcileNodes struct {
	EC2svc        *ec2.EC2
	Tags          map[string]string
	VpcID         *string
	SubnetID      *string
	CurrentNumber int
	DesiredNumber int
	logger        *zap.Logger
}

func (s *ReconcileNodes) Name() string {
	return "reconcile-node"
}

func (s *ReconcileNodes) Do(ctx context.Context) ([]planner.Procedure, error) {
	s.logger.Info("need to reconcile number of running nodes", zap.Int("current", s.CurrentNumber), zap.Int("desired", s.DesiredNumber))
	steps := []planner.Procedure{}
	if s.CurrentNumber > s.DesiredNumber {
		for ii := s.DesiredNumber; ii < s.CurrentNumber; ii++ {
			// TODO: remove instances if they are too many
		}
	} else {
		ii := s.CurrentNumber
		if ii == 0 {
			ii = ii + 1
		}
		for i := ii; i < s.DesiredNumber; i++ {
			steps = append(steps, &RunInstance{
				EC2svc:   s.EC2svc,
				Tags:     s.Tags,
				VpcID:    s.VpcID,
				SubnetID: s.SubnetID,
			})
		}
	}
	return steps, nil
}

As you can see I have only implemented the RunInstnace step, and there is a TODO left in the code, it means that scale down does not work for now. It returns the right steps required to matches the desired state, if there are 2 nodes, but we required 3 of them this steps will return only one RunInstance that will be executed by the scheduler.

Last interesting part of the code is the CreatePlan.Create function. This is where the magic happens. As we saw the schedulers calls the Create functions at least twice for every execution and its responsability is to measure the current state and according to it calculate the required steps to achieve that we desire. It is a long function that you have in the repo, but this is an idea:

resp, err := ec2Svc.DescribeInstances(&ec2.DescribeInstancesInput{
    Filters: []*ec2.Filter{
        {
            Name:   aws.String("instance-state-name"),
            Values: []*string{aws.String("pending"), aws.String("running")},
        },
        {
            Name:   aws.String("tag:cluster-name"),
            Values: []*string{aws.String(p.ClusterName)},
        },
        {
            Name:   aws.String("tag:app"),
            Values: []*string{aws.String("cucumber")},
        },
    },
})

if err != nil {
    return nil, err
}

currentInstances := countInstancesByResp(resp)
if len(currentInstances) != p.NodesNumber {
    steps = append(steps, &step.ReconcileNodes{
        EC2svc:        ec2Svc,
        Tags:          p.Tags,
        VpcID:         vpcID,
        SubnetID:      subnetID,
        CurrentNumber: len(currentInstances),
        DesiredNumber: p.NodesNumber,
    })
}

The code checks if the number of running instances are equals to the desired one, if they are different it calls the ReconcileNodes procedure.

Conclusion

This is it! It is a long article but there is code and a repository you can run! I am enthusiast about this pattern and the work exposed here because I think it makes it clear and I tried to keep the context as small as possible to stay focused on the workflow and the design.

Let me know if you will end up using it! Or if you already do how it is going @gianarb.

Control Theory is dope

2019-09-04T08:08:27+00:00

For the last two years at InfluxData I worked on our custom orchestrator that empower InfluxCloud v1 to run. I have some talk about it at InfluxDays, but they are not recorded so, I can’t really post them here, sadly.

If you are thinking: “Why you should write your own orchestrator?”, I have few answers for you.

Back in the day Kubernetes was not so popular, 4 years ago when InfluxCloud started it was not at least.
We had since the beginning to manage data and state, people still says that Kubernetes is not for them today, image how it was 4 years ago.

Btw now InfluxCloud v2 leverages Kubernetes.

Writing a good orchestrator is super fun! When I started but still today a big part of it are frustrating and not so good but the one we wrote following reactive planning and control theory are lovely! This article is an introduction about Control Theory. Chris Goller Solution Architect at InfluxData was the first person that told me about how Control Theory works in theory, and he pushed me to try reactive planning for our orchestrator.

As Kubernetes contributor I recognized some of those patterns as looking at shared informers, controller and so on. So I understood since the beginning that those patterns was everywhere around me!

Colm MacCárthaigh from Amazon Web Service with his talks (like the one posted here) helped me to find resources to read, more patterns and use cases for it.

Why it works

When I started to work as a Web Developer, designing APIs or websites I had different challenges to face. To write a solid CRUD you put all your effort when a request comes to your API, you validate it, apply transformation to sanitize the input and if it is valid you save it in your database. You need to build good UX, complex validations systems and so on. But what lands in the database is right and rock solid.

There are other systems where you do not have a database that tells you what is right or not. You need to measure the current state, calculate what needs to get back to your desired state and you need to apply what you calculated.

Those systems are everywhere:

The boiler you have at home to keep the water warm needs to constantly check if the desired temperature you set is the current one. What it is stored in its memory is what you desire, not the truth.
The example Colm MacCárthaigh used is the Autoscaler. It keeps checking the state of your system based on the scalability rules you set. For example if CPU is over 70% spin up 3 nodes. The autoscaler measures the current state of your CPUs and when it is over it calculates what needs to be done and it executes the scale up or down.
When you read Kubernetes documentation is will see reference to Controller, reconciliation loop, desired state and so on. All of those concepts come from Control Theory.

Orchestrator but more in general big microservices environment do not have the concept of data locality as we used to have in the past. The data you need can change continously, and they need to collected from different sources and combined in order to calculate what needs to be done.

I think this is the main reason about why patterns coming from Control Theory works well.

If you need to write a program that provisions 3 virtual machines and attach them to a random DNS record you can approach this problems in 2 ways. You can write a procedure that:

Creates 3 instances.
Takes the public IPs.
Creates the DNS record with the IPs as A record.

Another way you have to fix this issue is to start from checking what you have, making a plan to matches what it is not as you desire. So it will look like this:

Check how many instances there are and mark what you need to do, if there are 2 of them you need one, if there are 5 you need to delete 2, you there are 0 of them you need to create all of them.
Check if the DNS record is already there and how many IPs are assigned to it.
If it does exist you do not need to create it but you need to check if the IPs assigned to it are the same of the instances, If they are not you need to reconcile the DNS record fixing the IPs.
The record does not exist? You can create it.

If you are wondering how all those checks makes the system more reliable is because you never know what you already created or what it is already where. Let’s assume you are on AWS. API requests can fails at the middle of your process and you need to know where you are. AWS itself can stop or terminate instances, or some other procedures can do it or for manual mistake.

Approaching the problem in this way allows you to repeat the flow over and over because it idempotent and at every retry the process will be able to reconcile any divergence between what you asked for (3 VMs and one DNS record) and what it is actually running. This process is called reconciliation loop.

101 architecture

Colm MacCárthaigh highlights three major areas around how a successful Control Theory implementation looks like:

Measurement process
Controller
Actuator

Measurement process

The way you retrieve the current state of the system is crucial in order to have a low latency. They are crucial in order to calculate what needs to be done because from the current state your program get different decisions.

Controller

This section is where I have more experience with. The desired state is stored and clear usually. You know here to go. You get the measurements and with this information you need to write a procedure capable of making a plan stating from your current state to get to the desired one.

I wrote a few weeks ago an introduction about reactive planning it is the way I used to calculate a plan.

I am also preparing a PoC in Golang with actual code you can run and test to share in practice what means reactive planning.

Actuator

It is the part that take a calculated plan, and it executes it. I worked a lot with schedulers that are able to take a set of steps and execute them one by one or in parallel based on needs.

Conclusion

Think about one of them problem you have a try to think in a more reactive way, starting from checking where you are and not from doing things. Reliability and stability for your code will improve drastically.

Hack your Google Calendar with gcalcli

2019-08-26T08:08:27+00:00

I am pretty bad with meetings. I forget about them for a lot of different reasons, sometime I do not show up even if few minutes earlier my mind briefly remembered it.

Meetings are not my daily job, and I do not have them with a lot of different people: IPM with my team, one-to-one with my manager, various stand up. I can remember the recurrent one pretty well but it is still an annoying a useless exercise.

When they are not recurrent they are usually out of my small circle of friends and it gets even worst because I do not like to be late or to miss it! I swear I am not like that in real life! I am on time and I prefer to be there earlier.

Anyway! Ryan Betts VP Engineer at InfluxData shared a very nice CLI tools called gcalcli. I love CLI tools as much as I love API! Probably a bit more because they are the perfect glue between server side and the best UX ever (also known as my terminal).

gcalcli is a lovely CLI tool that uses the Google Calendar API to help you to manage your Google Calendar.

You can do a lot of things: list, search, edit, add events and even more. The authentication is well documented you need to create a project on Google Development Platform with Calendar API access. After that you get your credential and you follow the link I just posted! Super easy.

When you are logged I wrote this system unit and a timer in order to check every 10 minutes if there are upcoming events:

[Service]
SyslogIdentifier=gcalcli-notification
ExecStart=/usr/bin/gcalcli remind

[Install]
WantedBy=multi-user.target

[Unit]
Description="Send notification for every meetings set for xxxxx@gmail.com"

[Timer]
OnBootSec=0min
OnCalendar=*:0/10

[Install]
WantedBy=timers.target

The timer runs every 10 minutes this command /usr/bin/gcalcli remind. remind uses notify-send to show a lovely notification.

I set it up for my working calendar and let me tell you it works great! For that reason I was looking for a way to support multiple Google account, because I would like to use it for my personal Google Calendar as well.

There is a global flag for gcalcli called --config-folder, by default it set te none it creates a config file with credentials and preferences in your home directory. If you run gcalcli with that parameter set with a different location:

$ gcalcli --config-folder ~/.gcalclirc-anotheraccount list

The CLI won’t find the configuration file and it will proceed with a brand-new authentication and it will create a new file located where specified. Sweet! I did that trick in order to have the second Google Account configured and I have created a new unit and timer with the right flags and now I get notification from everywhere! So far so good!

Ryan allowed me to share a script he hacked called next, I have it in my bashrc

next() {
    datetime=$(date "+%Y-%m-%dT%H:%M")
    whatwhere=$(gcalcli --calendar name-your-calendar agenda --tsv --details location $datetime 8pm | head -n 1 | awk 'BEGIN {FS = "\t+"} ; {print $5 " " $6}')

    re="([[:digit:]]+)"
    if [[ $whatwhere =~ $re ]]; then
       room="zoommtg://zoom.us/join?confno=${BASH_REMATCH[1]}"
    fi

    echo "What: '$whatwhere'"
    echo "xdg-open $room"
    echo "xdg-open $room" | clipc
}

I use Linux, he uses MacOS, so I changed the script a bit.

xdg-open to make it to work with X, next gets the next closer meeting you have in one particular calendar (name-your-calendar in my case) and it stores on my clipboard (via clicp) the command to join a zoom channel. It is super when you are in a hurry, you will join zoom meetings in a second.

If you use gcalcli and you have other tricks let me know via twitter @gianarb because I would like to try them as well!

I am in love with language servers

2019-07-30T08:08:27+00:00

Hello everybody! I am writing this article because I had a chat with a friend of mine @wdalmut. He is a busy businessman and vimmer like me.

This article is a quick and practical way to understand why language servers are fantastic! Because they are!

When I started to use vim, I was developing almost all the time with PHP. PHP is a tricky language, and back in the day, YouCompleteMe was the way to go to have some autocomplete. However, as I said, PHP was not an excellent language for that because the number of files is enormous, and to load all of them to suggest functions and methods is tricky. Probably it is still like that.

Compared with a couple of years ago, we have more IDE and editors: Atom, VSCode, Sublime, and many more. All of them to be successful requires the same features:

Syntax highlight
autocomplete
Formatting

You can see the language serves as a protocol to abstract and reuse those features, and many more such as go to definition, find all references, show documentation. Vim is almost like WordPress; there is a plugin for everything; for example, there is an excellent vim-go plugin to make vim to work smart with golang. The problem is works for vim and as I said, almost all the editors need the set of shared features just listed to be usable on a daily base.

The community that builds a language has a lexer a parser, and it can traverse the AST for the language that it develops. It has the knowledge and all the building blocks to provide a tool usable by different clients. The way for them to build something reusable is a language server. The clients are different editors.

This story is real, and the Golang community develops gopls (it stays for go please), the Golang language server. I use it with vim, and as a client, I use vim-coc

vim-go >1.20 works with gopls as well, you need to set it explicitly:

let g:go_def_mode='gopls'
let g:go_info_mode='gopls'

This article expresses my love for language server, not for Go or vim-go or vim! Even if I love all of them!

We spent a good amount of time to achieve developer happiness and to boost our productivity.

There are more tools and developers around here. The killer feature for LSP is its ability to create communities and to give us the ability to share reusable code.

Other than the gopls I also use sourcegraph/javascript-typescript-langserver for JavaScript and TypeScript and rust-lang/rls-vscode for rust.

As you can see, rls-vscode looks from the name a VSCode project, but only because also VSCode supports the language server protocol!

Thanks sourcegraph, microsoft and everybody behind the LSP effort!

When do you need a Site Reliability Engineer?

2019-07-05T08:08:27+00:00

I have started to work as a Site Reliability Engineer more than two years ago as first hired SRE at InfluxData. I survived and learned to all the eras that every company that onboard a new position live:

Lack of knowledge about what the job role means
Adjustment
Growth
Re-adjustment
Repeat

You are an SRE not because you care about reliability, everybody cares about reliability but because the system is too complex to be driven by a person that also does other things.

There are no differences with any other “first hired” in a company. Even the first project manager gets hired when the person who was doing that job can’t make it anymore because the company needs somebody 100% focused on the product.

The Site Reliability Engineer as a role should improve service reliability. Visibility, observability, logging, scalability, instrumentation are all areas when it should step to serve better tooling to troubleshoot, identify issues. Because as we all know, even not that complex distributed system are difficult to debug, this complexity is caused by what it is called partial failure. The idea that a distributed system will never fail drastically alltogheter, but it is continuously in a condition of failure mitigated by re-try policies and or redundancy.

The ability to acknowledge a problem before it will get reported by a customer improves reliability.

It is not a responsibility for the Site Reliability Engineer to fix the actual bug in the service, it can. For all those reasons the SRE knows how to code, and it should modify the application, and it needs to be close to the team that builds the service, just as every heterogeneous group has who takes care about the design, UI, deploy, management.

are they the unique people on-call?

Obviously no. It’s hard to reach a scale where you can manage a sustainable rotation only with SREs, and every developer is responsible for the code it ships. If you managed to have a rotation for every service with different people, all of the teammates should be on-call.

The SREs other than being part of the rotation is the person responsible for the MTTR (mean time to repair) and the number of false positive. The Site Reliability Engineer needs to be able to make the MTTR as short as possible, and the number of false positive as low as it can. They should improve how the service is monitored, instrumented, and easy to debug.

do I need an SRE in every service team?

It is hard to quantify a number, but the SREs needs to have a structure that gives them time to hang out together and to see each other as a unique team as well to share knowledge and to avoid the use of too many technologies across the company. Even more, if the company is not at a gigantic scale in term of the number of people. The amount of SREs per team depends on now crucial, and complex reliability for the service is, how big the service team is. You can share SREs with organizations and services if they are not too big or too complicated or if the unit itself has excellent reliability skills embedded in.

What SRE is not

SRE does not replace your ops team; it is not a person with DevOps skills that knows containers and Kubernetes. It knows cloud, containers, and kubernetes because it is a pretty new “unicorn” role.

It is a side effect of being a coder that loves to see its code running smoothly under real load.

Test in production behind slogans

2019-05-27T08:08:27+00:00

What do we test before prod? We do our known unknowns -- does it work? (unit tests). does it fail in ways I can predict?

We need to test our unknown unknowns in production with ✨observability✨. and experiment upon them with chaos engineering! #VelocityConf
— Liz Fong-Jones (方禮真) (@lizthegrey) June 13, 2019

I got inspired by Liz’s tweet recently, and I am writing this post as a reminder for everybody. “Test in prod” is a slogan, a trademark. It doesn’t explain all the concepts behind a sentence as “things bo getter with Coke” hides why or how. Slogans are great as a quick reminder for more articulated ideas. They are useful because in one sentence you can recall to more profound contents inside your brain.

“You” do not test unknown unknowns in production, mainly because you do not know your unknowns. In production, you as a developer validate three kinds of things:

Complicated parts of your system that are not well covered by tests are working.
Something you are working on and you would like to be sure that it is working fine, even if it has a unit test, integration tests and so on.
Crucial part of the system that needs to work or your boss will kick your ass, and you are afraid that you test them even if you just changed a line of CSS.

What “test in prod” means is the fact that somebody, a random customer human or not randomly will trigger an unknown action that will cause an issue. It doesn’t even be to be triggered, it can be an environmental issue. For example, what Twitch call “the refresh storm” is an excellent example of an environmental issue. When a broadcaster has a connectivity issue, all the watcher starts to refresh the page multiple times thinking to solve the problem. As a side effect, the Twitch infrastructure can suffer about a high number of requests. This is a no-Twitch problem that becomes a Twitch problem.

We need to learn and onboard tools and mindset that will help us to improve how fast we can track, record, fix, and learn from an issue. All the question that matters happens in production, and by consequence, we need to stay focused on it. I think a lot of people test in prod in some way.

When your laptop starts, but it restarts by itself after some point you have a problem. You look around, and you notice that your fan doesn’t run anymore. It is a pretty simple issue to solve and detect. You hear that the fan doesn’t make any noise, so you replace it.

I am sorry! Everybody got distracted by distributed system, containers, cloud. 90% of our failures if you know how to design a fault tolerance application are a partial failure! They are a disaster to figure out, understand, and fix! Only a subset of our system may break, for a subset of customers, but the same part works correctly for another subgroup, and you need to figure out why! You should also be able to message that subgroup of customers to say “I am sorry! Shit happens, we are working on it”, proactively!

Conclusion

“test in prod” means all the things I wrote and probably way more! It is reasonable to say that nobody can do anything to avoid “test in prod” to happen, so have fun!

Instrumentation code is a first citizen in a codebase

2019-05-27T08:08:27+00:00

A few years ago a log was very similar to a printf statement with a message that in some way we’re trying to communicate to the outside the current situation of a specific procedure. The format and composition of the message were not crucial, the main purpose was to make it easy enough to read. A full-text search engine is capable of tokenizing and indexing every message for easy lookup and aggregation and it was enough to fix the gap between a human understanding log and something that a program can parse and visualize. Cloud Computing and containers changed the way we architect, visualize and deploy software:

The distribution of our applications. Compared with a more traditional approach our application runs in the smallest but much-replicated units (container, pods, ec2 and so on).
The size of our application (microservices) and by consequence the interaction between them, over a not perfect communication layer (the network).
Applications come and go much more frequently because we have automation that takes care of the number of replicas running inside a system. They are more dynamic and we do not really have a stable identifier as before: hostnames, IPs change more often.

These points increase the importance for us to get applications metrics out from our code because that’s the language our application speaks. We rely on them in order to understand what is going on. We need to realize that logs and metrics have different purposes:

To understand what is going right now
To verify what happened in the past (even from a legal perspective)
To compare

They are not random printf. All these purposes require methodologies and tools. This article will stay focused on the first point: “What is going on?” because it is a question I ask even to myself when I look at the system I wrote or manage and the answer is a real pain to retrieve. To troubleshoot a system we need a very dense amount of information “almost in real time” because that’s when a system is broken “now” and a picture or a sample of older data in order to compare the current situation with something that we can define as “working”. We can not really use old data because our codebase changes frequently (because somebody told us that we can break and develop fast). So there is not a lot of value at looking at high-density data coming from two weeks ago where the codebase was different. That’s why time series databases as InfluxDB have data retention features built in to keep themselves clean. InfluxDB removes the data after a certain amount of time, but with Kapacitor you can aggregate or sample the data to an older retention policy in order to keep what you need in the database. Back in the day, I wrote this article about Opentracing and Opencensus. This is a follow up after another year of working around code instrumentation, observability, and monitoring.

First of all both of them are vendor-neutral projects that help you instrument your applications without lock you with a specific provider. It doesn’t really need to be a bad, evil vendor. If you use the Prometheus client directly in your code, everywhere, you will be locked to it forever or until you will find the right time to move over all your codebase. But it sounds like “change your logger”: something you would like to do magically, one shot without wasting your time.

OpenTracing is 100% for tracing, the problem it solves is about how to instrument your application to send traces. OpenCensus does the same, plus it also takes care of metrics.

These two projects have a major issue, they are TWO different projects. They were not smart enough to agree on the same format and it split the dev community without any reason, sham of you!. Good for us they will be merged together at some point to something called OpenTelemetry. Finally!

Another misunderstanding is around how tracers such as Zipkin, Jager, XRay advertise them self as “opentracing compatible”. When I think about “compatible” I think like a REST API that follow some rules, and for that reason, the SystemA is compatible with SystemB and you can change them transparently.

This is not what happens with tracing infrastructure, because you need to remember that OpenTracing and OpenCensus play in your codebase size, it is not REST or nothing like that.

Compatibility, in this case, means that the tracers (Zipkin, Jaeger, AWS X-Ray, NewRelic) ship an OpenTracing compatible library across many languages that you can change in your codebase in order to point your application to a different tracer without changing the instrumentation code you wrote.

NB: OpenCensus has the same goal for metrics as well

function initTracer(serviceName) {
  var config = {
    serviceName: serviceName,
    sampler: {
      type: "const",
      param: 1,
    },
    reporter: {
      agentHost: "jaeger-workshop",
      logSpans: true,
    },
  };
  var options = {
    logger: {
      info: function logInfo(msg) {
        logger.info(msg, {
          "service": "tracer"
        })
      },
      error: function logError(msg) {
        logger.error(msg, {
          "service": "tracer"
        })
      },
    },
  };
  return initJaegerTracer(config, options);
}

const tracer = initTracer("discount");
opentracing.initGlobalTracer(tracer);

This example comes from shopmany a test e-commerce I wrote. In this case, the tracer is Jaeger, but if you need to change to Zipkin you can probably use zipkin-javascript-opentracing

It is important to evaluate an instrumentation library like OpenCensus, OpenTracing, OpenTelemetry because there is a community that writes and supports libraries across many languages and tracers. it means that you do not really need to write your own library, that sounds a bit like too much! I was very frustrated about the fact that these two libraries was TWO! I can’t wait to see how the result will look like. How easy it is to instrument an application is a key value for a company like Honeycomb.io and this sounds like a good reason for them to have their own instrumentation library (go, js, Ruby), and when they started the ecosystem was different (it is still a mess today as you read) but I hope that OpenTelemetry will push everybody to just work together because understanding what is going in production right now is a hard, messy and amazing challenge.

it is so nice to see how two great open source community such as @InfluxDB and @ntop_org can do togheter. That's how we can solve observability/monitoring challanges all togheter @Chris_Churilo
— gianarb (@GianArb) May 8, 2019

Keep instrumentation

The ability to instrument an application fast, precisely increase your troubleshooting capabilities. Fast your iterate on your instrumentation code faster your will understand what is going on. It is not a one short exercise but it is something you improve everyday based on what you will learn. But your ability to learn depends on how well you can read the language that your applications exposes (let me tell you a secret, it depends on how well you instrument your code).

After two years at InfluxData

2019-05-23T08:08:27+00:00

Hello dudes! I am gonna write down some rambling today.

I am writing this article during my flight back to KubeCon 2019 in Barcelona. I had a great time at our booth speaking with community enthusiasts, customers, and developers. I had to leave a day before for unknown circumstances (I am crazy).

I figure out just in time that I am suppose to take my flight now and not Friday as my brain memorized. I am running to the airport! Bye #KubeCon and sorry for everybody I won't say 👋 to as planned! 😢 See y soon!
— gianarb (@GianArb) May 23, 2019

A few days ago I just realized I am been working at InfluxData for two years, I think it is time for me to share something with you about my experience over there.

Community matters

I moved to Dublin 4 years ago for 1.5 years. I was not able to speak in English and I knew it was an important skill to learn. So I got a job and I move there.

I knew it was not gonna be forever because I love who I left in my hometown and I like Italy. Luckily for me, I love open source and Dublin has a huge set of meetups to attend. The Docker Meetup was my favorite one and also the one where I met @tomwillfixit and @jpetazzo he pushed me to join the amazing Captain program at Docker that was just gonna start at that time.

Anyway, I used and shared my InfluxDB love with the world even before moving to Dublin when I worked at @corleycloud with Walter and we wrote the PHP influxdb-sdk.

At some point, as you can hear from this podcast from The New Stack I obsessed @Chris_Churilo from Influx so much that she referred me there as an SRE. I was so excited, my practical interview toke three hours of troubleshooting a Go application running in Docker. I still remember the test, it was fun and friendly.

Anyway, that’s how I got here. Open Source, community, new friends and some luck!

Timezone ???

InfluxData is a remote friendly company and I was ready to get back to Italy. Everybody warned me about the complexity hidden behind remote working, nobody really told me anything about the fact that all my colleagues will start to work almost when I am ready to leave! InfluxData is very respectful and I can say that in two years I can count with one (maybe two) hands the amount of time I had to open my laptop at some weird time.

I love my laptop, I have it open by myself at a weird time as well!

But after two years I need to admit that it requires a great effort from both sides to work with +9 hours folks for so long. You need to be good at reaching out to them, they need to remember that it is late in your side if they see me frustrated or tired. But it is fun and you learn a lot about yourself and from your teammate all around the globe, I think the effort pays back. Everyone should have the chance to open their mind with cultures that different.

When I joined the people in Europe was not that many, probably 3-4. Now we are more, my team is not made by myself anymore but there are other 6 people and @gitirabassi is in my timezone. It makes everything way easier.

Unicorn SF start-up

This is also my first experience in a “unicorn” startup in US/SF. I do now know if I can define InfluxData as a unicorn startup but every conference I go, even in South Africa, there are people that use or known InfluxDB. So I bet we are pretty unicorn. Since I joined I think we are at least x4 more people and we are still growing (We are hiring). It is excited and stressful.

There are a lot of roles and teams that I never heard before in my career because I worked a lot with small companies and I am very happy to hang out and chat with them when we are face to face to understand how salespeople follow customers or how the outbound sales team can make thousands of calls a day to figure out the right person that should hear about what we do.

Almost all the time people are the bottleneck because it is hard to collaborate in a good way when your work environment keeps changing under your feet. But that’s how this business works and there is a lot to learn about how to survive and how it works.

I am whatever I want! And that’s awesome

I started as a web developer 7 years ago, I moved to automation and devops because I liked to make people comfortable and confident in deploying their code in production and as a developer, I knew the pain and I am happy to solve them.

As SRE I helped to develop a custom orchestrator for our SaaS with stateful workloads and databases. I also enjoyed all the tracing and instrumentation revolution that observability pushes.

I love the people and working from home sometimes brings some loneliness on the table that’s why we have tech communities. That’s why I organize the CNCF Meetup in Turin, I do open source and I go to conferences. There are millions of ways to feel less alone online and offline obviously! Meetups, co-working, friends, beers, and BBQ.

I think I am a bit tired to work so close to infra, ops, and automation. My “developer side” is pushing me back to where I started. The code (no not at all PHP I am sorry).

At InfluxData there are a lot of Golang ROCK starts, so I will look around to understand what I am happy to hack on!

Conclusion

I have no idea where I am trying to go with this rumblings. Re-reading what I wrote it looks my way to thanks all the people that over these two years helped me to grow and get better. I like to think that MAYBE somebody in the same situation having some hard time will stumble to this article and she/he will realize that everything will be all right!.

Keep rocking!

Workshop Design

2019-04-19T08:08:27+00:00

Hello sweet internet! At this point, you should know that I am far to be a lovely happy code. I like to share what I learn and to have a chat about what you are doing. That’s as it is! Feel free to follow my wheeze on twitter/gianarb.

If you don’t know what I am gonna speak about, I can tell you this is another way to enjoy coding!

Recently a friend of mine that organizes the CloudConf in Turin, Italy asked me if I was able to deliver a workshop. Let me say THE workshop, 8 hours of chatting with exercises and questions. I did something like that years ago about AngularJS but hey, this sounds like a challenge, and I love challenges! So I took it. If you read my recent posts you know I have a passion nowadays:

The topic was clear, I have called it “Application instrumentation”. Lovely!

I am driven by passion and purpose. My passion for troubleshooting and the purpose of figuring out what the f happen in production. I was ready to work on it!

Workshop?

This article is about how I prepared the workshop and I hope it can help somebody to avoid the same mistake and also to use some of the material I developed.

I made everything in open source. There are two new repositories on my GitHub one with fake e-commerce I made using 4 different programming languages:

Golang as frontend proxy with a UI in HTTP/JQuery.
Java to do the most secure part of the e-commerce obviously the payment service.
NodeJS to get discounts for the items.
PHP to get the list of items currently available.

You can find the code on github.com/gianarb/shopmany.

I decided to develop a minimum version of the application in order to have it reusable for another purpose. It can be used to build a use case for Kubernetes deployment for example or a CI lesson.

The branch master contains the minimum set of features that I need to have an application that has some sense. But for example, the services are without logs, metrics and tracing because they will be added as exercises from the attendees.

If you check out the workshop you will be able to see in the history a commit for every exercise and applications.

The lessons are available on github.com/gianarb/workshop-observability, every directory is a lesson. The readme contains a couple of information about what where we are, why we should care and one or more exercise to do in practice in order to familiarize with the concepts.

The lessons I developed for the purpose of the CloudConf workshop are:

lesson1 designing a health check endpoint. Adding a single endpoint is a good way to familiarize with a new application and there is so much to learn about how to design a good health check endpoint!
lesson2 is about logging and structure logging. I tried to pick the most popular logging libraries for the languages. Logging using JSON format to open the door for future serialization as an event.
lesson3 is about InfluxDB v1 and the TICK stack. The goal was to serve a monitoring stack that can work with a different structure such as events and traces.
lesson4 is about tracing. Using Jaeger we instrumented and build a trace for the application.

I have also reported an idea of a possible timeline (the one I used at the CloudConf):

09.00 Registration and presentation 09.30 - 13.00 Theory

Observability vs monitoring
Logs, events, and traces
How a monitoring infrastructure looks like: InfluxDB, Prometheus, Jaeger, Zipkin, Kapacitor, Telegraf…
Deep dive on InfluxDB and the TICK Stack
Deep dive on Distributed Tracing

13.00 - 14.00 Launch 14.00 - 17.00 Let’s make our hands dirty 17.30 - 18.00 Recap, questions and so on

Learning during the development

I like to prepare slides, posts, and workshop because I learn a lot along the way about concepts that I usually develop during a long set and frustrating attempts. Or reading a lot of blog posts, books, code. Writing about it helps me to put together what I learned developing easy to understand materials.

This workshop was not a special case. It is not clear for me that even if there is a lot going on with OpenCensus, OpenTracing, and other instrumentation libraries there is still room for improvement.

Instrumenting an application is not anymore just a matter of adding printf around the execution of the code. But it is the way we have to write an application capable of behind debugged and that speaks with the outside in an understandable way.

The course has two different sections: theory and practice.

The theory went well. I do not have a lot to say about it and for me, it is where I am most comfortable with because it looks like a long talk.

The practical part was for sure a bit too long and I didn’t have time to walk all the people over it but the fact that there are all the solutions, the purpose written down helped them to feel less lonely and everyone can follow the resolution is it can do the exercise in practice.

This usually happens because of different skills set or for trouble configuring the environment.

git helped me a lot, every commit has a diff that I used to explain the solution of the lessons. People that were not confidently writing the solution in a particular language had to just cherry-pick the commit in the language they didn’t know.

Collaboration

The practical part was designed to be a collaboration between people. IMHO it helps to feel less “at school” but more as a team that it is something we should feel more comfortable with at work.

I think it worked but not that well. People were supporting and helping each other. But I probably need to cut the lessons in a different way. I think I will remove the influxdb lessons injecting the learning process of only what matters for the course along the other lessons. Next time and I will develop a new lesson about how to parse the logs and push them to InfluxDB for example. (let me know if you would like to help me!)

Feedback

I asked them to do a survey before the end of the course in order to help me get their feeling. There is a lot to do and some of their feedback is part of this article. But in general, I am happy because I have all the material in order and this for me was just a first iteration. I hope to make it better, to grant more feedback from the open source and to run it again! So let me know if you would like to have me on board!

As I said instrumentation is hard and I still hoping to get an easier solution across languages. I tried OpenCensus but I didn’t manage to have it running at I was in the rush so I used Jaeger.

I will develop something about structured logging as I said for sure.

I hope to get a lesson from some as a service provider like HoneyComb for example.

Fun Fact

The youngest person in the room was a student in high school! Wow!

Observability is for troubleshooting

2019-02-28T08:08:27+00:00

Monitoring notifies you when something does not work. You get an alert, a slap in the face based on the priority of the issue. Observability is about troubleshooting, debugging, “looking around.” You don’t use observability techniques only when something doesn’t work.

Mainly because you don’t know where it happens, it can be anytime. You observe during development, locally or in production, anytime.

The ability to use the same observability tools and techniques such as tracing, log analysis and metrics is a tremendous value. You get used to them day by day and not only under pressure, during an outage. I practical trip that I can give you when you are instrumenting an application is about interconnection. You need a way to connect logs, with traces and with metrics.

There is nothing too complicated to understand. Every HTTP request has its own generated ID.

This ID will become the trace ID, and it will be attached to all the logs generated by that request. One of my application I instrumented uses opentracing/opentracing-to and uber-go/zap as the logger. I use a middleware similar to the one provided by the opentracing-contrib/go-stdlib.

Inside an HTTP handler, I configure the logger to add the trace_id for every log:

logger := GetLogger().With(zap.String("api.handler", "ping"))
if intTraceId := req.Context().Value("internal_trace_id"); intTraceId != nil {
    logger = logger.With(zap.String("trace_id", intTraceId.(string)))
}

In this way from this point in time the logger will add the trace_id to every line of log.

With this code req.Context().Value("internal_trace_id") I am retrieving the “trace_id” from the context. In Go every HTTP request has a context attached and this work because inside the middleware I set the trace_id in the context of the request and also as HTTP header:

// This is a temporary fix until this issue will be addressed
// https://github.com/opentracing/opentracing-go/issues/188
// This works only with Zipkin.
zipkinSpan, ok := sp.Context().(zipkin.SpanContext)
if ok == true && zipkinSpan.TraceID.Empty() == false {
  w.Header().Add("X-Trace-ID", zipkinSpan.TraceID.ToHex())
  r = r.WithContext(context.WithValue(r.Context(), "internal_trace_id", zipkinSpan.TraceID.ToHex()))
}

Having the trace_id exposed as header is nice because I can ask and traing everyone to just grab that parameter when they have issues. Or we can code the API consumer in a way that takes care about this value when something doesn’t go as expected.

All these connections are useful to build a context from different sources. This is the secret for happiness and Welcome to my Wonderland!

From sequential to parallel with Go

2019-02-21T08:08:27+00:00

Everything starts as a sequence of events. You have a bunch of things to do and you are not sure how long or hard to manage they will be.

As a pragmatic developer, you go over the list of things, and you make them one by one. The script runs, it works, and everyone is happy.

package main

import (
    "fmt"
    "log"
    "time"
)

func main() {
    list := []string{"a", "b", "c", "d", "e", "f", "g", "h", "i", "l"}
    for _, v := range list {
        v, err := do(v)
        if err != nil {
            log.Printf("nop")
        }
        fmt.Println(v)
    }

}

func do(s string) (string,error) {
    time.Sleep(1*time.Second)
    return fmt.Sprintf("%s-%d", s, time.Now().UnixNano()),nil
}

Let’s execute it:

$ time go run c.go
a-1550742371537033061
b-1550742372537419148
c-1550742373537846015
d-1550742374538086031
e-1550742375538488129
f-1550742376538746707
g-1550742377539047837
h-1550742378539540979
i-1550742379539938404
l-1550742380540339887

real    0m10.174s
user    0m0.149s
sys     0m0.074s

Until something changes from the outside, the outside world is a terrible place.

The list of things to do grows too much, and your program runs too slow to be competitive, so you start to think about parallelization.

Luckily for you, every action doesn’t depend on anything else, so you don’t need to stop if one of them fails or even worst you don’t need to do nothing weird, you skip that, and you log the failure.

There is an easy way to migrate the code about with something that safely runs in parallel just using some built-in functions in Go like channels and WaitGroups.

package main

import (
    "fmt"
    "log"
    "sync"
    "time"
)

func main() {
    fmt.Println("Start")
    parallelization := 2
    list := []string{"a", "b", "c", "d", "e", "f", "g", "h", "i", "l"}
    c := make(chan string)

    var wg sync.WaitGroup
    wg.Add(parallelization)
    for ii := 0; ii < parallelization; ii++ {
        go func(c chan string) {
            for {
                v, more := <-c
                if more == false {
                    wg.Done()
                    return
                }

                v, err := do(v)
                if err != nil {
                    log.Printf("nop")
                }
                fmt.Println(v)
            }
        }(c)
    }
    for _, a := range list {
        c <- a
    }
    close(c)
    wg.Wait()
    fmt.Println("End")
}

func do(s string) (string, error) {
    time.Sleep(1 * time.Second)
    return fmt.Sprintf("%s-%d", s, time.Now().UnixNano()), nil
}

parallelization should be an external parameter that you can change to parallelize more or less. With a parallelization factor of 2 the benchmark looks like:

$ time go run c.go
Start
a-1550742531701829912
b-1550742531701820924
d-1550742532702088077
c-1550742532702180981
e-1550742533702473002
f-1550742533703389899
g-1550742534702714251
h-1550742534703981070
i-1550742535702992582
l-1550742535704308486
End

real    0m5.269s
user    0m0.249s
sys     0m0.078s

Almost half of the time. Let’s try with 5.

$ time go run c.go
Start
e-1550742633337320607
b-1550742633337280491
c-1550742633337474112
d-1550742633337280481
a-1550742633337298154
h-1550742634338002235
i-1550742634338073772
f-1550742634338033897
g-1550742634338019639
l-1550742634338231670
End

real    0m2.145s
user    0m0.144s
sys     0m0.058s

I wrote this article because I like how easy it was for this use case to run in parallel. Based on how complicated your do function is you need to be more careful.

If your do function calls an external service it can fail, or it can rate limit you because you are parallelizing too much. But these are all problem that you can solve increasing the number of safeguards in your code.

Something I learned using this and calling AWS intensively to take snapshots is the fact that EC2 snapshots happen in the background on AWS, so if you have thousands of nodes and you call AWS it will rate limit you or you won’t have a good experience of what happens on the AWS side in reality.

A basic trick is to place a batch delay parameter that sleeps before every execution

v, more := <-c
if more == false {
    wg.Done()
    return
}

// Sleep here!

v, err := do(v)
if err != nil {
    log.Printf("nop")
}
fmt.Println(v)

This is a very crafty fix but if you catch this problem like me when everything is failing this is a safe bullet you should try.

Parallelization is fun, but in reality, it increases complexity. Go servers primitives that are solid foundations but it is not you to instrument your code well enough to be confident about how it works.

I will write the next chapter about this where I will use opencensus or opentracing to trace what is going on here!

Short TTL vs Long TTL infrastructure resource

2019-02-14T08:08:27+00:00

I am excited to listen to a lot of ideas and pains about infra as code and yaml. Everyone is more or less walking in the same direction. This is what I have in my mind atm. More will come. Short TTL vs Long TTL resources https://t.co/XRCOgbB3Rg
— pilesOfAbstractions (@GianArb) February 14, 2019

Recently I gave a talk at the ConfigManagementCamp about “Infrastructure as code” (slides) and I wrote an article about infrastructure as {real} code.

This post is a follow-up focused on how I identify YAML-friendly resources vs. something else.

I don’t hate YAML; I think it is a functional specification language, well supported by a lot of different languages. It works I use it when I need to write parsable and human-friendly files.

In infrastructure as code resources mean, almost everything: a subnet, an ec2, a virtual machine, a DNS record, or a pod.

I reference a single unit you can describe as a resource. The name probably comes from too much CloudFormation specification that I wrote over these years.

Short TTL vs. Long TTL are two different categories that I use to identify them. The resources during the evolution of your infrastructure can move between groups.

Long TTL resources are the one that doesn’t change much. For example, an AWS VPC currently doesn’t change. It gets deleted or replaced, but you can not change the cidr. A Route53 Hosted Zone doesn’t change that often. I am more confident about using specification languages and traditional tools like Terraform, CloudFormation or kubectl and YAML for these resources.

Short TTL resources changes often. Kubernetes deployment and statefulset. Route53 DNS record in my case or Autoscaling Groups. Manage the lifecycle of these kinds of resources via YAML requires a lot of automation and file manipulation that I don’t think it is safe to do. I like a lot more to interact with the API of my provider, ex. AWS or Kubernetes for them. To avoid programs that parse and modify YAML or JSON to deploy a slightly different version of a template I prefer to manipulate actual code. It is what I do every day. I have testing frameworks, libraries and a lot more patterns to use

The location of a resource is dynamic; it can jump from a category to another based on architectural decisions. One example I have is with AWS AutoScaling Groups. I like to use them to manage Kubernetes Nodes (workers). At the beginning when you need a k8s cluster to play with I usually create one autoscaling group with n replicas of the node. The node as the last command joins the cluster via kubeadm. Easy like it sounds. In this case, the autoscaling group is one. It doesn’t change that often. When your use case becomes more realistic, you need a more complicated topoligies. You need pods to go on different nodes with more RAM or more CPU or at least you need to labels or add taints to your cluster to have pods far or closer to others. This means that you end up having more AutoScaling Group with different configuration and usually, they go away and get replaced very often with varying versions of Kubernetes and so on. This dynamicity brought as side effect the request of a more friendly UX for ops, in our case integrated with the kubectl for example. That’s when we promoted AutoScaling Groups from a long TTL to a short TTL resource. We developed a K8S CRD to create autoscaling groups and so on.

The missing part is the reconciliation between long TTL and short TTL. As you can see you end up having YAML or JSON in a repository for the long TTL one and API requests for the short TTL. It means that you can not tell what’s the situation for your short TTL resources looking at your repository. You can see what you run via the kubernetes API, but that’s not what I am looking for. I think GitOps can fix the issue, but I will write more after more tests.

I tried to make these concepts as clear as possible but let me know what you think via twitter @gianarb

Extend Kubernetes via a Shared Informer

2019-02-07T08:08:27+00:00

Kubernetes runs a set of controllers to keep matching the current state of a resource with its desired state. It can be a Pod, Service or whatever is possible to control via Kubernetes. K8S has as core value extendibility to empower operators and applications to expand its set of capabilities. An event-based architecture where everything that matters get converted to an event that can be trigger custom code.

When I think about a problem I have that requires to take action when Kubernetes does something my first target is one of the events that it triggers, example:

New Pod Created
New Node Joined
Service Removed and many, many more.

To stay informed about when these events get triggered you can use a primitive exposed by Kubernetes and the client-go called SharedInformer, inside the cache package. Let’s see how it works in practice.

First of all as every application that interacts with Kubernetes you need to build a client:

// import "os"
// import  corev1 "k8s.io/api/core/v1"
// import  "k8s.io/client-go/kubernetes"
// import  "k8s.io/client-go/tools/clientcmd"


// Set the kubernetes config file path as environment variable
kubeconfig := os.Getenv("KUBECONFIG")

// Create the client configuration
config, err := clientcmd.BuildConfigFromFlags("", kubeconfig)
if err != nil {
    logger.Panic(err.Error())
    os.Exit(1)
}

// Create the client
clientset, err := kubernetes.NewForConfig(config)
if err != nil {
    logger.Panic(err.Error())
    os.Exit(1)
}

As you can see I am commenting the code almost line by line to give you a good understanding about what is going. Now that you have the client we can create the SharedInformerFactory. A shared informer listens to a specific resource; the factory helps you to create the one you need. For this example it lookup the Pod SharedInformer:

 // import v1 "k8s.io/api/core/v1"
 // import "k8s.io/client-go/informers"
// import  "k8s.io/client-go/tools/cache"
// import "k8s.io/apimachinery/pkg/util/runtime"

// Create the shared informer factory and use the client to connect to
// Kubernetes
factory := informers.NewSharedInformerFactory(clientset, 0)

// Get the informer for the right resource, in this case a Pod
informer := factory.Core().V1().Pods().Informer()

// Create a channel to stops the shared informer gracefully
stopper := make(chan struct{})
defer close(stopper)

// Kubernetes serves an utility to handle API crashes
defer runtime.HandleCrash()

// This is the part where your custom code gets triggered based on the
// event that the shared informer catches
informer.AddEventHandler(cache.ResourceEventHandlerFuncs{
    // When a new pod gets created
    AddFunc:    func(obj interface{}) { panic("not implemented") },
    // When a pod gets updated
    UpdateFunc: func(interface{}, interface{}) { panic("not implemented") },
    // When a pod gets deleted
    DeleteFunc: func(interface{}) { panic("not implemented") },
})

// You need to start the informer, in my case, it runs in the background
go informer.Run(stopper)

Knowing about Shared Informers gives you the ability to extend Kubernetes quickly. As you can see it is not a significant amount of code, the interfaces are pretty clear.

Use cases

I used them a lot to write dirty hack but also to complete automation gab a system for example:

We used to have a very annoying error during the creation of a Pod with a persistent volume. It was not a high rate error a restart makes everything to work as expected. A dirty hack is pretty clear; I automated the manual process of restarting the pod with that error using a Shared Informer just like to one I showed you
I am using AWS, and I would like to push some EC2 tags down as kubelet labels. I use a shared informer but this time to watch when a new node joins the cluster. From the new node I can get its AWS instanceID (it is a label itself), and with the AWS API. I can retrieve its tags to identify how to edit the node itself via Kubernetes API. Everything is part of the AddFunc in the shared informer itself.

Complete Example

This example is a function go program that logs when a new node that contains a particular tag joins the cluster:

package main

import (
    "fmt"
    "log"
    "os"

    corev1 "k8s.io/api/core/v1"
    "k8s.io/apimachinery/pkg/util/runtime"

    "k8s.io/client-go/informers"
    "k8s.io/client-go/kubernetes"
    "k8s.io/client-go/tools/cache"
    "k8s.io/client-go/tools/clientcmd"
)

const (
    // K8S_LABEL_AWS_REGION is the key name to retrieve the region from a
    // Node that runs on AWS.
    K8S_LABEL_AWS_REGION = "failure-domain.beta.kubernetes.io/region"
)

func main() {
    log.Print("Shared Informer app started")
    kubeconfig := os.Getenv("KUBECONFIG")
    config, err := clientcmd.BuildConfigFromFlags("", kubeconfig)
    if err != nil {
        log.Panic(err.Error())
    }
    clientset, err := kubernetes.NewForConfig(config)
    if err != nil {
        log.Panic(err.Error())
    }

    factory := informers.NewSharedInformerFactory(clientset, 0)
    informer := factory.Core().V1().Nodes().Informer()
    stopper := make(chan struct{})
    defer close(stopper)
    defer runtime.HandleCrash()
    informer.AddEventHandler(cache.ResourceEventHandlerFuncs{
        AddFunc: onAdd,
    })
    go informer.Run(stopper)
    if !cache.WaitForCacheSync(stopper, informer.HasSynced) {
        runtime.HandleError(fmt.Errorf("Timed out waiting for caches to sync"))
        return
    }
    <-stopper
}

// onAdd is the function executed when the kubernetes informer notified the
// presence of a new kubernetes node in the cluster
func onAdd(obj interface{}) {
    // Cast the obj as node
    node := obj.(*corev1.Node)
    _, ok := node.GetLabels()[K8S_LABEL_AWS_REGION]
    if ok {
        fmt.Printf("It has the label!")
    }
}

Serverless means extendibility

2019-01-22T08:08:27+00:00

I wrote an article about a GitHub Action I recently created to deploy my code to kubernetes. Very nice. Writing the action and the post, I realized what serverless is all about. I wrote it in the incipit of the article, but I think this topic deserves its dedicated post. Serverless is not yet for web applications. I know some of you will probably disagree but this is my blog, and that’s why I have one, to write whatever I like!

I used Lambda and API Gateway to distribute two pdf I wrote about “how to scale Docker”, it looks to me way more complicated than a go daemon. So I wrote them because I got the free tier and because I like to try new things. There are excellent applications written in that way for example acloud.guru but I am probably not ready for that! My bad

Anyway, I know what I am ready for: We should use serverless to offer extendibility for our as a service platform.

Good for us distributed system and hispters applications are all based on events, Kafka and so on. Plus now we have runC, buildkit and a lot of the building blocks useful to implement a solid serverless implementation.

It is not easy, at scale this is a complicated problem but we are in a better situation now, and it is a massive improvement from a product perspective:

Using containers, we can offer total isolation, and we can take a very carefully and self defensive approach.
An API already provides extendibility but, you still need to have your server and to run your application by yourself to enjoy them. With a serverless approach, it will be much easy for the customer to implement their workflow.
You can ask your customer to share their implementation creating a vibrant and virtuous ecosystem.

You can use a subset of the event that you write in Kafka as a trigger for the function, VaultDB to store secrets that will be injected inside the service and so on.

There is a lot more, but I am excited! Is somebody doing something like that? If so, let me know @gianarb, I would like to chat!

GitHub actions to deliver on kubernetes

2019-01-22T08:08:27+00:00

Recently GitHub released a new feature called Actions. To me, it looks like the best implementation I can think of for serverless. I used AWS Lambda and API Gateway for some basic API, and I wrote a prototype of an application capable of running functions using containers called gourmet I don’t buy the fact that it will make my code easy to manage. At least not to write API or web applications.

I used the #GitHubActions to verify and deploy code to a #kubernetes cluster https://t.co/nfkjmYKPKs I am impressed about how wonderful this feature is designed and implemented! @Github you 🤘!
— :w !sudo tee % (@GianArb) January 22, 2019

That’s why I like what GitHub did because they used serverless for what I think it is designed for, extensibility.

GitHub Actions just like Lambda functions on AWS are a powerful and managed way to extend their product straightforwardly.

With AWS Lambda you can hook your code to almost whatever event happens: EC2 creations, termination, route53 DNS record change and a lot more. You don’t need to run a server, you load your code, and it just works.

Jess Frazelle wrote a blog post about “The Life of a GitHub Action, and I decided to try something I had my mind since a couple of weeks but it required a CI server, and it was already too much for me.

Time to time I like the idea to have a kubernetes cluster that I can use for the testing purpose, so I created a private repository that it is not ready to be open source because it is a mess with secrets inside and so on.

In any case, to give you an idea, this is the project’s folder:

├── .github
│   ├── actions
│   │   ├── deploy
│   │   │   ├── deploy
│   │   │   └── Dockerfile
│   │   └── dryrun
│   │       ├── Dockerfile
│   │       └── dryrun
│   └── main.workflow
└── kubernetes
    ├── digitalocean.yaml
    ├── external-dns.yaml
    ├── micro.yaml
    ├── namespaces.yaml
    ├── nginx.yaml
    └── openvpn.yaml

The kubernetes directory contains all the things I would like to install in my cluster. For every new push on this repository, I would like to check if it can be applied to the kubernetes cluster with the command kubectl apply -f ./kubernetes --dryrun and when the PR is merged the changes should get applied.

So I created my workflow in .github/main.workflow: ( I left some comment to make it understandable)

## Workflow defines what we want to call a set of actions.

## For every new push check if the changes can be applied to kubernetes ## using the action called: kubectl dryrun
workflow "after a push check if they apply to kubernetes" {
  on = "push"
  resolves = ["kubectl dryrun"]
}

## When a PR is merged trigger the action: kubectl deploy. To apply the new code to master.
workflow "on merge to master deploy on kubernetes" {
  on = "pull_request"
  resolves = ["kubectl deploy"]
}

## This is the action that checks if the push can be applied to kubernetes
action "kubectl dryrun" {
  uses = "./.github/actions/dryrun"
  secrets = ["KUBECONFIG"]
}

## This is the action that applies the change to kubernetes
action "kubectl deploy" {
  uses = "./.github/actions/deploy"
  secrets = ["KUBECONFIG"]
}

The secrets are an array of environment variables that you can use to set values from the outside. If your account has GitHub Action enabled there is a new Tag inside the Settings in every repository called “Secrets.”

You can set key-value pairs usable as you see in my workflow. For this example, I set the KUBECONFIG as the base64 of a kubeconfig file that allows the GitHub Action to authorize itself to my Kubernetes cluster.

Both actions are similar the first one is in the directory .github/actions/dryrun

├── .github
    ├── actions
        └── dryrun
            ├── Dockerfile
            └── dryrun

It contains a Dockerfile

FROM alpine:latest

## The action name displayed by GitHub
LABEL "com.github.actions.name"="kubectl dryrun"
## The description for the action
LABEL "com.github.actions.description"="Check the kubernetes change to apply."
## https://developer.github.com/actions/creating-github-actions/creating-a-docker-container/#supported-feather-icons
LABEL "com.github.actions.icon"="check"
## The color of the action icon
LABEL "com.github.actions.color"="blue"

RUN     apk add --no-cache \
        bash \
        ca-certificates \
        curl \
        git \
        jq

RUN curl -L -o /usr/bin/kubectl https://storage.googleapis.com/kubernetes-release/release/v1.13.0/bin/linux/amd64/kubectl && \
  chmod +x /usr/bin/kubectl && \
  kubectl version --client

COPY dryrun /usr/bin/dryrun
CMD ["dryrun"]

As you can see to describe an action, you need just a Dockerfile, and it works the same as in docker. The CMD dryrun is the bash script I copied here:

#!/bin/bash

main(){
    echo ">>>> Action started"
    # Decode the secret passed by the action and paste the config in a file.
    echo $KUBECONFIG | base64 -d > ./kubeconfig.yaml
    echo ">>>> kubeconfig created"
    # Check if the kubernetes directory has change
    diff=$(git diff --exit-code HEAD~1 HEAD ./kubernetes)
    if [ $? -eq 1 ]; then
        echo ">>>> Detected a change inside the kubernetes directory"
        # Apply the changes with --dryrun just to validate them
        kubectl apply --kubeconfig ./kubeconfig.yaml --dry-run -f ./kubernetes
    else
        echo ">>>> No changed detected inside the ./kubernetes folder. Nothing to do."
    fi
}

main "$@"

The second action is almost the same as this one, the Dockerfile is THE same, so I am not posting it here, but the CMD looks like this:

#!/bin/bash

main(){
    # Decode the secret passed by the action and paste the config in a file.
    echo $KUBECONFIG | base64 -d > ./kubeconfig.yaml
     # Check if it is an event generated by the PR is a merge
    merged=$(jq --raw-output .pull_request.merged "$GITHUB_EVENT_PATH")
    # Retrieve the base branch for the PR because I would like to apply only PR merged to master
    baseRef=$(jq --raw-output .pull_request.base.ref "$GITHUB_EVENT_PATH")

    if [[ "$merged" == "true" ]] && [[ "$baseRef" == "master" ]]; then
        echo ">>>> PR merged into master. Shipping to k8s!"
        kubectl apply --kubeconfig ./kubeconfig.yaml -f ./kubernetes
    else
        echo ">>>> Nothing to do here!"
    fi
}

main "$@"

That’s everything, and I am thrilled!

There is nothing more to say other than “GitHub actions are amazing!”. They look well designed since day! The workflow file has a generator that even if I didn’t use it because I don’t like colors, it seems amazing. The secrets allow us to do integration with third-party services out of the box and you can use bash to do whatever you like! Let me know what you use them for on Twitter.

Why I speak at conferences

2019-01-15T08:08:27+00:00

I have recently counted the number of conferences listed here on my blog, and I realized that I am over 50 talks! I decided to write this post about why I have started and why I speak at conferences as celebration.

Community

Everything started when I learned how to develop. When I left university (it was not the best accomplishment) I began to work in one of the Lab to build a CMS in PHP to manage their instruments. It was my first experience ever and the worst scenario I was a solo-man in that company. Kind of a dangerous first job to me I that’s why I call my experience a community-driven success. I wrote the application three times:

Spaghetti code until I reached the limit for the codebase.
partially re-wrote it with Classes.
I jumped into IRC, and I discovered the community behind Zend Framework.

They helped me to figure out how to write the proper version of it. I am in love with all this open source people that were there to help a newbie like me, and I discovered the PHP Meetup in my city, Turin. Thank the people I met during an event I got my second job in a proper company, with other developers and servers in the basement!

To summarize, the community gave me a lot since my first day: things to learn, new friends and mentors, and a job. It is natural to return everything I can.

I gave my first talk at one of the local Meetup about Vagrant in 2013. I heart about it on some IRC channel; it was not popular in Italy yet. So the perfect chance to give something back to all the people that helped me.

Today after a couple of years technologies and motivations changed, but this is how I started. I like to be part of a community, that’s why open source is so important to me. And I want to share what I do and to learn from other people.

Italy is too small

we are privileged, AND we are hustlers. both true <3
— Charity Majors (@mipsytipsy) January 6, 2019

The side effect about being part of an open source community is globalization. You have teammates from everywhere, and you discover great use cases every day. The way I look at Computer Science requires new challenges and issues to solve, and I am not ready to take a nap solving too easy problems. This means that I need to take risks, I changed a lot of companies, and to do that, in some way you need to share what you are capable off, you need to put your face out there.

This is a rephase from a recent tweet from @rakyll or at least how I interpreted it.

Speaking at conferences is an excellent way to discover what other teams are doing and to meet smart dudes that can turn out to be great mentors.

Remote Work

Two years ago I went back from Dublin, and I decided to try remote working. I enjoyed it, and it is hard for me to get back at the moment. Working at home means that I don’t have a lot of social interaction. I am alone for about 8 hours a day, you can fix it moving to a coworking, but conferences or meetups are a great way to get out! You don’t need to go far away, that’s why I run a meetup in Turin about cloud computing feel free to let me know when you jump in if you would like to speak.

Conclusion

I am not a developer advocate, and I don’t work in the marketing or sales team for this reason you need to have support from your company. This is not easy, a lot of people think that if you have social skills and you are not similar to a robot you are not a good coder.

I always had managers that helped me to keep going, and I appreciate it. I work in a startup and time to time is InfluxData that needs technical people at the conferences that they sponsor, and luckily I love to speak about topics that are related to what I do at work or to what my company does like monitoring, observability, distributed system, and clouds, so I am always happy to go!

That’s it! Let me know why you speak at conferences via @twitter, and I hope my experience will help more people to share their experiences; you are great! I will probably follow up with other articles about how I approach a conference or an abstract so let me know if you would like to read it as well!

not really!

During the process or writing and digesting this post I realized how important conferences are for me as person and how sad is that not everyone can enjoy them as I do even if they would like to do it. There are plenty of reasons but I am not speaking about laziness here. I am speaking about under represented people or who can not afford to pay or it is not supported by its company.

Luckily there are a lot of groups that we can support to mitigate this problem and to figure out new ways to bring more people in and to build a comfortable and friendly environment for everyone. This is a win for everyone! ProjAlloy, WhomanWhoCode accepts donations, but even if you can not give money our you can look around when you attend a conference in order to be nice and how to make everyone around you too feel good!

Best Regards, Gianluca

[1] if you don’t know where to start you can pick a Meetup close to your place! They are always looking for a speaker and a smaller community can help you to give your first talk! I usually try my new talks in a meetup too!

[2] Be open during interviews, if you like to speak at conference you need to convince the new company that it is a valuable skills for them too!

testcontainer library to programmatically provision integration tests in Go with containers

2019-01-08T08:08:27+00:00

There are a lot of information in the title I know, but I am not good enough to make it simple.

Back in the days, I tried to make some contribution to OpenZipkin an open source tracing infrastructure in Java. I never really worked in that language, and apparently, I failed, but it wasn’t all a waste of time.

OpenZipkin has an excellent integration test suite, and I liked the approach it took to write integration tests for all the backends it supports MySql, ElasticSeach, Cassandra.

Provision the integration test environment is complicated even when you do it wrong:

Without a per test isolation.
Without a cleanup process.
Without putting the right effort to have isolated tests.

If you try to make integration tests in the right way, you will have a very hard time, but Zipkin uses a project called testcontainers-java. It is a library that wraps the Docker SDK to offer a friendly API to write integration tests using containers.

Why containers

In 2019 everyone knows the answer, containers are great for integration testings because they are a lightweight and flexible technology. Docker provides the architecture that simplifies how you can turn them on and off.

You can spin up a bunch of containers for every integration tests, they will be fresh new, and you can terminate them at the end of the tests. This increases isolation a lot, and it makes your tests more stable and easy to reproduce.

Golang

I develop in Go every day I loved the approach, so I decided to port that library to Golang and it eventually get moved to the testcontainers GitHub organization under the repository testcontainers/testcontainers-go.

There is a lot to do but I think at this point the API is stable and we have everything we need to use it. All the rest will be driven by yourself asking for new features or from contributors that will port more things from the java project.

This is our “Hello World.”

package main

import (
    "context"
    "fmt"
    "net/http"
    "testing"

    testcontainers "github.com/testcontainers/testcontainers-go"
)

// TestNginxLatestReturn verifies that a requesto to root returns 200 as status
// code
func TestNginxLatestReturn(t *testing.T) {
    ctx := context.Background()
    // Request an nginx container that exposes port 80
    req := testcontainers.ContainerRequest{
        Image:        "nginx",
        ExposedPorts: []string{"80/tcp"},
    }
    nginxC, err := testcontainers.GenericContainer(ctx, testcontainers.GenericContainerRequest{
        ContainerRequest: req,
        Started:          true,
    })
    if err != nil {
        t.Error(err)
    }
    // At the end of the test remove the container
    defer nginxC.Terminate(ctx)
    // Retrieve the container IP
    ip, err := nginxC.Host(ctx)
    if err != nil {
        t.Error(err)
    }
    // Retrieve the port mapped to port 80
    port, err := nginxC.MappedPort(ctx, "80")
    if err != nil {
        t.Error(err)
    }
    resp, err := http.Get(fmt.Sprintf("http://%s:%s", ip, port.Port()))
    if resp.StatusCode != http.StatusOK {
        t.Errorf("Expected status code %d. Got %d.", http.StatusOK, resp.StatusCode)
    }
}

This is a straightforward test, but you can imagine a lot of other use cases. Let’s say that you need to test how your application A interact with an application B that depends on Redis. You can programmatically build the environment you need in the tests:

// You spin up the Redis container
req := testcontainers.ContainerRequest{
    Image:        "redis",
    ExposedPorts: []string{"6379/tcp"},
}
redisC, err := testcontainers.GenericContainer(ctx, testcontainers.GenericContainerRequest{
    ContainerRequest: req,
    Started:          true,
})
if err != nil {
    t.Error(err)
}
defer redisC.Terminate(ctx)
ip, err := redisC.Host(ctx)
if err != nil {
    t.Error(err)
}
redisPort, err := redisC.MappedPort(ctx, "6479/tcp")
if err != nil {
    t.Error(err)
}

// Spin up Application B
appB, err := testcontainers.GenericContainer(ctx, testcontainers.GenericContainerRequest{
    ContainerRequest: req,
    Started:          true,
    Env: map[string]string{
        "REDIS_HOST": fmt.Sprintf("http://%s:%s", ip, redisPort.Port()),
    },
})
if err != nil {
    t.Error(err)
}
ipB, err := redisC.Host(ctx)
if err != nil {
    t.Error(err)
}
portB, err := redisC.MappedPort(ctx, "8081/tcp")
if err != nil {
    t.Error(err)
}

defer appB.Terminate(ctx)
defer redis.Terminate(ctx)

// Now you can use the go function from your application A that interact with
// application B
bclient := appA.NewServiceBClient(ipB, portB)
content, err := bclient.GetKey("my-key")

// Check what you need to check

Programmable environment is the key

I wrote about my relationship with infrastructure as code in a previous article but once again the fact that you can programmatically build your infrastructure using real code is the key for all this flexibility.

As plus for integration tests, you can build the environment you need from inside the test case itself, this ability provides significant control over it.

If you need to worm up etcd with some data, you spin up the etcd container and you push your data using the traditional Go etcd client:

// Spin up Etcd
req := testcontainers.ContainerRequest{
    Image:        "quay.io/coreos/etcd:latest",
    ExposedPorts: []string{"2379/tcp"},
}
etcdC, err := testcontainers.GenericContainer(ctx, testcontainers.GenericContainerRequest{
    ContainerRequest: req,
    Started:          true,
})
if err != nil {
    t.Error(err)
}
defer etcdC.Terminate(ctx)
ip, err := etcdC.Host(ctx)
if err != nil {
    t.Error(err)
}
etcdPort, err := redisC.MappedPort(ctx, "2379/tcp")
if err != nil {
    t.Error(err)
}

// Configure the etcd client
cfg := client.Config{
    Endpoints:               []string{"http://" + ip + ":" + etcdPort},
    Transport:               client.DefaultTransport,
    // set timeout per request to fail fast when the target endpoint is unavailable
    HeaderTimeoutPerRequest: time.Second,
}
c, err := client.New(cfg)
if err != nil {
    log.Fatal(err)
}
kapi := client.NewKeysAPI(c)

// Set the key foo
resp, err := kapi.Set(context.Background(), "/foo", "bar", nil)

I wrote this article because after a few weeks of coding and revisions I have finally tagged v0.0.1 and the library is ready to be tried we need feedback and feature requests to Prioritize the work to do. So feel free to try it and to open GitHub issues.

Infrastructure as (real) code

2018-12-31T08:08:27+00:00

I got different signals from the internet around the topic infrastructure as code. I worked with a lot of configuration management tools: Chef, Ansible, Salt. All of them are good and bad almost in the same way, for me it is mainly a boring syntax switch between them. That’s one of the reasons I have a repulsion for these kind of tools. This year at InfluxData we moved to Kubernetes, and I had the chance to see how a migration like that works, and the unique privileges to work with my collagues to design how the end result looks like, even if it is a never ending work in progress based on the feedback that we get from outself and other teams. So I think at this point I can try to explain why I think infrastructure as code today doesn’t work.

I’m starting to think the industry didn’t get the point of “infrastructure as code”. That people believe codified infrastructure is checking YAMLs into a git repo is troubling.
— Dan Woods (@danveloper) December 29, 2018

Configuration management are not entirely useless, but it is like learning a new framework, there is always something good to learn but it is just a framework. If you pick the cooler Javascript one, you will probably get a well-paid job in a startup with candies and a flexible workplace but I am always interested in learning the underline architecture and patterns. The reconciliation loop that ReactJS built to interact with the DOM is pretty nice, or the one that Kubernetes has to manage all the resources. Architecture, design patterns are well more useful that syntactic sure that you can get from the framework itself even more when the “sugar” looks like this:

- name: "(Install: All OSs) Install NGINX Open Source Perl Module"
  package:
    name: nginx-module-perl
    state: present
  when: nginx_type == "opensource"
- name: "(Install: All OSs) Install NGINX Plus Perl Module"
  package:
    name: nginx-plus-module-perl
    state: present
  when: nginx_type == "plus"
- name: "(Setup: All NGINX) Load NGINX Perl Module"
  lineinfile:
    path: /etc/nginx/nginx.conf
    insertbefore: BOF
    line: load_module modules/ngx_http_perl.so;
  notify: "(Handler: All OSs) Reload NGINX"

The above code is an Ansible script that I took from a randomly from the nginx role

piVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: { template "drone.fullname" . }}-agent
  labels:
    app: { template "drone.name" . }}
    chart: "{ .Chart.Name }}-{ .Chart.Version }}"
    release: "{ .Release.Name }}"
    heritage: "{ .Release.Service }}"
    component: agent
spec:
  replicas: { .Values.agent.replicas }}
  template:
    metadata:
      annotations:
        checksum/secrets: { include (print $.Template.BasePath "/secrets.yaml") . | sha256sum }}
{- if .Values.agent.annotations }
{ toYaml .Values.agent.annotations | indent 8 }
{- end }
      labels:
        app: { template "drone.name" . }}
        release: "{ .Release.Name }}"
        component: agent

This is an help chart I took from the official GitHub repository.

To be clear, when I imagine a sweet dessert full of sugar it is way different compared with what I have pasted above.

Both of them work with a template engine that is capable of rendering a template that looks like YAML. I will never buy that infrastructure as code doesn’t use the real code but a serialization language.

If you don’t know why YAML or JSON or HCL these are a set of reasons that you will hear:

The curve to learn YAML, JSON, HCL is way more friendly than a proper language like Go, Javascript, PHP or whatever.
You don’t have all the utilities that a language provides but only what the template engine exposes. This should help you and your team to avoid terrible mistakes.

These concerns was reasonably at the beginning, when the DevOps culture started, but now everyone has a good sense of how to code. We do code review, and we have a lot more experience around patterns and API to handle infrastructure provisioning.

If you know Kubernetes, it has powerful API that you can leverage to write automation code, same for cloud provider like AWS, GCP or OpenStack.
Reconciliation loop, informer, Workqueue, Controller and CRDs are concepts from Kubernetes that you can reuse.
I wrote about reactive planning and its application in cloud.

if people refuse to learn things, fire them.
if your management won't fire people for not pulling their weight, quit.

ENGINEERS: we live in a golden age of opportunity. please use it while it lasts. https://t.co/OdB24UNl9X
— Charity Majors (@mipsytipsy) December 28, 2018

All the concerns that I raised in favor of YAML, JSON vs. code drives to the risk of writing bad code, but I think there is no way to “remove bad code.” Even code that looks good today will look bad tomorrow. Find a way to mitigate the risk is admirable but I don’t think YAML is the right solution, a code architecture, the right patterns, testing, documentation and code review are the way to go.

Today therea are people with the right skills to write good code even around infrastructure, and if you use real code you will have:

A reacher set of libraries and tools based on the language that you will pick.
Unit, integration test frameworks.
Compiling or interpreting an actual language will highlight more syntax errors that every template engine.
Code is way more fun!
You can import your code and you don’t need to make trick things to join Kubernetes template together
You can instantiate new objects, apply transformations of them from the same structure to reuse the code that describes your resources (aws autoscaling group, kubernetes ingress or whatever).

This discussion applied to a real word situation with Kubernetes used not via YAML but with the Go struct provided by the kubernetes/client-go

apiversion: apps/v1
kind: deployment
metadata:
  name: micro
  namespace: micro
  labels:
    app: micro
    component: micro
spec:
  replicas: 12
  selector:
    matchlabels:
      app: micro
  template:
    metadata:
      labels:
        app: micro
    spec:
      containers:
      - name: microapp
        image: gianarb/micro
        ports:
        - containerport: 8080
        env:
        - name: SLACK_TOKEN
          valuefrom:
            secretkeyref:
              name: slack
              key: token
        - name: SLACK_USERNAME
          value: "myuser'
        resources:
          limits:
            memory: 128mi
          requests:
            memory: 100mi

This YAML translated to Golang:

func newMicroDeployment() *appsv1.Deployment {
    return &appsv1.Deployment{
        TypeMeta: metav1.TypeMeta{
            Kind:       "Deployment",
            APIVersion: "apps/v1",
        },
        ObjectMeta: metav1.ObjectMeta{
            Name:      "micro",
            Namespace: twodotoh.Namespace,
            Labels: map[string]string{
                "app":       "micro",
                "component": "micro",
            },
        },
        Spec: appsv1.DeploymentSpec{
            Replicas: pointer.Int32Ptr(12),
            Selector: &metav1.LabelSelector{
                MatchLabels: map[string]string{
                    "app": "micro",
                },
            },
            Template: corev1.PodTemplateSpec{
                ObjectMeta: metav1.ObjectMeta{
                    Labels: map[string]string{
                        "app": "micro",
                    },
                },
                Spec: corev1.PodSpec{
                    Containers: []corev1.Container{
                        {
                            Name:  "microapp",
                            Image: "gianarb/micro",
                            Ports: []corev1.ContainerPort{
                                {
                                    ContainerPort: 8080,
                                },
                            },
                            Env: []corev1.EnvVar{
                                {
                                    Name: "SLACK_TOKEN",
                                    ValueFrom: &corev1.EnvVarSource{
                                        SecretKeyRef: &corev1.SecretKeySelector{
                                            LocalObjectReference: corev1.LocalObjectReference{
                                                Name: "slack",
                                            },
                                            Key: "token",
                                        },
                                    },
                                },
                                {
                                    Name:  "SLACK_USERNAME",
                                    Value: "myuser",
                                },
                            },
                        },
                    },
                },
            },
        },
    }
}

You can make the function more flexible passing variables like the number of replicas for example, or you can write transformation function that looks like WithDifferentMemoryLimit to apply transformation to your runtime.Object.

deployment := newMicroDeployment()

// You can transform them with utils like:
WithDifferentMemoryLimit("200mi", deployment)

If you play well will Go packages, and if you structure your code you can have something like:

apps := []*runtime.Object{}
service := micro.NewKubernetesService()
deployment := micro.NewDeployment()
apps = append(apps, service)
apps = append(apps, deployment)
// Deploy via kubernetes api

I mean, you have the code now! So you can make all the mistakes you usually do during your daily job!

Hero image via Pixabay

You need an high cardinality database

2018-11-28T08:08:27+00:00

In order to understand how an application performs you need data. Logs, events, metrics, traces.

Observability and monitoring are expensive because you need to retrieve all this data across your system. An architeture these days is not a static rock where nothing happens and everything stays the same. You don’t have your 10 VPC, with always the same hostname that you can filter for.

Today you are on cloud, your instances are going up and down based on your load and it is easier for you to replace an EC2 that troubleshooting a failure.

Containers wrap your application and they makes it easy to deploy, as side effect you release more often, it means more data.

But the data are useless if you can not get anything good out from them, so they can be your silver buffet or a big pain, the difference is all made by your ability to use them to answer your questions or in the ability for your team aggregate them together in order to build automation with them.

To do all off this you need to manage high cardinality, this is word that sales team in tech are scary off because nobody will never sell an infinite high cardinality database, everything has a limit, and the unique solution is not a product itself but it is more like a mindset developers should have.

You need to store the raw data for just the right time, forever is not an option.
You need to give access to these data across the company in order to build better aggregation. Build Engineers will probably need data not just from the CI pipeline but also from your VCS. SRE to understand how a code change behave in prod they need metrics from servers but also from the CI. Spread the knowledge

The technologies that gives you the ability to interact with a big set of unstructured data should support an high wrtie troughpoot and smart indexes that will allow your query engine to lookup for what you need fast enough!

So that’s what I have in my mind when I think about a database that can support monitoring data.

I am not selling anything mainly because I think a final solution doesn’t exist yet, I can not really tell you what to buy but you should look around for other companies at your same scale because everyone has this problem:

Facebook has scuba
A lot of people use Cassandra and they looks happy at least for its writing capabilities.
There are new time series databases releases in a daily based
At InfluxData we obviously use InfluxDB for this purpose

The general idea here is that the goal should be to group data that now are in different sources: NewRelic, InfluxDB, ElasticSearch, Papertrail in the same place, because it is rare to get the answer for your question just looking at logs, or metrics, you need an aggregation or a sample of different data.

This will bring the debugging and troubleshooting capabilities of your team to the next level, and listen to me, if you are working with a microservices architecture or with a highly distributed environment you need help from everything!

Reactive planning is a cloud native pattern

2018-11-28T08:08:27+00:00

Probably this title can sound a bit weird to anyone that already know what reactive plan is and how far it can look from all the cloud-native and distributed system hipster movement but recently one of my colleagues Chris Goller pushed this pattern to one of the projects that we have at InfluxData and I find it glorious!

“In artificial intelligence, reactive planning denotes a group of techniques for action selection by autonomous agents. These techniques differ from classical planning in two aspects. First, they operate in a timely fashion and hence can cope with highly dynamic and unpredictable environments. Second, they compute just one next action in every instant, based on the current context.” (Wikipedia)

The Wikipedia definition of reactive planning as you can see is perfect to handle a system where the current status can change very frequently based on external and unpredictable events.

This is a perfect approach for provisioning/orchestrator tool like Mesos, Cloud Formation, Kubernetes, Swarm, Terraform. Some of them are working like this already.

The general idea is that before any action you need a plan because for these tools an action means: cloud interaction, spin up of resources that cost money. You need to be proactive avoiding useless execution.

A plan is made of a serious of steps and every step can return other steps if it needs. The plan is complete when there are no steps anymore. The plan gets executed at least twice, the second time it should return zero steps because the first attempts built everything you need, this is the signal that determines its conclusion. If it keeps returning steps it means that there is something to do and it tries again.

Let’s start with an example. Think about what Cloud Formation does. You can declare a set of resources and before to take action it needs to understand what to do. It is making a plan checking the current state of the system. This first part makes the flow idempotent and solid because you always start from the current state of the system. It doesn’t matter if it changes over time because of somebody that removed one of the resources. If something doesn’t exist it creates or modify it. Very solid.

Every single step is very small. Let’s take another example like creating a pod in Kubernetes. When you create a pod there are a lot of actions to do:

Validation
Generate the pod id, the pod name
Register the pod to the DNS, you
Store it to etcd Reach to CNI to configure the network
Reach to docker, container or whatever you use to get the container
Maybe reach to AWS to create a persistent volume
Attach the PV

If you try to design all interaction in a single “controller” you will end up with a lot of * if/else, error handling and so on. Mainly because as you can see almost every step interact over the network with something: database, DNS, CNI, docker and so on. So it can fail, it needs circuit breaking, retry policy and much more complexity.

It is a lot better to design the code where every point is a small step if the step that reaches docker fails it can return itself as “retry” or it can return other steps to abort everything and clean up. You will end up with small reusable (or not that much reusable) steps.

All the steps are combined within a plan, the “PodCreation” plan. There is a scheduler that takes and execute every step in the plan recursively.

This freedom allows you to use an incremental approach

The scheduler as first call a create method for the plan, the create method checks what to do based on the current state of the system, it is the responsibility of this function to return no steps when there is nothing to do.

I think this Reactive Planning is one of the best ways to organize the code in a cloud-native ecosystem for its reactive nature as I said and for the fact that it forces you to check the state of the system if you don’t do that the plan will keep executing forever. Obviously, you can use a high-level check to skip a lot of steps, this requires a balance if the plan you are executing is critical and frequently used you should check for every step if it requires an effort that won’t pay back you can implement deepest and preciser checks. You can check for the PodStatus. If it is running we are good nothing to do. Or you can check if Docker has a container running and if it has the right network configuration. If it is running but with no network, you can return the step that interacts with CNI to set the right interface. This freedom allows you to use an incremental approach, you start with an easy creation method with checks for only critical and high-level signal demanding a more solid and sophisticated set of checks for later, when you will have the best knowledge about where the system fails.

Hero image via Pixabay

You will pay the price of a poor design

2018-11-10T08:08:27+00:00

The unique way to avoid the price of a poor design project to replace them in time. But we all know that replacing a software is not a great idea.

Software should be improved via multiple iterations and that’s why you or your teammates will read a lot more code compared with the amount written and the of the story.

In a rush of writing a new project, this sentence will sound wrong, but this is just one of the phase of it. After that, it will be all about reading and replacing lines of code one by one and if you poorly design the software somebody will pay the price of it. You or somebody else, probably somebody else looking at how quick a developer changes job.

it is in your hands as a developer to think about design, at least to save yourself from the darkness

In a fancy unicorn startup, you will hear that there is no time to think about design, if you are in tech probably that’s not true, but there is always a project that nobody cares but it needs to be fixed.

Some other companies don’t have time to think about design because they are always running against something.

So, as you can see, it is in your hands as a developer to think about design, at least to save yourself from the darkness.

Recently I read “Philosophy Software Design” by John Ousterhout. This book turned out to be not a breath of fresh air for me was more like a “breath of consolidation”. Professor John Ousterhout fixed on paper, in an excellent way what I try to do every day but I was not always able to express.

Why comments are essential, deep API vs shallow classes, information hiding. You should read it!

Design it twice. It looks expensive but it is a take away from the book that I think is the practical key to unlock the door to make our work fun and healthy.

The first solution can’t be the best one. Even if you are smart enough to design something that it won’t crash we should make an effort to think about another solutions, to ask for a review, just as we do for when writing code.

In theory, we will have a great design looking in between of all the other attempts.

Chaos Engineering

2018-08-23T08:08:27+00:00

At the Jazoon conference in Switzerland, I had the chance to speak at the Chaos Engineering panel with Russ Miles from ChaosIQ and Aaron P Blohowiak from Netflix.

The organizers put me in the panel probably because “chaos” was part of the title for the talk I just gave in the morning. I was too curious to mention that I never did it before, at least on purpose!

So I was really out of my comfort zone dealing with these two folks that know their shit so well!

I am sure that as Engineers we are part of the Chaos, we create entropy inside the system during every deploy and even if we have all the tests in the world the first time it is tough to make it work. But I indeed never associated the word engineering to chaos. And that’s the real challenge.

So, let’s define Chaos and Engineering altogether.

Let’s start with Chaos because it is the easy one, as I said we as developers create chaos, distribution creates chaos, and customers create chaos. If somebody tells you that his production environment is excellent, you should not listen to him, Production is a nightmare, complicated and painful place. At least if somebody uses it.

And if it is just a bit more complicated than a static site it never works 100%, the chaos governs it, and that’s where the sentence Engineering becomes essential.

Engineering at least for what I can understand means to be driven by data and not feeling. So associating these two concepts together you have a powerful way to measure the chaos.

I think you can’t avoid chaos, so the best way to handle it is to learn from what it generates in your system to anticipate unpredictable situations.

As developers, ops or devops we are pessimistic about our system, and we know that it will fail: servers crashes, CoreOS auto updates itself, third party services stop to work. Usual the answer is to wait for it to happen usually Friday night.

Chaos Engineering is an exercise, a practice to leverage “unusual but possible” situations as teaching vector to our system.

It is another tool to achieve resiliency and to test scalability.

Chaos Engineering doesn’t bring down all your production system in an unrecoverable way. It designs exercises that you and your team will use to increase your operational experience and confidence.

Observability is a sort of requirement to understand how a chaotic event changes the “normal” state of your system. But from another point of view a chaotic even shed some light for a particular part of your system showing up lack of monitoring and instrumentation.

There open source framework like Chaos-tookit or famous tools like chaos-money.

I will try to start with some very simple example without writing too much code. I will get out from my system these metrics:

Number of requests (probably from ingress/nginx)
The number of requests with status code > 499
Http request latency

After that, I will try to simulate an outage removing or scaling down particular pods (the one that gets all the traffic) and I will look at how the metrics will change and how long it takes to recover.

OpenMetrics and the future of the prometheus exposition format

2018-08-23T08:08:27+00:00

Who am I to tell you the future about the prometheus exposition format? Nobody!

I was at the PromCon in Munich in August 2018 and I found the conference great! A lot of use cases about metrics, monitoring and prometheus itself. I work at InfluxData and we was there as sponsor but I followed a lot of talks and I had the chance to attend the developer summit the next day with a lot of promehteus maintainers. Really good conversarsations!

I just realized how lucky I was these days having chance to be so welcomed by the #prometheus community. I love my work. Thanks @juliusvolz @TwitchiH @tom_wilkie and everyone.. I feel regenerated
— :w !sudo tee % (@GianArb) August 11, 2018

To be honest my scope few years ago was very different, I was working in PHP writing webapplication that yes I was deploying but I wasn’t digging to much around them and I was not smart enough to uderstand that all the pull vs push situation was just all garbage. Smoke in the eyes that luckily I left behind me pretty soon because I had the chance to meet smart people that drove me out.

Provide a comfortable way for me to expose and store metrics is a vital request and the library needs to expose the RIGHT data it doesn’t matter if they are pushing or pulling.

RIGHT means the best I can get to have more observability from an ops point of view, but also from a business intelligence prospetive probably just manipulating again the same data.

It is safe to say that a pull based exposition format is easy to pack together because it works even if the server that should grab the exposed endpoint is unavailable or even if nothing will grab them. A push based service will always create some network noice even if nobody has interest on getting the metrics.

Back in the day we had SNMP but other than being an internet standard the adoption is not comparable with the prometheus one, if we had how old it is and how fast prometheus growed the situation gets even worst.

.1.0.0.0.1.1.0 octet_str "foo"
.1.0.0.0.1.1.1 octet_str "bar"
.1.0.0.0.1.102 octet_str "bad"
.1.0.0.0.1.2.0 integer 1
.1.0.0.0.1.2.1 integer 2
.1.0.0.0.1.3.0 octet_str "0.123"
.1.0.0.0.1.3.1 octet_str "0.456"
.1.0.0.0.1.3.2 octet_str "9.999"
.1.0.0.1.1 octet_str "baz"
.1.0.0.1.2 uinteger 54321
.1.0.0.1.3 uinteger 234

It also started as network exposing format, so it doesn’t express really well other kind of metrics.

The prometheus exposition format is extremly valuable and I recently instrumented a legacy application using the prometheus sdk and my code looks a lot more clean and readable.

At the beginning I was using logs as transport layer for my metrics and time series but I ended up having a lot of spam in log themself because I was also streaming a lot of “not logs but metrics” garbage.

The link to the prometheus doc above is the best place to start, here I am just copy pasting something form there:

# HELP http_requests_total The total number of HTTP requests.
# TYPE http_requests_total counter
http_requests_total{method="post",code="200"} 1027 1395066363000
http_requests_total{method="post",code="400"}    3 1395066363000

# Escaping in label values:
msdos_file_access_time_seconds{path="C:\\DIR\\FILE.TXT",error="Cannot find file:\n\"FILE.TXT\""} 1.458255915e9

# Minimalistic line:
metric_without_timestamp_and_labels 12.47

# A weird metric from before the epoch:
something_weird{problem="division by zero"} +Inf -3982045

# A histogram, which has a pretty complex representation in the text format:
# HELP http_request_duration_seconds A histogram of the request duration.
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{le="0.05"} 24054
http_request_duration_seconds_bucket{le="0.1"} 33444
http_request_duration_seconds_bucket{le="0.2"} 100392
http_request_duration_seconds_bucket{le="0.5"} 129389
http_request_duration_seconds_bucket{le="1"} 133988
http_request_duration_seconds_bucket{le="+Inf"} 144320
http_request_duration_seconds_sum 53423
http_request_duration_seconds_count 144320

# Finally a summary, which has a complex representation, too:
# HELP rpc_duration_seconds A summary of the RPC duration in seconds.
# TYPE rpc_duration_seconds summary
rpc_duration_seconds{quantile="0.01"} 3102
rpc_duration_seconds{quantile="0.05"} 3272
rpc_duration_seconds{quantile="0.5"} 4773
rpc_duration_seconds{quantile="0.9"} 9001
rpc_duration_seconds{quantile="0.99"} 76656
rpc_duration_seconds_sum 1.7560473e+07
rpc_duration_seconds_count 2693

Think about that not as the prometheus way to grab metrics, but as the language that your application uses to teach the outside world how does it feels.

It is just a plain text entrypoint over HTTP that everyone can parse and re-use.

For example kapacitor or telegraf have specific ways to parse and extract metrics from that URL.

If you don’t have time to write a parser for that you can use prom2json to get a JSON version of that.

In Go you can dig a bit more inside that code and reuse some of functions for example:

// FetchMetricFamilies retrieves metrics from the provided URL, decodes them
// into MetricFamily proto messages, and sends them to the provided channel. It
// returns after all MetricFamilies have been sent.
func FetchMetricFamilies(
	url string, ch chan<- *dto.MetricFamily,
	certificate string, key string,
	skipServerCertCheck bool,
) error {
	defer close(ch)
	var transport *http.Transport
	if certificate != "" && key != "" {
		cert, err := tls.LoadX509KeyPair(certificate, key)
		if err != nil {
			return err
		}
		tlsConfig := &tls.Config{
			Certificates:       []tls.Certificate{cert},
			InsecureSkipVerify: skipServerCertCheck,
		}
		tlsConfig.BuildNameToCertificate()
		transport = &http.Transport{TLSClientConfig: tlsConfig}
	} else {
		transport = &http.Transport{
			TLSClientConfig: &tls.Config{InsecureSkipVerify: skipServerCertCheck},
		}
	}
	client := &http.Client{Transport: transport}
	return decodeContent(client, url, ch)
}

FetchMetricsFamilies can be used to get a channel with all the fetched metrics. When you have the channel you can make what you desire:

mfChan := make(chan *dto.MetricFamily, 1024)

go func() {
    err := prom2json.FetchMetricFamilies(flag.Args()[0], mfChan, *cert, *key, *skipServerCertCheck)
    if err != nil {
        log.Fatal(err)
    }
}()

result := []*prom2json.Family{}
for mf := range mfChan {
    result = append(result, prom2json.NewFamily(mf))
}

As you can see prom2json converts the result to JSON.

It is pretty fleximple! And it is a common API to read applicatin status. A common API we all know means automation! Dope automation!

Future

The prometheus exposition format growed in adoption across the board and a couple of people leaded by Richard are now pushing to have this format as new Internet Standard!

The project is called OpenMetrics and it is a Sandbox project under CNCF.

if you are looking to follow the project here the official repository on GitHub.

Probably it looks just a political step with no value at all from a tech point of view but I bet when it will be a standard and not just “the prometheus exposition” we will start to have routers exposing stats over http://192.168.1.1/metrics and it will be a lot of fun!

It will be obvious that it is not a only-prometheus feature and this new group has people from difference companies and backgrounds. So the exposition format will be probably not just for operational metrics but more generic.

Apps I used during my nomad working

2018-08-11T10:38:27+00:00

It is a couple of years now since my first conference and now that I am working in remote, it is even harder to combine traveling and work.

Mainly because now that I don’t need to go to an office I travel more often, and usually conference are a better more far away too. It means that I spent more time, not in my usual workplace.

It is a very challenging and exciting opportunity, and I am glad to live it. This is the first important things I think. If you are not happy about what you do, even if it is challenging, you are going to give up.

But other things help me a lot. In general, they are come back to planning. I feel better when I have the time to look around and to be prepared for the city I am going to visit. There are a couple of apps that help me with that:

WorkFrom

It is a community of digital nomad and remote worker. It keeps up to date a database of pubs, libraries, restaurants, bars where you can work from. It is very nice, and it has some nice feature like:

A map, so you can see what is around you
Net speed measurement inside the app. Other than a detailed description of the place it also shares (if the person who reviewed the area made it) how the internet connection is fast and sometimes even the WiFi password, in this way you don’t need to ask.
As I mentioned it is a community, so there is also a Slack Channel that you can use to speak with another remote worker.

I used it in Berlin, Munich, Copenaghen, Amsterdam and it worked pretty well.

bike and other local transport applications I am using

oBike as an example because just because I used it recently in Munich but I think you should always have a look to what the city uses for bike sharing, for instance, because at least for me even if I love to walk around take a ride time to time is faster and helpful.

Bonus point a lot of these apps have a free tier in means that you can even use them for free the first time you visit that city!

Yelp

In general, I find Yelp better than TripAdvisor for what concerns restaurant and place to eat. So when I am not able to spot nothing good by myself during my walks around the city or when I am looking for a specific kind of good I use Yelp.

Adobe Scan

For this app I need to give credit to Lorenzo because he showed that app to me the first time. I use it a lot after a trip when I need to submit the expenses. It is always a very annoying work to do but at least with this app, I can take a set of pictures, and it will generate a single pdf ready to be submitted!

Revolut

A few years ago when I was working at CurrencyFair I started to test and play with online banks and Revolut is very good when you are traveling around, and you need to manage different currencies. First of all, I like the idea to use a different card when I buy on Amazon or when I travel because in case of any trouble I will have a limited amount of money there. For example in Cube I had my card cloned but I only had 10 Euro, so it was not the first count. (the bank give me the money back in any case btw).

Plus Revolut has some excellent features to track where and what you spent. You can label your transfers to easy lookup them when you need to expense or calculate how much you spent. The exchange commission low compared with the more traditional banks, so this is a free win!

Let me know on Twitter if you have any other applications to suggest I will be happy to try them next time and maybe to add them here!

hero img credits

FAQ: Distributed tracing

2018-07-06T10:38:27+00:00

This article is a write up of a talk that I will give at the osmc in Germany in November about Distributed tracing. It is a sequence of questions I got about distributed systems and distributed monitoring.

Why do I need distributed tracing?

It always depends, I find distributed tracing useful in a microservices environment or more in general when there is a request that flies in a system crossing different applications, queues or processes. If you have a problem understanding where a request fails, you need to follow it in some way and tracing does just that.

how do you follow a request?

First of all, we should probably change the name request, it looks too HTTP oriented, and it is not really what we look for now. In modern application, you are interested in events. You need to monitor an event:

user registration
payment
a bank transaction
send an email
generate an invoice

These are all events, and probably in your system, they are distributed not via HTTP but maybe they go in a queue, or they are broadcasted using Kafka or Redis. Distributed tracing is all about tracking events. The way to go is to create an id. Usually, it is called request_id or trace_id and you need a way to propagate it in your system.

For example, in a queue, you can put the trace_id as part of the payload. Via HTTP or gRPC you can use Headers.

Your application can take that id, and it can create the span to trace a particular section.

how a trace looks like?

In my mind, this is a picture for one trace. Every segment is a span. So, every span has a trace id, and every span has its own span_id. You can attach information to every span as key value store. Let’s suppose a span represent a query in mysql you can put the query as metadata in the span itslef. In this way you will have a bit more context.

do we need a standard for tracing?

I can’t convince you that interoperability is essential if you already analyzed the problem and you answered “No” to yourself. To build a trace you need to agree on something over languages and applications. That’s why I think a standard is something you can not avoid, at the end you will end up having one just for your company.

how a tracing infrastructure looks?

The applications that are writing traces is not important. Traces is cross platform and languages. Usually, you point an app to a tracer. It can be Zipkin, Jaeger or others.

The tracer takes all the traces, and it stores them in a storage. The databases are usually ElasticSearch, Cassandra, InfluxDB. It depends on which tracer you are using. They support different databases.

In general traces are high cardinality oriented data, and you can write a lot of them in a short amount of time. So it is a write-intensive application.

There are a couple of other pieces that you can add in your tracing infrastructure:

You can add a downsampler to select what to store. If an API request generate too many traces probably you are interested in storing only a % of them to decrease pressure on your database. So you can use a simple distributed hash algorithm on the trace_id to declare what to save or not. A mod on the trace_id is enough for example.
You can add a collector in front of the tracer. Zipkin support Kafka For example. In InfluxDB we use telegraf. A collector is usually a stateless application, it gets all the traces from the applications. It bulks them and sends them to the tracer. A collector decreases the pressure on the tracer itself because usually, they work better with a bulk of data. In second if a tracer go down or you need to update it, the collector is a layer that can keep the traces for a little bit to give you time to restore the tracer.

why did I pick opentracing?

I am an interoperability oriented developer; I think it is essential to avoid vendor lock-in and embracing a big community like the opentracing one you get a lot of tools and services already instrumented with this protocol. It makes my life easy.

can I have a tracing infrastructure on-prem?

You can; there are a couple of tracers open source.

Zipkin is an open source project in Java started by Twitter.
Jaeger looks a lot like a porting of it in Golang and Uber makes it.

Both of them are open source, and they support different backends like ElasticSaerch, Cassandra and so on.

there are as a service tracing infrastructure?

There are, NewRelic has an opentracing compatible API, or Lightstep for example. A lot of cloud providers offer a tracing service. AWS X-RAY or Google Stackdriver.

can I store traces everywhere?

You can, but they are a high cardinality data. The trace_id is usually the lookup parameter for your queries. It means that it should be indexed, but it changes for every request. The consequence is a big index. You need to keep it in mind.

Once you do the trace…then what?

I left this question as the last one because I read it in the opentracing mailing list and I think it is a hilarious question.

First of all, you don’t buy a pan and after the fact you start asking yourself why you have it.

Probably you need to write something, and for that reason, you buy a pen.

Anyway, I trace my applications because it helps me to understand my environment over the “distribution complexity.” I can detect what is taking too long and a trace helps me to understand what to optimize.

Opentracing has a set of standard annotation very useful to detect network latency between services. You can mark a span as “client send” request for example. And when the server gets the request, you can mark another span as “server received.” This two information is useful to know how much time your request spends going from client to the server and you can optimize them time usually working on the proximity between these two applications.

More in general you can parse a trace to get what ever you need as normal logs or events the powerful things is downsampling and analysis. If you are tracing a queue system you can get the average time for a worker to process a message.

Conclusion

Let me know if you have more questions on twitter @gianarb. I am happy to answer them here.

Logs, metrics and traces are equally useless

2018-06-18T10:38:27+00:00

Every signal from applications or infrastructure is useless, in the distributed system era aggregation matters.

The ability to combine logs, metrics and, traces together is the key takeaway here.

Kubernetes spin ups too many containers to allow us to stream or tail a log fail.

Even cloud providers offer too many virtual machines to enable us to tail logs.

A centralized place where to store all of them is a great start, but you need to experience and learn how to combine the metrics you are ingesting to increase the visibility over your system.

If you instrument your code with opentracing, for example, you can get the trace_id and attach it to your log to associate it with the trace itself. It can also work as the lookup key for troubleshooting.

If you get some weird logs, you will know from where it comes. With opentracing, this is still a bit of a mess the specification recently added explicit support to extract TraceId and SpanId from the SpanContext. It is currently not implemented in a lot of implementation. I recently started a conversation in the opentracing-go project to figure out how to apply it because currently it depends from what tracer you are using and it is an essential regression for the specification itself that should hide it by design.

Using Jaeger this is the way to do it:

if sc, ok := span.Context().(jaeger.SpanContext); ok {
  sc.TraceID()
}

Using Zipkin:

zipkinSpan, ok := sp.Context().(zipkin.SpanContext)
if ok == true && zipkinSpan.TraceID.Empty() == false {
  w.Header().Add("X-Trace-ID", zipkinSpan.TraceID.ToHex())
}

To get back in track, I wrote this article because I saw this problem and this inclination speaking with friends, colleagues and other devs, we are now good (or just better) storing high cardinality values but save them inside a database doesn’t give us any value it is all about how we use them.

Correlation brings your alert to a different level. You probably have an alarm to measure how much disks you still have.

An alert on the only CPU usage can be very frustrated even more if it happens too often and a lot of time you restart a container or a node to make it work because at 2 am you can’t fix the cause. You can investigate what matters to fill an issue on GitHub.

Every automation tools can make your work leaving you free to sleep. It can probably fill out the issue.

Combining the CPU with the time for the system to recover from a node restart can make your alert smart enough to wake you up when it is not able to fix itself leaving you ready for more acute and trivial problems.

Conclusion

It is a pretty straightforward concept, but yes, everything is useless if you store data without getting values out of them doesn’t matter if they are logs, metrics or traces. The real value is not in a single one of them, it is in how do you aggregate them together because a complex simple doesn’t explain itself over one signal.

Cloud Native Intranet with Kubernetes, CoreDNS and OpenVPN

2018-05-29T10:38:27+00:00

This article has a marketing and buzzword oriented title. I know.

Let me introduce you to what I am going to speak with you here with better worlds: VPN, private, DNS, kubernetes, security.

I hope we all agree that VPN should be a must-have when you set up an infrastructure. It doesn’t matter what you are doing, how many people are working with you.

When you design a new system usually you need to expose to the public only some service over HTTP and HTTPS all the rest: Jenkins, monitoring tools, dashboards, log management should be locked-down and accessible just in a private network. An intranet.

An intranet is a private network accessible only to an organization’s staff. Often, a wide range of information and services are available on an organization’s internal intranet that is unavailable to the public, unlike the Internet.

All these concepts apply to “Cloud Native” ecosystem as well.

Kubernetes has a powerful dashboard and CTL that you can use to interact with the API. That API doesn’t need to be publicly exposed, and to use the CLI from your laptop, you should set up a VPN.

OpenVPN

Usually, I configure an OpenVPN using the image kylemanna/openvpn available on Docker Hub. It is straightforward to apply, and it offers a set of utilities around user creation and certification management.

apiVersion: v1
kind: Service
metadata:
  name: openvpn
  namespace: openvpn
  labels:
    app: openvpn
spec:
  ports:
  - name: openvpn
    nodePort: 1194
    port: 1194
    protocol: UDP
    targetPort: 1194
  selector:
    app: openvpn
  type: NodePort
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: openvpn
  namespace: "openvpn"
  labels:
    app: openvpn
spec:
  replicas: 1
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app: openvpn
  template:
    metadata:
      labels:
        app: openvpn
    spec:
      nodeSelector:
        role: vpn
      containers:
        - name: openvpn
          image: docker.io/kylemanna/openvpn
          command: ["/etc/openvpn/setup/configure.sh"]
          env:
            - name: VPN_HOSTNAME
              valueFrom:
                configMapKeyRef:
                  name: vpn-hostname
                  key: hostname
            - name: VPN_DNS
              valueFrom:
                configMapKeyRef:
                  name: vpn-hostname
                  key: dns
          ports:
            - containerPort: 1194
              name: openvpn
          securityContext:
            capabilities:
              add:
                - NET_ADMIN
          volumeMounts:
            - mountPath: /etc/openvpn/setup
              name: openvpn
              readOnly: false
            - mountPath: /etc/openvpn/certs
              name: certs
              readOnly: false
      volumes:
        - name: openvpn
          configMap:
            name: openvpn
            defaultMode: 0755
        - name: certs
          persistentVolumeClaim:
            claimName: openvpncerts

I put the persistentVolumeClaim to remember you to store in a persisted and safe place (and you should backup them too) the certificates used and generated by the VPN /etc/openvpn/certs.

I won’t write more about this topic; we are all excellent yaml developers!

How to create users, configuration and so on is a well knows topic that you can easily find in the OpenVPN’s documentation.

I don’t know if you realized that, but this VPN runs inside a Kubernetes Cluster, so well configurated allow us to reach pods via a private network and a bonus point also via kubedns to ping services, pods and all the other resources registered to it.

To do that OpenVPN server can be configured to push kubedns to the client:

dhcp-option DNS <kube-dns-ip>

Something with learned is that if you are using Linux the NetworkManager-OpenVPN plugin pushes the DNS correctly, but the OpenVPN cli tool doesn’t if you are using the last one you need to set it up in another way.

Tips: You can take the <kube-dns-ip> doing cat /etc/resolv.conf from inside a pod.

DNS

Push the KubeDNS or the DNS used by kubernetes is not enough to have a complete intranet. You should be able to set up a custom domain to have friendly or short URL.

You can take two different directions. KubeDNS can have static record configured, but some person is not happy to touch or customize too much the KubeDNS because Kubernetes itself use it and if you mess it up all it can be a problem.

A possible solution is to deploy another DNS like CoreDNS and configures it to resolve KubeDNS as a fallback. In this way, you will be free to register custom LTDs and records. Kubernetes is going to use KubeDNS as usual, and if you mess up CoreDNS, only a fraction of your system will blow out.

Naturally to resolve your custom domains from the VPN you need to push the CoreDNS ip and not the one used by Kubernetes.

If two DNSs are too much take the option one or from Kubernetes 1.10 you can use CoreDNS as kubernetes DNS so it is a bit more flexible and you can use only that one if you are brave enough.

I suggested CoreDNS because it supports records configuration via etcd. Here an example of Corefile:

. {
      errors
      etcd *.myinternal {
          stubzones
          path /skydns
          entrypoint  http://etcd-1:2379,http://etcd-2:2379,http://etcd-3:2379
          upstream /etc/resolv.conf
      }
      proxy . /etc/resolv.conf
}

Running this configuration inside a pod automatically fallback to kubedns (that automatically fallback to the one configured to reach internet). Because of upstream point to resolv.conf that inside a pod contains kubedns.

Benefits

Resolve Kubernetes DNS record from your local environment is very comfortable to build a shared or dynamic development environment for you and your colleagues.

You can set up per-developer namespaces that they can use to deploy services reachable from the program that they are writing. Or you can deploy your application, and another person connected to the VPN will be able to use it.

Server time vs Response time

2018-05-18T10:38:27+00:00

If you find yourself in San Francisco walking nearby Market Street, you should consider stopping at the Jewish Museum. There is a charming Pastrami place just next to it. It is a sandwich place with good lemonade. It only takes 3-4 minutes to get your meal, and from there it takes no more than 15 minutes walking to be in front of the Ocean. Very nice! Now, let’s consider this other scenario. It is lunchtime, and you are starving. You rush outside your office, and you run to the Pastrami place close to the Jewish Museum. After 35 minutes of wait, you get your sandwich and start eating it asking yourself: why it took so long this time? Shall I probably have walked to the next place to get a faster meal?

Something similar can happen to your Services as well! And that’s precisely the phenomena in computer science we try to capture using the concepts of server time and response time. Server time aims to measures how much a server takes to run a specific action. Let’s say consider an example operation the generation of a monthly report: it usually takes 2ms, but if a lot of customers require the same kind of report at the same time and your system saturates? This situation might very quickly end up in having a subset of them getting the report in more than 1 minute or actually in the timeout of the operation. The time it takes for a customer to get his report is what is typically called response time.

How can we measure these metrics?

The answer to this question is not easy: it depends on your architecture and system. The starting point is instrumenting your application to determine how much time it gets to produce the report. Stress testing is the other important aspect: generating some load on your application and sampling the average response time will let you estimate the application’s service time. Notice that to make this measurement the app should NOT soak during this test!

If you control all the chain (from the HTTP app that sends the request to the server), you can trace the request and simulate the same behavior of your customers. If you can’t do this, you can consider using the frontend edge, probably a load balancer.

I would rather have questions that can’t be answered than answers that can’t be questioned. Richard Feynman

Why does it matter

How many nodes do I need to deploy to accommodate x number of requests per second? When should I consider scaling out my application? How does scale-out affect the customer experience? This is precisely why server time and response time matters! Having an average response time close to the defined service time is a signal of proper utilization and health of an application because it indicates that the response latency is under control and it is far from saturation. Bringing to the limit these two signals, in addition, is a key metric to estimate the correct sizing of the applications instances and infrastructure.

Btw the Pastrami place exists! You should try it! I will be in SF in 2 weeks. So let me know about other places @gianarb. Picture from GMaps. I will take a better one!

Go how to cleanup HTTP request terminated.

2018-04-25T10:38:27+00:00

Expensive HTTP handler is everywhere, doesn’t matter how good you are as a developer. Business logic is what matters in our application, and it can be pretty complicated. It can create large files, resources on AWS starts thousands of containers on Kubernetes.

This kind of procedures have in common they can be very slow and they produce a lot of garbage if the system/person who requires that stops prematurely by mistake or not.

If your API requests create AWS resources and the client, terminate the call you should clean what you created.

if you are generating a report and the customer changes are mind and refresh you should stop the procedure.

You bet! Queues, background processes probably fit better but coming back on the previous example, if you are computing something and who is waiting for the result changed his mind, stop and release resources can be a massive optimization.

package main

import (
    "fmt"
    "io/ioutil"
    "net/http"
    "os"
    "time"
)

func main() {
    http.HandleFunc("/a", func(w http.ResponseWriter, r *http.Request) {
        err := ioutil.WriteFile(os.TempDir()+"/txt", []byte("hello"), 0644)
        if err != nil {
            panic(err)
        }
        println("new file " + os.TempDir() + "/txt")
        notify := w.(http.CloseNotifier).CloseNotify()
        go func() {
            <-notify
            println("The client closed the connection prematurely. Cleaning up.")
            os.Remove(os.TempDir() + "/txt")
        }()
        time.Sleep(4 * time.Second)
        fmt.Fprintln(w, "File persisted.")
    })
    http.ListenAndServe(":8080", nil)
}

When you are building an HTTP server in Go, you can use a channel provided by the Zhttp.ResponseWriter` to wait for the connection to be closed. And if it happens, you can take action. The prototype above is very simple, every request stores a file but I would like, remove the file if the client closes the connection.

$ run main.go

You can start the server, and from another terminal, you can start a curl, you will see that after almost 4 seconds your request will succeed and the file will be persisted on disk. Check it!

$ time curl http://localhost:8080/a
File persisted.

real    0m4.018s
user    0m0.008s
sys     0m0.006s
$ cat /tmp/txt

Now let’s suppose that the client terminates the connection because it is too slow or the person who made the request doesn’t care anymore. Are you going to leave that request going? Event if nobody cares and it is just consuming resources?

As you can see I am using the Notifier to remove the file if the client terminates the connection:

notify := w.(http.CloseNotifier).CloseNotify()
go func() {
    <-notify
    println("The client closed the connection prematurely. Cleaning up.")
    os.Remove(os.TempDir() + "/txt")
}()

You can check it stopping a curl just after starting it:

$ time curl http://localhost:8080/a
^C

real    0m1.016s
user    0m0.008s
sys     0m0.005s

And the server reports

$ go run main.go
new file /tmp/txt
The client closed the connection prematurely. Cleaning up.

That’s it! Build and clean after yourself!

Go testing tricks

2018-04-17T10:38:27+00:00

I recently wrote a blog post with my point of view about testing. I used Go as the language to concretize it. I had good feedback about that article, and this is all about how I write tests in Go.

Fixtures

I wrote that I don’t like them, but I think they are useful. You can use them to verify the same function checking the same assertion with different input. So let’s say you are testing a function that returns the multiplication of two numbers if the first is even, it returns the division if not.

I will write two tests, one to test events number and one to test the other case, and I will set up two fixtures one for every test. I won’t write just one test with elaborate fixtures because they are hard to read and the name of the test function will help a lot to understand the assertion. Small example good for blogging purpose. But I hope you got the idea.

package test

import "testing"

func MagicFunction(f int, s int) int {
    if f%2 == 0 {
        return f * s
    }
    return f / s
}

func TestEventInputsShouldReturnMoltiplication(t *testing.T) {
    table := []struct {
        first  int
        second int
        result int
    }{
        {2, 1, 2},
        {4, 10, 40},
    }
    for _, s := range table {
        if r := MagicFunction(s.first, s.second); r != s.result {
            t.Errorf("Got %d, expected %d. They should be the same.", r, s.result)
        }
    }
}

func TestOddInputsShouldReturnDivision(t *testing.T) {
    table := []struct {
        first  int
        second int
        result int
    }{
        {15, 3, 5},
        {21, 7, 3},
    }
    for _, s := range table {
        if r := MagicFunction(s.first, s.second); r != s.result {
            t.Errorf("Got %d, expected %d. They should be the same.", r, s.result)
        }
    }
}

sub-test

To make the fixtures a bit better I use the t.Run function a lot. It is a feature introduced in Go 1.9 as part of the testing package.

package test

import (
    "fmt"
    "testing"
)

func MagicFunction(f int, s int) int {
    if f%2 == 0 {
        return f * s
    }
    return f / s
}

func TestEventInputsShouldReturnMoltiplication(t *testing.T) {
    table := []struct {
        first  int
        second int
        result int
    }{
        {2, 1, 2},
        {4, 10, 40},
    }
    for _, s := range table {
        t.Run(fmt.Sprintf("%d * %d", s.first, s.second), func(t *testing.T) {
            if r := MagicFunction(s.first, s.second); r != s.result {
                t.Errorf("Got %d, expected %d. They should be the same.", r, s.result)
            }
        })
    }
}

vim-go has an option let g:go_test_show_name=1 to allow the name of the test as part of the output for :GoTest or :GoTestFunc. This helps a lot to enjoy this feature.

Golden files

Golden files are something used in different packages in the Go standard library, and Michael Hashimoto spoke about it during his brilliant talk about testing at the GopherCon 2017. In case of complex output, you can verify the result of the tests with the content of a file. It improves order and readability. When you declare a global flag in your test, it becomes available inside go test so if you use the update flags all the tests will pass, but you will update all the golden files. So this is very useful if you need to compare a lot of bytes.

update := flag.Bool("update-golden-files", false, "Update golden files.")

go test -update-golden-files

I was using this trick a lot when I was writing PHP code, and I was testing HTTP responses.

Test helper and return function

When you have repeatable code across tests, you can create a helper function, and you can use it in your tests. There are two general rules about this approach:

The helper function should have access to *t testing.T variable.
Your helper never returns an error; it marks the test as failed. That’s why it needs access to *t. testing.T.

Another good trick is to return a function from the helper to clean up what you did in the helper. So let’s say that your helper starts an HTTP server. You can return the HTTP Close function as a callback.

func () testHelperStartHTTPServer(t *testing.T) func() {
    ts := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        // long and complex mock maybe with a golden file and so on
    }))
    return func() { ts.Close() }
}

func TestYourTest(t *testing.T) {
    hclose := testhelperStartHTTPServer()
    // All your logic and checks
    defer hclose()
}

I used the same practice when I was writing integration tests using bash and bats. It is a very clean and easy to read approach.

parallel

You can use the function t.Parallel() to notify at the test runner that your case can run in parallel with other tests marked as parallel. When you write unit tests, you can almost always run them in parallel because they should be completely isolated.

Short and verbose

-short and -v are two flags available when you run go test. You can use them in your tests:

import "testing"

func TestVeryLongAndExpensiveCapability(t *testing.T) {
  if testing.Short() {
    t.Skip("skipping this testsVeryLongAndExpensive is too expensive")
  }
  // ... other code
}

-short describes itself pretty well, you can skip tests that are too long and expensive.

-v allows you to print more:

import "testing"
func TestVeryLongAndExpensiveCapability(t *testing.T) {
  if testing.Verbose() {
  }
  // ... other code
}

testing/quick

testing/quick is a nice package that offers a set of utilities to write test quick. Go has not an assertion library inside the stdlib but this can help if you are like me and you are happy to not vendor assertion libraries because if { } with some sugar is what I need.

So that’s it, have fun and write tests!

The Go awesomeness

2018-04-09T10:38:27+00:00

It’s one year since I started to use Go every day at work. I was using it before but for fun or OSS projects. I was looking for my next challenge, and I was mainly working with PHP, JavaScript previously and I knew that a compiled, statically typed language was my next step.

At my previous job at CurrencyFair, the environment was pretty standard for a financial tech company so backend in Java, frontend in PHP. But my experience with all the interfaces and abstract classes that I created in Java at that time made me hate that language. So I was looking for something different.

I was as I am now involved in automation, cloud and operational other than development so all the tools like Docker, InfluxDB, Kubernetes, Consul, Vault was in golang and for me as OSS addicted it become the natural choice. Now after all this time I am ready to write why I think Go is the right choice for me now.

1. abstraction and maintainability

I wrote a lot about this topic, so I am not going to repeat myself. But I think maintainability is tied together with abstraction. Previously when I was working with PHP, we always had services, injection and so on. In that environment it was good, but all that abstraction like in Java doesn’t make your code more flexible. It makes it hard to understand in the long run and code needs to be written with history in mind because delete code is very hard. Go with its interface implementation, how it forces you to struct the project helps the codebase to grow in a better way.

2. Stdlib

Community wasted time across languages to identify the right way to indent code. Go comes with that decision done. Same for testing. How to write automatic tests, benchmarks is inside the language. No libraries, it is there. More in general os, net, net.http, img and so on, a lot of stuff are provided by the language itself. It is great because you don’t need anything to start, other than Go. Compared with other languages you can do a lot more things. Having all this feature inside Go guarantees compatibility over time, they won’t break compatibility for the next years, and the code is developed and reviewed by a large number of people.

3. pprof

pprof is a profiler, and it is shipped as part of Go. You can use it even via cli, or it also has an excellent HTTP package under net/HTTP/pprof. Just to show you how much power it can be InfluxDB extends it to export a zip archive with all the information we need to troubleshoot the database behavior:

func (h *Handler) handleProfiles(w http.ResponseWriter, r *http.Request) {
    switch r.URL.Path {
        case "/debug/pprof/cmdline":
            httppprof.Cmdline(w, r)
        case "/debug/pprof/profile":
            httppprof.Profile(w, r)
        case "/debug/pprof/symbol":
            httppprof.Symbol(w, r)
        case "/debug/pprof/all":
            h.archiveProfilesAndQueries(w, r)
        default:
            httppprof.Index(w, r)
    }
}

Here all the code influxdata/influxdb. This is super useful because we can ask customers or developer in the OSS community to export and upload the archive to see what is going on. Having a standard way to troubleshoot and export a profile allows us to build visualization or static analysis on it for common calculation.

4. delve

A good debugging session is the best way to approach a new application or to go deeper learning a language or a software. delve is easy to setup and to use. Even if you are not gdb/debugger superhero as I am not you will be able to make your first steps with delve. So it is a nice starting point too.

5. godoc

Other than behind an excellent way to generate documentation from source code I use it a lot even when I am not designed libraries just to double check that my package has the comprehensive public methods. I always think about what I am exposing to the outside when I write code. APIs are not just JSON or HTTP thing, every object exposes their API, and you need to be aware of how you are building iteration between the internal state and the outside. Avoid misuse of your structs is your responsibility as developer and godoc help me to identify poor decision.

6. vim-go

I would like to stay in my terminal all day, and vim-go allows me to write good code in my comfort zone. In the past I wrote a lot of vim scripts and plugins, following how fatih and all the other maintainers are developing vim-go is great. Bonus point they recently added support for delve, so you can now debug golang application in vim!

7. dep

Dependency management is probably the worst things that Go has. The good thing is that now we have [dep] and it should become the standard way to manage dependencies. Right now the situation looks a lot like this:

Govender, go get, glide currently there are a lot of different ways to manage dependencies, and it generates a lot of confusion, but I hope at the end we will converge in just one. Probably dep.

Conclusion

More in general with Go I am learning that the language is one of expect to become a good developer. A good developer needs to know the language, but the best way to go deeper in it is writing tests, benchmarks, profiling application and using the debugger. All these tools make my life as developer easy. Easy life for me means that I can go deeper solving problems and indirectly it will make me a better developer.

Go is fun!

Observability according to me

2018-04-04T10:38:27+00:00

I started to read about observability almost one year ago, Charity Major comes to my mind when I thinking about this topic and she is the person that is pushing very hard on this.

This is probably the natural evolution of how we approach monitoring.

Distributed systems require a different way to approach the three monitoring piles: collect, storage and analytics.

Understand a microservices environment brings a new layer of complexity and the most obvious consequence is the amount of data that we are storing compared, I think it is way more than before.

Prometheus, InfluxDB, TimescaleDB, Cassandra and all the time series databases that show up every week is a clear sign that now we need more than just a way to store metrics. I think is now clear that collecting more metrics is the point. More data is not directly related to a deeper understanding of our system.

Observability for a lot of companies look like a new way to sell analytics platform but according to me it’s a scream to bring us back to the problem: “How can we understand what is happening?” or even better “How should we use the data we have to understand what’s going on?”.

All the data should be organized, reliable and usable. Logs, metrics, traces are part of the resolution, the brain to analyze and get value out from them is what Observability means to me.

Visualization is one expect, proactive monitoring, correlation, and hierarchy are other steps. Looking at our old graphs all of them are driven by the hostname for example. But now we have containers, we have virtual machines and immutable infrastructure makes rebuilding less costly and more secure than an incremental update. The name of the server should not be the keyword for our queries, the focus should be moved to the role of services.

Think about your Kubernetes cluster, you label servers based on what they will run, if something unusual happens the first things to do is to move the node out from the production pool, the autoscaler will replace it and you will be troubleshoot it later.

Before we were looking at processes, we were keeping them alive as the Olympic flame but containers are making them volatile. We spin them up and down for every request in some serverless environment. What we care and what we should monitor are the events that float across our services, that’s the new gold. W we can lose 1000 containers but we can’t miss the purchase order made my a customer. All our effort should be moved on that side.

I love this point of view because it brings us to what really matters, our applications.

According to me the mountain of waste showed in the picture explains really well our current situation, we collected what ended up to be a lot of garbage and now we need to climb it looking for a better point of view. I think the data in our time series databases are not garbage but gold, it’s just not simple as it should be.

That’s why is great that companies are building tools to fill the gap: IFQL is an example. The idea behind the project is to build a language to query and manipulate data in an easy way. Same for company line Honeycomb or open source projects like Grafana and Chronograf that are trying to make these data easy to use.

We spoke about tools but there is another big expect and it’s all a cultural, distributed teams need different tools to collaborate and troubleshoot problems. Different UI and way to interact with graph and metrics.

I don't give a shit about testing

2018-03-29T10:38:27+00:00

That’s what I learned during my experience as a developer. Doesn’t matter which languages you end up working if you are making HTTP API or things like that you don’t have an excuse. Write tests make your development faster, and it will drastically improve maintainability of your project.

In this post, I would like to tell you how I approach testing particularly in Go obviously.

First of all, when you create a new file, you should write its _test.go child. It’s hard to tell you who should be the child of who. Sometimes I start to write everything inside a test function, just because run the actual test is faster compared with compile, run the binary, trigger the right entry point and so on. When I am satisfied, I move the code to a function, and I leave the assertions I wrote as a new test. Pretty good.

I don’t give a shit about automated testing. I write tests.

I use vim-go and :GoTestFunc is probably the most used shortcut during my day to day job.

When I can choose I don’t use assertion libraries, the testing package is enough for me and dependency management in go is a pain, so fewer things I vendor better I feel about myself.

I use fixtures, but I don’t like them. I prefer to write more small tests than complicated fixtures.

A single test for me is more descriptive, and I don’t mind to write redundant code, I can always refactor it later or move it some helper function. A complicated fixture will be hard to maintain. The name of the function is an excellent way to describe what you are covering in your test and the function itself creates a beautiful block that improves readability.

func TestCarComposition(t *testing.T) {
    fixtures := []car.Composition{
        {"blue", "europe", 1, nil, "2011-12-05", "ford"},
        {"", "", 35, bool, nil, "ford"},
        {"red", "usa", 0, bool, nil, "fiat"},
        {"white", "", 35, bool, nil, "kia"},
        {"orange", "", 1, bool, "2010-05-12", nil},
        {"", "", 0, bool, nil, "ford"},
    }
}

Bonus point, as you can see fixtures are sad to read!

Even unit vs. integration vs. function is a very annoying discussion. Don’t tell me about TDD, BDD, CCC, DDD things. I don’t care they are all amazing as soon as they can make my development simple.

So, CDD is probably my best test methodology: Comfort driven development.

Usually when I am writing a computing function when it elaborates maps, strings, files without using too many external resources I start from unit test, because it makes iteration faster as I told before. And it won’t require too many mocks. I don’t like mocks.

Let’s discuss mocks

Mocks are a pain; you end up to be bored when you write mocks, they won’t fail when it’s useful for you to see an error and they will fail when you don’t care. So comfort looks very far from mocks!

When mocks becomes too complicated, and I can write another kind of tests I go with that solution. Maybe integration or I will try to write the simple mock possible, sometimes even the entire web server can be a valuable solution:

func TestInfluxDBSdkGetTheRightValues(ti *testing.T) {
    ts := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        data := influxdb.Response{
            Results: []influxdb.Result{
                {
                    Series: []influxdbModels.Row{},
                },
            },
        }
        w.Header().Add("Content-Type", "application/json")
        w.WriteHeader(http.StatusOK)
        _ = json.NewEncoder(w).Encode(data)
    }))
    defer ts.Close()

    config := influxdb.HTTPConfig{Addr: ts.URL}
    client, _ := influxdb.NewHTTPClient(config)
    defer client.Close()

    // Whatever you need to check
}

You need to play carefully; these tests are slower and more expensive in resources. But I like the idea to take the faster solution when I am developing; you can come back on your tests later when the feature is more stable and better designed. Write tests should not slow me down too much, I am looking for a way to write the implementation and the test fasts to iterate on both of them other than waste time making everything perfect. Nothing will be forever; nothing will be complete in programming, so design your environment to be easy to change.

Integration tests

I am a CLI kind of person, so I often send HTTP requests via cURL. Docker is very easy from day one to start and stop your application, cleaning databases and so on.

bats combines these two sentences. It is an automation test framework for bash. It is very simple to setup, and it allows me to copy paste some cURL, and with jq, you can make the checks you need over your JSON response.

An integration test suite made with bats looks like that:

An “init” file in bash where you can run setup and teardown function before and after every test. Usually, you can you that functions to spin up and down the containers that you need to tests, this is the one that I wrote for this example

#!/bin/bash

function setup() {
  teardownCallback=$(init)
}

function teardown() {
  eval $teardownCallback
}

function getHost() {
  echo "http://localhost"
}

function init {
  executionID=$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 7 | head -n 1)
  containerLabels="exec=${executionID}"
  $(docker run -d -l $containerLabels -p 80:80 nginx)
  echo "docker ps -aq -f 'label=${containerLabels}' | xargs docker rm -f"
}

You have a set of .bats files with the various scenarios, I wrote one to check if the status code 200 for the nginx home

#!/usr/bin/env bats

load utils

@test "Nginx home return 200" {
      statusCode=$(curl -I -X GET "$(getHost)" 2>/dev/null | head -n 1 | cut
      -d$' ' -f2)
        [ $statusCode -eq 200 ]
}

What you are running is a bats test to check that nginx:latest is serving the right page. Your use case will be ten times more complicated.

Another reason to take this approach is about bash itself. If you are not a bash expert, you will probably end up to write straightforward tests, cURL, grep, regex and some pipes. Nothing more.

And you won’t use any code that runs your application. It’s important to avoid weird buggy tests.

developer happiness

Tests are a methodology to decrease the cost of maintenance and to improve your ability to write code.

It should not be a fashion way to show how good you are as a developer. You will be a good developer as a side effect.

I look at all the different way to tests my code as a tool set, AI is becoming very smart. So we need to be less “server” and more human been. 100% coverage for unit tests looks a lot like something that a server can do. Pick the right method based on your feeling.

Review book Database Reliability Engineer

2018-03-27T09:08:27+00:00

The author of Database Reliability Engineer are Laine Campbell, Charity Majors. It is published by O’Reilly. You can buy it on Amazon.

You probably know the book Site Reliability Engineer, if you don’t I reviewed it a few days ago.

This book walks the same experience but focused on databases. I work for InfluxData as SRE and I deal every days with databases running on our Cloud Product so I am in that topic but now because I am good as DBA but just as a developer with a background on distributed system and cloud computing.

It doesn’t really matter if you manage databases as DBA or if you use it as a developer this book contains useful contents for both categories.

It is a dance book, I started it ready it few months ago and at some point I spotted, just because it contains too many notions and experiences and it requires some time to metabolize them.

I particularly enjoyed the Chapter 7 about Backup and Recovery. But also Chapter 3 about about Risk Management is great because it goes deeper on how metrics should drive your way to look at risks and outage.

My daily job is all around database orchestration, containers and so on. So I found this book very useful for my day to day job and it enforced the expectation that I had about the application that I am building.

I will try to go deeper on backup management to make the recovery part of a more structure pipeline to be sure that it is always usable for example.

It is a very practical book based and you can fill all the enthusiasm and the experiences made by the two authors Charity and Laine. If you are happy to learn from who has its hands dirty this book is your book. It drives you an story and less learned.

How to manage migration and how to fill the gap between developer and DBA because they are both the same goals the success of the company and the project.

If we can keep both on the same loop avoiding backstabbing we will increase our chance of success. If you are a manages and you see some friction in your teams, this book can give you some good feedback.

How to use a Forwarding Proxy with golang

2018-03-21T09:38:27+00:00

A forwarding proxy is a proxy configuration that handle requests from a set of internal clients that are trying to create a connection to the outside.

In practice is a man in the middle between your application and the server that you are trying to connect. It works over the HTTP(S) protocol and it is implemented at the edge of your infrastructure.

Usually, you can find it in large organizations or universities and it is used as additional control mechanism for authorization and security.

I find it useful when you work with containers or in a dynamic cloud environment because you will have a set of servers for all the outbound network communication.

If you work in a dynamic environment as AWS, Azure and so on you will end up having a variable number of servers and also a dynamic number of public IPs. Same if your application runs on a Kubernetes cluster. Your container can be everywhere.

Now let’s suppose that a customer asks you to provide a range of public IPs because he needs to set up a firewall… How can you provide this feature? In some environments can be very simple, in others very complicated.

1st December 2015 a users asked this question on the CircleCI forum this request is still open. This is just an example, CircleCi is great. I am not complaining about them.

One of the possible ways to fix this problem is via the forwarding proxy. You can spin up a set of nodes with a static ip and you can offer the list to the customer.

Almost all cloud providers have a way to do that, floating ip on DigitalOcean or Elastic IP on AWS.

You can configure your applications to forward the requests to that pool and the end services will get the ip from the forward proxy nodes and not from the internal one.

This can be another security layer for your infrastructure because you will be able to control and scan packages that are going outside from your network in a really simple way and in a centralized place.

This is not a single point of failure because you can spin up more than one forward proxies and they scale really well.

Under the hood, a forward proxy is the HTTP method CONNECT.

The CONNECT method converts the request connection to a transparent TCP/IP tunnel, usually to facilitate SSL-encrypted communication (HTTPS) through an unencrypted HTTP proxy.

A lot of HTTP Client across languages already support this in a very transparent way. I built a very small example using golang and privoxy to show you how simple it is.

First of all, let’s build an application called whoyare. It is an HTTP server that returns your remote address:

package main

import (
	"encoding/json"
	"net/http"
)

func main() {
	http.HandleFunc("/whoyare", func(w http.ResponseWriter, r *http.Request) {
		w.Header().Set("Content-Type", "application/json")
		body, _ := json.Marshal(map[string]string{
			"addr": r.RemoteAddr,
		})
		w.Write(body)
	})
	http.ListenAndServe(":8080", nil)
}

You can call the GET the route /whoyare and you will receive a JSON like {"addr": "34.35.23.54"} where 34.35.23.54 is your public address. Running whoyare from your laptop if you make a request on your terminal you should get localhost as remote address. You can use curl to try it:

18:36 $ curl -v http://localhost:8080/whoyare
* TCP_NODELAY set
> GET /whoyare HTTP/1.1
> User-Agent: curl/7.58.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: application/json
< Date: Sun, 18 Mar 2018 17:36:40 GMT
< Content-Length: 31
<
* Connection #0 to host localhost left intact
{"addr":"localhost:38606"}

I wrote another application, it uses http.Client to print the response on stdout. If you have the server running you can run it:

package main

import (
	"io/ioutil"
	"log"
	"net/http"
	"os"
)

type whoiam struct {
	Addr string
}

func main() {
	url := "http://localhost:8080"
	if "" != os.Getenv("URL") {
		url = os.Getenv("URL")
	}
	log.Printf("Target %s.", url)
	resp, err := http.Get(url + "/whoyare")
	if err != nil {
		log.Fatal(err.Error())
	}
	defer resp.Body.Close()
	body, err := ioutil.ReadAll(resp.Body)
	if err != nil {
		log.Fatal(err.Error())
	}
	println("You are " + string(body))
}

So, this is a very simple example, but you can apply this example to more complex environments.

To make this example a bit more clear I created two virtual machines on DigitalOcean, one will run privoxy the other one will run whoyare.

whoyare: public ip 188.166.17.88
privoxy: public ip 167.99.41.79

Privoxy is a very simple to setup forward proxy, nginx, haproxy doesn’t fit very well for this use case because they do not support the CONNECT method.

I built a docker image gianarb/privoxy, it’s on Docker Hub. You can run and by default, it runs on port 8118.

core@coreos-s-1vcpu-1gb-ams3-01 ~ $ docker run -it --rm -p 8118:8118
gianarb/privoxy:latest
2018-03-18 17:28:05.589 7fbbf41dab88 Info: Privoxy version 3.0.26
2018-03-18 17:28:05.589 7fbbf41dab88 Info: Program name: privoxy
2018-03-18 17:28:05.591 7fbbf41dab88 Info: Loading filter file:
/etc/privoxy/default.filter
2018-03-18 17:28:05.599 7fbbf41dab88 Info: Loading filter file:
/etc/privoxy/user.filter
2018-03-18 17:28:05.599 7fbbf41dab88 Info: Loading actions file:
/etc/privoxy/match-all.action
2018-03-18 17:28:05.600 7fbbf41dab88 Info: Loading actions file:
/etc/privoxy/default.action
2018-03-18 17:28:05.607 7fbbf41dab88 Info: Loading actions file:
/etc/privoxy/user.action
2018-03-18 17:28:05.611 7fbbf41dab88 Info: Listening on port 8118 on IP address
0.0.0.0

The second step is to build and scp whoyare in your server. You can build it using the command:

$ CGO_ENABLED=0 GOOS=linux go build -o bin/server_linux -a ./whoyare

Now that we have the application up and running we can try via cURL to query it directly and via privoxy.

Let’s try directly as we did previously:

$ curl -v http://your-ip:8080/whoyare

cURL uses an environment variable http_proxy to forward the requests through the proxy:

$ http_proxy=http://167.99.41.79:8118 curl -v http://188.166.17.88:8080/whoyare
*   Trying 167.99.41.79...
* TCP_NODELAY set
* Connected to 167.99.41.79 (167.99.41.79) port 8118 (#0)
> GET http://188.166.17.88:8080/whoyare HTTP/1.1
> Host: 188.166.17.88:8080
> User-Agent: curl/7.58.0
> Accept: */*
> Proxy-Connection: Keep-Alive
>
< HTTP/1.1 200 OK
< Content-Type: application/json
< Date: Sun, 18 Mar 2018 17:37:02 GMT
< Content-Length: 29
< Proxy-Connection: keep-alive
<
* Connection #0 to host 167.99.41.79 left intact
{"addr":"167.99.41.79:58920"}

As you can see I have set http_proxy=http://167.99.41.79:8118 and the response doesn’t contain my public ip but the proxy one.

These are the logs that you should expect from privoxy for the requests crossing it:

2018-03-18 17:28:22.886 7fbbf41d5ae8 Request: 188.166.17.88:8080/whoyare
2018-03-18 17:32:29.495 7fbbf41d5ae8 Request: 188.166.17.88:8080/whoyare

The client that you run it previously by default it connects to localhost:8080 but you can override the target URL via env var URL=http://188.166.17.88:8080. Running the following command I reached directly whoyare.

$ URL=http://188.166.17.88:8080 ./bin/client_linux
2018/03/18 18:37:59 Target http://188.166.17.88:8080.
You are {"addr":"95.248.202.252:38620"}

The golang HTTP.Client supports a set of environment variables to configure the proxy, it makes everything very flexible because passing these variables to any service already running it will just work.

export HTTP_PROXY=http://http_proxy:port/
export HTTPS_PROXY=http://https_proxy:port/
export NO_PROXY=127.0.0.1, localhost

The first two are very simple, one is the proxy for the HTTP requests, the second for HTTPS. NO_PROXY excludes a set of hostname, the hostname listed there won’t cross the proxy. In my case localhost and 127.0.0.1.

HTT_PROXY=http://forwardproxy:8118
     +--------------+           +----------------+         +----------------+
     |              |           |                |         |                |
     |   client     +----------^+ forward proxy  +--------^+    whoyare     |
     |              |           |                |         |                |
     +--------------+           +----------------+         +----^-----------+
                                                                |
                                                                |
    +---------------+                                           |
    |               |                                           |
    |   client      +-------------------------------------------+
    |               |
    +---------------+
   HTTP_PROXY not configured

The client with the environment variables configured will cross the forward proxy. Other client will reach it directly.

This granularity is very important. It’s very flexible because other than a “per-process” you can also select what request to forward and what to exclude.

$ HTTP_PROXY=http://167.99.41.79:8118 URL=http://188.166.17.88:8080
./bin/client_linux
2018/03/18 18:39:18 Target http://188.166.17.88:8080.
You are {"addr":"167.99.41.79:58922"}

As you can see we just reached whoyare via proxy and the addr in response is now ours but the proxy one.

The last command is a bit weird but it is just to show how the NO_PROXY works. We are calling the proxy excluding the whoyare URL, and as expected it doesn’t cross the proxy:

$ HTTP_PROXY=http://167.99.41.79:8118 URL=http://188.166.17.88:8080 NO_PROXY=188.166.17.88 ./bin/client_linux
2018/03/18 18:42:03 Target http://188.166.17.88:8080.
You are {"addr":"95.248.202.252:38712"}

Let’s read this article as a practical introduction to golang, forward proxy. You can subscribe to my rss feed or you can follow me on @twitter. Probably I will write about how to replace privoxy with golang and about how to setup and deploy this solution on Kubernetes. So let me know what to write first!

The abstract manifesto

2018-03-17T10:08:27+00:00

This is a personal outburst. Stop to abstract by default.

I worked on too many application abstracted by default. And abstracted I mean complicated.

The abstraction is easy to understand when you need it. If you need to think too much about why that crappy code should have an interface, or that made should implement an interface and not an object you are out of track.

Abstraction is not the answer, code architecture is, unit testing helps, integration tests are the key to the modern microservices environment.

Don’t waste your time creating interfaces that nothing will reuse. If you don’t know what to do run.

There are languages and design pattern that probably set your brain to look for abstraction everywhere. I worked with Java developer that wasn’t able to write a class without an interface, or without its abstract. My question was: “Why are we doing that?”. Compliance.

Dude, your world is a very boring one, and you are the root cause.

If you are working in a service-oriented environment with services big enough to be rewritten easily the abstraction is even more useless.

We are developers, we often don’t build rockets. That’s life, there are a good amount of companies that make rockets, apply there or you will put your company in the condition of paying technical debts for you and they will hire smart contractors to figure out what you did now that you are not working there because after probably just one year you locked yourself in that boring project full of complicated concepts.

btw, I don’t think that software to control rockets has a lot of abstractions sorry.

Y'all are all about passionate programmers, but honestly I'd rather programmers than care _just enough_. I could do with less pedantic arguments about code.
— ｡ 𝕷𝖎𝖓𝖉𝖘𝖊𝖞 𝕭𝖎𝖊𝖉𝖆 ｡ (@lindseybieda) March 1, 2018

I saw this tweet on my timeline yesterday and I think it really describes my current mood. The code changes over time and I should spend more time making it flexible enough to support this continues to grow. Abstraction is not the right way.

So, passionate code engineer always abstract? That’s not the giveaway, Java engineer always abstract? maybe.

Review book Site Reliability Engineering

2018-03-15T10:08:27+00:00

I bought Site Reliability Engineering a lot of months ago. I read the ebook first but I am the kind of people that buy also paper books when they are good, so if you are working on a distributed and scalable environment it’s something that you should read.

Published by O’Reilly and edited by Betsy Beyer, Chris Jones, Jennifer Petoff and Niall Richard Murphy it is written by many Google engineers and it’s about the experience they made scaling services like Google Maps, Calendar, YouTube and all the other products.

I spoke with different people about this book and a lot of them told me that there is nothing new on that. It’s just cool because Google made it cool.

I have a different option. It’s a nice book because it is a complete source of information about design and processes in a highly scalable environment. Probably some of them topic are well known but it’s hard to find all this information in a single place.

To be fair, it has 524 pages so it’s not a fast read. It took me few months but I keep it around when I need to explain concepts like how to dimension and measure loads in a services environment. SLA, SLO and how to use them properly to manage and measure risks are well explained, circuit breaking and more, in general, a lot of good procedures about resiliency, teamwork, delivery are explained in this book.

There is a nice chapter about how to use the metrics to set up a function and smart alerting system to keep engineer on-call in a safe and comfortable environment.

Another one about how Google design resilient applications and how they dimension services. How much and how deep they know their services impressed me a lot.

Site Reliability Engineering is a good mix of concepts that you can apply through your day to day no-at-Google job and the all the Google scale “freaky fun”.

So, in the end, I will define like the bible for an engineer that wish to work in a high-scalable environment. Doesn’t matter if you are not there yet or if you won’t serve millions of requests per second. It’s good to read and to keep around.

The HTML version of this book is now available online for free.

What is distributed tracing. Zoom on opencensus and opentracing

2018-02-18T10:08:27+00:00

A few months ago I started to actively study, support and use opentracing and more in general the distributed tracing topic.

In this article, I will share something about what I have learned starting from the basic I hope to get your thoughts and questions via Twitter @gianarb.

We all know the trend for the last couple of years. Spread applications across containers, cloud providers and split them into smallest units called services or microservices, pets…

This procedure brings a lot of advantages:

you can manage people in a better way and spread them across this small units.
small units are easy to understand for new people or after a couple of months. In a work like our where there is a high turnover having the ability to rewrite a service if nobody knows it in a couple of days it’s great.
You can monitor these units in a better way and if you detect scalability problems or bottleneck you can stay focused on the specific problem without having other functions around. It enforces the single responsibility in some way.

Btw there are other points for sure, but the last one is very important and I think it helps us to understand why tracing is so important now.

We discover that monitor this pets is very hard and it’s different compared to the previous situation. A lot of teams discovered this complexity moving forward with services making noise in production.

Our focus is not on the virtual machine, on the hostname or the even on the container. I don’t care about the status of the server. I care about the status of the service and even deeper I care about the status of a single event in my system. This is also one of the reasons why tools like Kafka are so powerful and popular. Reply a section of your history and collect events like user registration, new invoice, new attendees register at your event, new flight booked or new article published are the most interesting part here.

Servers, containers should be replaceable things and they shouldn’t be a problem. The core here is the event. And you need to be 100% sure about having it in someplace.

Same for monitoring, if the servers, containers are not important but events are you should monitor the event and not the server.

Oh, don’t forget about distribution. It makes everything worst and more complicated my dear. Events move faster than everything else. They are across services, containers, data centers.

Where is the event? Where it failed. How a spike for particular events behave on your system? If you have too many new registrations are you still able to serve your applications?

In a big distributed environments what a particular service is calling? Or is it used? Maybe nobody is using it anymore. These questions need to have an answer.

Distributed tracing is one of the ways. It doesn’t solve all the problems but it provides a new point of view.

In practice speaking in HTTP terms tracings translate on following a specific request from its start (mobile app, web app, cronjobs, other apps) so it’s the .

Registering how many applications it crosses to, for how long. Labeling these metrics you can event understand latency between services.

from https://www.hawkular.org/

Speaking in the right language, this image describes a trace. It’s an HTTP to fronted service. It’s a GET request on the /dispatch route. You can see how far you can go. A trace is a collection of spans.

Every span has it’s own id and an optional parent id to create the hierarchy. Spans support what is called Span Tags. It is a key-value store where the key is always a string and some of them are “reserved” to describe specific behaviors. You can look at them inside the specification itselt. Usually, UI is using this standard tag to build a nice visualization. For example if a span contains the tag error a lot of tracers colored it red.

I suggest you read at the standard tags because it will give you the idea about how descriptive a span can be.

The architecture looks like this:

   +-------------+  +---------+  +----------+  +------------+
   | Application |  | Library |  |   OSS    |  |  RPC/IPC   |
   |    Code     |  |  Code   |  | Services |  | Frameworks |
   +-------------+  +---------+  +----------+  +------------+
          |              |             |             |
          |              |             |             |
          v              v             v             v
     +-----------------------------------------------------+
     | · · · · · · · · · · OpenTracing · · · · · · · · · · |
     +-----------------------------------------------------+
       |               |                |               |
       |               |                |               |
       v               v                v               v
 +-----------+  +-------------+  +-------------+  +-----------+
 |  Tracing  |  |   Logging   |  |   Metrics   |  |  Tracing  |
 | System A  |  | Framework B |  | Framework C |  | System D  |
 +-----------+  +-------------+  +-------------+  +-----------+

from opentracing.org

There are different instrumentation libraries across multiple languages and you need to embed one of them in your application. It usually provides a global variable where you can add spans too. Time by time they are stored in the tracer that you select. If you are using Zipkin as tracer you can select different backends like ElasticSearch and Cassandra. Tracers provides API and UI to store and visualize traces.

As you can see from the graph above Opentracing “is able” to push to Tracers, Logging system, metrics and so on. With my experience with opentracing, I don’t know how this can be done.

I always used it with a Tracer like Zipkin or Jaeger to store spans. Logs are covered by the spec because you can attach to every spans one or multiple Span Logs.

each of which is itself a key:value map paired with a timestamp. The keys must be strings, though the values may be of any type. Not all OpenTracing implementations must support every value type.

from opentracing.org

The idea behind this feature is clear. There are too many buzzwords: metrics, logs, events, time series and now traces.

It’s easy to end up with more instrumentation libraries that business code. That’s probably why opentracing cover this uses case. Logs and traces are time series. That’s probably why metrics are there.

Using the go-sdk it looks like this:

span, ctx := opentracing.StartSpanFromContext(ctx, "operation_name")
    defer span.Finish()
    span.LogFields(
        log.String("event", "soft error"),
        log.String("type", "cache timeout"),
        log.Int("waited.millis", 1500))

But I am not able to find a way to say: “Forward all these logs to ….elastic and this traces to Zipkin”. And I don’t know if the expectation is to have tracers smart enough to do that. But from my experience trying to extend Zipkin, this looks like a hard idea. At first, because the tracers are out of the OpenTracing’s scope.

If the goal is to wrap together everything logs have precise use case from ages. They work pretty well and you can’t change the expectation. They can be a real-time stream on stdout, stderr and/or other thousands of exporter. I can’t find this kind of work there. So, looking at the code it’s not clear who is in charge of what. But the graph is pretty.

I like the idea and I started looking at OpenCensus a library open sourced by Google from its experience with StackDriver and the Google’s scale. It has its specification and it provides a set of libraries that you can add to your application to get what they call stats, traces out from your app. Stat stays for metrics, events. It’s another buzz probably!

The concept looks similar to OpenTracing, obviously, the specs are different.

Looking at the code, the go-SDK looks a lot more clear. I can clearly see stats and tracing objects, they both accept exporters and they can be Prometheus, Zipkin, Jaeger, StackDriver and so on. I like the idea that the exporter is part of the project, you don’t need a tracing application like Zipkin, you can write your exporter to store data in your custom database and you are ready to go.

.
├── appveyor.yml
├── exporter
│   ├── jaeger
│   ├── prometheus
│   ├── stackdriver
│   └── zipkin
├── internal
├── plugin
├── README.md
├── stats
│   ├── internal
│   ├── ...
│   └── view
├── tag
├── trace

You can probably do the same with OpenTracing writing your tracer that store things in your custom databases jumping Zipkin and Jaeger, it looks a bit more complicated looking at the interface:

// opencensus-go/trace/export.go

// Exporter is a type for functions that receive sampled trace spans.
//
// The ExportSpan method should be safe for concurrent use and should return
// quickly; if an Exporter takes a significant amount of time to process a
// SpanData, that work should be done on another goroutine.
//
// The SpanData should not be modified, but a pointer to it can be kept.
type Exporter interface {
	ExportSpan(s *SpanData)
}

// opentracing tracer

type Tracer interface {
	// Create, start, and return a new Span with the given `operationName` and
	// incorporate the given StartSpanOption `opts`. (Note that `opts` borrows
	// from the "functional options" pattern, per
	// https://dave.cheney.net/2014/10/17/functional-options-for-friendly-apis)
	//
	// A Span with no SpanReference options (e.g., opentracing.ChildOf() or
	// opentracing.FollowsFrom()) becomes the root of its own trace.
	//
	StartSpan(operationName string, opts ...StartSpanOption) Span

	// Inject() takes the `sm` SpanContext instance and injects it for
	// propagation within `carrier`. The actual type of `carrier` depends on
	// the value of `format`.
	//
	// OpenTracing defines a common set of `format` values (see BuiltinFormat),
	// and each has an expected carrier type.
	//
	// Other packages may declare their own `format` values, much like the keys
	// used by `context.Context` (see
	// https://godoc.org/golang.org/x/net/context#WithValue).
	//
	Inject(sm SpanContext, format interface{}, carrier interface{}) error

	// Extract() returns a SpanContext instance given `format` and `carrier`.
	//
	// OpenTracing defines a common set of `format` values (see BuiltinFormat),
	// and each has an expected carrier type.
	//
	// Other packages may declare their own `format` values, much like the keys
	// used by `context.Context` (see
	// https://godoc.org/golang.org/x/net/context#WithValue).
	//
	Extract(format interface{}, carrier interface{}) (SpanContext, error)
}

OpenTracing doesn’t care about exporter and tracers, something else handle that complexity, (the user, me.. bored) the standard only offers interfaces. I don’t know if this is good. It really looks a lot more like a common interface between traces. I like the idea, but I need a lot more.

Now, writing this article I understood that I have a lot more to figure out about this projects, sadly I realized that in practice they are even more similar compared my feeling before writing all this down.

Tracing, metrics and instrumentation libraries remain crucial from my point of view. You can write everything you want but if you are not able to understand what’s happening you are not making a good job. You look like a monkey.

Personal I would like to find a common and good library to wrap together all the buzzwords stats, spans, traces, metrics, time series, logs because they are all the same concept just from a different point of view.

Everything is a point in time, grouped, ordered or with a specific hierarchy. You can use them as aggregate, to compare, to alert and so on. A powerful implementation should be able to combine both needs an easy ingestion with a clear output.

I think that OpenTracing has a lot to do from both sides in and out. OpenCensus looks good from an ingestion point of view. Nothing about logs in OpenCensus maybe because they are good enough as they are but we need to be able to cross reference logs, traces, metrics from infrastructure and application events from dashboards and automatic tools.

It looks like, with both setup, that you still need a platform capable to serve and use this data. A lot of people will answer that it’s out of scope for these projects, but I am pretty sure we all learned that just storing events is not enough.

The right balance

2018-02-09T10:08:27+00:00

My daily job as a developer other than programming is about finding the right balance between different things:

Buzzword driven development vs rock solid boring things.
New technologies vs something already adopted inside the company.
State of the art implementation. Something that can work in an hour.

And many more. I am sure you will have your own list. What I noticed during these years working for different companies, with different teams and in different states is that there are a lot of kind of developers, we are full of companies and projects to develop.

You should really look for the right place for you, but you need to know what to look for.

What I am trying to say that people, colleagues, companies can help you to find your balance be proactive on this research. Speak with your manager or with your colleagues about what you are happy to do or not you will find interesting answers if you are working with people with your same mindset. You will probably come up selling and buying tasks and issues from member of the teams because they like more what you are doing and vice versa.

Sometime you will even have the sensibility to grab a bored task just to have it done and leave your colleagues free to do something more fun.

This is the kind of work that I like. I am almost sure about that now. A place where shit needs to be done and you have an active word on how, it doesn’t need to be a decision, sometime I don’t know what’s better for the company or for the project, but can’t be always a black box that come from the ground and needs to be done. I am not an check list implementer and dealing with my colleagues and other people is an active and very good part of my day.

When it’s time to have fun because things are going well and even when you are the unique different voice out form a meeting. Find the right place where you feel free to say your opinion, be wise and mature enough to know that your opinion can be good or bad can’t be always the one to follow.

The right balance between all of these things, across teams, companies gives to me the feel to be in the right place.

Kubernetes up and running

2017-12-19T10:08:27+00:00

I read “Kubernetes up and running” an O’Reilly book written by Kelsey Hightower, Brendan Burns and Joe Beda.

It looks like the instrumental manual, you look for it when you buy something. Based on your knowledge of it you read or not that manual.

I have a good knowledge of containers, orchestrator and cloud computing but I never worked with Kubernetes until 2 weeks ago when I started a co-managed k8s cluster on scaleway with Lorenzo.

The book is well written, I read it in less than one week. The chapters are well split because I was able to jump “Builiding a Raspberry Pi Kubernetes cluster” because I am not really interested without any pain.

Chapters like “Deploying real world applications”, “Service Discovery” are good and the book covers all the basic concepts that you need to know about Kubernetes. You can feel all the experience that the three authors have on the topic. There are gold feedbacks about what they learned using and building what is now the orchestration standard.

Just to summarize, if you are using kubernetes and you like papers, this book is a good way to have a documentation on paper. If you are new to Kubernetes is the best way to start.

Thanks to all the authors! If you have any questions let me know @gianarb

Desk setup

2017-12-17T10:08:27+00:00

As you probably know in April 2017 I moved back after an year and half in Dublin and I started to work from home as SRE at InfluxData.

I am ready to write a small post about my current setup.

Ikea Markus

First things I bought an Ikea Markus. It is comfortable and it has a competitive price. It’s flexible and well designed.

I don’t have a lot to say about it. If you are not passionate about expensive and weird chairs you can go for this one. It will work!

Stand Mount

My setup counts two boring Asus monitor, one horizontal and one vertical, Lorenzo suggested to me this standmount support for two monitor.

Day by day I discover how bad I am using more than one monitor. Change focus so often is not for me but I like the vertical monitor when I am debugging some weird application.

Asus Zenbook 3

I have an Asus Zenbook 3 the only usb-c is kind of a pain. I am a traveler and a speaker. I don’t enjoy its low weight (900gr) that often because I always need some adapter.

For traveling the adapter Asus Universal Dock is good. It is embedded with a charger it means that you need to have power supply to use it. I wrote an article about it and I was very disappointed about the product. But now that I am using it only for traveling purpose it’s not too bad.

If you are a multi desktop user you need to remember that it doesn’t have an external video card, it has VGA and HDMI but you can use only one of them at the time.

I used Ubuntu 17.04 and 17.10. Now I am using ArchLinux and both laptop and Universal Dock need to install some drivers, to work a bit on audio configuration and so on. But it’s a good challenge and almost everything works out of the box.

Logitech C922

The embedded Asus WebCam is not great. If you are looking for high definition or an acceptable quality you need to have an external webcam.

I work from and when I have a meeting with colleagues and friends I would like to offer to them good experience.

The Logitech C922 is not powerful enough to make me beautiful but it makes an amazing work and it’s very good.

If you record tutorials or workshop you should think about having one of this. It comes with a small tripod to setup it where ever you like.

USB-C adapter

As I told you the world is not ready for the USB-C but I am! Plugable makes my life very simple.

WebCam, two monitors, Ethernet cable are always attached to it and I just need to plug my laptop in via the USB-C and everything will happen.

It’s an expensive toy but I am using it on Linux and it’s working. The company doesn’t officially support it but there is a DisplayLink driver open source on GitHub that you can use.

Desk

Last but not least I have a standing desk.

I think a good chair, gym, swim are better solution to keep you healthy but I change my point of view and my position help me to stay focused.

Every time I have a boring or complex task at some point just toggle my current position from down to up or vice versa gives me some fresh power to end it well.

I monitored the Ikea Bekant for a lot of months but I was not sure about investing money a standind desk.

I looked at them for so long that Ikea started a very good discount campaign and I just bought it. I took only the mechanical legs because I like the feeling of real wood and I bought the table separately.

That’s it! Bye!

From Docker to Moby

2017-10-20T10:08:27+00:00

At DockerCon 2017 Austin Moby was the big announcement.

It created confusion and some communities are still trying to understand what is going on. I think it’s time to step back and see what we have after seven months after the announcement.

containerd is living a new life, the first stable release will happen soon. It has been donated to CNCF.
notary is the project behind docker trust. I wrote a full e-book about Docker Security if you need to know more. This also has been donated to the CNCF.
github.com/docker/docker doesn’t exist anymore there is a new repository called github.com/moby/moby .
CLI has a separate home.
docker-ce is the first example of moby assembling. It is made my Docker Inc.

Containers are not a first class citizen in Linux.

They are a combination of cgroups, namespaces and other kernel features. They are also there from a lot of year. LXD is one of the first project that mentioned container but the API wasn’t really friendly and only few people are using it.

Docker created a clean and usable api that human beings are happy to use. It created an ecosystem with an amazing and complete UX. Distribution, Dockerfile, docker run, docker image and so on.

That’s what Docker is, in my opinion. Other than a great community and a fast growing company.

What Docker is doing with Moby is to give the ability to competitors, startups, new projects to join the ecosystem that we built in all these 4 years.

Moby in other hands is giving the ability at Docker to take ownership of the clean and usable experience. The Docker CLI that we know and use every day will stay open source, but not the moby project’s part. It will be owned by Docker. As I wrote above, the code is already moved out.

Moby allows other companies and organisations to build their user interface based on what they need. Or to build their product on top of a open source project designed to be modular.

Cloud and container moves fast Amazon with ECS, RedHat with OpenShift, Pivotal with Cloud Foundry, Mesos with Mesosphere, Microsoft with Azure Container Service, Docker with Docker, they are all pushing hard to build projects around containers to sell them at big and small corporations to make legacy projects less bored.

Legacy is the new buzzword

Docker will continue to assemble and ship docker as we know it. The project is called docker-ce:

apt-get install docker-ce
docker run -p 80:80 nginx:latest

Everything happens down the street, in the open source ecosystem. Moby won’t contain the CLI that we know.

Moby won’t have the swarmkit integration as we know it. It was something that Docker as company was looking to have. Mainly to inject an orchestrator in million of laptops. Other companies and projects that are not using swarm don’t need it and they will be able to remove it in some way.

Companies like Pivotal, AWS are working on containerd because other the runtime behind Docker it’s what matters for a lot of projects that are just looking to run containers without all the layers on top of it to make it friendly. ECS, Cloud Foundry are the actual layers on top of “what runs a container”.

Container orchestrator doesn’t really care about how or who spins up a container, they just need to know that there is something able to do that.

It is what Kubernetes does with CRI. They don’t care about Docker, CRI-O, containerd. It’s out of scope they just need a common interface. In this case is a gRPC interface that every runtime should implement. Here a list of them:

That’s a subset of reasons about why everything is happening:

Docker Inc. will be free to iterate on their business services and projects without breaking every application in the world. And they will have more flexibility on what they can do as company.
The transaction between Docker to Moby is the perfect chance to split the project to different repositories we already spoke about docker-cli, containerd and so on.
Separation of concern is popular design pattern. Split projects on smallest libraries allow us to be focused on one specific scope of the project at the time. buildkit is the perfect example. It’s the evolution of the docker build command. We had a demo at the MobySummit and it looks amazing!

That’s almost it. Let’s summarise:

Are you a company in the container movement? You are competing with Docker building container things and you was complaining about them breaking compatibility or things like that now you should blame the Moby community.

Are you using docker run? You are fine! You will be able to do what you was doing before.

Are you a OpenSource guru? Maybe you will be a bit disappointed if you worked hard on docker-cli and now Docker will bring your code out but you signed a CLA, the CLI will stay open source. Blame yourself.

That’s it! Or at least that’s what I understood.

Git git git, but better

2017-10-10T10:08:27+00:00

I can’t say that Git is a new topic. Find somebody unable to explain how a version control system was working was very hard. Now it’s almost impossible.

I used SVN and Git for many years and I also put together some unusual use case for example: “Splitting Zend Framework Using the Cloud” is a project that I made with Corley SRL my previous company and the Zend Framework team.

It helped me to put my hands down on the Git file system and I discovered a lot of features and capabilities that are not the usual: commit, checkout, reset, branch, cherry-pick, rebase and so on.

But during my experience building cloud at InfluxData I need to say that I can see a change of my mindset, I am sharing this because I am kind of proud of this. It’s probably not super good looking at the time required to achieve this goals but how cares!

Sometimes it’s the journey that teaches you a lot about your destination. (Drake)

I don’t know this Drake, I am not even sure if it’s the right author of the quote but that’s not the point.

At InfluxData, just to give you more context, I am working on a sort of scheduler that provisions and orchestrate servers and containers on our cloud product. A lot of CoreOS instances, go, Docker and AWS api calls.

It’s a modest codebase in terms of size but it is keeping up a huge amount of servers, I am actively working on the code base almost by myself and I am kind of enjoying this. Nate, Goller and all the teams are supporting my approach and are using it but I am not using Git because hundreds of developers need to collaborate on the same line of code. I had some experience in that environment working as contributor in many open source project. But this time is different.

I am mainly alone on a codebase that I didn’t start and I don’t know very well, this project is running in production on a good amount of EC2.

I really love the idea of having a clean and readable Git history. I am not saying that because it’s cool. I am saying that because every time I commit my code I am thinking about which file to add/commit -a is not really an option that I use that much anymore. I think about the title and the message.

I try to avoid the WIP message and I use it only if I am sure about a future squash, rebase and if I need to push my code to ask for ideas and options (as I said I am writing code almost alone, but I am always looking for support from my great co-workers).

This has a very big value I think also as remote worker. This is my first experience in this environment and for a no-native English a good and self explanatory title can be the hardest part of the work but it will help other people to understand what I am doing.

When you are working on a new codebase and you have tasks that require refactoring to be achieved in a fancy and professional way you will find yourself moving code around without really be able to figure out when and how it will become useful to close your task and open the PR that your team lead is waiting for. At the end if you start to write code and you commit your changes at the end of the day as I was doing at the beginning after a couple of days you will figure out that your PR is too big and you are scared to merge them. And probably it’s just the PR that is preparing the codebase to get the initial requests. I hated the situation but if you think about what I wrote you will find that it’s totally wrong.

VCS is not there as saving point, you are not plaining Crash Bandicoot anymore, you don’t need to use Git as your personal “ooga booga”. The right commit contains an atomic information about a feature, bug fix or whatever.

These are the questions that I am asking myself now before to make a commit:

am I confident cherry-picking this commit to master? This is a good way to make your commit small and easy to merge. If one of your PR is becoming too big and you have “cherry-picked” commits you can select some of them merge them as single PR.
are deploy and rollback easy actions? This is similar to previous one but I am the one that deploy and monitor the service in production. I need to ask this question to myself before every merged PR.
Looking at the name of the branch that in my case the task in my viewfinder the commit that I am creating is about it or can I create a new PR just for this piece of code? This helps me a log to split my PR and to them small. A small PR is easier to review, it has a better scope and it makes me less scared to deploy it.

Git is more than a couple of commands that you can execute. You need to be in the right mindset to enjoy all the power.

Orbiter the Docker Swarm autoscaler on the road to BETA-1

2017-08-09T10:08:27+00:00

Orbiter is an open source project written in go hosted on GitHub. It provides autoscaling capabilities in your Docker Swarm Cluster.

As you probably know at the moment autoscaling is not a feature supported natively by Docker Swarm but this is not a problem at all.

Docker Swarm provides a useful API that helps you improving its capabilities.

I created Orbiter months ago as use case with InfluxDB and to allow services to scale automatically based on signal up or down. You can follow the webinar that I made with InfluDB here.

This article is not about “How it works”. You can read more here about how it works and you can watch the embedded video that I made in the Docker HQ in San Francisco.

Yesterday we made some very good improvements and we are moving forward to tag the first beta release. I need to say a big thanks to Manuel Bovo. He coded pretty much all the features listed here.

PR #26 e2e working example. Please try it.
PR #27 Now Orbiter has background job that listen on the Docker Swarm event API and register and de-register new services deployed with right labels. You don’t need to restart orbiter anymore. It detect new services automatically.
PR #29 Fixed the up/down range. Now we can not scale under 1 tasks but we can scale up services with 0 tasks.
PR #31 We have a cooldown period configurable via label orbiter.cooldown. This fix avoid multiple scaling in a short amount of time.
PR #32 We are migrating our API base root. Now all the API are /v1/orbiter/...... At the moment we are supporting old and new routes. In October I will remove the old one. Please migrate to /v1/orbiter/.... now!.

Now?

That’s a good question, but I have part of the answer. In October the plan is to release a BETA and finally the first stable version but what we need to do to go there?

Offer a proper auth method. Manuel started this PR. I have some concerns but we are on the right path.
Make orbiter “Only-Swarm”. The project started with the vision to become a general purpose autoscaler. But this is not in line with the idea of single responsibility and we designed a very clean API for Docker Swarm, make it usable in other context is not going to work. We tried it with DigitalOcean but the api and the project looks too complex and I love simplicity.
Get other feedback from the community to merge valuable features before the stable release.

That’s it! Share it and give it a try! For any question I am available on twitter (@gianarb) or open an issue.

Asus universal dock station driver

2017-08-03T10:08:27+00:00

Every developer loves to share things about it’s setup. They also loves to make it better and to spend time on it.

Lorenzo (fntlnz) is super on it! I am not, plus I bought a Zenbook 3. Super slim, less than 1kg, I can use it to cut ham probably but the unique USB-C is driving me crazy.

Probably more than the actual 40 degrees that I have in my home office now! It is probably why I am writing this post btw.

When I bought this laptop 7 months ago the Universal Docker Station was not available and I wasn’t even able to install linux on this laptop.

Now I have an Asus Universal Dock station. I am feeling a little bit better but to work it replace a normal charger, it means that without a socket near you, I can not use a USB… Amazing experience.

I tried other adapter but I didn’t find one good enough. Every one of them had some input or output port unusable for some reason. Most of them because the BIOS has a different watt limit and they can not charge the laptop. I never received a response from ASUS about it. That’s great.

Anyway I am writing this article just as note for myself about the driver that Lorenzo discover to have the Asus Universal Dock Station’s ethernet port running.

Realtek ethernet driver. It’s super easy to install. Just compile it and it will work.

CNCF Italy, first event about opentracing

2017-06-05T10:08:27+00:00

CNCF is a branch of The Linux Foundation focused on Cloud Computing and modern scalable architectures. it’s supporting tools like Kubernetes, Prometheus, containerd and so on. If you are using one of them or you are looking to know more about them, this is your meetup. Join us! hashtag #CNCFItaly on twitter.

The event will be 13th July 19.00pm at Toolboox Office in Turin. Reserve your seat on Meetup.com.

This will be a full evening about OpenTracing. OpenTracing turns the light on for microservices. It is a specification to store and manage trace. How can we follow what’s going on from the beginning to the end of our requests? What’s happening when they cross different services? Where is the bottleneck? Tracing helps you to understand what’s going on. It’s not just for microservices but also for caching, queue system an so on. Have a look at the trends we need to know more about it!

Beers, Pizza are offered by CNCF after the two sessions!

All done!

Amazing event! here some pictures and the video is coming soon!

Container security and immutability

2017-06-05T10:08:27+00:00

55 pages about how to improve container security. @ciliumproject #BPF, best practices, @coreos clair, #apparmor https://t.co/ABiuldYA9b pic.twitter.com/61jzWxzb1Y
— :w !sudo tee % (@GianArb) June 5, 2017

Security is a fascinating topic. It’s part of every aspect of a system. From your email server to the HTTP body validation of your API system.

It’s also a very human centric topic. You can use the most stronger security approaches but if your rules are too hard to follow or too complicated to implement the end users or your colleagues will become the perfect breach to be exploited by bad people.

In distributed systems there are interesting challenges like:

How can we trust the instances part of the system itself? I mean, how can we trust a new application after a pool scale?
All the traffic generated by the system needs to be locked. The network topology grows with the number of services that we add but that’s not a good excuse to leak on responsibility about how we do manage our network.

When you design a system you need to think about security from different points of view:

Security needs to be efficient. This seems obvious but it’s always something to keep in mind.
It needs to be easy to use in development mode. As we said before. If security is making things slower somebody will turn it off.
If you are good enough to make it easy, it will be easier to enforce a secure behavior.

All this concepts are well applied in all different projects built by the Docker Community. Notary, swarmkit are just a few examples but if you think about the update framework (TUF) and the whole set of things happening behind every docker push and pull command you suddenly see a great example on how to make complicated things really easy to use.

I published an ebook that you can download for free here. It contains ~55 pages about Container and Docker Security. In this article I will share one of the concept expressed in that book, Immutability.

Docker containers are in fact immutable. This means that a running container never changes because in case you need to update it, the best practice is to create a new container with the updated version of your application and delete the old one.

This aspect is important from different point of views.

Immutability applied to deploy is a big challenge because it opens the door to a very different set of release strategies like blue green deployment or canary releases. Immutability also lowers rollback times because you can probably keep the old version running for a little longer and switch traffic in case of problems.

It’s also a plus from a scalability and stability point of view. For each deploy you are in fact using provisioning scripts and build tools to package and release a new version of your application. You are creating new nodes to replace the old ones that means that you are focused on provisioning and configuration management. You are justifying all the effort spent to implement infrastructure as code.

It matters also for security because you will have a fresh container after each update and in the case of a vulnerability or injection they will be cleaned during the update.

You have also an instrument to analyse the attacked container with the command docker diff This command shows the differences in the file system.

It supports 3 events:

A - Add
D - Delete
C - Change

In case of attack, you can commit the attacked container to analyse it later and replace it with the original image.

This flow is interesting but if you know that your application does not need to modify the file system you can use –read-only parameters to make the fs read only or you can share a volume with the ro suffix -v PWD:/data:ro.

Docker can’t fix the security issues for you, if your application can be attacked by a code injection then you need to fix your app but Docker offers a few utilities to make life hard for an hacker and to allow you to have more control over your environment.

During this chapter we covered some practices and tools that you can follow or use to build a safe environment.

In general, you need to close your application in an environment that provides only what you need and what you know.

If your distribution or your container has something that you don’t have under your control or it is unused then it is a good idea remove these dark points.

That’s all. Immutability is not free and it requires to keep all the tools and processes involved in deploy, packaging up to speed because all your production environment depends on these tools. But it’s an important piece of the puzzle. To read more about tools like Cilium, CoreOS Clair, best practices about registry and images you can download the pdf Play Safe with Docker and Container Security.

Orbiter an OSS Docker Swarm Autoscaler

2017-04-22T08:08:27+00:00

My presentation at the Docker HQ in San Francisco.

Autoscaling

One of the Cloud’s dreams is a nice world when everything magically happen. You have unlimited resources and you are just going to use what you need and to pay to you use. To do what AWS provides a service called autoscaling-group for example. You can specify some limitation and some expectation about a group of servers and AWS is matching your expectation for you. If you are able to make an automatic provision of a node you can use Cloudwatch to set some alerts. When AWS trigger these alerts the austocaling-group is creating or removing one or more instance.

let’s try with an example

You have a web service and you know that for 2 hours every day you don’t need 4 EC2 because you have a lot of traffic, you need 10 of them. You can create an autoscaling group, set some alerts:

When the memory usage is more than 65% for 3 minutes start 3 new servers.
When the memory usage is less than 30% for 5 minutes stop 2 servers.

Just to have an idea. In this way AWS knows what do you and you don’t need to stay in front of your laptop to wait something happen. You can just do something funny.

It’s something useful, if you think about a daily magazine, they usually has a lot of traffic in the beginning of the day when all the people are usually reading news. At that’s an easy scenario.

But it can also happen than a new shared on reddit or HackerNews is getting a lot of traffic and the last thing that you are looking for is to go down just during that spike!

Actors

There are different actors in this comedy. First of all our cluster needs to be manageable by outside via API. In this example I am going to use Docker Swarm, Orbiter supports a basic implementation for DigitalOcean but it still requires some toning.

You need to have some time series database or analytics platform that can trigger webhook to trigger orbiter based on some metrics.

We ran a demo with the TICKStack (InfluxDB, Telegraf, and Kapacitor) days ago. It’s available at this link.

In the end you need to deploy orbiter.

Orbiter, design and arch

Orbiter is an open source tool designed to be a cross platform autoscaler. It is in go and it provides a REST API to handle scale requests.

It provides one entrypoint:

curl -v -d '{"direction": true}' \
    http://localhost:8000/handle/infra_scale/docker

direction represent how to scale your service, true means up, false means down.
/handle/infra_scale/docker identify the autoscaling group. infra_scale is the autoscaler name, docker is the policy name.

infra_scale for example contains information about the cluster manager, where it is, what is it? Docker or Digitalocean or what ever?

The policy describes how an application scale. If you know a bit Docker Swarm docker is the name of the service.

Orbiter supports two different boot methods. One is via configuration:

autoscalers:
  infra_scale:
    provider: swarm
    parameters:
    policies:
      docker:
        up: 4
        down: 3

The second one is actually only supported by Docker Swarm and it’s called autodetection. In practice when you start orbiter, it’s looking for a Docker Swarm up and running. If it finds Swarm it’s going to list all the services deployed and it’s going to manage all the services labeled with orbiter=true.

By default up and down are set to 1 but you can override them with the label orbiter.up=3 and orbiter.down=2.

Let’s suppose to have a Docker Swarm cluster with 3 nodes.

$ docker node ls
ID                           HOSTNAME  STATUS  AVAILABILITY  MANAGER STATUS
11btq767ecqhelidu8ah1osfp *  node1     Ready   Active        Leader
ptre8d4bjccqi6ml6z445u0mz    node2     Ready   Active
q5rwi3cej9gc1vqyscwfau640    node3     Ready   Active

I deployed a service called gianarb/micro. It is an open source demo application. There are different versions, I deployed the version 1.0.0. It only shows the current IP of the container/server.

docker service create --label orbiter=true \
    --name micro --replicas 3 \
    -p 8080:8000 gianarb/micro:1.0.0

You can check the number of tasks running with the command:

$ docker service ps micro
ID                  NAME                IMAGE                 NODE
DESIRED STATE       CURRENT STATE            ERROR
         PORTS
         onsqgriv3nel        micro.1             gianarb/micro:1.0.0   node3
         Running             Running 51 seconds ago

         yxtxyder7bs3        micro.2             gianarb/micro:1.0.0   node1
         Running             Running 51 seconds ago

         lyzxxdc00052        micro.3             gianarb/micro:1.0.0   node2
         Running             Running 52 seconds ago

At this point you can visit port 8080 of your cluster to have a look of the service but for this demo doesn’t really matter. We are going to start orbiter and we are going to trigger a scaling policy to simulate a request made by our monitoring tool.

docker service create --name orbiter \
    --mount type=bind,source=/var/run/docker.sock,destination=/var/run/docker.sock \
    -p 8000:8000 --constraint node.role==manager \
    -e DOCKER_HOST=unix:///var/run/docker.sock \
    gianarb/orbiter daemon --debug

I am using Docker to deploy orbiter as service. I am using the Unix Socket to communicate with Docker Swarm and I am deploying this service into the manager because it needs to have write permission to start and stop tasks. This can be done only into the manager. You can configure orbiter with the variable DOCKER_HOST to use REST API. In this way you don’t have this constraint. This configuration in very easy to show in a demo like this one.

$ docker service logs orbiter
orbiter.1.zop1qkwa1qxy@node1    | time="2017-04-18T09:24:56Z" level=info
msg="orbiter started"
orbiter.1.zop1qkwa1qxy@node1    | time="2017-04-18T09:24:56Z" level=debug
msg="Daemon started in debug mode"
orbiter.1.zop1qkwa1qxy@node1    | time="2017-04-18T09:24:56Z" level=info
msg="Starting in auto-detection mode."
orbiter.1.zop1qkwa1qxy@node1    | time="2017-04-18T09:24:56Z" level=info
msg="Successfully connected to a Docker daemon"
orbiter.1.zop1qkwa1qxy@node1    | time="2017-04-18T09:24:56Z" level=debug
msg="autodetect_swarm/micro added to orbiter. UP 1, DOWN 1"
orbiter.1.zop1qkwa1qxy@node1    | time="2017-04-18T09:24:56Z" level=info
msg="API Server run on port :8000"

As you can see into the logs the API are running on port 8000 and orbiter already detected a service called micro, the one that we deployed before and it auto-created a autoscaling group called autodetection_swarm/micro. This is the unique name that we can use when we trigger our scale request.

$ curl -d '{"direction": true}' -v
http://10.0.57.3:8000/handle/autodetect_swarm/micro
*   Trying 10.0.57.3...
* TCP_NODELAY set
* Connected to 10.0.57.3 (10.0.57.3) port 8000 (#0)
> POST /handle/autodetect_swarm/micro HTTP/1.1
> Host: 10.0.57.3:8000
> User-Agent: curl/7.52.1
> Accept: */*
> Content-Length: 19
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 19 out of 19 bytes
< HTTP/1.1 200 OK
< Content-Type: application/json
< Date: Tue, 18 Apr 2017 09:30:35 GMT
< Content-Length: 0
<
* Curl_http_done: called premature == 0
* Connection #0 to host 10.0.57.3 left intact

With that cURL I simulated a scale request and as you can see in the log above orbiter detected the request and it scaled up 1 task for our service called macro

$ docker service logs orbiter
orbiter.1.zop1qkwa1qxy@node1    | POST /handle/autodetect_swarm/micro HTTP/1.1
orbiter.1.zop1qkwa1qxy@node1    | Host: 10.0.57.3:8000
orbiter.1.zop1qkwa1qxy@node1    | Accept: */*
orbiter.1.zop1qkwa1qxy@node1    | Content-Length: 19
orbiter.1.zop1qkwa1qxy@node1    | Content-Type:
application/x-www-form-urlencoded
orbiter.1.zop1qkwa1qxy@node1    | User-Agent: curl/7.52.1
orbiter.1.zop1qkwa1qxy@node1    |
orbiter.1.zop1qkwa1qxy@node1    | {"direction": true}
orbiter.1.zop1qkwa1qxy@node1    | time="2017-04-18T09:30:35Z" level=info
msg="Received a new request to scale up micro with 1 task." direc
tion=true service=micro
orbiter.1.zop1qkwa1qxy@node1    | time="2017-04-18T09:30:35Z" level=debug
msg="Service micro scaled from 3 to 4" provider=swarm
orbiter.1.zop1qkwa1qxy@node1    | time="2017-04-18T09:30:35Z" level=info
msg="Service micro scaled up." direction=true service=micro

We can verify the current number of tasks that are running for micro and we can see that it’s not 3 as before but 4.

$ docker service ls
ID                  NAME                MODE                REPLICAS
IMAGE
azi8zyeor5eb        micro               replicated          4/4
gianarb/micro:1.0.0
ezklgb6uak8b        orbiter             replicated          1/1
gianarb/orbiter:latest

This project is open source on github.com/gianarb/orbiter you can have a look on it, try and leave some feedback or request if you need something different.

PR are also open if you are working with a different cluster manager or with a different provider, add a new one is very easy. It’s just a new interface to implement.

LinuxKit operating system built for container

2017-04-18T10:08:27+00:00

Linuxkit is a new project presented by Docker during the DockerCon 2017. If we look at the description of the project on GitHub:

A secure, portable and lean operating system built for containers

I am feeling already exited. I was an observer of the project when Justin Cormack and the other contributors was working on a private repository. I was invited as part of ci-wg group into the CNCF and I loved this project from the first day.

You can think about linuxkit as a builder for Linux operating system everything based on containers.

It’s a project that can stay behind your continuous integration system to allow us to test on different kernel version and distribution. You can a light kernels with all the services that you need and you can create different outputs runnable on cloud providers as Google Cloud Platform, with Docker or with QEMU.

Continuous delivery, new model

I am not really confident about Google Cloud Platform but just to move over I am going to do some math with AWS as provider. Let’s suppose that I have the most common continuous integration system, one big box always up an running configured to support all your projects or if you are already good you are running containers to have separated and isolated environment.

Let’s suppose that you Jenkins is running all times on m3.xlarge:

m3.xlarge used 100% every months costs 194.72$.

Let’s have a dream. You have a very small server with just a frontend application for your CI and all jobs are running in a separate instance, tiny as a t2.small.

t2.small used only 1 hour costs 0.72$ .

I calculated 1 hour because it’s the minimum that you can pay and I hope that your CI job can run for less than 1 hour. Easy math to calculate the number of builds that you need to run to pay as you was paying before.

194.72 / 0.72 ~ 270 builds every month.

If you are running less than 270 builds a months you can save some money too. But you have other benefits:

More jobs, more instances. Very easy to scale. Easier that Jenkins master/slave and so on.
How many times during holidays your Jenkins is still up and running without to have nothing to do? During these days you are just paying for the frontend app.

And these are just the benefit to have a different setup for your continuous delivery.

LinuxKit CI implementation

There is a directory called ./test that contains some linuxkit use case but I am going to explain in practice how linuxkit is tested. Because it uses itself, awesome!

In first you need to download and compile linuxkit:

git clone github.com:linuxkit/linuxkit $GOPATH/src/github.com/linuxkit/linuxkit
make
./bin/moby

You can move it in your $PATH with make install.

$ moby
Please specify a command.

USAGE: moby [options] COMMAND

Commands:
  build       Build a Moby image from a YAML file
  run         Run a Moby image on a local hypervisor or remote cloud
  version     Print version information
  help        Print this message

Run 'moby COMMAND --help' for more information on the command

Options:
  -q    Quiet execution
  -v    Verbose execution

At the moment the CLI is very simple, the most important commands are build and run. linuxkit is based on YAML file that you can use to describe your kernel, with all applications and all the services that you need. Let’s start with the linuxkit/test/test.yml.

kernel:
  image: "mobylinux/kernel:4.9.x"
  cmdline: "console=ttyS0"
init:
  - mobylinux/init:8375addb923b8b88b2209740309c92aa5f2a4f9d
  - mobylinux/runc:b0fb122e10dbb7e4e45115177a61a3f8d68c19a9
  - mobylinux/containerd:18eaf72f3f4f9a9f29ca1951f66df701f873060b
  - mobylinux/ca-certificates:eabc5a6e59f05aa91529d80e9a595b85b046f935
onboot:
  - name: dhcpcd
	image: "mobylinux/dhcpcd:0d4012269cb142972fed8542fbdc3ff5a7b695cd"
	binds:
	 - /var:/var
	 - /tmp:/etc
	capabilities:
	 - CAP_NET_ADMIN
	 - CAP_NET_BIND_SERVICE
	 - CAP_NET_RAW
	net: host
	command: ["/sbin/dhcpcd", "--nobackground", "-f", "/dhcpcd.conf", "-1"]
  - name: check
	image: "mobylinux/check:c9e41ab96b3ea6a3ced97634751e20d12a5bf52f"
	pid: host
	capabilities:
	 - CAP_SYS_BOOT
	readonly: true
outputs:
  - format: kernel+initrd
  - format: iso-bios
  - format: iso-efi
  - format: gcp-img

Linuxkit builds everythings inside a container, it means that you don’t need a lot of dependencies it’s very easy to use. It generates different output in this case kernel+initrd, iso-bios, iso-efi, gpc-img depends of the platform that you are interested to use to run your kernel.

I am trying to explain a bit how this YAML works. You can see that there are different primary section: kernel, init, onboot, service and so on.

Pretty much all of them contains the keyword image because as I said before everything is applied on containers, in this example they are store in hub.docker.com/u/mobylinux/.

The based kernel is mobylinux/kernel:4.9.x, I am just reporting what the README.md said:

kernel specifies a kernel Docker image, containing a kernel and a filesystem tarball, eg containing modules. The example kernels are built from kernel/
init is the base init process Docker image, which is unpacked as the base system, containing init, containerd, runc and a few tools. Built from pkg/init/
onboot are the system containers, executed sequentially in order. They should terminate quickly when done.
services is the system services, which normally run for the whole time the system is up
files are additional files to add to the image
outputs are descriptions of what to build, such as ISOs.

At this point we can try it. If you are on MacOS as I was you don’t need to install anything one of the runner supported by linuxkit is hyperkit it means that everything is available in your system.

./test contains different test suite but now we will stay focused on ./test/check directory. It contains a set of checks to validate how the kernel went build by LinuxKit. They are the smoke tests that are running on each new pull request created on the repository for example.

As I said everything runs inside a container, if you look into the check directory there is a makefile that build a mobylinux/check image, that image went run in LinuxKit, into the test.yml file:

onboot:
  - name: check
	image: "mobylinux/check:c9e41ab96b3ea6a3ced97634751e20d12a5bf52f"
	pid: host
	capabilities:
	 - CAP_SYS_BOOT
	readonly: true

You can use the Makefile inside the check directory to build a new version of check, you can just use the command make.

When you have the right version of your test we can build the image used by moby:

cd $GOPATH/src/github.com/linuxkit/linuxkit
moby build test/test.yml

Part of the output is:

Create outputs:
  test-bzImage test-initrd.img test-cmdline
  test.iso
  test-efi.iso
  test.img.tar.gz

And if you look into the directory you can see that there are all these files into the root. These files can be run from qemu, google cloud platform, hyperkit and so on.

moby run test

On MacOS with this command LinuxKit is using hyperkit to start a VM, I can not copy paste all the output but you can see the hypervisor logs:

virtio-net-vpnkit: initialising, opts="path=/Users/gianlucaarbezzano/Library/Containers/com.docker.docker/Data/s50"
virtio-net-vpnkit: magic=VMN3T version=1 commit=0123456789012345678901234567890123456789
Connection established with MAC=02:50:00:00:00:04 and MTU 1500
early console in extract_kernel
input_data: 0x0000000001f2c3b4
input_len: 0x000000000067b1e5
output: 0x0000000001000000
output_len: 0x0000000001595280
kernel_total_size: 0x000000000118a000
booted via startup_32()
Physical KASLR using RDRAND RDTSC...
Virtual KASLR using RDRAND RDTSC...

Decompressing Linux... Parsing ELF... Performing relocations... done.
Booting the kernel.
[    0.000000] Linux version 4.9.21-moby (root@84baa8e89c00) (gcc version 6.2.1 20160822 (Alpine 6.2.1) ) #1 SMP Sun Apr 9 22:21:32 UTC 2017
[    0.000000] Command line: earlyprintk=serial console=ttyS0
[    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[    0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
[    0.000000] x86/fpu: Using 'eager' FPU context switches.
[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000003fffffff] usable

When the VM is ready LinuxKit is starting all the init, onboot, the logs is easy to understand as the test.yml is starting containerd, runc:

init:
  - mobylinux/init:8375addb923b8b88b2209740309c92aa5f2a4f9d
  - mobylinux/runc:b0fb122e10dbb7e4e45115177a61a3f8d68c19a9
  - mobylinux/containerd:18eaf72f3f4f9a9f29ca1951f66df701f873060b
  - mobylinux/ca-certificates:eabc5a6e59f05aa91529d80e9a595b85b046f935
onboot:
  - name: dhcpcd
	image: "mobylinux/dhcpcd:0d4012269cb142972fed8542fbdc3ff5a7b695cd"
	binds:
	 - /var:/var
	 - /tmp:/etc
	capabilities:
	 - CAP_NET_ADMIN
	 - CAP_NET_BIND_SERVICE
	 - CAP_NET_RAW
	net: host
	command: ["/sbin/dhcpcd", "--nobackground", "-f", "/dhcpcd.conf", "-1"]
  - name: check
	image: "mobylinux/check:c9e41ab96b3ea6a3ced97634751e20d12a5bf52f"
	pid: host
	capabilities:
	 - CAP_SYS_BOOT
	readonly: true

Welcome to LinuxKit

			##         .
		  ## ## ##        ==
		   ## ## ## ## ##    ===
	   /"""""""""""""""""\___/ ===
	  ~~~ {~~ ~~~~ ~~~ ~~~~ ~~~ ~ /  ===- ~~~
	   \______ o           __/
		 \    \         __/
		  \____\_______/


/ # INFO[0000] starting containerd boot...                   module=containerd
INFO[0000] starting debug API...                         debug="/run/containerd/debug.sock" module=containerd
INFO[0000] loading monitor plugin "cgroups"...           module=containerd
INFO[0000] loading runtime plugin "linux"...             module=containerd
INFO[0000] loading snapshot plugin "snapshot-overlay"...  module=containerd
INFO[0000] loading grpc service plugin "healthcheck-grpc"...  module=containerd
INFO[0000] loading grpc service plugin "images-grpc"...  module=containerd
INFO[0000] loading grpc service plugin "metrics-grpc"...  module=containerd

The last step is the check that runs the real test suite:

kernel config test succeeded!
info: reading kernel config from /proc/config.gz ...

Generally Necessary:
- cgroup hierarchy: properly mounted [/sys/fs/cgroup]
- CONFIG_NAMESPACES: enabled
- CONFIG_NET_NS: enabled
- CONFIG_PID_NS: enabled
- CONFIG_IPC_NS: enabled
- CONFIG_UTS_NS: enabled
- CONFIG_CGROUPS: enabled
- CONFIG_CGROUP_CPUACCT: enabled
- CONFIG_CGROUP_DEVICE: enabled
- CONFIG_CGROUP_FREEZER: enabled
- CONFIG_CGROUP_SCHED: enabled

........
.......

Moby test suite PASSED

			##         .
		  ## ## ##        ==
		   ## ## ## ## ##    ===
	   /"""""""""""""""""\___/ ===
	  ~~~ {~~ ~~~~ ~~~ ~~~~ ~~~ ~ /  ===- ~~~
	   \______ o           __/
		 \    \         __/
		  \____\_______/

[    3.578681] ACPI: Preparing to enter system sleep state S5
[    3.579063] reboot: Power down

The last log is the output of check-kernel-config.sh files.

If you are on linux you can do the same command but by the default you are going to use qemu an open source machine emulator.

sudo apt-get install qemu

I did some test in my Asus Zenbook with Ubuntu, when you run moby run this is the command executed with qemu:

/usr/bin/qemu-system-x86_64 -device virtio-rng-pci -smp 1 -m 1024 -enable-kvm
	-machine q35,accel=kvm:tcg -kernel test-bzImage -initrd test-initrd.img -append
	console=ttyS0 -nographic

By default is testing on x86_64 but qemu supports a lot of other archs and devices. You can simulate an arm and a rasperry pi for example. At the moment LinuxKit is not ready to emulate other architecture. But this is the main scope for this project. It’s just a problem of time. It will be able soon!

Detect if the build succeed or failed is not easy as you probably expect. The status inside the VM is not the one that you get in your laptop. At the moment to understand if the code in your PR is good or bad we are parsing the output:

define check_test_log
	@cat $1 |grep -q 'Moby test suite PASSED'
endef

./linuxkit/Makefile

Explain how linuxkit tests itself at the moment is the best way to get how it works. It is just one piece of the puzzle, if you have a look here every pr has a GitHub Status that point to a website that contains logs related that particular build. That part is not managed by linuxkit because it’s only the builder used to create the environment. All the rest is managed by datakit. I will speak about it probably in another blogpost.

Conclusion

runc, docker, containerd, rkt but also Prometheus, InfluxDB, Telegraf a lot of projects supports different architecture and they need to run on different kernels with different configuration and capabilities. They need to run on your laptop, in your IBM server and in a Raspberry Pi.

This project is in an early state but I understand why Docker needs something similar and also, other projects as I said are probably going to get some benefits from a solution like this one. Have it open source it’s very good and I am honored to be part of the amazing group that put this together. I just did some final tests and I tried to understand how it’s designed and how it works. This is the result of my test. I hope that can be helpful to start in the right mindset.

My plan is to create a configuration to test InfluxDB and play a bit with qemu to test it on different architectures and devices. Stay around a blogpost will come!

Some Links:

Reviewers: Justin Cormack

get "Docker the Fundamentals" by. Drive your boat as a Captain

You can get the Chapter 2 of the book "Drive your boat as a Captain" just leave click on the cover and leave your email to receive a free copy.

This chapter is getting started with Docker Engine and the basic concept around registry, pull, push and so on. It's a good way to start from zero with Docker.

Containers why we are here

2017-03-12T08:08:27+00:00

“It is change, continuing change, inevitable change, that is the dominant factor in society today. No sensible decision can be made any longer without taking into account not only the world as it is, but the world as it will be… This, in turn, means that our statesmen, our businessmen, our everyman must take on a science fictional way of thinking” Asimov, 1981

Isolation and Virtualization

I can see clearly two kind of invention: the ones that allow people to do something they couldn’t do before and the ones that let them do something better. Fire, for example, gave people the chance to cook food, push away wild beasts and warm themselves up during cold nights. Many years later, electricity let people warm their houses just by pushing a button. After wheels discovery people began to travel and to trades goods, but was only with car’s invention that they might do it faster and efficiently. Similarly, the web creates a huge network, able to connect people all over the world, web application gave people tools to use and customise such a complex system. Under this perspective, container is one of the main revolution of the last years, a unique tool that helps with app management and development. Let’s discover something more about the real story of containers.

We have not a lot of documentation about why Bill Joy 18th March 1982 added chroot into the BSD probably to emulate him solutions and program is an isolated root. That’s was amazing but not enough few years later in 1991 Bill Cheswick extended chroot with security features provided by FreeBSD and implemented the “jails” and in 2000 he introduced what we know as the proper jails command now our chroots can not be anything, anywhere out of themself. When you start a process in a chroot the PID is one and there is only that process but from outside you can see all processes that are running in a chroot. Our applications can not stay in a jail! They need to communicate with outside, exchange information and so on. To solve this problem in 2002 in the kernel version 2.4.19 a group of developers like Eric W. Biederman, Pavel Emelyanov introduced the namespace feature to manage system resources like network, process and file system.

This is just a bit of history about how the ecosystem spin up, in the end of this chapter we will try to understand how why Docker arrives on the scene, but the main goal of this book is on another layer and on another complexity, we are here to understand how manage all this things in cloud and how to design a distributed system but you know the past is important to build a solid future.

All this great features are now popular under the name of container, nothing really news and this is one of the reason about why all this things are amazing! They are under the hood from a while! Solid and tested feature put together and made usable.

Nothing to say about the importance for a system to being isolated: isolation helps us to usefully manage resources, security and monitoring, in the best way, false problems creation in specific applications, often not even related to our app.

The most common solution is virtualization: you can use an hypervisor to create virtual server in a single machine. There are different kind of virtualization:

Full virtualization
Para virtualization like Virtual Machine, Xen, VMware
Operating System virtualization like Containers
Application virtualization like JVM.

img from fntlnz’s blog. Thanks

The main differences between them is how they abstract the layers, application, processing, network, storage and also about how the superior level interact with underlying level. For example into the Full virtualization the hardware is virtualized, into the para virtualization not.

Container is an operation-system-level virtualization. The main difference between Container and Virtual Machine is the layer: the first works on the operating system, the second on the hardware layer.

When we speak about container we are focused on the application virtualization and on a specific feature provided by the kernel called Linux Containers (LXC): what we do when we build containers is create new isolated Linux systems into the same host, it means that we can not change the operation system for example because our virtualization layer doesn’t allow us to run Linux containers out of Linux.

The reasons

Revolutions are not related to a single and specific event but come from multiple movements and changes: Container is just a piece of the story.

Cloud Computing allowed us to think about our infrastructure as an instable number of servers that can scale up and down, in a reasonable short amount of time, with less money and without the investment requested to manage a big infrastructure made of more than one datacenter across the world.

As a consequence, applications that had been in a cellar, now are on Amazon Web Service, with a load balancer and maybe different availability zone. This allowed little teams and medium companies, without datacenter and infrastructures, to think about concept like distribution, high availability, redundancy. Evolution never stop .

Once our applications are running in few virtual machines, our business will grow up so we start to scale up and down this servers to serve all our users. We experimented few benefits but also a lot of issues related, for example, to the time requested for managing this dynamism; moreover big applications are usually more expensive to scale.

Our application can only grow but the deploy can be really expensive. We discovered that the behavior of an application is not the same across all of our services and entrypoint, because few of them receive more traffic that others. So, we started to split our big applications in order to make them easy to scale and monitor. The problem was that, in order to maintain our standard, we need to find a way to keep them isolated, safe and able to communicate each others.

The Microservices Architecture arrived and companies like Netflix, Amazon, Google and others counts hundreds and hundreds of little and specific of services that together work to serve big and profitable products. Netlix is one of first companies that started sharing the way they build Netlix.com: with more that 400 microservices, they managed feature like registration, streaming, rankins and all what the application provides. At the moment, Containers are the best solution for managing a dense and dynamic environment with a good control, security and for moving your application between servers.

Reviewers: Arianna Scarcella, Jenny Burcio

About your images, security tips

2016-12-28T08:08:27+00:00

Everything unnecessary in your system could be a very stupid vulnerability. We already spoke about this idea in the capability chapter and the same rule exists when we build an image. Having tiny images with only what our application needs to run is not just a goal in terms of distribution but also in terms of cost of maintenance and security. If you have some small experience with docker already you probably know the alpine image. It is build from the Alpine distribution and it’s only 5MB size, if your application can run inside it then this is a very good optimization that you can do. What about your binaries? Can your application run standalone? If the answer is yes you can think about a very very minimal image. scratch is usually used as a base for other images like debian and ubuntu but you can also use it to run your golang binary and let me show you something with our micro application. In the release page, there are a list of binaries already compiled and ready to be used. In this case we can download the linux_386 binary.

curl -SsL https://github.com/gianarb/micro/releases/download/1.0.0/micro_1.0.0_linux_386 > micro

And we know we can include this binary in the scratch image with this Dockerfile

FROM scratch

ADD ./micro /micro
EXPOSE 8000

CMD ["/micro"]

docker build -t micro-scratch .
docker run -p 8000:8000 micro-scratch

The expectation is an http application on port 8000 but the main difference is the size of the image, the old one from alpine is 12M the new one is 5M.

The scratch image is impossibile to use with all applications but if you have a binary you can remove a lot of unused overhead.

Another way to understand the status of your image is to scan it to detect security vulnerabilities or exposures. Docker Hub and Docker Cloud can do it for private images. This is a great feature to have in your pipeline to scan an image after a build.

CoreOS provides an open source project called clair to do the same in your environment.

It is an application in Golang that exposes a set of HTTP API to pull, push and analyse images. It downloads vulnerabilities from different sources like Debian Security Tracker or RedHat Security Data. Each vulnerability is stored in Postgres. Clair works like static analyzer, this means that it doesn’t need to run our container to scan it but it persists different checks directly into the filesystem of the image.

docker run -it -p 5000:5000 registry

With this command we are running a private registry to use as a source for the image to scan

docker pull gianarb/micro:1.0.0
docker tag gianarb/micro:1.0.0 localhost:5000/gianarb/micro:1.0.0
docker push localhost:5000/gianarb/micro:1.0.0

Now that we pushed in our private repo the micro image we can setup clair.

mkdir $HOME/clair-test/clair_config
cd $HOME/clair-test
curl -L https://raw.githubusercontent.com/coreos/clair/v1.2.2/config.example.yaml -o clair_config/config.yaml
curl -L https://raw.githubusercontent.com/coreos/clair/v1.2.2/docker-compose.yml -o docker-compose.yml

Modify $HOME/clair_config/config.yml and add the proper source postgresql://postgres:password@postgres:5432?sslmode=disable

Now you can run the following command to start postgres and clair:

docker-compose up

To make our test easier, we will use another CLI called hyperclair that is just a client to work with this application. If you are using Mac OS, you can follow the above commands, if you are in another OS you can find the correct url in the release page

curl -SSl https://github.com/wemanity-belgium/hyperclair/releases/download/0.5.2/hyperclair-darwin-386 > ~/hyperclair
chmod 755 ~/hyperclair

Now we have an executable in ~/hyperclair

~/hyperclair pull localhost:5000/gianarb/micro:1.0.0
~/hyperclair push localhost:5000/gianarb/micro:1.0.0
~/hyperclair analyze localhost:5000/gianarb/micro:1.0.0
~/hyperclair report localhost:5000/gianarb/micro:1.0.0

The generated report looks like this:

Hyperclair is just one of the implementations of clair, you can decide to use it or build your own implementation in your pipeline.

Docker registry to ship and manage your containers.

2016-12-14T08:08:27+00:00

Build and Run containers is important but ship them out of your laptop is the best part! The Registry is a very important tool that requires a bit more attention. A Registry is used to store and manage your images and all your layers. You can use a storage to upload and download them across your servers and to share them with your colleagues.

The most popular one is hub.docker.com it contains different kind of images: public, official and private. You can create an account and push your images or build them for example from a github or bitbucket repository. The integration with GitHub and Bitbucket is called “Automated Builds”. it allows you to create a continuous integration environment for your images, when you select “Create” and “Automated Builds” you can specify a repository and a path of your Dockerfile. You can specify more that one path from the same repository to build more that one image tag. In this way you can centralize and build your images every time that a new change is pushed into the repository. It also supports organizations to split your images in different groups and manage visibility of them in case of private images.

By default any developer can push their image to registry and they’ll be public and free for other developers to use. Official images are those public images selected and maintained from specific organization or member of the communities, the idea is that they have a better quality or who provides them are usually involved into the product development. A set of official images are: Nginx, Redis, MySql, PHP, Go and so on https://hub.docker.com/explore.

Docker Hub offers different plan to store private images, all people has one for free but if you need more you can pay a plan and store more.

Registry is not just a tool but it’s a specification, it describe how expose capabilities has pull, push, search and so on. This solution allowed the ecosystem to implement these rules in other projects and save the compatibility with the Docker Client and with the other runtime engine that use this capability. It’s for this reason that other providers as Kubernetes, Cloud Foundry supports download from Docker Hub. This specification has 2 version, v1 and v2 the most famous registries implement both standard and they fallback from v2 to v1 for features that are not supported yet. For example Search is not supported at the moment into the v2 but only in v1.

If you are looking for an In-House solution you have different tools available online. The first one is distribution. It is provided by Docker, it’s open source and offers a very tiny registry that you can start and store in your server. It also supports different storage like the local filesystem and S3. This feature is very interesting because usually the size of the images and the number of layers increase very fast and you also need to keep them safe with backup and redundancy policies for high availability. This is very important if your environment is based on containers it means that your register is a core part of your company. Let’s start a Docker Distribution:

$ docker run -d -p 5000:5000 --name registry registry:2

In docker the default registry is hub.docker.com it means that when we push or pull an image we are reaching this registry:

$ docker pull alpine

To push our images in another registry we need to tag them:

$ docker tag alpine 127.0.0.1:5000/alpine

With this command you tagged the alpine to a registry 127.0.0.1:5000 because as we said in previous chapters the name of the image contains a lot of information:

REGISTRY/NAME:VERSION

The default registry is hub.docker.com a name could be simple as alpine or with a username matt/alpine and you can pin a specific build with a version you can use semver or for example the sha of the commit the default VERSION is latest.

Now that we have a new tag we can push and pull it in from our registry:

$ docker push 127.0.0.1:5000/alpine
$ docker pull 127.0.0.1:5000/alpine

A very important information to remember when you start a customer registry is that every layers, every build is stored and it’s very easy to have a big registry, you need to monitor the instance to be sure that your server has enough disks space and also take care about high availability. In a real environment the registry it the core of your infrastructure, developers use it to pull and push build and also to put a version in production. Take care of your registry.

Other that Docker provided registry there are few alternatives. Nexus is a registry manager that support a lot of languages and packages if you are a Java developer you know it. Nexus supports Docker Registry API v1 and v2. The Docker registry specification is young but it has 2 version already.

We can use the image provided by Sonatype and start our Nexus repository:

$ docker run -d -p 8082:8082 -p 8081:8081 \
    -v /tmp/sonata:/sonatype-work --name nexus sonatype/nexus3
$ docker logs -f nexus

When our log tells us that Nexus is ready we can reach the ui from our browser http://localhost:8081/ or with the IP of your Docker Machine if you are using Docker for Mac/Windows or Docker in Linux. The default credentials are username admin and password admin123.

First of all we need to create a new Hosted Repository for Docker, we need to press the Settings Icon top left of the page, Repositories and Create Repository. I called mine mydocker and you need to specify an HTTP port for that repository, we shared port 8082 during the run and for this reason I chose 8082.

Nexus has different kind of repositories Host means that it’s self hosted but you can also create a Proxy Repository to proxy for example the official Docker Hub. Now we need to login to out docker registry:

$ docker login 127.0.0.1:8082

Now we can tag an alpine and push the tag into the repository

$ docker tag alpine 127.0.0.1:8082/alpine
$ docker push 127.0.0.1:8082/alpine

You can go in Assets click on mydocker repository and see that your image is correctly stored.

GitLab has also a container registry. GitLab uses it to manage build and it’s available for you from version 8.8 if you are already using this tool.

Thanks Kishore Yekkanti, Giulio De Donato for your review.

get "Docker the Fundamentals" by. Drive your boat as a Captain

You can get the Chapter 2 of the book "Drive your boat as a Captain" just leave click on the cover and leave your email to receive a free copy.

This chapter is getting started with Docker Engine and the basic concept around registry, pull, push and so on. It's a good way to start from zero with Docker.

Continuous Integration and silent checks. You are looking in the wrong place

2016-11-18T10:08:27+00:00

Continuous Integration is a process of merging all developer working copies to shared mainline several times a day. In practice is when you have in place a system that allow you to trust all changes that all developers are doing in a short period of time in order to have that code complaint and ready to be pushed in production.

There are a lot of different way to do CI but I will stay focused on a very important expect, you need a policy that contains a series of checks that you can easy automate. All this steps persisted on every change allow you to mark that new code as ready.

Automation is an important part to keep your integration continuous, usually what people do is a human review of the code, if one or more people mark your code as complaints and the continuous integration system is agree with them your code can be merged. This is the unique manual step.

But let’s talk about what I call “Silent Checks” they are really one of the best invention that I never saw. Silent Checks are like cigarettes, all know that they are not so good but nobody cares.

Usually your CI system use exit code to understand if a check is good or bad, your command come back with 0 in case of success or with another number if something fails. Sometime you can find in your continuous integration checks that put the status code in a silent mode. The check fails but it’s not important enough.

You have a check that runs but you are not asking people to care about the result. Probably because it’s not important enough. There are few disadvantages about this approach:

That check is making your job slow.
If the job doesn’t fail no one care about that optional check and your check will never fail.
When a job fails you just need to scroll and jump over all the logs generated by the optional check. They produce a very long logs because usually they fails. There is more, usually your coworkers forget about this check and they ping you about that errors.

Analyse your code is very important but there are other strategies that you can use to avoid this inconvenient. Usually the silent checks are in place in a period of migration, maybe they are important to monitor how it is going. They are just in the bad position. You can move them in a separated job, collected them and analyse what you need to analyse and monitoring trends about how your team works.

I saw a TEDx Talk by Adam Tornhill. He talked about Analyzing Software with forensic psychology. This topic is great! You can get a lot of informations about your application from who is writing that code.

Trends and monitoring not just to understand how your application works but they are fundamentals to understand how your team is working, how they feel and also to catch how your codebase is moving. They are really important and if you are strong enough to have a good monitoring system for that metric you are really in a good position! You just need to understand that insert them into the continuous integration flow is not a good idea.

Docker Bench Security

2016-11-15T10:08:27+00:00

Frequently, best practices help you to have a safe environment, docker-bench-security is an open source project that runs in a container and scans your environment to report a set of common mistakes like:

Your kernel is too old
Your docker is not up to date
Some Docker daemon configurations are not good enough to run a production environment
Your container runs 2 processes
and others

It’s a great idea to run it at some stage in each host to have an idea about the status of your environment. To do that you can just use this command when running a container

$ docker run -it --net host --pid host --cap-add audit_control \
    -v /var/lib:/var/lib \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -v /usr/lib/systemd:/usr/lib/systemd \
    -v /etc:/etc --label docker_bench_security \
    docker/docker-bench-security

A good way to start is your run it in your local environment. Run the command and check what you can do to make your local environment safe. This tool is open source on GitHub and it’s also a great example of collaboration and how a community can share experiences to help other members to improve an environment. This is a partial output:

Initializing Thu Nov 24 21:35:24 GMT 2016

[INFO] 1 - Host Configuration
[WARN] 1.1  - Create a separate partition for containers
[PASS] 1.2  - Use an updated Linux Kernel
[PASS] 1.4  - Remove all non-essential services from the host - Network
[PASS] 1.5  - Keep Docker up to date
[INFO]       * Using 1.13.01 which is current as of 2016-10-26
[INFO]       * Check with your operating system vendor for support and security maintenance for docker
[INFO] 1.6  - Only allow trusted users to control Docker daemon
[INFO]      * docker:x:999:gianarb
[WARN] 1.7  - Failed to inspect: auditctl command not found.
[WARN] 1.8  - Failed to inspect: auditctl command not found.
[WARN] 1.9  - Failed to inspect: auditctl command not found.
[INFO] 1.10 - Audit Docker files and directories - docker.service
[INFO]      * File not found
[INFO] 1.11 - Audit Docker files and directories - docker.socket
[INFO]      * File not found

Sometime to have a good result you just need to run a single command.

This article is part of “Drive your boat like a Captain”. It’s a book about Docker in production, how manage a cluster of Docker Engine with Swarm and what it means to manage a production environment today.

Keep in touch to receive news about the book scaledocker.com. If you are looking for a Docker Getting Started you can also look on the first chapter that I released Docker The Fundamentals

Chef Server startup notes

2016-11-10T10:08:27+00:00

I worked with different provisioning tools and configuration managers in the last couple of years: Chef, Saltstack, Puppet, Shell, Python, Terraform. Everything that was allowing me to make automation and describe my infrastructure as a code.

I really think that this is the correct street and every companies need to stop to persist random commands in a server:

The code used to describe your infrastructure is reausable.
The code is a good backup and you can put it in your repository to study of it changed and manage rollbacks.
Your servers become collaborative and your team can review what you do.

Chef is my first configuration manager, I started to use it with Vagrant few years ago but I never had change to deep drive into it and into the full chef-server configuration from scratch.

I had this change few days ago and I am here to share some notes. I used digitalocean to start 1 Chef Server and two nodes, during this post I am not focused about the recipe and cookbook syntax but I will share some commands and notes that I took during my test to start and configure a Chef Server.

First of all doctl is the command line application provided by digitalocean to manage droplets and everything, I used that tool to start my droplets.

The Chef Server doesn’t run in little box, we need 2gb RAM, I tried with small size but nothing was working, the installation process gone out of memory very soon. Thanks Ruby.

$ doctl compute droplet create chef-server \
  --region ams2 --size 2gb --image 20385558 \
  --access-token $DO --ssh-keys $DO_SSH

$ doctl compute droplet create n1 \
  --region ams2 --size 512mb --image 20385558 \
  --access-token $DO --ssh-keys $DO_SSH

$DO contains my digitalocean access key and $DO_SSH the id of the ssh key to log into the servers. You can leave the last one empty and you will receive an email with the password.

When the process is gone you will be able to copy the ip of the chef-server and go into it.

$ doctl compute droplet ls

ID              Name            Public IPv4     Public IPv6     Memory  VCPUs   Disk    Region  Image           Status  Tags
30cw4230        chef-server                                     2gb     1       20      ams2    Debian 8.6 x64  new
q0514230        n1                                              512     1       20      ams2    Debian 8.6 x64  new

This provisioning script installs Chef-Server from the official deb package and also install chef-manage. Chef-manage provides a nice web interface to manage users, cookbooks and everything is stored into the server.

cd /tmp
sudo apt-get update
sudo apt-get install -y unzip curl
curl -LS https://packages.chef.io/stable/ubuntu/16.04/chef-server-core_12.9.1-1_amd64.deb -o chef-server-core_12.9.1-1_amd64.deb
sudo dpkg -i chef-server-core_12.9.1-1_amd64.deb
sudo chef-server-ctl reconfigure

Our server is up an running reachable over HTTPS (port 443). This configuration is just for testing purpose. It’s not a good practice leave a Chef Server public as we are doing. It’s a better idea to close it under a VPN for example.

Chef supports an authentication and authorization level based on users and companies. We are creating a new user called Gianluca Arbezzano with username ga@thumpflow.com and as password hellociaobye. We are also create an organization and we are associating user with org.

chef-server-ctl user-create gianarb Gianluca Arbezzano ga@thumpflow.com 'hellociaobye' --filename /root/gianarb_test.pem
chef-server-ctl org-create tf 'ThumpFlow' --association_user gianarb --filename /root/tf-validator.pem
chef-server-ctl org-user-add tf gianarb

At this point we can configure a nice UI for our Chef Server with this simple commands:

sudo chef-server-ctl install chef-manage
sudo chef-server-ctl reconfigure
sudo chef-manage-ctl reconfigure --accept-license

Chef-Server works with the concept of Organization and User. The organization is a group of users that share cookbooks, rules and so on. Users can update cookbooks and there is also a set of permission to manage access on particular resources like:

Add a new node
Syncronize cookbook with the server
add new users

At this point we have one user with its own key and credential. You can come back into the UI and use username (gianarb) and password (hellociaobye) to login in. The key (–filename) is used to configure knife and encrypt communication between client and server. There are 3 main actors and this point that we need to know:

Chef-Server contains all our recipes, cookbooks and it’s the brain of the cluster.
Nodes are all servers configurated by Chef.
Workstation are usually enable to syncronize, update cookbooks. For example Jenkins or your Continuous Integration System after every new commit can push every changes into the server.

Chef Server has a HTTP api and knife is a CLI that provide an easy integration for your node and workstation. With this command we are installing knife. You can do it in your local environment, to become a workstation and into the server. (it’s usually a god practice create a user, we are doing everything as root right know but it’s BAD! don’t be bad!).

We have two certificate one is gianarb_test.pem and it’s identify a specific user, we need to generate our for every workstation/member of the team and the validation_client represent the organization, it could be the same across multiple users.

curl -O -L http://www.opscode.com/chef/install.sh
bash ./install.sh

You can copy paste the 2 keys into the local machine and run this command that will drive your into the process to create a ~/.chef/knife.rb file that your cli uses to communicate with the chef server.

knife configure

This is an example of generated knife configuration file that I did in my server. I lose times to understand the chef_server_url it contains the hostname of the server but also the `/organization/' be careful about this or knife will come back with an HTML response in your terminal.

log_level                :info
log_location             STDOUT
node_name                'gianarb'
client_key               '/root/gianarb_test.pem'
validation_client_name   'tf-validator'
validation_key           '/root/tf-validator.pem'
chef_server_url          'https://chef-server:443/organizations/tf'
syntax_check_cache_path  '/root/.chef/syntax_check_cache'
cookbook_path            ["/home/gianarb/git/chef-pluto/cookbooks"]
ssl_verify_mode          :verify_none

The last 2 commands download and validate the SSH certificate because in the default configuration the CA is unofficial and we need to force our client to trust the cert.

knife ssl fetch
knife ssl check

Know that we did that in our server and also in our local environment we can clone chef-pluto a repository that contains recipes, rules and cookbooks to configure our node, we need to syncronize it into the server.

git clone git@github.com:gianarb/chef-pluto.git /home/gianarb/git/chef-pluto/chef-pluto
cd /home/gianarb/git/chef-pluto/chef-pluto
knife update /

The last command update all our repository into the chef server. You can log in into the web ui and see the micro cookbook and the power rule.

micro is an application that I wrote in go and it just expose the ip of the machine. It’s a binary and the cookbook downloads and starts it, pretty straightforward.

At this point we need to make a provisioning of our first node, usually is the server that install and start the Chef Client into the node, what we can do it’s store a private key into the server to allow chef to connect to the node. I copied the digitalocean private key into the server (~/do), from security point of view you can create a dedicate one. You can also use the -P option if you are not using an ssh-key to run this example.

knife bootstrap <ip-node> -N node1 --ssh-user root -r 'role[power]' -i ~/do

If everything it’s good you can reach the application from port 8000 into the browser. The log is something like:

$ knife bootstrap 95.85.52.211 -N testNode --ssh-user root -r 'role[power]' -i ~/do
Doing old-style registration with the validation key at /root/tf-validator.pem...
Delete your validation key in order to use your user credentials instead

Connecting to 95.85.52.211
95.85.52.211 -----> Existing Chef installation detected
95.85.52.211 Starting the first Chef Client run...
95.85.52.211 Starting Chef Client, version 12.15.19
95.85.52.211 resolving cookbooks for run list: ["micro"]
95.85.52.211 Synchronizing Cookbooks:
95.85.52.211   - micro (0.1.0)
95.85.52.211 Installing Cookbook Gems:
95.85.52.211 Compiling Cookbooks...
95.85.52.211 Converging 2 resources
95.85.52.211 Recipe: micro::default
95.85.52.211   * remote_file[Download micro] action create_if_missing (up to date)
95.85.52.211   * service[Start micro] action start
95.85.52.211     - start service service[Start micro]
95.85.52.211
95.85.52.211 Running handlers:
95.85.52.211 Running handlers complete
95.85.52.211 Chef Client finished, 1/2 resources updated in 02 seconds

knife started the client, syncronized cookbooks, it assigned the power role at the node and run the correct recipes. Your server is ready and you can create and delete nodes to make your infrastructure complex how much you like.

Chef is quite old and it’s in ruby (the first one could be a plus but the second one no really) but it continue to be a good way to make a provisioning o your infrastructure. Lots of people moved to Ansible but the agent that they reject offer a very good orchestration feature that it’s something that I usually search.

I worked with StalStack and it’s very nice, the syntax is easy and it seems less expensive in terms of configuration, resources and setup but I am not really sure about the YAML specification. I am not a ruby developer and I don’t love the ruby syntax but in the end is a programming languages and I am doing infrastructure as a code.

Docker The fundamentals

2016-08-25T12:08:27+00:00

I am writing a book about Docker SwarmKit and how manage a production environment for your containers.

The second chapter of the book is a Getting Started about Docker, it covers basic concepts about what container means and it’s a started point to understand the concepts expressed into the book.

Drive your boat like a Captain. Docker in production

The book is work in progress but you can find more information into the site scaledocker.com.

To receive the first chapter free leave your email and if you like your twitter account:

Introduction
Install Docker on Ubuntu 16.04
Install Docker on Mac
Install Docker on Windows
Run your first HTTP application
Docker engine architect
Image and Registry
Docker Command Line Tool
Volumes and File Systems 20
Network and Links
Conclusion

Enjoy your reading and leave me a feedback about the chapter!

Be smart like your healthcheck

2016-08-25T12:08:27+00:00

I am not a doctor, I am a Software Engineer and this is a tech post! You can continue to read!

To monitor monolithic what we usually do is install a tool like Nagios to centralize all our metrics and to stay in touch with our infrastructure and our application. In a distributed system with more that one services with own metrics the situation is totally different. This about how it’s more dynamic respect a monolithic. Containers or VM that scale up and down and that move around the network, Nagios is a good solution to check if our new service after a deploy is safe and ready to be attached into the production pool? I love a talk made by Kelsey Hightower during the Monitorama event, he speak about healthcheck watch him to follow a great demo!

Healthcheck is an API that your service exposes to share it’s status, if you make it really start it’s a good tool to understand the situation of your service with just a call. A service could be ready or not and it’s in the best situation to communicate its status. It’s a like a patient, you need to ask him all what you need to make the best diagnosis and take a decision about it.

We can stay focused on a REST service, it exposes an API under the route /health. The response could has two different Status Code:

200 if all it’s good and you service is ready
500 it there is something wrong and your service is not ready

To make an smart HealthCheck what do we need to check?

This is a real implementation:

<?php
echo 1;

It’s better that nothing but we are looking for something smart! We need to check all dependendencies that our service has and it’s for this reason that the service itself is the best actor because it knows what it need to be ready. I wrote a demo service, the name is micro, it’s in go and the version 2 use mysql.

func Health(username string, password string, addr string) func(http.ResponseWriter, *http.Request) {
    return func(w http.ResponseWriter, r *http.Request) {
        res := healtResponse{Status: true}
        httpStatus := 200
        dsn := fmt.Sprintf("%s:%s@tcp(%s:3306)/micro", username, password, addr)
        ddb, err := sql.Open("mysql", dsn)
        if err != nil {
            log.Fatal(err)
        }
        if err := ddb.Ping(); err != nil {
            res.Status = false
            res.Info = map[string]string{"database": err.Error()}
        }
        c, _ := json.Marshal(res)
        if res.Status == false {
            httpStatus = 500
        }
        log.Println("%s called /health", r.Host)
        w.WriteHeader(httpStatus)
        w.Header().Set("Content-Type", "application/json")
        w.Write(c)
    }
}

Doesn’t matter how many dependencies you service has, you need to check all of them, databases, other services that it uses. In my case I decided to add a key-value field, I called it info, it contains some information about whether mysql is or is not not working, in order to make the debug flow easy. If the service that you are checking has an healthcheck you are lucky! You can use that entrypoint to know if your dependency is fine. If you are not so lucky if you can create a wrapper or just check if you can reach the service, in my case I just tried to connect to mysql in order to know if my network supports me! I also using the correct database name in order to avoid edge case like “mysql is on but the database doesn’t exist”.

The ecosystem supports healthchecks! Nginx looks it to know if a server is reachable, if the health check doesn’t work for a while it just make the server out for few times. Same for Kubernetes, Swarm and Docker. Docker provides a library in go an healthcheck framework that you can use in your applications, it is also used in Docker 1.12.

You can describe in your Dockerfile an HealthCheck

HEALTHCHECK CMD ./cli health

If the exit code is 0 Docker marks you container like healthy if it’s different like unhealthy. Very easy and flexible, you can check your REST healthcheck in this way

HEALTHCHECK --interval=30s --timeout=30s --retries=3 \
  CMD curl -si localhost:8000/health | grep 'HTTP/1.1 200 OK' > /dev/null

--interval is the timing between two healthcheck, --timeout is used to mark like unhealthy a service that doesn’t come back after 30s in this case, --retries is the attempts to do before make a container unhealthy.

HealtCheck doesn’t replace traditional monitoring system but with a lot of instances and services has a single point to check and understand the situation after a deploy make your like easy and your products stable.

Build opinions to become wise

2016-08-25T12:08:27+00:00

I am Gianluca, I am 24 years old and I work as Software Engineer.

I like my work because there are really a lot of different kinds of Software Engineer, mainly because you can work in pretty much every environment like technology, food, cook, sport, fashion.

Also because there are a lot of stuff to do, if you like to work on a product, you build features to make happy other people to buy a car or to read a news on a online news paper.

If you like play with cables, racks, switchs you can work in a big or small farm and design fancy datanceters.

But I like this work because a lot of people have opinions. There are opinions built on top of big study or after a long experience but everyone is happy to share them and use them to build a new product.

I am a good PHP developer, because I worked for a while with this language and I tried to catch and verify good opinions from a lot of developers that come from different parts of the world and from different experiences.

I am happy so share someone of them with you:

<?php
namespace Opinion\One;

class Good {

    private $important;

    public function __contruct($somethingThatIReallyNeed)
    {
        $this->important = $somethingThatIReallyNeed;
    }
}

Usually I force myself (and when I can other people :P ) to inject objects from contructor if our object (Good) can not work without them.

This is a good way to be sure that your object will be completed, because you can not forgot nothing!

I use Zend Framework from a lot of time and I remember one class ServiceLocatorAwareInterface.

I have an opion, I hate this class.

If you implement this interface your service has a service locator! It’s powerfull today that you need to finish your ticket and go away with a new feature. After months a lot of people improve this service and they start to use a lot of other services without think about for what reason you built your service. Just because it’s really simple get random services from service locator.

Be wise, don’t allow people to write bad code and use your constructor to inject your dependencies!!

I have also architecture opinions like:

When you think about “How can I resolve this problem?” you must start from design pattern! They are documented and tested from a lot of developers and in more use cases.

Imagine your colleague that come to you around 5pm to ask how an entire libraries is designed, you can just reply “It’s just a SOAL client, see you tomorrow!” and you go to play basketball.

Or if you are building an API, OAuth2 could be a good choice for your anthentication service. It’s tested, there are a lot of clients and documentations. Your clients will be good to know that you are just using Oauth2 and nothing strange.

Well, I am not here to share all my opinion in a single post, all of them are just examples of what I mean for opinion.

An opinion is really important! But it’s just an opinion!

As a Software Engineer you need to have an opinion because every day people will try to have opinion for you: Microservices, containers, one big repository, golang/rust.

To build an opinion you need to make experience and to study it’s a big effort but to be really wise you need also to stay ready to change your opinion.

In my opinion this is the main difference between smart and wise people. I prefer the second one!

Thanks for your review Lorenzo!

Watch demo about Docker 1.12 made during Docker Meetup

2016-08-24T12:08:27+00:00

In August during the Docker Meetup I presented with a demo some new features provided by Docker 1.12.

It’s an important release because it improves your experience with Docker in production with an orchestration framework included into the product.

Docker provides a new set of commands to create a cluster of Docker deamon and manage a production enviroment.

It’s something like Kubernetes, Mesos, Swarm but it is included and built-in Docker.

I wrote an article about it few months ago “Docker 1.12 orchestration built-in”.

In this demo I do an introduction of some new features like:

How create a SwarmMode docker cluster
What is a service? What tasks means?
How Docker SwarmKit manage a node down?
I tried to show the HealthCheck feature :)
How docker swarmkit manage containers update
service discovery

“Microservices and common parts"

2016-08-14T12:08:27+00:00

Changing my glossary and replacing the concept of application with service could be a buzzword but this takes me to built a new approach to my work.

Nowadays many products require more services than before to work: perhaps they could be modules, libraries directly integrate in one or more services or applications that communicate and provide a feature, it doesn’t matter cause in any case the product’ll have some dependences.

If you start to follow this path, a lot of redundant concepts will show up in your product:

Monitoring
Logging
Authentication
Scaling
High availability
Distribution
Testing (unit, functional, integration..)
And others

Some of them, like monitoring or logs, require architecture and tools selection: you can use some B2B tools or host something in house. It’s not only a problem of tooling, the other face of this “redundant money” is how your services can communicate logs, metrics with outside in a clean and reusable way.

In this post I will try to share the common part of a microservices ecosystem and some possible approach to solve this issues.

Logging

All applications require a good and strong log system. There are few libraries able to help you in managing this section but the minimum requirement, in my opinion, includes:

Support for multiple stream: usually, I use stdout or file and I move them in a database with a separate pipeline, but a lot of good libraries allow you to manage your logs in different collectors.
Different layers like: INFO, DEBUG, WARNING, FATAL.
Provide a way to change this layer runtime, for example with a RPC call.

The third point is really important: if your application start to have a big traffic, the amount of logs you must manage will be relevant; so, changing this level runtime allows you to manage the amount of logs that you store and, for example, allows DEBUG information only if you need to do some specific debugging in production. This strategy save storage and money.

There are a lot of services and open source tools able to manage and storage this data. The real issue is decide which street follow.

Are you interested to manage your logs or it’s a big effort for your company? You can move all to log entries and forgot about elastic search and kibana and similar in this case. Think about your environment and catch the best solution. Remember that it could be just a temporary solution. When you start a business you have different thoughts, start slim and easy.

Monitoring

Several services require several time and energy to be monitored and to be maintained alive.

The best way to do that is with a time series database like prometheus, InfluxDB or other as a service solution like NewRelic or AppDynamics.

The real problem is how your application can provide metrics readable and usable from external systems. You can find a very good solution to this problem in Docker: they provide different streams and events to grab this kind of informations.

If you take a lot on how it manage this part you can implement a good system in your application. A stream of events is also a good API to allow other services to enjoy features provided from your service.

Heahtcheck

Understand with a single request if your application has all what it needs to work is really important.

The microservices ecosystem contains a lot of micro applications that change and have dependencies to work. How can you understand if all your system is up and runs without spend a lot of time?

You can create for each service the call /heath that return 200 if all it’s fine and 500 if there is something that it’s not working properly.

During a release you can use this endpoint in order to understand if your service is ready to be attached into the production pool.

In practice, if you have one service called Users that depends from MySQL and from another service like Emailer, the health entrypoint for Users’s service will check whether it can connect to the MySql and also you can call /health for Emailer in order to check if the service is up.

Your orchestration and deploy framework can check after each deploy if the health is up and running and manage your release, it can revert or it doesn’t include your new release into the production pool.

Authentication

Your microservice is not public, sometime you have a set of firewall’s rules or a strong network settings to manage the security of your environment but for other services the authentication layer is a requirement and usually there are few services that need to know which is the identity of the user that is persisting an action.

Think about a To Do service, it need to know the identity of the user in order to fetch the correct items.

For this reason this layer could be common between your services and it’s also a critical section of your architecture because usually from it depends the security of your application and users.

Oauth2 is a framework to manage authentication, I recommend it because it has a documentation already done, it’s a standard. You don’t reivent anything, there are a lot of libraries and use cases about it that make it solid, flexible and reusable.

Automation and Deploy

A good layer of automation is important in every ecosystem to make your work less bored but also to decrease chance for a human to make a mistake during a repetitive task.

If you are thinking about a microservices ecosystem all this problems are multiplied for a big number of applications.

Without a good layer of automation and a good deploy’s flow you will spend all your day to put line of code in production without have time to stay focused on new features or other business’s requests.

Documentation

Describe the topology of your ecosystem,
how match microservices you have?
where they are and how they are distributed across your datacenters
Make it extensible and easy to read and update.
How a single service works?
Which APIs it expose
how another service can communicate with it.
Single dependencies for each microservices is also important to know.

All common part like, logs, auth, metrics help you to have a common documentation easy to maintain, read and implement but for each service you must provide a specific documentation because all it’s clear today but between few months when you worked on ten other services the situation could be really different.

One of the goal about microservices is the possibility to add and integrate them easily. Documentation is one of the goal to make this possible and efficient.

Communication Layer

A lot of companies have one communication layer in the environment, JSON and REST. It’s a good choice, easy to implement and there are also a lot of tools to test, document and create client libraries.

But HTTP/REST is not the unique way to expose features out of your service, this is really important to know.

There are other efficient and less expensive solution, binary protocol is one of them.

For all this topics we can stay here to speech for years for this reason I have in plan other posts to analyze some points better.

Please let me know if in your experience there are other common part between your services.

What Distributed System means

2016-07-12T16:08:27+00:00

I choose to put my experience about distributed system in a serie of blog posts on which I’ll cover different topics.

I will speak about service discovery, micro services, container, virtual machines, schedulers, cloud, scalability and latency I hope to have, at the end of this experience a good number of posts in order to share what I know and how I work and approach this kind of challenges.

In first I will not speak about nothing new, in fact distributed system means:

A distributed system consists of a collection of autonomous computers, connected through a network and distribution middleware, which enables computers to coordinate their activities and to share the resources of the system, so that users perceive the system as a single, integrated computing facility.
Wolfgang Emmerich, 1997

Internet is a distributed system, you infrastructure is usually a distributed system if you follow the minimum requirements to make high availability for your services.

In first of all I love what service means, your application is a service, microservices is just a way to remind to people that a little application is easy to maintain, deploy and control but the idea in my opinion is just make something autonomous and useful for your customers. Sometimes your customer is a human in other case could be another service provided by yourself or from a third-party, it is not really important. It’s important that your service must be ready to communicate with the extern.

Distributed your system is important to make it available, if you close your service inside a single datacenter in a single part of the world you take the risk to make it unavailable in case of problem in that particular area, if you distribute your service in different location you are increasing the chances to stay up.

You are also mitigating the latency around your system because you are bringing your application near your customers and if you have a world wide traffic that’s param is really important.

This is the map of the submarine cable (2014) and all know that internet is not in the air and serve different point in the world require different amount of time to have a response and also is not just a problem of distance but traffic and quality of the network have them weight. Akamai is a expert about this topic, he provide a service of content delivery (CDN) and also it’s a monitoring system for the status of the network, they provide different data, one of them describe the high level status of Internet.

Virtualisation, container, cloud computing and in general the low price to design an infrastructure and the growth of internet’s users allow little company with a little budget to create something of stable, secure and available in different part of worlds. I think that for this reason micro services and distributed system start to have a big impact in the industry.

A good exercise to understand the current situation could be design a little infrastructure cross provider in multi datacenter to support a normal blog, with a database and an application. With a couple of servers on different cloud providers you can create an high available and distributed system across multiple datacenter and avoid a lot of point of failure like: Geography disaster Provider errror…

Docker, openstack, AWS, Consul, Prometheus, Elasticsearch, MongoDB are just a set of products that help us to create something really stable and useful. Continue Delivery, High Availability, disaster recovery, monitoring, Continuous Integration, reliable are a subset of topic that you must resolve when you think about distributed system because you can not care about where the instances of your applications are around the world and the network is not a paradise of stability. Microservices helps you to create better and stable application, allow your company to create more rooms for more developers and to replace single pieces and features but they create other kind of problems like architecture complexity, good knowledge of in different layers (DevOps point of view), network and chain of failures. All the topics that we already know must be adapter for this new architecture, monitoring, logging, deploy.

Symfony and InfluxDB to monitor PHP application

2016-07-02T10:08:27+00:00

Symfony is one of the most famous PHP Frameworks in use right now, today we are going to use it to understand how much is important to know how one our features performs. We don’t monitor CPU usage, I/O disk or the number of server errors but we are monitoring the final feature from the business point of view. This approach is very important because understanding which is the impact of a new release on a specific, critical feature is the best way to understand how a customer use our service.

In this article we are implementing a monitor for one of the most diffuse business requirements, the authentication.

In order to understand how many people try to do a login, and track how many of them perform a wrong authentication and use this metrics to understand how the system evolves. Sometimes happens that right after a deploy the number of wrong logins grows faster than usual, this could be a sign that the feature doesn’t work as expected. We begin from the standard Symfony application

$ composer create-project symfony/framework-standard-edition influxdb_app
$ cd influxdb_app/
$ php bin/console server:run

We create a route under authorization (/admin), it is private and only admin users are allowed to see that and one public homepage (/).

You can follow the official tutorial, or this step from the Symfony application directly from GitHub. We have an admin panel and a public site, the idea is use our InfluxDB PHP SDK to understand how this feature works. We use the Dependency Injection Container (DiC) provided by Symfony to create our influxdb.client.

Go into the project’s root and use composer to install the library: composer require influxdb/influxdb-php The first things to do is add some parameters: host and port of our InfluxDB. To do that open app/config/parameters.yml and add this fields:

influxdb_host: 127.0.0.1
influxdb_port: 8086
influxdb_db: symfony_influx

We use the REST Api to send metrics to InfluxDB, please if your connection params are different change them.

The second step is configure the Symfony’s DiC in order to get our client around the application, open app/config/services.yml and add this line.

services:
    # ...
    influxdb_client:
      class: InfluxDB\Client
      arguments: ['%influxdb_host%', '%influxdb_port%']
    influxdb_database:
      class: InfluxDB\Database
      arguments: ['%influxdb_db%', '@influxdb_client']

With this specification we are asking at the DiC to provide a influxdb_client, it’s a InfluxDB\Client object with two constructor parameters: influxdb_host, influxdb_port.

InfluxDB could have different databases, influxdb_database is a service that use the influxdb_clint to work with only one database influxdb_db. Now we have a influxdb.database ready to be used! Only to try if all works fine open DefaultController and try to send a page view metrics:

/**
     * @Route("/", name="homepage")
     */
    public function indexAction(Request $request)
    {
        $result = $this->get("influxdb_database")->writePoints([new Point(
          'page_view',  // name of the measurement
          1             // the measurement value
        )]);

        // replace this example code with whatever you need
        return $this->render('default/index.html.twig', [
            'base_dir' => realpath($this->getParameter('kernel.root_dir').'/..'),
        ]);
    }

Go into the homepage and in the meantime do a query like SELECT * FROM “symfony_influx”.”“.page_view into the InfluxDB’s admin panel, you are sending a new point after each visit! Very good but we have another target! If you have some problem and you are using my repository see the difference between this and the last step on GitHub.

Sent a point in this method it’s not a good practice because our controller has two responsability: Rendering of the page Sent a point In this example the situation is not dangerous because the application is very easy and with a very low traffic, but symfony provide a strong event system, perfect to split the logic on different classes and simplify our code, we try to follow this approach for our last step, we create a listener to sent a point when an user fails a login. In first we must create a listener into src/AppBundle/Listener/MonitorAuthenticationListener.php.

<?php
namespace AppBundle\Listener;
use Symfony\Component\Security\Core\Event\AuthenticationFailureEvent;
use InfluxDB\Point;
class MonitorAuthenticationListener
{
    private $database;
    public function __construct($database)
    {
        $this->database = $database;
    }
    public function onFailure(AuthenticationFailureEvent $event)
    {
        $this->database->writePoints([new Point(
            'login',
            1,
            ['status' => 'error']
        )]);
    }
}

We use the DiC to attach this listener at the security.authentication.failure event. This event is called after each failed login. To do that open app/config/services.yml and add this configuration.

services:
    # ....
    security.authentication.monitoring:
        class: AppBundle\Listener\MonitorAuthenticationListener
        arguments: ['@influxdb_database']
        tags:
            - { name: kernel.event_listener, event: security.authentication.failure, method: onFailure }

We are injecting into the constructor our influxdb database, in this way we use it to send points like our old example into the controller. This is the last practical section of this tutorial, please if you have lost something try to check this diff from the last step on GitHub. Try to do some wrong login and check the situation into the Admin Panel with a query like

SELECT * FROM "symfony_influx"."".login.

The admin panel is not the best way to check our metrics, InfluxData provide a great dashboard called Chronograf, try to use this metric to create a graph specific to understand how your feature works. This post is only a getting started to understand a good way to send metrics without connect directly your business logic with the monitoring system, but with a real traffic this approach is totally inefficient.

Send point by point increase the traffic in your network and the latency create performance problems, telegraf is a collector that you can use to mitigate this problem, in this way you can not send your points directly to InfluxDB but you can use this agent installed on your server that collect and send bulk of data for you.

Docker 1.12 orchestration built-in

2016-06-20T10:08:27+00:00

Some tests with Docker 1.12! https://t.co/budUOtMuBB #docker #DockerCon orchestration, swarm and services.
— Gianluca Arbezzano (@GianArb) June 20, 2016

During the DockerCon 2016 docker announced Docker 1.12 release. One of the news stories around the new version is the orchestration system built directly inside the engine, this feature allow us to use swarm without installing it separately from outside, it’s now a feature provided by Docker directly.

Now we have a new set of commands that allow us to orchestrate containers across a cluster.

docker swarm
docker node
docker service

All these commands are focused on increasing our ability to orchestrate our containers and also join them in services.

#!/bin/bash

# create swarm manager
docker-machine create -d virtualbox sw1
echo "sudo /etc/init.d/docker stop && \
    curl https://test.docker.com/builds/Linux/x86_64/docker-1.12.0-rc2.tgz | \
    tar xzf - && sudo mv docker/* /usr/local/bin && \
    rm -rf docker/ && sudo /etc/init.d/docker start" | \
    docker-machine ssh sw1 sh -
docker-machine ssh sw1 docker swarm init

# create another swarm node
docker-machine create -d virtualbox sw2
echo "sudo /etc/init.d/docker stop && \
    curl https://test.docker.com/builds/Linux/x86_64/docker-1.12.0-rc2.tgz | \
    tar xzf - && sudo mv docker/* /usr/local/bin && \
    rm -rf docker/ && sudo /etc/init.d/docker start" | \
    docker-machine ssh sw2 sh -
docker-machine ssh sw2 docker swarm join $(docker-machine ip sw1):2377

another Captain wrote this script that I just updated to work with the public Docker 1.12-rc2. We can use this script to create a cluster with virtual box ready to be used. After this script you can see the number of workers and masters, in this case your one and one.

$ docker node ls

Docker 1.12 has a built-in set of primitive functions to orchestrate your containers just like a summary. The main commands that you must run to create a cluster are

## On the master to start your cluster
$ docker swarm init --listen-addr <master-IP(this ip)>:2377
## on each node to add it into the cluster
$ docker swarm join <master-ip>:2377

If you are not confident with docker swarm this is the architecture, this graph is provided by Docker Inc. and explains really well the design around this project. The principal actors are managers and workers, managers are the brains of the system, they dispatch schedules and remember services and containers. Workers execute these commands.

The cluster is secure because each node has a proper TLS identity and all communications are encrypted end to end by default with a automatic key rotation in order to increase the security around the keys use in the cluster.

Raft is the consensual protocol used to distribute message around the cluster and check the number of nodes, it’s complex algorithm but really interested I have in plan another article about it but the offical site contains a lot of details about it.

We already saw the concept of services in docker-compose they are a single or a group of containers to describe your ecosystem, you can scale a specific service or orchestrate it across your cluster. It’s the same here, you don’t have a specification file like compose at the moment but anyway you can run a bunch of commands to create your service.

$ docker service create --name helloworld --replicas 1 alpine ping docker.com

With this example we push up a new service helloworld. It has one container from the alpine image and it pings docker.com site.

docker service ls

To watch all our services, we can also inspect a service

docker service inspect <container_id>

There is a new concept, when you run a service you are also creating a task, this task represents the container/s under your service, in this case we have just one task

docker service tasks helloworld

When you scale your service you are creating new tasks

docker service scale helloworld=10

Now you can see 10 tasks that are running and you can inspect one of them, inside you can find the containerId and you can, for example, follow logs

22:17 $ docker inspect  6fhfse4it8lwzlsk1t5sd5jbk
[
    {
        "ID": "6fhfse4it8lwzlsk1t5sd5jbk",
        "Version": {
            "Index": 67
        },
        "CreatedAt": "2016-06-18T21:06:36.707664178Z",
        "UpdatedAt": "2016-06-18T21:06:39.241942781Z",
        "Spec": {
            "ContainerSpec": {
                "Image": "alpine",
                "Args": [
                    "ping",
                    "docker.com"
                ]
            },
            "Resources": {
                "Limits": {},
                "Reservations": {}
            },
            "RestartPolicy": {
                "Condition": "any",
                "MaxAttempts": 0
            },
            "Placement": {}
        },
        "ServiceID": "24e0pojscuj2irvlxvx2baiid",
        "Slot": 2,
        "NodeID": "55v4jjzf56mcwnhbwvn4cq1rs",
        "Status": {
            "Timestamp": "2016-06-18T21:06:36.7110425Z",
            "State": "running",
            "Message": "started",
            "ContainerStatus": {
                "ContainerID": "4ec69142e3e886098915140663737f4176c6de5afe9f2fad1f5b2439d8fc336d",
                "PID": 3627
            }
        },
        "DesiredState": "running"
    }
]
22:17 $ docker logs -f 6fhfse4it8lwzlsk1t5sd5jbk

At this point it is a normal container and it’s running on your cluster. Well I tried to explain the main concept around this big feature provided by Docker 1.12, the last example is just to cover the DNS topic.

I created an application that serve an http server and print the current IP. Each server has an internal load balancer that dispatches traffic in round robin between the different tasks. In this way it’s totally transparent, you can just resolve your service with a normal URL, docker will do the rest for you.

$. docker service create —-name micro —-replicas 10 —-publish 8000/tcp gianarb/micro

Micro is an application that exposes an http server on port 8000 and print the current ip, now we have 10 tasks with this service. To grab the current entry point for our service we can inspect it and search for this information:

$. docker service inspect <id-service>

...
      "Endpoint": {
            "Spec": {},
            "Ports": [
                {
                    "Protocol": "tcp",
                    "TargetPort": 8000,
                    "PublishedPort": 30000
                }
            ],
            "VirtualIPs": [
                {
                    "NetworkID": "890fivvc6od3pa4rxd281lobb",
                    "Addr": "10.255.0.5/16"
                }
            ]
       }
...

In this case our published port is 3000, we can call :3000 to resolve our service, if you try to do multi request you can see your IP chances because the internal DNS is calling different containers.

This is just an overview about the features but there are other powerful news like DAB, stacks and how do an easy update of your containers, this could be the topic around my next article. Please stay in touch follow me on Twitter to chat and receive news about the next articles.

Thanks @gpelly for your review!

A little bit of refactoring

2016-04-24T10:08:27+00:00

I wrote this note during the PHPKonf, I spoke about Jenkins and Continuous Delivery but during the days I followed few interested talks and this is my list of notes.

Is the code reusable? For a few people the response is yes, for other ones is no. I agree with Ocramius that the response is no. An abstraction is reusable, an interface is reusable, but it’s very hard to reuse a final implementation. First of all because when you finish to write it your code is already old, your function is already legacy and you start to maintain it, you search bugs and edge cases.

One of the way to reduce the time dedicated to do refactroring is prevent and defende your code from bad integration, in OOP usually you have a sort of visibility (private, public, protected) and other different way to defend your code, opinable but the ocramius’s talk about Extremly defensive php is good to see.

Refactoring is a methodology to make better your code. There are different improvement topics like readable, performance, solidity.

make your code readable for the new generation is one of the best stuff that you can do to show your love for your team and your company.
If your site require more time to be loaded usually you lost your client. “less performance is a bug” cit. Fabien Potencier
When your run your code all it’s fine, you are a good developer and your feature works. After the deploy in production your code is the same but usually there is a bd category of people, your client, that will use it in a very strange way, usually it’s synonym of bug or edge case. Each bug fix make your code more solid.

Test your code before start to change it, you know automation is good but if you love seems a machine to it manually. Setup a continuous integration system, it can be do just one step like run tests but remember to increase that with all steps that you usually do to test the compliance of your code like style, standard, static analysis just to enforce that you are not a machine. Create a good environment and an automatic lifecycle for your application allow you to stay focused on the code and not lost your time around stupid task, remember that when a routine is good the machine fail less respect a human, usually.

Refactoring is one of the best stuff that you can do for other people and to make your feature ready for the real world, usually it’s hard for no-tech company understands it because few times they don’t see any kind of change create a good environment to save time and use it to do refactoring is a godo strategy. Automation is the unique method that I know to do that. There are different layer of automation just to start my 2coins is just put a make file on your codebase and when you do something for the second time stop to write it on your console and write a new make task to share with you team. After that install Jenkins and allow it do run this task for you before put on your on the master branch (for git users, trunk for svn users).

Make your development environment comfortable and increase the conformability’s perception about the lifecycle it’s the best way to do refactoring without the fear to die. If you are fear to die usually you don’t do nothing.

Great talk as always Gianluca :) @GianArb #phpkonf pic.twitter.com/ZW2G1UsXm7
— Fontana Lorenzo (@fntlnz) May 21, 2016

Add the PHPKonf in your list! See you next year!

Docker inside docker and overview about Jenkins 2

2016-04-01T10:08:27+00:00

#docker inside docker and an overview about Jenkins 2 https://t.co/qa5ddjfhrs @docker @jenkinsci #container
— Gianluca Arbezzano (@GianArb) May 4, 2016

Jenkins is one of the most famous continuous integration and deployment tools, it’s written in Java and it helps you to manage your pipeline and all tasks that help you to put your code in production or manage your build.

The announcement of Jenkins release of version 2 few days ago, is one of the best release of this year in my opinion.

The previous version is very stable but it has a lot of years and the ecosystem is totally different. I am happy to see a strong refurbishment to get the best of this powerful tool with a series of new feature like:

Nice installation wizard
Refactoring of the design, one of the most critical feature of the previous version
Good and modern set of plugins like Jenkins Pipeline to manage your build

Jenkins is truly a wonder but the tool of the moment it’s docker, engine that allow you to work easier with the containers.

This two tools together are perfect to create an isolated environment to test and deploy your applications.

The first setup could be install Jenkins on your server and use a plugin to manage the integration and trigger your test inside an isolated environment, the container.

Great work but in my opinion reproducibility is one of the critical point when you deal with plugins if you can not run your build on your local environment easily then you have a problem. Secondly if the container could be a good solution to deploy and maintain a solid and isolated application, why your Jenkins has not the privilege to run inside a container? In this perspective how can we run container inside a container?

Ok, now its the time to figure it out how to solve the problems.

We can use the official Jenkins image to put jenkins inside a container, but I worked on my personal alpine installation, light and easy, here is the dockerfile and we can pull it:

docker pull gianarb/jenkins:2.0

If you are interested the main article to understand how run docker inside docker is written by jpetazzo, the idea is run our jenkins container with -privileged enabled and share our docker binary and the socket /var/run/docker.sock to manage our communications.

/var/run/docker.sock is the entrypoint of the docker daemon
docker the command is like a client that sends commands to socket
--privileged give extended privileges to our container

Translated in a docker command:

docker run -v /var/run/docker.sock:/var/run/docker.sock \
    -v $(which docker):/usr/local/bin/docker \
    -p 5000:5000 -p 8080:8080 \
    -v /data/jenkins:/var/jenkins \
    --privileged \
    --restart always \
    gianarb/jenkins:2.0

We connect on http://docker-ip:8080 and start the new awesome wizard!

To verify that all work we can create a new job it only runs docker ps -a our expectation is the same list of containers that we have out of jenkins.

Now we can use run command from jenkins to manage our build with docker without any kind of plugins but anyway you are free to use Docker Plugin to start your build.

I used Jenkins like an example to run docker inside another container but you can use the same strategy to do the same with your applications if they require a strong connection with docker.

Happy docker's birthday and thanks

2016-03-25T10:08:27+00:00

Just a post to say thanks docker for your awesome community and happy birthday!

This week is the “Docker’s birthday week” and already it this amazing, one week of birthday, a lot of MeetUp groups this week done a Tutorial Meetup to help people to start with Docker, Dublin made it very well!

Fantastic turnout for the #dockerbday @workday. Thanks to everyone who attended and completed the voting app!! pic.twitter.com/zWQssvyHSd
— TomWillFixIT (@tomwillfixit) March 23, 2016

50 people to understand as docker and all ecosystem works and to eat a slice of cake (thanks WorkDay)

All it's ready.. We can go! #dockerbday Dublin.. Some problem? Don't worry to ask! pic.twitter.com/9qz3V9mW9y
— Gianluca Arbezzano (@GianArb) March 23, 2016

There are different kind of developers, I am happy to work and to follow each communities that provide a good tools and increase the quality of my work, I spend much time accross different team like doctrine, InfluxDB and I am very happy to see as Docker make a big effort to involve and to use its community.

I wase member of the beautiful mentor team (we will share a “retro pic” next month because we forget to do it) and I am happy to see that we done a good work.

@GianArb tnks 4 the help 2nite Gianluca!! Appreciate it #DublinDocker #dockerbday
— Delpedro (@Delpedro47) March 23, 2016

Today we seen that also che community appreciate it! Happy Birthday and see you next month!

Some days of work vs Jenkins CI

2016-02-21T10:08:27+00:00

https://t.co/02HbnkzRsS "Some Days of work vs #JenkinsCI" Little things about continuous integration #ci #dev
— Gianluca Arbezzano (@GianArb) March 14, 2016

Guys please move down your hands, I love JenkinsCI! I am not here to write a bad post about it! I am here to share few days of reasonings about continuous inteagrations, Jenkins CI and all this strong topic.

Reproducible

There are a lot of tools that you can use to run tasks, ant, make, grunt. Use it to run section or all your build on your local environment, this appraoch increase the value your tasks becuase you use and test them more times. To have a reproducible build help you to maintain splitted your flow by your runner maybe Jenkins is perfect but the are other tools and services: Travis-CI, CircleCI, Drone, don’t create a big dependency with your environment.

Speedy

A slow test suite is a bad idea, in 1 minutes I can maintain the focus on the execution but 5 minutes are a lot, you can take a coffé or start to think about another task and return on your old task require an effort. This several focus switch it’s not good, and at the same time you lost 5 minutes for each build and for every engineer after 1 week are a lot of money.

Versionable

I lost more time about this point and I am not sure if it is a required point or not but TravisCI for example use a yml specification file, this file doesn’t describe only your build but it part of the story of your application, if you include it into the VCS. Could be it a value for your pipeline?

Maintainable

There are a lot of tools that you can choice to create the perfect pipeline ant it’s very easy lost your focus and start to use too much tools, you must try all but it’s your task to create the perfect sub-set of tools the point 1 (Reusability) increase the value, use tools that you can reuse during the daily work of your team to increase the develop and go the flow better. Each tools that you add seems perfect until they don’t becase a problem.

Scalable

An easy way to decrease the time for your job is split it in different little jobs and run them in parallel, you can check the codestyle and run your test suite in the same time for example. Another good reason to create a scalable environment for your jobs is because your company would grow and the continuous integration system burns to helps it to grow and not to stop it.

Unique

Jenkins, vagrant, ant, make, drone, docker are only a list of amazing tools to create the perfect pipeline to deploy and test your code but they are only a means the goal is indeed the best pipeline for your code and for your team. Observe how your team works, which the requirments and criticalities and design the best pipeline for your use case.

Communication layer

One goal for your team is understand the status of the build without logged in any application, because enter into the Jenkins site (at first because it is not beautiful :P ) and it is another step to do other: create feature branch, submit pull request, write code lalala.. Use directly the pull request to create a connection with your job, you continuous integration system can submit a new comment or if you are working with GitHub you can use the status check, in this way you can help your colleagues during them work and remove a jump.

With JenkinsCi you can do all but if you lost more time to create your best pipeline? Maybe you don’t know it or maybe it is not the best tool for your use case. Jenkins is flexible, but the flexibility is only the number of plugins that you can install?

I don’t know I use it but I am happy to experiment and there are a lot of new technologies and tools that maybe can help us to do a good work, with or without JenkinsCI.

As a microservices

This is a summary of a pipeline, each pipeline follows this steps and from this point of view seems very easy! Jenkins, drone as very stong solution but they are all in one, if you follow this image it’s clear that maybe to create the own pipeline for your projects play with the LEGO to mount the best steps for your team and for your project it’s possible.

I am happy to share some projects to implement this approach.

Slimmer, proof of concept

I tried to create a runner for my test suite, slimmer, to implement this thought with docker and go. Go offers a lot of libraries and tools to create something in a bit of time and docker it’s perfect because it creates isolated environment and it’s very easy to scale with Swarm. In practice at the moment this console app exec a build.slimmer a bash script executable flexible and versionable. TravisCI is powerful but the YML file is it a good way to describe a build? It’s flexible? Maybe yes but I am curious to try a “low level” approach, because finally all becames a series of commands. I created also a series of agent to trigger notification quicly: ircer, slacker. You can use them to notify the result of your build.

composer install
vendor/bin/phpunit
RESULT=$?
curl -LSs https://github.com/gianarb/ircer/releases/download/0.1.0/ircer_0.1.0_linux_386 > ircer
chmod 755 ircer
if [ $RESULT = 0 ]; then
    ./ircer -j tech-team -m "You are a great develop. Your build works"
else
    ./ircer -j tech-team -m "No bad but your build doesn't work"
fi

This is an example of build.slimmer with an IRC notification, it is a PoC and I prepared a little presentation to receive some feedback and I presented it during a Dublin Go Meetup

I wait some feedback if you are interested about continuous integration and continuous delivery.

ChatOps create your IRC bot in Go

2016-02-21T10:08:27+00:00

The infrastructure as a service (IaaS) opened new ways to manage your infrastructure. Use an API to create, destroy and update your virtual machine is one of the biggest revolutions of our sector.

A lot of companies and a lot of DevOps started to create own assistence to increase the automation or to check the status of them infrastructure, in top of all GitHub provided a series of awesome blogpost and tools to describe this approach that it has a name: ChatOps.

HuBot is a beautiful tools written in node.js to provide smart bot.
So, What is ChatOps? And How do I Get Started? by PagerDuty
Say Hello to Hubot by GitHub

IRC is an application layer protocol that facilitates communication. One of the most famouse open IRC server is freenode all most important open source projects use it to chat.

This concept is already applyed it because most projects are your personal bot, for example Zend use Zend\Bot a good assistence written by DASPRiD.

The ChatOps is an assistence oriented to decrease the distance between your infrastacture and your communication channels.

I wrote a low level library to communicate on IRC protocol, we can try to use it to write our dummy bot.

package main

import (
    "log"
    "fmt"
    "regexp"
    "bufio"
    "net/textproto"
    "github.com/gianarb/go-irc"
)

func main(){
    secretary := NewBot(
        "irc.freenode.net",
        "6667",
        "SybilBot",
        "SybilBot",
        "#channel-name",
        "",
    )
    conn, _ := secretary.Connect()
    defer conn.Close()

    reader := bufio.NewReader(bot.conn)
    tp := textproto.NewReader(reader)
    for {
        line, err := tp.ReadLine()
        if err != nil {
            log.Fatal("unable to connect to IRC server ", err)
        }

        isPing, _ := regexp.MatchString("PING", line)
        if isPing  == true {
            bot.Send("PONG");
        }

        fmt.Printf("%s\n", line)
    }
}

With this code you have a bot, in this case her name is SybilBot and at the moment it suppot only the PING PONG flow, without this helth system your bot go down after few time.

You can use the same log to add other actions

yourAction, _ := regexp.MatchString("CheckSomething", line)
if yourAction  == true {
    // Do Something
}

go-irc allow you to communicate over IRC protocol, our but is very stupid I like the idea! If you are working on this topic, in go or in other language please ping me! I am very happy to know your bot!

InfluxDB PHP 1.3.0 is ready to go

2016-02-18T10:08:27+00:00

Shout out to @GianArb for shipping a new release of the InfluxDB-PHP library! Here's what's new: https://t.co/tJQIu9OCbL
— InfluxData (@InfluxDB) February 29, 2016

We are happy to annuonce a new minor release, Influxdb-php library 1.3.0.

This is a list of PRs merged in 1.3.0 since 1.2.2:

#36 Added quoting of dbname in queries
#35 Added orderBy to query builder
#37 Fixed wrong orderby tests
#38 Travis container-infra and php 7

The QueryBuilder now support the orderBy function to order our data, InfluxDB supports it from version 0.9.4.

select * from cpu order by value desc

Now you can do it in PHP

$this->database->getQueryBuilder()
  ->from('cpu')
  ->orderBy('value', 'DESC')->getQuery();

We are increase our Continuous Integration system in order to check our code with PHP7, it’s perfect!

We escape our query to support reserved keyword like database, servers personally I prefer avoid this type of word but you are free to use them.

Please we are very happy to understand as the PHP community use this library and InfluxDB, please share your experience and your problem into the repository, on IRC (join influxdb on freenode) and we wait you on Twitter.

Remeber to update your composer.json!

```json

{
    "require": {
        "influxdb/influxdb-php": "~1.3"
    }
}

A big thanks at all our contributors!

Swarm scales docker for free

2015-12-14T10:08:27+00:00

An ocean of containers! With docker and swarm.. https://t.co/1dXoZYS3ZA #docker
— Gianluca Arbezzano (@GianArb) February 8, 2016

Gourmet is a work in progress application that allow you to execute little applications on an isolated environment, it dowloads your manifest and runs it in a container. I start this application to improve my go knowledge and to work with the Docker API I am happy to share my idea and my tests with Swam an easy way to scale this type of application.

Gourmet exposes an HTTP API available at the /project endpoint that accept a JSON request body like:

{
    "img": "gourmet/php",
    "source": "https://ramdom-your-source.net/gourmet.zip",
    "env": [
        "AWS_KEY=EXAMPLE",
        "AWS_SECRET=",
        "AWS_QUEUE=https://sqs.eu-west-1.amazonaws.com/test"
    ]
}

img is the started point docker image
source is your script
env is a list of environment variables that you can use on your script

During my test I use this php script that send a message on SQS.

Your script has a console entrypoint executables in this path /bin/console and gourmet uses it to run your program.

To integrate it with Docker I used fsouza/go-dockerclient an open source library written in go.

container, err := dr.Docker.CreateContainer(docker.CreateContainerOptions{
    "",
    &docker.Config{
        Image:        img,
        Cmd:          []string{"sleep", "1000"},
        WorkingDir:   "/tmp",
        AttachStdout: false,
        AttachStderr: false,
        Env:          envVars,
    },
    nil,
})

This is a snippet that can be used to create a new container. With the container started I use the exec feature to extract your source and to run it.

exec, err := dr.Docker.CreateExec(docker.CreateExecOptions{
    Container:    containerId,
    AttachStdin:  true,
    AttachStdout: true,
    AttachStderr: true,
    Tty:          false,
    Cmd:          command,
})

if err != nil {
    return err;
}

err = dr.Docker.StartExec(exec.ID, docker.StartExecOptions{
    Detach:      false,
    Tty:         false,
    RawTerminal: true,
    OutputStream: dr.Stream,
    ErrorStream:  dr.Stream,
})

After each build Gourmet cleans all and destroies the environment.

err := dr.Docker.KillContainer(docker.KillContainerOptions{ID: containerId})
err = dr.Docker.RemoveContainer(docker.RemoveContainerOptions{ID: containerId, RemoveVolumes: true})
if(err != nil) {
    return err;
}
return nil

At the moment it is gourmet, It could be different hypothetical use cases:

high separated task
run a testsuite
dispatch specific functions

A microservice to work with docker container easily.

I thought about an easy way to scale this application and I found Swarm, it is a native cluster for docker and it seems awesome in first because it is compatibile with the docker api.

Swarm

A Docker Swarm’s cluster is very easy to setup, I worked on this project vagrant-swarm to create a local environment but the official documentation is easy to follow.

Swarm’s cluster has two actors:

A master is the entrypoint of your requests, it provide an HTTP api compatible with docker.
A series of nodes that communicate with the master.

During this example we will work with 1 master and 2 nodes. Build this machine with virtualbox , with another tool, or in cloud is not a problem and install docker.

Into the master pull swarm and create a cluster identifier.

docker pull swarm
docker run --rm swarm create
docker run --name swarm_master -d -p <manager_port>:2375 swarm manage token://<cluster_id>

swarm create returns a cluster_id use them to start the manager and the manager_ip is the ip of your master server.

Now go into the node, because we must do few things.

docker daemon -H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock
docker run -d swarm join --addr=<node_ip:2375> token://<cluster_id>

When cluster_id is the id created in the previous step and the node_id is the ip of your current node. Enter into the master and restart your manager container

docker restart swarm_master

Now we are ready to test if all it’s up.

docker -H tcp://0.0.0.0:2375 info

Replace 0.0.0.0.0 with your master ip if you are in the same server. You’ll wait this type of response

$. sudo docker -H tcp://192.168.13.1:2375 info
Containers: 1
Images: 1
Role: primary
Strategy: spread
Filters: health, port, dependency, affinity, constraint
Nodes: 2
 vagrant-ubuntu-vivid-64: 192.168.13.101:2375
  └ Status: Healthy
  └ Containers: 1
  └ Reserved CPUs: 0 / 1
  └ Reserved Memory: 0 B / 513.5 MiB
  └ Labels: executiondriver=native-0.2, kernelversion=3.19.0-43-generic, operatingsystem=Ubuntu 15.04, storagedriver=aufs
 vagrant-ubuntu-vivid-64: 192.168.13.102:2375
  └ Status: Healthy
  └ Containers: 0
  └ Reserved CPUs: 0 / 1
  └ Reserved Memory: 0 B / 513.5 MiB
  └ Labels: executiondriver=native-0.2, kernelversion=3.19.0-43-generic, operatingsystem=Ubuntu 15.04, storagedriver=aufs
CPUs: 1
Total Memory: 513.5 MiB
Name: f5e23167339e

Gourmet is a set of environment variables to create a connection with docker api, in particular this function NewClientFromEnv and the DOCKER_HOST parameter.

Docker Swarm supports the same Docker API in this way gourmet uses more nodes.

$ DOCKER_HOST="tcp://192.168.13.1:2333" ./gourmet api

Docker and wordpress for a better world

2015-12-14T10:08:27+00:00

#docker and #wordpress for a better world.. https://t.co/o9c6YXvsl3 Blogpost after my talk @CodemotionIT How and Why? @awscloud
— Gianluca Arbezzano (@GianArb) December 22, 2015

I am trying to represent a typical wordpress infrastructure

Isolation: every single wordpress share all with the others, filesystem, memory, database.

This lack of isolation causes different problems:

The monitoring of each installation is harder.
We share security problems
We don’t have the freedom to work without the fear or blocking 100 customers

We are overwhelmed by the problems

LXC Container

it is an operating-system-level virtualization environment for running multiple isolated Linux systems (containers) on a single Linux control host.

by wikipedia

Wikipedia helps me to resolve one problem (theory), container is isolated Linux System

Docker

Docker borns as wrap of LXC container but now we use an own implementation runc to serve your application ready to go in an isolate environment, with own filesystem and dependencies.

Worpdress in this implemetation has two containers, one to provide apache and php and one for mysql database. This is an example of Dockerfile, it describes how a docker container works it is very simple to understand, from this example there are different keywords

FROM describes the image that we use as start point.
RUN run a command.
EXPOSE describes ports to open during a link, in this case MySql runs on the default port 3306.
CMD is the default command used during the run console command.

FROM ubuntu
RUN dpkg-divert --local --rename --add /sbin/initctl
RUN ln -s /bin/true /sbin/initctl
RUN echo "deb http://archive.ubuntu.com/ubuntu precise main universe" > /etc/apt/sources.list
RUN apt-get update
RUN apt-get -y install mysql-server
EXPOSE 3306
CMD ["/usr/bin/mysqld_safe"]

Very easy to read, it is a list of commands! We are only write a container definition, now we can build it!

docker build -t gianarb/mysql .

In order to increase the value of this article and to use stable images I will use the official mysql and wordpress images.

Download this images

docker pull wordpress
docker pull mysql

We are ready to run all! Dockerfile is only a way to describe each single container, and the pull command downloads online container ready to work, it is a good way to reuse your or other containers.

We downloaded mysql and wordpress, with the run command we start them and we define our connections

docker run \
    --name mysql \
    -p 3306:3306 \
    -e MYSQL_ROOT_PASSWORD=passwd  mysql

docker run -e WORDPRESS_DB_HOST=wp1.database.prod \
    -e WORDPRESS_DB_USER=root \
    -e WORDPRESS_DB_PASSWORD=help_me \
    -p 8080:80 \
    -d --name wp1 \
    --link wp.database.prod:mysql wordpress

I can try to explain this commands, it run two containers:

The name of the first container is mysql and it uses the mysql image, we use -p flag to expose mysql port now you can use phpmyadmin or other client to fetch the data but remember that is not a good practice.
The second container called wp1 uses the image gianarb/wordpress forward the container port 80 (apache) on host 8080, that in this case it is the way to see the site. –link flag is the correct way to consume mysql outside the main container, in this particular case we could use wp.database.prod how url to connect at mysql from our worpdress container, awesome!
Docker image supports environment variable ENV for example we can use them to configure our services, in this case to set root password in mysql and to configure worpdress’s database connection

We are ready! Now you have a worpdress ready to go on port 8080.

Docker Compose

To save time and to increase reusability we can use docker-compose tool that helps us to manage multi-container infrastructures, in this case one for mysql and one for wordpress. In practice we can describe all work did above in a docker-compose.yml file:

wp:
  image: wordpress
  ports:
    - 8081:80
  environment:
      WORDPRESS_DB_HOST: wp1.database.prod
      WORDPRESS_DB_USER: root
      WORDPRESS_DB_PASSWORD: help_me
  links:
    - wp1.database.prod:mysql
mysql:
  image: mysql:5.7
  environment:
    MYSQL_ROOT_PASSWORD: help_me

Now we can run

docker-compose build .
docker-compose up

To prepare and start our infrastructure. Now we have one wordpress with own mysql that run on port 8081. We can change wordpress port to start new isolate wordpress installation.

via GIPHY

In Cloud with AWS ECS

We won a battle but the war is too long, we can not use our PC as server. In this article I propose AWS Elastic Container Service a new AWS service that helps us to manage containers, why this service? Because it is Docker and Docker Composer like, it’s managed by AWS, maybe there are more flexible solutions, Swarm, Kubernetes but it is a good start point.

A services of keywords to understand how it works:

Container instance: An Amazon EC2 that is running the Amazon ECS Agent. It has been registered into the ECS.
Cluster: It is a pool of Container instances
Task definition: A description of an application that contains one or more container definitions
Each Task definition running is a Task

In practice

Create a cluster

ecs-cli configure \
    --region eu-west-1 \
    --cluster wps \
    --access-key apikey \
    --secret-key secreyKey

Up nodes (one in this case)

ecs-cli up --keypair key-ecs \
    --capability-iam \
    --size 1 \
    --instance-type t2.medium

Push your first task!

ecs-cli compose --file docker-compose.yml  \
    --project-name wp1 up

Follow the status of your tasks

ecs-cli ps

You can use another docker-compose.yml with a different wordpress port to build another task with another worpdress!

Now is only a problem of URL

We are different isolated worpdress online, but they are an ip and different ports, maybe our customers would use a domain name for example. I don’t know if this solution is ready to run in production and it is good to run more and more wordpress but a good service to turn and proxy requests is HaProxy. This is an example of configuration for our use case:

wp1.gianarb.it and wp1.gianarb.it are two our customers and 54.229.190.73:8080, 54.229.190.73:8081 are our wordpress.

...
frontend wp_mananger
        bind :80
        acl host_wp1 hdr(host) -i wp1.gianarb.it
        acl host_wp2 hdr(host) -i wp2.gianarb.it
        use_backend backend_wp1 if host_wp1
        use_backend backend_wp2 if host_wp2
backend backend_wp1
        server server1 54.229.190.73:8080 check
backend backend_wp2
        server server2 54.229.190.73:8081 check

Note: This configuration increase the scalability of our system, because we can add other service in order to support more traffic.

backend backend_wp1
        server server1 54.229.190.73:8080 check
        server server1 54.229.190.12:8085 check
        server server1 54.229.190.15:80 check

There are other solutions

Nginx
Consul to increase the stability and the scalability of our endpoint

This article is based on my presentation at Codemotion 2015

Thanks for review Lorenzo! I'm in Ireland from 3 weeks but I am not ready to write an article without your english review!

FastEventManager, only an event manager

2015-11-01T00:00:00+00:00

The Event-Driven Messaging is a design pattern, applied within the service-orientation design paradigm in order to enable the service consumers, which are interested in events that occur within the periphery of a service provider, to get notifications about these events as and when they occur without resorting to the traditional inefficientpolling based mechanism. by. wiki

In PHP there are different implementation of this pattern, but I tried to write my idea. An easy to understand and to extends event manager based on regex.

Why? Because it is a good way to match strings, it is flexible and powerful. As it is smart and little and it can be used as basis for custom implementation. it resolves a regex and triggers events It supports a priority to order triggered listeners.

Install

composer require gianarb/fast-event-manager

Getting Started

<?php
require __DIR__."/vendor/autoload.php";
use GianArb\FastEventManager;
$eventManager = new FastEventManager();
$eventManager->attach("user_saved", function($event) {
});
$user = new Entity\User();
$eventManager->trigger("/user_saved/", $event);

Each listener has a priority (default = 0), it describe the order of execution

<?php
$eventManager->attach("wellcome", function() {
    echo " dev!";
}, 100);
$eventManager->attach("wellcome", function() {
    echo "Hello";
}, 345);
$eventManager->trigger("/wellcome/");
//output "Hello dev!"

I wrote this library because there are a lot of solutions that implement this pattern but they are verbose, this is only an event manager if you search other features you can extends it or you can use differents implementations. On top of this library you can write your library to build an event manager ready to use with your team in your applications.

This is a good solution because it is easy, ~31 line of code to trigger events without fear to inherit many line of codes and unused features to maintain.

Penny PHP framework made of components

2015-10-27T23:08:27+00:00

#pennyphp https://t.co/tsA2nE09GM Why and what?! o.O #php #framework to build #microservices and application "consciously"
— Gianluca Arbezzano (@GianArb) October 29, 2015

The PHP ecosystem is mature, there are a lot of libraries that help you to write good and custom applications. Too much libraries require a strong knowledge to avoid the problem of maintainability and they also open a world made on specific implementations for specific use cases.

A big framework adds a big overhead under your business logic sometimes, and some of those unused features could cause maintainability problems and chaos.

Spending too much time reading the docs could be a problem, do you think you are a system integrator and not a developer?! These are different works!

We are writing penny to share this idea. This is a middleware, event driven framework to build the perfect implementation for your specific project. The starting point we chose is made of:

Zend\Diactoros PSR-7 HTTP library
Zend\EventManager to design the application flow
PHP-DI DiC library
FastRouter because it is fast and easy to use

but we are working to replace every part of penny with the libraries perfect for your use case.

Are you curious to try this idea? We are writing a big documentation around penny. docs.pennyphp.org/en/latest

And we have a set of use cases:

pennyphp/penny-classic-app builds with plates
pennyphp/bookshelf builds with doctrine, twig
gianarb/twitter-uservice gets the last tweet from #AngularConf15 hashtag

Share your experience!

vim composer 0.3.0 is ready

2015-09-15T23:08:27+00:00

#vimForPHP https://t.co/EdczdpCrRc #php #vim Release 0.3.0 #composer plugin is ready! Thanks @sensorario for your work!
— Gianluca Arbezzano (@GianArb) September 9, 2015

I’m very happy to announce release 0.3.0 of vim-composer. This plugin builds a good integration between VIM and composer the strong dependency manager for PHP.

Changelog

#18 Added missing ComposerUpdate function
#21 Added missing CONTRIBUTING.md file
#20 Require and/or init commands

Now this plugin serve new function to require specific package. Update it and map new function :ComposerRequireFunc.

Staging environment on demand with AWS Cloudformation

2015-07-08T09:08:27+00:00

Staging environment on demand. To Work on #AWS low level with #cloudformation https://t.co/VWBR129637 #cloud #devops
— Gianluca Arbezzano (@GianArb) July 16, 2015

Staging Environment

There are few environments during my developer workflow, today I chose a little example:

Production enviroment always exists, it runs the stable application and you can not use it for your test.
Staging enviroment is a “pre-production” state.
Develop enviroment is instable and it runs new features and fixes, here there’s the work of all team but it’s not ready to go in production.

Staging environment in my opinion could be “volatile” version, we use it when our product is ready to go in production for the last time it was unused. Maybe this statement isn’t real in your work but if you think a little team of consultants that work on different projects maybe this words have a sense.

AWS Cloudformation

CloudFormation is an AWS service that helps you to orchestate all AWS services, you can write a template in JSON and you can use it to create an infrastructure with one click. This solution helps me to build and destroy this environment and we can pay it only if it’s necessary, if you use a stagin env == production env it can be very very expensive. This solution could help you to down cost.

Current infrastructure

This is my template to build a simple application Frontend + MySQL (RDS). In this implementation I build network configuration and I create one instance of RDS and one EC2 (my frontend). Parameters key is the list of external parameters that I can use to configure my template, for example database and EC2 key pair, my root’s password.. Resources key contains description of all actors of this infrastructure.

{
  "Parameters" : {
    "VPCName" : {
      "Type" : "String",
      "Default" : "staging",
      "Description" : "VPC name"
    },
    "ProjectName" : {
      "Type" : "String",
      "Default" : "app",
      "Description" : "Project name"
    },
    "WebKey" : {
      "Type" : "String",
      "Default" : "web-key",
      "Description" : "Ssh key to log into the web instances"
    },
    "WebInstanceType" : {
      "Type" : "String",
      "Default" : "m3.medium",
      "Description" : "Web instance type"
    },
    "WebInstanceImage" : {
      "Type" : "String",
      "Default" : "ami-47a23a30",
      "Description" : "Web instance image"
    },
    "DatabaseInstanceType" : {
      "Type" : "String",
      "Default" : "db.m3.medium",
      "Description" : "Database instance type"
    },
    "DatabaseName" : {
      "Type" : "String",
      "Default" : "mydb",
      "Description" : "Database instance's name"
    },
    "DatabaseMasterUsername" : {
      "Type" : "String",
      "Default" : "gianarb",
      "Description" : "Name of master user"
    },
    "DatabaseEngineVersion" : {
      "Type" : "String",
      "Default" : "5.6",
      "Description" : "MySQL version"
    },
    "DatabaseUserPassword" : {
      "Type" : "String",
      "Default" : "test1234",
      "Description" : "User password"
    },
    "DatabasePublicAccess" : {
      "Type" : "String",
      "Default" : true
    },
    "DatabaseMultiAZ" : {
      "Type" : "String",
      "Default" : false
    }
  },
  "Resources" : {
    "Staging": {
       "Type" : "AWS::EC2::VPC",
       "Properties" : {
          "CidrBlock" : "10.15.0.0/16",
          "EnableDnsSupport" : true,
          "EnableDnsHostnames" : true,
          "InstanceTenancy" : "default",
          "Tags" : [{"Key": "Name", "Value": {"Ref": "VPCName"}}]
       }
    },
    "DatabaseSubnet1": {
      "Type" : "AWS::EC2::Subnet",
      "Properties" : {
        "AvailabilityZone" : "eu-west-1a",
        "CidrBlock" : "10.15.1.0/28",
        "MapPublicIpOnLaunch" : true,
        "VpcId": {
          "Ref" : "Staging"
        },
        "Tags": [{"Key": "Name", "Value": "db-1a"}]
      }
    },
    "DatabaseSubnet2": {
      "Type" : "AWS::EC2::Subnet",
      "Properties" : {
        "AvailabilityZone" : "eu-west-1b",
        "CidrBlock" : "10.15.1.16/28",
        "MapPublicIpOnLaunch" : true,
        "VpcId": {
          "Ref" : "Staging"
        },
        "Tags" : [{"Key": "Name", "Value": "db-1b"}]
      }
    },
    "WebSubnet1": {
      "Type" : "AWS::EC2::Subnet",
      "Properties" : {
        "AvailabilityZone" : "eu-west-1a",
        "CidrBlock" : "10.15.0.8/28",
        "MapPublicIpOnLaunch" : true,
        "VpcId": {
          "Ref" : "Staging"
        },
        "Tags" : [{"Key": "Name", "Value": "web-1a"}]
      }
    },
    "RDSSubnet": {
     "Type" : "AWS::RDS::DBSubnetGroup",
     "Properties" : {
        "DBSubnetGroupDescription": "db-prod-subnet-group",
        "SubnetIds" : [
          { "Ref": "DatabaseSubnet1" },
          { "Ref": "DatabaseSubnet2" }
        ]
      }
    },
    "Database": {
      "Type" : "AWS::RDS::DBInstance",
      "Properties" : {
        "AllocatedStorage": "5",
        "AllowMajorVersionUpgrade" : false,
        "DBInstanceClass": {"Ref":"DatabaseInstanceType"},
        "DBName" : {"Ref":"DatabaseName"},
        "DBInstanceIdentifier": {"Ref":"DatabaseName"},
        "Engine" : "MySQL",
        "EngineVersion" : {"Ref":"DatabaseEngineVersion"},
        "DBSubnetGroupName": {
          "Ref": "RDSSubnet"
        },
        "MasterUsername" : {"Ref": "DatabaseMasterUsername"},
        "MasterUserPassword" : {"Ref": "DatabaseUserPassword"},
        "MultiAZ" : true,
        "VPCSecurityGroups": [
          {
            "Ref": "DatabaseSG"
          }
        ],
        "PubliclyAccessible" : {"Ref": "DatabasePublicAccess"},
        "Tags" : [{"Key": "Name", "Value": {"Fn::Join":[".", ["db", {"Ref": "ProjectName"}, {"Ref":"VPCName"}]]} }]
      }
    },
    "WebInstance" : {
        "Type" : "AWS::EC2::Instance",
        "Properties" : {
            "ImageId" : {"Ref": "WebInstanceImage"},
            "InstanceType" : {"Ref": "WebInstanceType"},
            "KeyName" : {"Ref": "WebKey"},
            "BlockDeviceMappings" : [
                {
                    "DeviceName" : "/dev/sdm",
                    "Ebs" : {
                        "VolumeType" : "io1",
                        "Iops" : "200",
                        "DeleteOnTermination" : "false",
                        "VolumeSize" : "20"
                    }
                },
                {
                    "DeviceName" : "/dev/sdk",
                    "NoDevice" : {}
                 }
            ],
            "SubnetId": { "Ref" : "WebSubnet1" },
            "SecurityGroupIds": [
                {"Ref": "WebSG"}
            ]
        }
    },
    "StagingZone": {
      "Type" : "AWS::Route53::HostedZone",
      "Properties" : {
        "Name" : {"Fn::Join":[".", [{"Ref": "ProjectName"}, {"Ref":"VPCName"}]]},
        "VPCs" : [{"VPCId": {"Ref": "Staging"}, "VPCRegion": "eu-west-1"}]
      }
    },
    "StagingInternetGateway" : {
      "Type" : "AWS::EC2::InternetGateway",
      "Properties" : {
        "Tags" : [ {"Key" : "Name", "Value" : {"Fn::Join":["-", [{"Ref":"VPCName"}, "igw"]]}}]
      }
    },
    "StagingIgwAttach": {
      "Type" : "AWS::EC2::VPCGatewayAttachment",
      "Properties" : {
        "InternetGatewayId" : {"Ref": "StagingInternetGateway"},
        "VpcId" : {"Ref": "Staging"}
      }
    },
    "StagingRouteTable": {
       "Type" : "AWS::EC2::RouteTable",
       "Properties" : {
          "VpcId" : {"Ref": "Staging"}
       }
    },
    "LocalRoute": {
       "Type" : "AWS::EC2::Route",
       "Properties" : {
          "DestinationCidrBlock" : "0.0.0.0/0",
          "GatewayId" : {"Ref": "StagingInternetGateway"},
          "RouteTableId" : {"Ref": "StagingRouteTable"}
       }
    },
    "Web1LocalRoute": {
      "Type" : "AWS::EC2::SubnetRouteTableAssociation",
      "Properties" : {
        "RouteTableId" : {"Ref": "StagingRouteTable"},
        "SubnetId" : {"Ref": "WebSubnet1"}
      }
    },
    "Db1LocalRoute": {
      "Type" : "AWS::EC2::SubnetRouteTableAssociation",
      "Properties" : {
        "RouteTableId" : {"Ref": "StagingRouteTable"},
        "SubnetId" : {"Ref": "DatabaseSubnet1"}
      }
    },
    "Db2LocalRoute": {
      "Type" : "AWS::EC2::SubnetRouteTableAssociation",
      "Properties" : {
        "RouteTableId" : {"Ref": "StagingRouteTable"},
        "SubnetId" : {"Ref": "DatabaseSubnet2"}
      }
    },
    "DatabaseSG": {
      "Type" : "AWS::EC2::SecurityGroup",
      "Properties" : {
        "GroupDescription" : "Database security groups",
        "SecurityGroupIngress" : [
          {
            "IpProtocol" : "tcp",
            "FromPort": 3306,
            "ToPort" : "3306",
            "SourceSecurityGroupId": {"Ref" : "WebSG"}
          }
        ],
        "Tags" :  [{"Key": "Name", "Value": "db-sg"}],
        "VpcId" : {"Ref": "Staging"}
      }
    },
    "WebSG": {
      "Type" : "AWS::EC2::SecurityGroup",
      "Properties" : {
        "GroupDescription" : "Web security groups",
        "SecurityGroupIngress" : [
          {
            "IpProtocol" : "tcp",
            "ToPort" : 80,
            "FromPort": 80,
            "CidrIp" : "0.0.0.0/0"
          },
          {
            "IpProtocol" : "tcp",
            "ToPort" : 22,
            "FromPort": 22,
            "CidrIp" : "0.0.0.0/0"
          }
        ],
        "Tags" :  [{"Key": "Name", "Value": "web-sg"}],
        "VpcId" : {"Ref": "Staging"}
      }
    },
    "DatabaseRecordSet" : {
      "Type" : "AWS::Route53::RecordSet",
      "Properties" : {
         "HostedZoneId" : {
            "Ref": "StagingZone"
         },
         "Comment" : "DNS name for database",
         "Name" : {"Fn::Join":[".", ["db", {"Ref": "ProjectName"}, {"Ref":"VPCName"}]]},
         "Type" : "CNAME",
         "TTL" : "300",
         "ResourceRecords" : [
           { "Fn::GetAtt" : [ "Database", "Endpoint.Address"]}
         ]
      }
    }
  }
}

Conclusion

You can load this teamplate in your account and after environment creations you are ready to work with one EC2 instance and one RDS with MySQL 5.6 installed. You can log into the web interface with key-pair chosen during the creation flow (default ga-eu) and I set default this mysql credential:

user gianarb
password test1234

But you can change it before running this template because they are Parameters. This approach in my opinion is very powerful because you can start versioning your infrastructure and you can delete and restore it quickly because if you delete the cloudformation stack it rollbacks all resources, it is very easy!

Trick

Parameters node create a form into the AWS CloudFormation console to choose a lot of different variable values, for example name of intances or key-pair to log in your EC2.

{
  "Parameters" : {
    "VPCName" : {
      "Type" : "String",
      "Default" : "staging",
      "Description" : "VPC name"
    },
    "ProjectName" : {
      "Type" : "String",
      "Default" : "app",
      "Description" : "Project name"
    },
    "WebKey" : {
      "Type" : "String",
      "Default" : "web-key",
      "Description" : "Ssh key to log into the web instances"
    }
}

Resources node contains all elements of your infrastructure, EC2, RDS, VCP.. You can use the parameteters with a simple Ref Key. es. [{"Key": "Name", "Value": "ProjectName"}] describe the name of the specific project into the parameter form.

{
  "Resources" : {
    "Staging": {
       "Type" : "AWS::EC2::VPC",
       "Properties" : {
          "CidrBlock" : "10.15.0.0/16",
          "EnableDnsSupport" : true,
          "EnableDnsHostnames" : true,
          "InstanceTenancy" : "default",
          "Tags" : [{"Key": "Name", "Value": {"Ref": "VPCName"}}]
       }
    },
    "DatabaseSubnet1": {
      "Type" : "AWS::EC2::Subnet",
      "Properties" : {
        "AvailabilityZone" : "eu-west-1a",
        "CidrBlock" : "10.15.1.0/28",
        "MapPublicIpOnLaunch" : true,
        "VpcId": {
          "Ref" : "Staging"
        },
        "Tags": [{"Key": "Name", "Value": "db-1a"}]
      }
    }
}

In your template you can describe VPC and create its subnet. You can also describe the specific resource and you can use it to build another

WebSubnet1": {
  "Type" : "AWS::EC2::Subnet",
  "Properties" : {
    "AvailabilityZone" : "eu-west-1a",
    "CidrBlock" : "10.15.0.8/28",
    "MapPublicIpOnLaunch" : true,
    "VpcId": {
      "Ref" : "Staging"
    },
    "Tags" : [{"Key": "Name", "Value": "web-1a"}]
  }
},

In this example I resumed Staging VPC to build its subnet.

This chapter is insteresting because it creates a RecordSet to map a CNAME DNS in your VPC and now in your Web instances you can resolve MYSql host with db.app.staging.

"DatabaseRecordSet" : {
  "Type" : "AWS::Route53::RecordSet",
  "Properties" : {
     "HostedZoneId" : {
        "Ref": "StagingZone"
     },
     "Comment" : "DNS name for database",
     "Name" : {"Fn::Join":[".", ["db", {"Ref": "ProjectName"}, {"Ref":"VPCName"}]]},
     "Type" : "CNAME",
     "TTL" : "300",
     "ResourceRecords" : [
       { "Fn::GetAtt" : [ "Database", "Endpoint.Address"]}
     ]
  }
}

@EmanualeMinotto thanks for trying to fix my bad English

Build your Zend Framework Console Application

2015-05-21T23:08:27+00:00

Blogpost about console-skeleton-app for your console application https://t.co/WuVq0GZlxE #PHP #ZF #console #develop
— Gianluca Arbezzano (@GianArb) June 23, 2015

Github: Article written about console-skeleton-app 1.0.0

I’m writing a skeleton app to build console/bash application in PHP. This project is very easy and it depends on ZF\Console a zfcampus project and Zend\Console builds by ZF community. I have a todo list for the future but for the time being it’s just a blog post about these two modules.

Integration with container system to manage dependency injection
Docs to test your command
Use cases and different implementations

ZF\Console and other components

ZF\Console is maintained by zfcampus and it is used by Apigility
zendframework\zend-console is maintained by zendframework, all the info are in the documantation

Tree

This is my folders structure proposal, there are three entrypoint in the bin directory, one for bash, one for php and a bat for Window. I use composer to manage my dependencies and I included .lock file because this project is an APPLICATION not a library.. /config directory contains only routing definitions but in the future we can add services and other configurations. src/Command/ contains my commands.

├── bin
│   └── console.php
├── composer.json
├── composer.lock
├── config
│   └── routes.php
├── src
│   └── Command
│       ├── Conf.php
│       ├── Database.php
│       └── Download.php
└── vendor
    └── ...

Bootstrap

The Application’s entrypoints are just example and they require few changes. First we have to change the version in the parameters.php configuration file and also change the application name 'app' to what fits. To load configurations from different sources I will use the well known Zend\Config component.

<?php
require __DIR__.'/../vendor/autoload.php';

use Zend\Console\Console;
use ZF\Console\Application;
use ZF\Console\Dispatcher;

$version = '0.0.1';

$application = new Application(
    'app',
    $version,
    include __DIR__ . '/../config/routes.php',
    Console::getInstance(),
    new Dispatcher()
);

$exit = $application->run();
exit($exit);

Routes

config/routes.php contains router configurations. This is just an example but you can see all options here.

<?php
return [
    [
        'name'  => 'hello',
        'route' => "--name=",
        'short_description' => "Good morning!! This is a beautiful day",
        "handler" => ['App\Command\Hello', 'run'],
    ],
];

Command

Basic command to wish you a good day! I decided that a command doesn’t extends any class because in my opinion is a good way to impart readability and simplicity.

<?php
namespace App\Command;

use ZF\Console\Route;
use Zend\Console\Adapter\AdapterInterface;

class Hello
{
    public static function run(Route $route, AdapterInterface $console)
    {
        $name = $route->getMatchedParam("name", "@gianarb");
        $console->writeLine("Hi {$name}, you have call me. Now this is an awesome day!");
    }
}

Troubleshooting and tricks

OSx return an error because zf-console use a function blocked into the mac os php installation. Have a look at PR#22
See this article to package your application in a phar archive..

@__debo thanks for trying to fix my bad English

Test your Symfony Controller and your service with PhpUnit

2015-05-21T23:08:27+00:00

Unit #test for your #Controller with #PhpUnit and #Symfony.. With a little use case of #DepedenceInjaction test https://t.co/JNb39EyRly #php
— Gianluca Arbezzano (@GianArb) May 21, 2015

In this article I would like share with you a little experience with:

Symfony MVC
PhpUnit
Symfony Dependence Injaction

This is an example of very easy controller.

<?php
namespace AppBundle\Controller;

use Sensio\Bundle\FrameworkExtraBundle\Configuration\Route;
use Symfony\Bundle\FrameworkBundle\Controller\Controller;
use Symfony\Component\HttpFoundation\Request;

class SomeStuffController extends FOSRestController
{
    /**
     * @Rest\Post("/go")
     * @return array
     */
    public function goAction(Request $request)
    {
        if($this->container->getParameter("do_stuff")) {
            $body = $this->container->get("stuff.service")->splash($request->getContent());
        }
        return [];
    }
}

$this->container->getParameter("do_stuff") is a boolean parameter that enable or disable a feature, How can I test this snippet? I can try to write a functional test but in my opinion is easier write a series of unit tests with PhpUnit to validate my expectations.

Expectations

If do_stuff parameter is false function get by my container will be call zero times
If do_stuff parameter is true function get by my container will be call one times

<?php

namespace AppBundle\Tests\Controller;

use Liip\FunctionalTestBundle\Test\WebTestCase;
use AppBundle\Controller\SomeStuffController;

class SomeStuffControllerTest extends WebTestCase
{
    public function testDoStuffIsTrue()
    {
        $request = $this->getMock("Symfony\Component\HttpFoundation\Request");
        $container = $this->getMock("Symfony\Component\DependencyInjection\ContainerInterface");
        $service = $this->getMockBuilder("Some\Stuff")->disableOriginalConstructor()->getMock();
        $container->expects($this->once())
            ->method("getParameter")
            ->with($this->equalTo('do_stuff'))
            ->will($this->returnValue(true));

        $container->expects($this->once())
            ->method("get")
            ->with($this->equalTo('stuff.service'))
            ->will($this->returnValue($service));

        $controller = new SameStuffController();
        $controller->setContainer($container);

        $controller->goAction($request);

    }
}

This is my first expetection “If do_stuff param is true I call stuff.service”. In this controller I use a few objects, Http\Request, Container and stuff.service in this example is a Some\Stuff class. In the first step I have created one mock for each object.

<?php
$request = $this->getMock("Symfony\Component\HttpFoundation\Request");
$container = $this->getMock("Symfony\Component\DependencyInjection\ContainerInterface");
$service = $this->getMockBuilder("Some\Stuff")->disableOriginalConstructor()->getMock();

In the second step I have written my first expetctation, “Call only one time function getParameter from $container with argument do_stuff and it returns true”.

<?php
$container->expects($this->once())
    ->method("getParameter")
    ->with($this->equalTo('do_stuff'))
    ->will($this->returnValue(true));

Thanks at this definitions I know that there will be another effect, my action will call only one time $container->get("stuff.service") and it will be return an Some\Stuff object.

The second test that we can write is “if do_stuff is false $contaner->get("stuff.service") it will not be called.

<?php
public function testDoStuffIsFalse()
{
    $request = $this->getMock("Symfony\Component\HttpFoundation\Request");
    $container = $this->getMock("Symfony\Component\DependencyInjection\ContainerInterface");
    $service = $this->getMockBuilder("Some\Stuff")->disableOriginalConstructor()->getMock();
    $container->expects($this->once())
        ->method("getParameter")
        ->with($this->equalTo('do_stuff'))
        ->will($this->returnValue(false));

    $container->expects($this->never())
        ->method("get")
        ->with($this->equalTo('stuff.service'))
        ->will($this->returnValue($service));

    $controller = new SameStuffController();
    $controller->setContainer($container);
    $controller->goAction($request);
}

The price of modularity

2015-02-21T23:08:27+00:00

Today all frameworks are modulable but it isn’t just a beautiful word, behind it there are a lot of concepts and ideas:

The modularity helps you to reuse parts of code in different projects
Every component is indipendent so you work on single part of code
Every component solves a specific problem… it’s a beautiful concept that helps you with maintainance!
other stuffs..

As you can imagine there is a drawback, all this requires a big effort. Ideally every component requires personal circle of release, repository, commits, pull requests, travis conf, documentation etc. etc.

Anyway several shorcuts are available. For instance, git subtree could help you in this war but the key is this: you need an agreement to win.

Zend Framwork Community choose another street, Zend\Mvc in this moment required:

{
    "name": "zendframework/zend-mvc",
    "...": "...",
    "target-dir": "Zend/Mvc",
    "require": {
        "php": ">=5.3.23",
        "zendframework/zend-eventmanager": "self.version",
        "zendframework/zend-servicemanager": "self.version",
        "zendframework/zend-form": "self.version",
        "zendframework/zend-stdlib": "self.version"
    },
    "require-dev": {
        "zendframework/zend-authentication": "self.version",
        "zendframework/zend-console": "self.version",
        "zendframework/zend-di": "self.version",
        "zendframework/zend-filter": "self.version",
        "zendframework/zend-http": "self.version",
        "zendframework/zend-i18n": "self.version",
        "zendframework/zend-inputfilter": "self.version",
        "zendframework/zend-json": "self.version",
        "zendframework/zend-log": "self.version",
        "zendframework/zend-modulemanager": "self.version",
        "zendframework/zend-session": "self.version",
        "zendframework/zend-serializer": "self.version",
        "zendframework/zend-text": "self.version",
        "zendframework/zend-uri": "self.version",
        "zendframework/zend-validator": "self.version",
        "zendframework/zend-version": "self.version",
        "zendframework/zend-view": "self.version"
    },
    "suggest": {
        "zendframework/zend-authentication": "Zend\\Authentication component for Identity plugin",
        "zendframework/zend-config": "Zend\\Config component",
        "zendframework/zend-console": "Zend\\Console component",
        "zendframework/zend-di": "Zend\\Di component",
        "zendframework/zend-filter": "Zend\\Filter component",
        "...": "..."
    },
    "...": "..."
}

A few require-dev dependencies are used into the component to run some features, why? This force me to think “Dependencies of this feature are included or not?”!! Composer was born to solve it! In my opinion the cost of the question is highest than download a few unused classes. There are a lot of unused classes? Maybe too much?

Even if the right answer donsn’t exist I think thant some indicators may help you to understand when is the moment to split the component:

List of dependencies
Complexity of component
Features
..

No shortcuts.

Zend Framework release 2.3.4

2015-01-14T00:00:00+00:00

Zend Framework 2.3.4 is ready! After 4 mouths the new path version of ZF2 is published.

How all path release there aren’t new important feature but the list of pull requests is very long.

#7112 You can find into /resources directory a ZF official logo
#7087 Happy new year by ZF!
#6673 Zend\Http\Header now support DateTime format for expire Cookie

Zend Framework follow semver directives, 2.3.4 is a path release in this version there are a log list of bug fixes

Good download Zend Femework 2.3.4, this is the changelog

Influx DB and PHP implementation

2015-01-06T00:00:00+00:00

Influx DB is time series database written in Go.

It supports SQL like queries and it has different entry points, REST API (tcp protocol) and UDP.

We wrote a sdk to manage integration between Influx and PHP.

It supports Guzzle Adapter but if you use Zend\Client you can write your implementation.

<?php
$guzzle = new \GuzzleHttp\Client();

$options = new Options();
$adapter = new GuzzleAdapter($guzzle, $options);

$client = new Client();
$client->setAdapter($adapter);

In this case we are using a Guzzle Client, we communicate with Influx in TPC, but we can speak with it in UDP

<?php
$options = new Options();
$adapter = new UdpAdapter($options);

$client = new Client();
$client->setAdapter($adapter);

Both of them have the same usage

<?php
$client->mark("app.search", $points, "s");

The first different between udp and tcp is known, TPC after request expects a response, UDP does not expect anything and in this case does not exist any delivery guarantee. If you can accept this stuff this is the benchmark:

Corley\Benchmarks\Influx DB\AdapterEvent
    Method Name                Iterations    Average Time      Ops/second
    ------------------------  ------------  --------------    -------------
    sendDataUsingHttpAdapter: [1,000     ] [0.0026700308323] [374.52751]
    sendDataUsingUdpAdapter : [1,000     ] [0.0000436344147] [22,917.69026]

Zf2 Event, base use

2013-11-21T12:38:27+00:00

Hi! Some mouths ago I have writted a gist for help me to remember a base use of Events and Event Manager into Zend Fremework, in this article I report this small tutorial.

<?php
require_once __DIR__."/vendor/autoload.php";

class Foo
{
    /* @var \Zend\EventManager\EventManagerInterface */
    protected $eventManager;

    public function getEventManager()
    {
        if(!$this->eventManager instanceof \Zend\EventManager\EventManagerInterface){
            $this->eventManager = new \Zend\EventManager\EventManager();
        }
        return $this->eventManager;
    }

    public function echoHello()
    {
        $this->getEventManager()->trigger(__FUNCTION__."_pre", $this);
        echo "Hello";
        $this->getEventManager()->trigger(__FUNCTION__."_post", $this);
    }
}

$foo = new Foo();
$foo->getEventManager()->attach('echoHello_pre', function($e){
    echo "Wow! ";
});
$foo->getEventManager()->attach('echoHello_post', function($e){
    echo ". This example is very good! \n";
});
$foo->getEventManager()->attach('echoHello_post', function($e){
    echo "\nby gianarb92@gmail.com \n";
}, -10);
$foo->echoHello();

The result:

gianarb@GianArb-2 eventTest :) $ php try.php
Wow! Hello. This example is very good!

by gianarb92@gmail.com

@see Zend Event Manager Ref

Git global gitignore

2013-11-21T12:38:27+00:00

.gitignore helps me manage my commits by setting which files or directory don’t end in my repository. I know two good practices if you work for example on an open source project:

You don’t commit your IDE configurations
Not use .gitignore file for exclude IDE configuration, because this is personal problem. There are differents IDE, if all devs exclude this files on a repository level the lists is very long.

I follow this practices for all my projects, if you are Mac user you have a DS_STORE files, there is a method for exclude this file of default.

~./.gitconfig is your configuration file, every user has it. If you execute this command

$. git config --global core.excludesfile ~/.gitignore_global

into this file it write thiese lines

[core]
excludesfile = /Users/gianarb/.gitignore_global

/Users/gianarb/.gitignore_global is my global gitignore file!

# IDE #
#######
.idea

# COMPOSER #
############
composer.phar

# OS generated files #
######################
.DS_Store
.DS_Store?
._*
.Spotlight-V100
.Trashes
ehthumbs.db
Thumbs.db

Vagrant Up, slide and first talk

2013-09-14T12:08:27+00:00

Vagrant Up

Friday 12 Sept 2013 I talked at Vagrant, tool for manage VM, these is my slide. I thanks PugMi for this opportunity if you have questions I'm here! :grin:

Vagrant - PugMI from Gianluca Arbezzano

quasi 30 persone a sentire @GianArb parlare di #vagrant #php #pugMi #milano pic.twitter.com/75MOJiZmDZ
— Milano PHP (@MilanoPHP) September 12, 2013

Zend Framework 2 - Console usage a speed help

2013-08-22T08:08:27+00:00

With Zend Framework is very easy to write a command line tool to manage different things. But what if there are more commands? How do you remeber them all?

<?php
namespace ModuleTest;
use Zend\Console\Adapter\AdapterInterface;
class Module {
	public function getConsoleUsage(AdapterInterface $console)
	{
		return array(
			array('test <params1> <params2> [--params=]', 'Description of test command'),
			array('run <action>', 'Start anction')
		);
	}
}

You can write this function in a Module.php file, and create a basic helper to help you see when you write a bad command.

English by Rali :smile: Thanks!!!! :smile:

Generale Jekyll sitemap without plugin

2013-08-09T09:38:27+00:00

This blog is a static blog and uses GitHub pages, GitHub pages are generally deployed using Jekyll.

How can you generate a sitemap without Jekyll plugin?

This gist answers your question.

I use some post values: changefreq, date and priority, if you don’t set any specific values for them default values are used that are, 0.8 for priority and month for frequency.

In a single post you add this params for use correct params!

---
layout: post
title:  "Why this blog?"
date:   2013-07-22 23:08:27
categories: me
tags: me, developer, presentation, gianarb
summary: Gianluca Arbezzano, developer, Italian, why open this blog?
changefreq: monthly
---

If you want to know more about the Sitemap Protocol read this.

Marco thanks for English! :)

Zend Framework 2 - How do you implement log service?

2013-07-26T23:08:27+00:00

A log system is an essential element for any application. It is a way to check the status and use of the application. For a basic implementation you can refer to the fig-stanrdars organization PSR-3 article, that describes th elogger interface.

Zend Framework 2 implement a Logger Component, the following is an example of how to use it with service manager.

<?php
return array(
	'service_manager' => array(
		'abstract_factories' => array(
			'Zend\Log\LoggerAbstractServiceFactory',
		),
	),
	'log' => array(
		'Log\App' => array(
			'writers' => array(
				array(
					'name' => 'stream',
					'priority' => 1000,
					'options' => array(
						'stream' => 'data/app.log',
					),
				),
			),
		),
	),
);

LoggerAbstractServiceFactory is a Service Factory, as an example, into service Manager class Logger and will be used in the whole application. Log/App is the name of a single logger, and writer is an adapter that is used to choose the method of writing, in this case everything is written to file, but you can use a DB adapter and write your log into database.

<?php
namespace GianArb\Controller;
class GeneralController
	extends AbastractActionController
{
	public function testAction(){
		$logger = $this->getServiceLocator()->get('Log\App');
		$logger->log(\Zend\Log\Logger::INFO, "This is a little log!");
	}
}

With this configuration Log\App writes a string into data/app.log file, with INFO property. By default you can use an array of properties.

<?php
protected $priorities = array(
	self::EMERG  => 'EMERG',
	self::ALERT  => 'ALERT',
	self::CRIT   => 'CRIT',
	self::ERR    => 'ERR',
	self::WARN   => 'WARN',
	self::NOTICE => 'NOTICE',
	self::INFO   => 'INFO',
self::DEBUG  => 'DEBUG',
);

Usage of different keys is a good practice because it is very easy to write filter or log categories.

Another good practice, valid for all services in general, is to create your class extending single service.

<?php
use Zend\Log\Logger
class MyLogger extends Logger

This choice helps managing future customizations of services and is another important layer for managing unexpected updates.

Rali, thanks for your help with my robotic english! :P

Why this blog?

2013-07-22T23:08:27+00:00

Hi! I’m Gianluca aka GianArb, I’m web developer, works with Php, SQL and noSql databases and in this time I’m crazy for DevOps, Vagrant and Chef, than for manage this tool I’m learning Ruby.

Why this blog?

I’m opening this blog because my English is very terrible! I have an Italian Blog in WordPress but I’d like to use Jekyll and this is a good opportunity. Share my experience and my Job, to grow and improve my skills! Can you help me with my English? :P

Skills and interests

This is a list of my skills and interests that I am sure will be all topics for my posts, I hope you will enjoy reading them! Php, tech, html, css, js, Open Source, Zf2, Doctrine, Symfony, noSql (mongo, couch..), sql databases, Redis, DevOps, Chef, Vagrant, Composer, TDD….

Open Source world!

Community and Open Source are my passion! A community is a great way to challenge myself as a person and as a coder. You can find me on Github Account!

Gianluca Arbezzano - GianArb

Linkerd jumped on the bandwagon

How I discover new codebases

Kubernetes is finally just an utility

You should avoid Meetup.com

From Ubuntu to NixOS the story of a mastodon migration

At the beginning it was all about Ubuntu

Build a migration plan

How it went

Show me the code

Now what?

My workflow with NixOS. How do I work with it

Some context

The workflow

How to run this VM

Integration tests

Steep learning curve

Website redesign and goodbye Bootstrap

Goodbye navbar

Your browser is cooler than me

ADV or not ADV

What’s next

Credit

How I started with NixOS

Start simple

Check it out

Pick a second use case

Time to install NixOS the second target

Conclusion

How I tricked the cable mafia with PXE. Install OpenWRT on APU4d

Get what you need

Time to install OpenWRT

Conclusion

DIY Board management control for an Intel NUC: power control

Time to hack

Power the Raspberry PI

Conclusion

Nix for developers

Nix is slowly sneaking in my toolchain.

Not tight to an operating system

Declarative environment

It is all based on Git.

Environment composition with Nix

Project level sandboxing with nix-shell

Everything can be “nixyfied”

Conclusion

Evolution of a logline

Bus Factor

Automation

Flexibility

Conclusion

Kubenetes v1.20 the docker deprecation dilemma in practice

Lessons learned working as site reliability engineer

Ability to develop a friendly environment for yourself

Troubleshoot like a ninja

Think about code debuggability in production

Conclusion

Vanity URL for Go mod with zero infrastructure

Why

Goals

Prerequisite

Add your first library.

Conclusion

What is Tinkerbell?

What the project provides

The end goal

Workflow and template

Action

How a template and a workflow looks like

How to get started

Next steps

Conclusion

More, I want more!

Reactive planing in Golang. Reach a desired number adding and subtracting random numbers

Change the AddNumber to use a randomly generated number.

Evolve the Create function to subtract numbers from the current state

Conclusion

Your release workflow is code, it is just about time

How bare metal provisioning works in theory

Netbooting