Reproducible Builds

GDG Berlin Golang

20 April 2015

Dave Cheney

Warning: this presentation contains nuts

I feel a certain degree of trepidation on stage today. Not just because the size of the audience I am addressing, but because of the subject I will be discussing.

Dependency management in Go is the single most common question I have been asked consistently for several years now.

Dependency management is the equivilent of Python's GIL, a problem that everyone has, but one that has not been solved.

What I see is many people who are actively choosing to stay "standard library only", and I worry that more are silently sitting on the fence until a solution is found.

Agenda

Problem statement

Repeatable builds

I have a requirement that at any time I can fetch the entire graph of source that went into a program, feed that to a compiler and produce a program that is identical to one created in the past.

This is the requirement I have, and this is the motivation for this talk. If you don't have this requirement, that's fine.

The plethora of tools that exist in this space shows that Go programmers have multiple, sometimes overlapping requirements. Again, that is fine, this is my solution for my requirements; it is my hope that I can convince you of it's utility to you, but again, if you don't share my requirements, I may not be successful in my arguments.

Out of scope

Why I don't have a reliable builds today

OK, so now I've told you what I want; I need to explain to you why I don't feel that I have it today.

import "github.com/pkg/sftp"  # yes, but which revision!

The most obvious reason is the import statement inside a Go package does not provide enough information for go get to select from a set of revisions available in a remote code repository the specific revision to fetch.

That information simply isn't there.

Naming things (part 1)

There are two rules for successful dependency management in Go.

Rule 1: Things that are different must have different import paths.

Who has written a log or logger package, they might all be called "log", but they are not the same package.

This why we have namespaces, github.com/you/log, github.com/me/log.

Naming things (part 2)

Rule 2: Things that are the same must have the same import path.

Are these two packages the same, or are they different ?

github.com/lib/pq
github.com/davecheney/foo/internal/github.com/lib/pq

They are the same, this is the same code -- this is obvious to a human, not a computer.

To a compiler these are different packages.

The import statement cannot be changed

We cannot add anything to the import syntax for two reasons

import "github.com/pkg/term" "{hash,tag,version}"

Versions in the URL (part 1)

We cannot embed anything in the import syntax

import "github.com/project/v7/library"

Versions in the URL (part 2)

Leads to nightmarish scenarios where equality and type assertions are broken.

import "github.com/project/v9/lib" // registers itself as a dialer
import "github.com/project/dialer"

err := dialer.Dial("someurl")
fmt.Println(err == lib.ErrTimeout) => false
fmt.Printf("%T", err) => "lib.ErrTimeout"
fmt.Println(v7/lib.ErrTimeout == v9/lib.ErrTimeout) => false

Competitive Analysis

Dude, be a good Gopher, don't break users

So the first, and longest standing solution to this problem is to always have a stable API.

If it worked, we wouldn't be having this conversation today

I live in the real world

If my time in system administration taught me anything, it's the unexpected failures that get you. You can plan for the big disasters, but it turns out that the little disasters can be just as disruptive.

These are all little disasters, you can usually find the code again, maybe it's just a quick sed rewrite and you're back again.

But just like the big disasters, these little disasters are indistinguishable, code which built one day, won't build the next.

Don't be this person

The moral of the story is, if you are responsible for delivering a product written in Go, you need to be responsible for all the source that goes into that product.

Tools which manage $GOPATH

Tools which fixup $GOPATH after go get

Problems

Tools which vendor packages

Copying, vendoring, rewriting the source, is the new position from the Go team.

Problems

Tools which give you one $GOPATH per project

Virtual env all the things!

Problems

Proposal

Stop working around go get

Every one of the existing solutions is hamstrung by the fact it is working around the limitations of the go tool.

Stop using go get. Don't use the go tool at all.

Requirements

So, we're talking about writing a new build tool for Go, not a wrapper around an existing tool.

Project based

A new build tool should be project based.

No configuration files

This one I find hard to accept, but Go developers do not want to have any sort of configuration file to build their code.

I find this hard to rationalize because most repos that I look have had dozens of turds in them, Gruntfiles, Dockerfiles, Werker configs, etc.

Respect the canonical import path

package pdf // import "rsc.io/pdf"

Leaves source untouched

Annoying things

If we're going to go the extreme of divorcing ourselves from the go tool then maybe we can fix a few other annoyances along the way

Introducing gb

% /usr/bin/gb

Demo time

Take aways

The problem is go get, not the import statement.

The go tool doesn't define the language, we can build a replacement.

Try it out

go get github.com/constabulary/gb/...

Thank you

Use the left and right arrow keys or click the left and right edges of the page to navigate between slides.
(Press 'H' or navigate to hide this message.)