Your Images are Out of Date (probably) - The Silent Rebuilds problem
Download MP3Welcome to DevOps and Docker talk.
And in this episode, I'm taking
a clip from the live stream I
did with my co-host Nirmal Mehta.
Of AWS and I bring him
up to speed on something.
I've spent weeks on working with
Eric Smalling of Chainguard,
that's the Zero CVE image security
company for Docker images.
And Eric and I have been working
for a couple of weeks on trying
to create a piece of training.
I wouldn't call it new 'cause this isn't
a new issue, but something I'm calling
silent rebuilds or more accurately
silent upstream base image rebuilds.
That's a long time.
I like silent rebuilds better.
That has a better ring to it.
But this video is gonna be about a problem
that I really want to call attention
to on your path to basically having the
least amount of CVEs in production as
possible without having to have to do
crazy amounts of work or custom home
built images that have little parts
taken out of it or something like that.
Like we all are probably using, a majority
of us at least, are using upstream based
images from Docker Hub or from GitHub
or AWS's public or GitHub's public image
catalog or Chainguard or any one of the
companies that are now providing not just
base images, but hardened base images.
These all are upstream controlled, meaning
that we don't build them ourselves.
We typically rely on the vendor
to provide us that base image.
It's usually a base os of some sort,
whether that's Debian or alpine or maybe
even Ubuntu or Wolfie if you're from
Chainguard, which create, they created
their own custom base image, and there's
all these base images, and then there's
usually something on top, like if you're
not just doing a generic base image,
then you're probably doing something
like Python or Node.JS or something else.
That's a programming language, which
has its own libraries in dependencies.
And then maybe if you're doing
something on top of that, like you
might be doing a WordPress site or a
Ghost blog or Drupal, and so there's
another application layer on top.
Or maybe you're building your own custom
images regardless of what you're doing.
That base image is the cause for
years and years of us talking
about how to reduce image size.
But more importantly, CVE Count.
And CvEs in production are really all
that matters at the end of the day for
us DevOps people because something in
test or staging is obviously important
too, but really what really matters is
making sure that production has the least
number of vulnerabilities as possible.
And we typically only have
vulnerabilities in production even when.
we're thinking about them and
trying to reduce the number.
We're only able to get our production
total, like all the servers or
the entire Kubernetes cluster or
however you consider production.
We're only able to get that
number down to so little.
Like it's really, really hard and
challenging and difficult to get
zero CVEs everywhere all the time.
Like I actually have never seen it.
I've never, at all the government agencies
I've worked with, all the big companies,
the small startups, everyone I've
worked with, there's always something.
So you have to do a lot of analysis
and there's all this work involved.
Well, regardless of all that, the
base upstream images that you're using
today, probably if they didn't come
with CVEs on day one when you first
started using them, the longer they sit
somewhere, on any server running, the
more likely they're gonna have more CVEs.
And that is the topic
of this conversation.
And I wanted to call attention to a
particular style of CVEs, sneaking in
to your images after you've deployed
them, because CVEs are discovered
all the time for existing code.
That's how CVEs happen.
That's how we find the vulnerabilities,
and then eventually patch them
and eventually roll out the patch.
Right?
So there's a strategy to all this.
There's multiple ways to
update your images, which are
really the production artifact.
And in this conversation, I'm essentially
explaining the work we've been doing on
how to provide you a open source guide
to the tooling and basically like a
prescription for using GitHub Actions.
'cause that's my favorite.
Hey, did you know I'm making
a course on GitHub Actions?
I'm making a course.
Uh, you can sign up below in the notes.
Uh, there's a link somewhere in this
video and you can go check that out.
But as a part of that.
We were making this piece of
content around Chainguard images
specifically, and I started to imagine
a bigger solution to this problem.
And we are creating that for you.
I don't really have a name for the
solution other than a series of GitHub
workflows and automations that are going
to ensure from multiple vectors that
your container images actually have the
least CVEs that you can currently get.
And that might be in the form of
a Dependabot or a renovate tool,
and that might be in a newer tool
from Chainguard called Digestabot.
And it's not a very popular or
well-known project, so I want to call
attention to it and we talk about what
it is and the, the problem and the
solution, So let's reduce that down.
Let's create a strategy.
That people can implement, so I hope
you enjoy this episode with Nirmal
and I talking about silent rebuilds.
I.
hi.
Hey man.
Hey.
I am excited.
You and I have been talking about
image Docker builds and Docker
images and security and slims versus
Alpine and CVE scanning and all that.
We have been talking
about that for a decade,
Correct.
and it still feels like I find new
things and it still feels like I
learned things that I realize are
way more important than I thought
and that I've not been considering
and that I should be solving for my
clients, my students and my courses.
we all know that Docker Hub has official
images, over 200 images at this point,
I believe, that are all very popular.
Open source projects, languages,
frameworks, and these things, even
in the Alpine or the slim images
come with an OS underneath them,
it's either Debian or Alpine.
And then, we've had other attempts
at like base os container standards.
over the years we've tried to create
like container oss, Chainguard created
Wolfie, which is essentially their
container os and we've had them on
the show multiple times to talk about
like what Wolfie is and how that
helps them create zero CVE images.
we don't yet have that for
Debian, for Ubuntu, for Alpine.
I'm pretty sure that if I pulled
down any image, running Debian
underneath that, there's gonna
be some vulnerability probably.
Like, it's gonna be pretty
rare when I'm actually zero.
And that's just kinda like
the nature of Linux right now.
we can't update and guarantee and
fix and then update and then roll out
updates of everything all the time.
So there's always, everyone's always
managing like somewhere between one
and infinity of CVEs in production,
At any given moment
any given moment.
And the goal is to get that
thing down as tight as possible.
on a rolling basis,
and the reality is, is
it's all a moment in time.
Like the minute we scan something,
it's only whatever, you're, you
have to trust a scanner because
they all, they're all different.
Some scanners are better than
others at finding things.
At that moment in time, you have that
security stance of how many CVEs you have.
A minute later, you're now outta date.
So you, you, you do your best.
You take snapshots, your security
team maybe scans once a day or once
a week or once a month, or not at
all, but maybe someday they'll scan.
Or you scan only in CI and
you kind of ignore it once in
the go goes into production.
So there's all these variations
and everybody's trying all
these different things.
But I think when we all talk about
containers, we all tend to agree that
you build a container, you want the
smallest image you can get away with.
If you can go distroless, great.
If you can, do something like Chainguard
or a, a paid hardened image where you
have support and guarantees, great.
But that image is only great on
the day that you last looked at it.
Like when you last certified it or last
scanned it and approved it every day
since then, it's gotten worse, right?
And the only way you're gonna
know how worse is to re-scan it.
So.
We, we now have these production
tools, like you can use Trivy to
an open source tool for scanning
for vulnerabilities on in images
amongst other things, but in images.
And then it can now scan
a Kubernetes cluster.
So you can get a posture of what's my
images throughout the cluster and what are
the CVE counts in production in real time.
'cause that's probably different than
what your CI is telling you or what
your, you know, your local machine.
So, so again, I think
these are all known things.
I think most people that are
running Kubernetes today understand
that, at least to some degree,
Yeah.
and they know that if I pin,
which everybody tells you to pin,
don't pin the latest pin, pin your
images to Python three point 13.3
so there's some determinism, like
you're trying to achieve some
determinism with the versioning.
Like what you can kind of control
what's inside your container in that
Yeah.
if you came to me as a boss, might come to
their dev engineer, their DevOps engineer
and say, I want you to give me strategies
so that we can implement projects
that will reduce the CVE count across
our production infrastructure by 90%.
Like my target, my stretch goal is 90%.
My goal is 50%.
Please implement, give me a strategy.
And so you're gonna end up with
these strategy plans around,
obviously there's the host os
and that's a whole other thing.
I'm really just gonna talk about container
images, but that's not enough because
minification reduces the amount of blast
radius, but you still have to have things
like the application layer dependencies.
Like if your developers aren't
updating their dependencies in
their app, their gems, their jars,
like they're not updating all these
things, then I, as the infrastructure
manager can only do so much.
So Dependabot and renovate largely are
the automation tools we all use to solve
that problem for application developers.
For US infrastructure people, it's
all about minimal base images.
Like that's the best we can add to
that workload and say like, yeah.
So like where a lot of people move
to Alpine or they buy Chainguard
yeah, so, or they go to, at least
they go to Slim and they learn about
slim versus like no normal Python
or normal ruby images or whatever.
that again, only gets you so far because
the minute you move to that slim or that
minimal, it's really only about how fast
you update that when a new one comes out.
that becomes the next big problem.
If you update today and you're on
this minimal image, and you're super
lean on your slim images, great.
But tomorrow your CVE count probably
went up in production or the day after,
it'll go up one, one of the days here.
Pretty soon you will have more.
And so your job then is,
okay, how do I update faster?
Like if that lower image, if that alpine
base image three point 12 or whatever
is in all my images as what we call
the base image, then maybe my job is
now to make sure that's always fresh.
That's always the latest version.
Yep.
Dependabot and Renovate
can also help with that.
They now can manage helm
chart updates automatically.
Kubernetes compose Docker files,
customize, you know, like basically
any way you wanna deliver a
container and infrastructure code.
Those two tools will tell
you about a new version.
Right
So when you go from Python,
three point 13.0 to Python,
three point 13.1, presumably that
0.1 was a CVE fix or a bug fix.
Maybe not a security bug, but a bug fix.
something.
Something is fixed.
Hopefully nothing will break,
but something is fixed.
And so generally, a lot of teams
either have the stance of, we're
gonna pin to Python three 13 and
we're gonna have Dependabot and
Renovate just kind of run every day.
they tend to run only once a day.
So the best I can get is the day that
Alpine releases a BA new base image,
or Ubuntu, or Debian or the new, the
latest Postgres, like whatever that base
image gets updated to a new version.
Those two tools, one of those two tools
will tell me they kinda do the same thing.
that will gimme a PR and then it's
all about how fast can I click the
button and then how fast can I deploy.
So you're shortening the window between
CVE detected and CVE fixed in production?
Right.
that is like half of my talk is
to say, this is where we were.
Now the line has moved.
It was always moved.
I just didn't always know it.
It wasn't until about
do you, wait a minute.
Okay.
Because what you just described, even
for organizations today would be,
a
would be great, would be sophisticated.
it, but it's not the pinnacle.
It's not enough.
There are things happening and
I'm calling them silent rebuilds.
This has always happened.
at some point in every container
engineer's career, they realize this
is happening because it's not shouted
from the rooftops, I don't think Claude
and ChatGPT, neither one of them could
find significant discussions about
this problem anywhere on the internet,
except for one tool from one company.
Chainguard made Digestabot.
Digestabot like Dependabot and
Renovate, but it only does one thing
okay.
Python, three point 13.1
Okay.
rebuilt
By the person that owns that.
Yeah.
by Docker Hub, that image
doesn't stay the same.
If you were to go and download
Python three point 13.1 today.
Mm-hmm.
And you then downloaded that image in a
month, would it be the identical image
With just that tag, not like the sha
the most specific tag you can get?
okay.
I,
I'm not trying to put you on pressure.
I'm not trying.
No, no, no.
I know, I understand.
Let me, I want to make sure that
we're explaining this properly.
'cause there's a gigantic amount of nuance
about what you're about to talk about.
So this is what happens when
we don't pre preuss our show.
Okay.
So.
I think, I know, I think I'm
not gonna spoil the ending here.
I think I understand where this
headed, but just to reiterate to
our audience you're talking about,
like, if you go to Docker Hub, right.
and Python's a perfect example
because there's so much
machine learning stuff going on
I am picking on it.
tho those workloads are insanely
sensitive to the versions of like
minor versions of a bunch of libraries.
Right.
So I've definitely been in this
position where I need a very specific
version of Python in a container.
So I go and I say from, you
know, Python, what did you say?
13 point something.
Yeah.
Three 13.3.
And so, you know, that's
what I put as my base image.
And then I, do QEMU, CUDA and all this
other stuff that I need to put into like
the notebook or whatever, container image.
So what you're saying is I go
ahead and build my image with
that from, that base image.
I am assuming if I change something
in my application code, but I'm not
changing that from base image, but I
don't have it in my local registry.
I don't have it cached.
I do a no cached.
A week later I rebuild.
In theory, it should be the
exact same thing, except for
the application code changes.
Let's let, actually, let's just
assume I don't even touch anything
in the application code, right?
Like all that's the same.
You are saying that if I, the
next day or the two days later
or a week later and I redo that.
It's non-deterministic.
That build is not the same, even
though I've done all the best
practices to ensure that I'm thinking,
making this as deterministic as
possible, that's what you're saying.
right.
I will find out that there's
differences for some reason.
Yeah, if you were to manually hash the
image, it would be a different hash.
There's a reason,
Okay.
and that is because a little,
a very, I think a very
underrated, little known fact.
we know that tags are, mutable.
all official images.
As far as I know, all official
Docker Hub images are mutable.
Docker Hub, as well as, harbor, both have
the option for you as an individual in
your images to make the tags immutable.
Which means the minute I make that
tag for any image, I can never reuse
that tag for any other image built.
that's a new option in Docker Hub.
You can go in and say for my
organization and make them immutable,
but official images are mutable.
I think that's also an ECR.
It's also in ECR.
So anyways, keep going
Yeah.
It's probably gotta be, I mean,
because Python runs on something in
that image, whether it's Debian or
Wolfie or Alpine, Python changes at a
different speed than the underlying.
right?
don't tag, and in most cases, there are
some that we do, some of the official
images in Docker Hub are tagged where you
can see the version of the, the app that's
running on it, like Postgres, and then the
version of Alpine underneath it, at least
two minor version, not patch version.
Okay.
So you could then pin to, and then
when we have Debian, a lot of times
what you'll see is like we're, we
got bookmark and then we're new.
That's the version of Debian underneath.
And then there's a new version of Debian.
And for some reason we don't use
the SemVer tac versions of Zambian.
We always use it.
It seems like at Docker Hub, we
always use the, what do we call those?
the friendly names, the
fun names, the code names.
I don't know what you wanna call 'em.
The handle
product names, I dunno,
So we've had bookworm, yeah,
we've had a lot of other ones.
and those two move at different speeds.
So you could then try to pin to a tag
that is both the version of Postgres that
you might need to use and the version
of Alpine that you wanna stick to.
Because you want to be as deterministic
as possible, but even that's not
enough, because underneath each one
of those tags, even though they're
pinned to both, it's still mutable,
meaning that Docker rebuilds the image
when the underlying os part changes
and they don't tell you about it.
This isn't a conspiracy.
I'm not, I'm not saying they
just don't have a good way
There's no UX for it.
Right there, the UX is
staring at digest lists.
That's the UX is looking at sha1 hashes
and timestamps, and knowing when you last
pulled it versus what the one is today,
Mm.
again.
Okay.
And just before the show, I'm gonna
shout out to Eric Smalling at Chainguard.
he let me know that their particular
tool chain control keeps, they
have this archive of tracking
every digest for every tag.
There's a limit to the re, the, the, to
the what they do, but they track this.
Okay.
So I will just prove this theory.
their chain control tool that's going to
pull their history of them essentially
pulling, I don't know exactly how they
get this data, but my guess is if I
actually spent the week building a tool,
I was using Claude Code to build a tool.
I'm calling Tag Tracker,
which will do this for me.
So it's going to run every day and
it's gonna download all my favorite
images and all different variations
of their tag, whether it's Python
three, Python three 18, Python 3
18, like it's gonna get 'em all.
And then it's going to track the digest.
And if the digest changes,
it makes a log entry.
Okay?
So they have this tool
and they already do this.
we've been doing this all along.
And so the Python three 13.8 image,
the most specific pen image you
can have, has had four builts
in the time that it's existed.
So it was built, ten seven.
Ten eight, ten nine and 10 13.
So which one are you running
No
you decided to pull Python?
3 13 8. In production?
have to look at the digest and compare.
Yeah, now good news is
Kubernetes and Swarm.
The two orchestrators that I care about.
they are smart enough at least to
resolve that tag to the digest.
And the digest is the you.
For those not aware, the digest is
a content guarantee that you will
always, if you use the digest, it's
in theory and nearly impossible
to have a collision of that name.
And so it is a unique content addressable.
String.
That was the whole premise and
basis for how we build Docker
images and how we store them in
registries and how we run them.
So the cool thing is Kubernetes
and Swarm both will, if you say,
give me this version of Python,
they will at least ensure that the
exact same digest is on every node.
Right.
So that like when you
run it, you're getting
I didn't even think about that because
you could have had, like if you
were running, if you were spinning.
by the way, didn't
If you were spinning up a new node,
like you had a node with this Python
app on 10, 10, 0 8, and then in
your EKS cluster or whatever, you
spun up a new node the next day,
yeah.
it could have, it could
And this used to be able to happen.
There was a day where the orchestration
was not always resolving the
digest before it issued commands.
To each node to download an image, right?
Like it might have been resolving
individually but at some point you have
to resolve this inside of container
D and Docker d and like the low level
tooling always does this because
it has to pull down these tar balls
and it has to get 'em from a source.
but somewhere there's a human that
types in a number and then the
computer converts it into the digest
at some point and everything we use.
So then what happens is when someone
learns this, they go, oh, well,
I was told to pin the digests.
So they can use tools to go in their
Docker files and then pin to the digest.
And the way you can do
this, I actually made a blog
the challenge there is we want
human readable version tags, right?
Like we don't communicate, like,
I don't say, oh, you need to, you
know, the prerequisite for this is
Python Digest, blah, blah, blah,
blah, blah, F1 three or something.
Like, that's not how we communicate
about versions or what we need, right?
yes.
And that is the problem that the way
you get around that problem, at least,
is how I do it, is some people don't
realize that in the from line you can
specify the tag and the digest together.
Now, when you do it this way, the
tag becomes useless to the machine.
The machine does not care
about the tag anymore.
you could put whatever tag you want in
there, but the tag is like documentation.
It's like a comment for the humans to know
that this is the tag that I intended and
the digest was resolved by the computer.
Now there's tooling that can give you this
sha hash, and then we'll update when the
Python version updates, it will make sure
that the sha hash is the correct one, but.
none of these tools we've mentioned
so far will, on ten seven, you have
this di like if I'm running arm on
my servers, let's say I'm really cool
and I'm running the latest arm image.
So then, so when I created my cluster
and I deployed, I was on this one.
But then I, you know, a, a week later I
realize, oh, there's this new version.
You need to be sure that you're on
latest build of every image, even
though the tag hasn't changed.
you need to be aware of this.
You need to have PRs that are
automated for this, and you
need to be redeploying these.
Okay, so if you don't know about
High Fivers, this is a little ad.
high Fivers is a group of DevOps
professionals and we all meet once a
month in, discord, and we have a video
call and we complain about our jobs.
talk about DevOps and the problems
we're having and the solutions.
And we talk about
Kubernetes and containers.
So if you're interested in high fives,
you can be a member in this channel.
And pick high fives.
You can join us once a month.
It costs it for a cup of coffee.
you get to join and learn.
And that's the whole point I have that
group I'm thinking of renaming it to the
Agentic DevOps Guild to make it sound
like we're some sort of superheroes.
But that's a thing we do.
So yesterday, we did that for this
month and I was listening to one of our
regulars, and Brandon was talking about
having this exact same problem at work,
but not realizing the core problem of
it and the actual best solution of it.
And so what they had was they
were going microservices.
When they go microservices, the effect
of that sometimes is if you get then
they were going so specific with their
microservices that you could, they
were making a microservice per verb,
not just per endpoint, but per verb.
Wow.
Yeah, I mean, Carpe diem, so.
that is not a wrong decision in the
right positions, in the right state.
But what happens is, is that means
that you're writing code that
doesn't need to change very often.
So you're pinning to whatever, let's
say it's a Golang thing, right?
You're pinning Golang,
or you're pinning, Java.
And so you have that image, but
the code's kind of done, like
we've got a thousand lines of code
or whatever we're kind of done.
And so it sits there, it
becomes stale on the servers.
For lots of us that are deploying cloud
native apps, the developers are constantly
developing, so the images get refreshed
pretty quickly all the time in production.
So they don't age.
But what about your stat, your
demon sets, your Postgres,
Yeah.
Like everything, there's
like all this other stuff
There's this other stuff
that just sits there.
every day it sits there running.
It's getting more CVEs over time.
there's no way to avoid it.
I don't care what you're running,
it's going to have CVEs eventually.
So, so we get back to this core
problem of this thing is being rebuilt
for the good of the internet, and no
one knows that it's being rebuilt.
That's why I call it Silent Rebuilds.
Interesting.
I'm clearly on a soapbox moment, right?
Because I have known this for five
years and I haven't cared enough.
And it wasn't until Eric and I started
hanging out and talking about what are
we gonna do for this piece of content
that we're gonna create together?
And I started to care because I realized
that I could buy all the Chainguard images
and spend however much money I wanna
spend, unlimited amounts of money just to
buy every image I could possibly imagine.
And then I could deploy those into
production on day one with zero CVEs.
And on day two, they now have CVEs.
What do I do then?
What's my plan?
Yeah.
And Chainguard, their answer to that
is they literally rebuild every day.
Like according to Eric,
I'll put 'em on the spot.
They rebuild every day.
They are just constantly rebuilding.
so, if this is, doesn't seem interesting
to you and your operating container
environments, this is super important.
' cause this should be part of
that strategy that you started
this conversation with, right?
Like, how do you answer that question
if you're being gold on be being in
a better security posture, which part
of that would be removing as many
CVEs as possible in whatever window
of time, you're tracking it in.
yeah.
Yeah.
And if you wanna learn more info,
because I think you just barely
scratched the surface on this topic
with me today, so I appreciate that.
to me it feels like it's a
very useful tool, solves a
very specific niche problem.
It is very much a Unix type style,
you know, solve one problem,
do it great kind of thing.
And, but I feel like in the quest
for us moving to zero CVE, like the
eternal quest that will never, shall
probably never be fully realized.
The north, the North star.
Yeah.
The North Star,
yeah.
yeah.
CVE and CVE publication is random.
CVE fixes are random upstream
happens at random times.
So at any minute of any day, there
are many things in the works.
And eventually it arrives at
the base image that you are not
building, but you are using some
from some base image provider.
And that moment.
Starts a race to, in my, when I
imagine this, it's like, I, I, I
watched the F1 movie this summer.
It was abr fa great Brad Pitt movie.
Loved it.
Of course, if you gotta be a Brad
Pitt fan, but it was, I think
it was the movie of the summer.
It was fantastic.
and so I'm thinking of a formula one
scenario of the minute my upstream
image is rebuilt to remove that
CVE, it is a clock that is now my
clock of how fast can I detect it,
PR it, and push it to production.
Right?
And that for some teams is
never, it never happens.
but as the maturity of your team improves
and you get more automated and more
sophisticated, and you become aware even
that some of these mechanisms exist,
you start shortening that window to
the point where ideally we're gonna get
to someday where things aren't polling
anymore, they're, everything's web hooked.
Everything is, is chained.
And.
The way I understand, there's a great
video on Chainguard about how they build
out their entire internal infrastructure.
And it's largely based on GitHub
Actions, a little bit of Argo,
CI, but it's it's largely GitHub
Actions and they just, they are, they
build from everything from source.
They build up everything
deterministically.
They can truly do reproducible builds
unlike a lot of what I see out there.
and it's all very sophisticated and ex
and like top tier expert level stuff.
The rest of us aren't that.
Right.
The rest of us aren't like that.
And there, and they have other competitors
like obviously Docker has hardened images
So there's a lot of smart
people, but you and I are not
the smart people necessarily.
We don't have to be that smart.
My point is, is that they, they're doing
that hard work, but because they're all
providing us these updated digests because
they're rebuilding it like it's now on us.
now that I've told you, it's on us
because we now have the knowledge that
it is in our job now to detect that
difference and then do something about it.
Do something about it.
I this is like the day the tag is
invented and it's three point 18.9
the day that it's invented the epoch.
And then there are events that happen,
downstream and there's a day where, that
new digest happens, and then there's
a day where you actually deploy it.
And I don't think this happens in a lot
of organizations at all, because there's
never an opportunity for them to update
the digest of the same tag because
we've been yelling from the rooftops
pin to the sha hash pin to the digest.
But when you do that, you've now actually
created a new problem because you're
forcing yourself to a moment in time
with a set of binaries that every day
after get worse in terms of security.
And so they're not aging like fine wine.
They're aging like spoiled, spoiled
think there was good intent behind that,
Oh, yeah.
Yeah.
you have to still do it.
removing non-determinism is a good intent
Right.
I get what you mean, but pinning
is like half the problem.
That's what you're kind of
saying, or half of the solution,
It's like you unintentionally create a
side effect when you pin to a digest, and
that is there's no opportunity for your
existing tools to deploy something newer.
So like if you went to like three 18.
' cause a lot of teams, especially
I see that node teams, especially
teams that are just using Node to
build a front end system, they're not
actually running node on a server.
They just use it in CI, they
pin to the minor version.
They might just pin to the major
version on a node version because
they only need it to do JavaScript
compilation and CSS ification
and like, stuff like that, right?
So they don't really care as
much as long as the original open
source team obeys, SemVer verb
rules and all that stuff, right?
they don't break anything.
the problem is when you pin to that
thing where, you do this and then.
It looks like that, is that Docker
and Chainguard and GitHub and everyone
who, you know, and your, AWS's public
images, like I bet you they're all
doing the same thing that Docker Hub
originally did, which is, well, we've
got a new version of the underlying os.
Let's go ahead and reduce the CVE
count, let's do the right thing and
update the tags with a better CVE count.
that's the right thing.
But there is very little indication
in the UI that that ever happened,
right?
It's not communicated with
the right language about the
deterministic nature of that build
right?
or the reproducible nature of that build
Because you can see, when you
look at all the tags, you see
dates, you're like rebuilt.
You can see all these built
yesterday, built yesterday.
but there's nothing that indicates
this is the third variant or the
third rebuild of this image imagine
if it said like every time it rebuilt,
it added something to the tag.
Like build one, build two,
Build three, build five.
Yeah.
Yeah,
which they just don't do.
And maybe they don't need to.
Maybe that's not something we want
to get into because again, that
doesn't really solve the problem.
It just makes you more aware.
Like when os package managers rebuild
an app, as far as I know, like an
apt when curl is changed in some
fundamental way, it gets a new version.
'cause you can, if you look at like
app tags, they're, it's like the curl
version and then something after,
which is kind of like the app to.
Variant.
I, I think, and at least
is how I understand it.
So I actually think that this is a
pretty unique problem to container
images because they include app
code independencies and os code
dependencies or os package managers.
I think it's because of the two
you could be the most secure person
in the world and if you don't do this
particular step, you're still gonna have
more CVEs in production than you should.
it's actually very
little work to fix this.
It's just knowing about, it's the
actual, I think, problem for everyone.
Well thank you so much, Brett.
Now I know what silent rebuilds are
Silent rebuilds.
and to everyone that's
I'm gonna, we're gonna see if that sticks.
okay.
That was the stream we did.
But since then I've had some updates that
I wanted to tell you about real quick.
First, I've released a video walkthrough,
a repo, and a long blog post breaking
down the problem, and a detailed series
of solutions using either renovate
pena bot or a bot to ensure you're
getting PRs for all the silent builds.
Second, I did more research and confirmed
again that the official Docker Hub images
are rebuilt to a different digest for
the same image tag on a random schedule.
Remember, official images of open source
software are usually volunteers in open
source repos making these changes, and
each of these official images had their
own repo dedicated to the Dockerfile that
builds and pushes the image to Docker Hub,
and those images are silently rebuilt.
Typically, like any GitHub Actions
workflow, whenever there's a
commit to the release branches.
Now those commits might be something that
could reduce CVEs or it could just be
an update to a read me file or any other
reason you might make a commit to a repo.
The real point here is that we don't
know why these images are silently
rebuilt and there are no tools currently
to help with that, which means that we
have to assume that every rebuild of
an image tag with a new digest is for
a good reason and that we should deploy
it with the same mindset that we deploy
updates that do change the SemVer tag.
We just don't know why.
Unlike the SemVer changes,
again, the solution isn't hard.
Just implement, renovate, or depend
a bot with the right settings, which
I show off in the repo, and they will
both ensure that your base images are
using digest, pinning and that they're
checking every time you run 'em.
Recommended daily for silent rebuilds
of that tagged to a different digest.
I like these options because they bring
your container dependency updates into
the same tool and process that you check
for application level dependency updates.
In my example repo for this solution,
I have the settings you need to add to
each repo, and I have a bunch of sample
PRs that show off the various ways
these tools check for image updates,
including the silent rebuilds, All the
resources are in the show notes and
if you have questions, Feel free to
start a discussion in the repo or in my
Discord server, and thanks for watching.
I'll see you in the next episode.
Creators and Guests
