Backup S3, Google Drive, iCloud, Notion with Plakar

Download MP3

What does it actually take to
build a backup system from scratch?

Not just slap together some Arsing
scripts and call it a day, but engineering

something that handles deduplication,
encryption, compression, indexing,

and restoring in one cohesive tool.

Well, today I'm joined by Julian and
Gilles, the CEO and CTO of Plakar,

a new company, but built on nearly
a decade of R&D in building open

source backup systems from scratch.

I am excited about this one because
Plakar goes beyond just backing

up your typical workloads like
databases, file systems, and S3.

It also can back up connectors like Google
Drive, iCloud Drive, OneDrive, and even

things like Notion Dropbox and imap.

Yeah.

You remember imap?

Well, they've recently joined
the CNCF, so we talk about their

upcoming Kubernetes integration,
obviously a part of this channel.

And on the show I suggested that they
build the ability to back up to Docker

images, basically to OCI registry storage
really is what I was asking about.

And then like within a couple
of weeks they went off and built

that, uh, as a new integration.

That's pretty dope.

This is the one backup tool that
I've seen in really, maybe ever, but

definitely recently that not only looks
useful for work and server workloads and

clusters and clouds, but is something
that I think I want to try to back

up my own personal iCloud and Google
Drive and notion and all those things.

Which I, I've found historically are, is
actually very problematic, challenging to

find a free tool that isn't just cobbled
together with a bunch of different things

and something that's reliably able to be
restored in a reasonable amount of time.

So I'm, I'm excited about this tool 'cause
I feel like it, it affects multiple parts

of my job and life and it's open source.

So let's get into it.

We're going to talk about backups.

We've talked about that before, but
there's so much more to backups.

We're going to get into it.

I'm excited.

you all started a company.

how long ago?

We can give two answers on this one,
the first one is that the company

has been incorporated in 2024.

Yeah.

it's a quite new company to support this
project, but this project is, quite, old

in the sense that Gilles did a lot of
R& D, in the past 10 years on it, so.

Yeah.

What I was noticing as I was digging
through the project was there are a lot of

foundational things when you're creating
a backup product that you have to define

that a lot of us don't think about.

I, the only thing I can think I can
equate it to not being a developer

of a backup product is similar to
creating a new database product because

you had to create a file format.

You had to create a
streaming, backup format.

You had to go, I would imagine, much more
low level than the typical application

developer has to go because you had
all these underlying fundamental

concepts of, you know, things like
the backup file, the caching of the

backups, all that stuff that, yeah,

yesterday I was at a meetup and
I was presenting the product.

Someone, asked me like,
why did you go into backup?

That seems like a very boring, area.

yeah, and it's, if you take it from
a developer perspective, it goes from

very low level to very high level.

It trusts all, like many fields
of, computer science that

you might be interested in.

If you have, like a
high appetite for tech.

So you have to know about
how a fast system works.

You have to know how, how to manage your
memory, how to manage high concurrency,

how to manage, like file formats.

In our case, we kind of
developed a database in a sense

because you have a bit tree.

You have a sense of how to manage,
match your bit tree to something.

So yeah, it's very, complete as a project
to, to dive into technical topics.

I was like, Oh, this is going to be
a small project, small side project.

And then you realize that, Oh, you
end up doing cryptography, you end up

doing compression and stuff like that.

And you, like any area you look
at, you're going to find ways to

improve it and go further into tech.

Yeah, I can only imagine how much
time is spent, on like the engineering

fundamentals of a giant file that
you need to do various things with,

because most of us don't deal with
terabyte sized files on a daily basis.

to me, the biggest files I have
to deal with are model files,

like open source model downloading
and uploading, like that's the

biggest thing I have to deal with.

maybe if I was in an enterprise, I'd
have big backups and stuff like that.

I used to manage backups at a Government
Enterprise, about 7, 000 users.

That was 15 years ago.

And I had two dedicated
staff that worked for me.

All they did was manage
the storage and backups.

Their entire job was ArcServe, I
think we were either using ArcServe

or NetBackup, but we had, you know,
Windows machines, Macs, we had, Linux

machines, we had mainframes, and
it had to handle all of that stuff.

And this was pre cloud, so we didn't
even have to worry about How do I

back up cloud storage or what we
didn't even have S3 at the time.

That wasn't a thing that much in
the early 2000s, but it was so time

consuming and such a nerve wracking
effort to deal with recovery,

which most people don't talk about.

Like we don't spend a lot of time.

when you're talking about
backups, everyone's concerned

about the backup part.

And I always focus more
on the recovery part.

And I get more excited about the recovery.

Like how easy is it?

How fast is it?

how fast can I discover the
thing that I need to recover?

Because often that's the, trick.

If you're backing up hourly and daily
and monthly and weekly, and you've

got all these incrementals all the
traditional backup terminology, like

sometimes you're like, well, we, that
person needs to recover that file.

But it needs to be the one
that's not today, because that

one was corrupted or whatever.

So then you end up Like sleuthing through
a giant caching system trying to find

the one file or the one directory on
one server somewhere amongst a thousand

servers that you had to back up that day.

And how do you do all of that
reliably and in a way that you get,

that two people can handle the data.

And now I don't know any, I don't
know any customers of mine or anyone

who has two people managing backups.

It's like a part time job for one person.

So, Something has changed.

When you tackle that issue of how do you
find the proper thing to restore in a

fast way, you end up realizing that you
have to develop some kind of database.

It's not just a backup, it's not just
like gluing files together into some kind

of archive that's going to be efficient.

You have to actually have indexes, find
things in an efficient way, be able

to generate diffs between versions of
files and do it in a way that can scale.

Because it's not really
just a volume sync.

It's more, how many files am
I going to have to look into?

And, like a large volume of small
files is as problematic as, I

think to back up huge files,

Also the performance, lots of small files
isn't exactly performant on a lot of

systems, but I mean, since I was stopped
managing backups, we now have SSDs.

So like I lived in a world where we had
spinning disks and things were super

slow and, you know, if you had gigabit
networking, you were actually doing

great, and, but times have changed.

So when I look at this, like where's the
elevator pitch that when I looked at the

website, the thing I took away from it.

It was one open source, run on my own
hardware, on prem, wherever I want to run

it, and it has this idea of integrations,
which is not new, like most backups.

You have to have like compatible, like it
has to be compatible with this database

file or this type of storage or this
NAS or this, iSCSI thing or whatever,

but in your case, it looks like the
integrations are more cloud focused.

So they're dealing with HTTP,
but specifically different APIs.

Like I saw Notion in the list,
which I'm a huge Notion fan.

I never thought about backing
up my notion like that.

Then now that I know that it exists,
I'm now like obsessed about, maybe

I should be backing up my notion.

Like, why are the, how did this happen?

how did the integration
list happen the way it does?

maybe to start and let you complete,
Gilles, but, um, you realize that most

of the SaaS provider right now, they
are on a shared model responsibility.

So it means that you are
in charge of the backup.

In the case of Notion, for example, they
are not providing any kind of backup,

and you have to do it by yourself.

And when you look on all the SaaS that
you are using for personal use are.

Even for, you know, enterprise usage, you
see that a lot of, you have a lot of all

in your, resilience, or in the protection
of your data and I think it was important

to have, software that is able to, manage,
I would say, the legacy tasks like, you

know, backuping files, et cetera, but
also be able to back up any kind of data.

including, of course,
everything coming from SAS.

So I think at some point in the product,
we decided, okay, it's not, a backup

solution that is supposed to backup only
files, but a backup solution that is, so

it should be able to backup any kind of
data and maybe you can tell a bit more

about, you know, how did you do that?

the, just to mention about the open
source part, the main driver initially

was to avoid having, vendor locking.

Because, you don't have that many
solutions that can back up many

sources and that are not closed today.

you have hacks, you have scripts that
bundle a bunch of solutions, but you don't

have one solution that you can trust.

And, and it so happened that a
friend of mine who has like a

degree in computer science, so he's
like fairly educated on the topic.

He managed to lose all his data
because he used a set of scripts

that did not behave correctly.

And, he did not realize because,
everything seemed to be okay

until the day his server crashed.

And he had to rely on the restore
part that everyone overlooks now.

and the thing is, if he had a solution
that was not a glue of multiple script

and rsync and blah, blah, blah, they
would, this would not have happened.

and now you end up having to
look at what solution allow you

to, a backup multiple source.

And you end up Having to go generally
towards commercial solutions,

that will provide support for
multiple sources without hacks.

And they will usually have,
some kind of closed format.

So you have to trust that they
will not go away or they will not

bump their prices and that you
can trust them on the long run.

And what I wanted was to initially
have a well documented format that

We are going to be fully open with a
license that prevents closing the code.

if we decided to go wrong,
someone would just fork the

code and it will go that way.

so that, that's a safeguard
against ourselves going wrong.

Yeah.

And then you have, what
do you do with that?

how do you manage multiple sources?

And you realize that most of the
open source solution, either there

are, there are three, Fairly targeted
at, at doing synchronization like

rsync and they are twisted into doing
backups through hard links, like

tricks,

or they have a high, file system,
like they're highly built around

the concept of file system.

So you can actually do a backup
of an S3 bucket, for example, but

that's using a trick to map the
S3 bucket on your file system.

So they have, limitations And, they
do not work well when you break this

limitation, if you create a bucket and
put 2 million objects in it and try

to mount it to the file system, that's
not going to work very well for you.

There was a, like a disruption
in how do you model this?

How do you model this issue that you
want to import various sources, you don't

know these sources yet, and you want to
be extensible and have a plugin system,

so you don't even know what plugins will
be written in a year from now, and make

it fit in a model that will scale if you
have flat, flattened data at the root.

Designing this model, we came up with
something abstract enough that you can

kind of prove that anything can go in.

And ourself, we work with that abstraction
so that, like they all benefit from

the same, same deduplication, same
encryption, same, like same features

without, Like when you write a
plugin yourself, you would not have

to think about all the details.

You would just have to think
about how do I get the data

from this point to this point.

It will do the work once it's
there and in a very simple API.

So our most of the work was done on that.

Finding that abstraction that allows
us to work efficiently, but assuming

a wide variety of, of sources.

And most of the integration that
we have are, some are tagged stable

and some are tagged beta because we
are a bit hard on ourselves because

beta does not mean it does not work.

It means that, we want to show it works.

And depending on the, how people
are interested in that backend,

we might drive that one, further
in terms of, projection readiness.

but they all work, to some extent.

I can imagine like little edge cases
of a lot of this stuff, especially

when you're pulling and pushing
from an API that isn't exactly.

I have interesting questions.

It's like, okay, with the notion,
how exactly does that recovery work?

And what if there's duplicate data,
do you, where do you know, we all can

conceptually, we've all, most of us have
all dealt with file based backups, right?

Same single system, same host.

Easy, easy day, right?

you're not even dealing
with remote storage.

And then, like, people tend to evolve into
a, okay, now I'm doing, like, SMB mounts

or something to put some files elsewhere,
I'm doing low tech rsync or something.

and then there's, this giant chasm,
I feel like, which is, there's all

those little utilities that are very
niche and very composable, but you're,

like, you're saying, you're building
your own scripts, you're building your

own, orchestration, essentially you're
designing the orchestration yourself.

And then from there to a complete
cohesive strategy that uses one

or two products maximum, you
suddenly jump into like enterprise.

There's a lot of enterprise backup
garbage out there, I feel like,

like there's a ton of stuff that.

Especially when it comes to
cloud APIs, I do this every year.

Every year I have, I'm a small business
of, you know, three to five people,

depending on what year we're talking.

And I, so I have some business needs.

But mostly they're the sim the backup
needs I have are like what a person

would ha an individual would have.

I have iCloud, I have, you know,
Google Drive, I have Dropbox, I

probably have STP FTP somewhere.

I have Notion, I might have some
S3 buckets, and I have Macs, and I

need to, manage all these things.

I have a an Ubuntu server in the closet,
There are things and places, I would

honestly love my GitHub Git repos to
be backed up automatically just in

case GitHub goes down and I need to
move to, you know, GitLab or something.

And when I look at, just for Google Drive
or iCloud or OneDrive, any of the sort of

top three cloud file drives or whatever
you want to call them, They're really,

I couldn't find a single product on the
internet that I could buy for one person.

It seemed like all the products out there
that were like, yeah, we'll back up your

company's Google Drive, because that's,
you know, I have the company version of

Google Drive and the company version of
OneDrive and Those don't always work with

all the consumer stuff, or if you're using
CyberDuck or some other little utility.

I was looking into backing up three
people's Google Drive, and I was

looking at possibly having to spend
500 a month on an enterprise piece

of software because their minimum
License purchase was like five users

or 10 users or something like that.

And I gave up.

I eventually just gave up.

I couldn't figure out a
scenario that didn't require.

A bunch of weird scripts with, you
know, running cron jobs that would

probably never notify me in a failure
that would run certain things.

And it just was a mess.

So you guys show up and suddenly
I'm like this, I could do this in an

afternoon and it would cost me nothing.

Like it would cost me
pennies with, Plakar.

Yeah.

and

the nice thing also about the, like
our open source dimension, 'cause we

are an open source first, company.

Clearly whatever we do is open source.

Unless it's strategically
not good to do it on purpose,

but that's the default thing.

it's to provide enough libraries
and examples to empower users to

actually extend the integration.

Our goal right now would not
be to be like, we handle all

the integrations ourselves.

That would be more like, an integration
that's fairly critical to companies.

We would do it.

to provide some kind of level of
quality, then if users want to implement

specific integration, we would like
help them, get them forward because,

like if you want to use one tool to
back up everything, you have to have the

manpower to do everything, which is not

going to happen.

and by making like some of the tasks
we did today with my team was how

do we simplify the API even further?

So, so.

People are less likely to even shoot
themselves in the foot while trying

to do something simple because that
lowers the bar to being able to

actually, instead of spending your
time writing a script, that's not going

to be very good, write an integration
because it's as simple as that script.

And it's going to be like a reviewed and
you're going to get help from others.

It's going to fit into one thing that
actually tackles the difficult part.

And that's where I would like to
reach in terms of open source.

How does this work?

in terms of the development are I
mean, all the integrations are open

source, but what, how many of those
integrations are created by the

community versus the core team is it,
I'm assuming this is led by feedback.

Like people are asking for things.

So then you're motivated to
make a, integration for them.

in terms of, how many were
done by the community,

I'm just curious, like the ratio.

And now currently it's currently all of
the integration were done by ourselves.

And

we have pushed, a few months ago,
we have pushed the SDK and we are

trying to provide example and,
you know, Simplify even further.

but it's the community that's
driving the decision about which

one we do currently, for example.

Like people have been asking for
IMAP and gcloud and stuff like that.

We're going to go spend
more time doing that.

but yeah, the idea is to start growing
the developer community, not the user

community, but the developer community
into extending their own integration.

Yeah.

we reached a right level right now, a
right level of, easiness, difficulty,

depending on how you see it, of writing
on integration because it's, it boils down

now to writing one function that scans
and allows you to enumerate your data.

And provides you an accessor to
the data to actually read it.

Once you have that, it can plug into
what we have and you get all the,

benefits behind.

Which means that some of the integration,
like the Google Cloud integration

was done in half an hour, unplanned.

So one of the developers was like,
Oh, well, I have a half an hour.

I'll do that.

And that's Okay.

He has the knowledge, but you can
assume that someone who does not have

the knowledge will take more time,
but he's not going to go from 30

minutes to a month doing that task.

You're not reinventing the wheel
every time you want to back up

a different product and, cause
I'm here for the Docker backups.

I'm here for, I'm here for, image
registries to be a, from and to.

And for me, so I actually.

years ago, I created a small
script called Docker Backup,

Volume Backup is like the name.

And it kind of took off a little bit
and then Docker ended up adding it

as an extension into Docker Desktop.

And then eventually they just made it
a default feature in Docker Desktop.

And so now like in the Docker
community, volumes were never really

meant to be moved around as images,
but they're just files, right?

So, I get more requests for fixing
that shell script, if that's all it

is, and working on that than just about
every one of my other examples, and

there's clearly a need for developers
to have, to move or backup volumes on

their local Docker system, or whether
it's Docker or Containerd or Cryo or,

whatever, it doesn't really matter.

Podman, the developer sometimes wants.

To move, you know, the database files that
are on that Docker volume somewhere else.

And there's not really a move option.

Right.

And there's no easy way you kind of
have to learn all these different

commands for extracting it out.

Do you put it in a tarball?

Do you put it in a container image?

Like all that stuff.

So I'm here for that integration.

So sign me up.

I'm going to make you laugh.

two days ago, I was
having my sleepless night

of wondering what I was going to do.

I was looking into Docker because, we had
a discussion a long time ago about, how

could we benefit from our deduplication
to, lower the size of storage for

images instead of, layering layers, each
of the layers could be deduplicated.

And so I looked into it because
I had never looked into how the

backup of this stuff worked.

And they use tar as a format,

yep.

And we have a tar importer.

Which actually can extract a tar and back
up what's inside the tar, which means

that you could back up all your images
and have the duplications through them.

And I look into how it's happening
with containers and we can back up

containers the same way, actually.

and metadata, That's all it is.

we have an integration that's not,
not, you're ready yet because, it's

a small experiment, but something we
could push forward, which is, okay,

we have already a tar integration, So

we can create a Docker integration, which
is actually, an integration that talks to

the Docker API to get a stream of the tar
that gets passed into the tar integration.

And then boom, you have a new thing
that's packaged and not using a

script on the side, which is the goal.

So really that's the point of
the, of, of, Plakar initially is

to allow doing this in ways that,
oh, that stuff is not backed up.

How can I actually back it up, you know,
in a clear way without too much effort?

Obviously, there's some dev
here, but once it's done, it's

no longer dev for other people.

So, so there's that, but the idea is
that then you, if you trust the tool,

you trust that your Docker backup
is working the same way as your SQL

backup or your file system backup.

yeah, there's even a scenario
where, You could use, cause you

know, a container registry is
nothing but an object store really.

And there's a, there are in the cloud
native community, there's sort of a

consensus around the container registry
being the artifact of all things,

like the storage of all artifacts.

And so now we have all these different
types, container images, it's just

one type for an OCI registry, but
there's all these other now file types.

We can store Helm chart data in there.

We can still compose files in there.

Each one is its own.

It's not a container image, it's just
a registry artifact, and there's even

utilities now that we use to, if we have
a new type of object that we want to store

in the registry, we can use tool utilities
to create the metadata for all that.

I'm not sure that necessarily a registry
is a great backup storage location.

Like maybe an S3 storage
system would be better.

cause all the clouds already
have that, but they all also

already have image registries.

And so a lot of times when I'm working
with teams, like if we're going to

implement some sort of new backup or
some sort of new replication system,

or like we, if we need storage for
something, it's a lot easier for

me to use what they already have.

What my ultimate back end storage, I
mean, file storage and S3 storage probably

makes the most sense for the two types
of storage for backups, but there's

probably other scenarios like container
registries, and I love the idea that when

I'm on the site, the integrations can let
me, it kind of clues me into which ones

are inputs and outputs and which ones are
both, and it made, and just staring at

the options that you had made me think
like, Well, you know, do, can I store, you

know, Google Drive backups in OneDrive?

And then, also store
OneDrive backups in Notion.

Like, I started to wonder, what's
my, what's, what is my, I have a

document that's the path of where
all the things I need and where

they all go for backups, right?

If we're all talking about 3 2 1
storage for backups, like we've, we

often as backup engineers, even of
your own software, your own, stuff at

home, you often forget a year later
what you did with all it and how often

it backups and where is it going.

I tend to forget.

And I know I'm using Backblaze
in some places and I'm using a

different cloud in other places.

And I have to document all of that
for my own sanity because every year

I think I should check my backups
and see if they're still working.

And then I forget, I don't know
where my backups are, how they work.

And I have to go and
redo all that research.

So the idea that I could maybe Get
closer to having this in one product

is something that, I might have to, I
might have to do and make a video on.

So, let's get some Docker stuff in there.

a break

think the only managed service that we
are providing to the community right now

is to, you know, send some email if you
have issue with your backup and a summary

of, you know, what you are backing up.

maybe we can cover this one because
it will be always in your mailbox,

you know, what kind of backup
you have and where it's stored.

So that could be nice.

Yeah.

And at this point, you know, once,
once we have the image registry stuff,

then you start talking about Kubernetes
and can we run this on Kubernetes?

let's talk about the storage for a
little bit and get into the weeds of it

because I'm going to, I'm going to pick,
bring this up and hopefully this isn't,

hopefully I'm not, trolling your issues.

But I brought, I was looking to deploy
this before the show so that I could

come to the show and have feedback
or experience to talk about and say,

yeah, I got it to work last night.

the first thing I go for being a Docker
guy is I want to deploy the Docker image.

There's an issue open that's asking for
the Docker image and, I think someone

replied and said, Oh, we're not ready yet.

We've got caching, we've got other things
we've got to worry about and we need to

come up with a more cohesive strategy.

So I'm here for that cohesive strategy.

Let's talk about that.

what are the challenges
that you're seeing?

And do you have a plan?

Because we mentioned, we talked a
little bit before the show, but it

sounds like there's stuff coming.

So, not to spoil anything,
but let's talk about it.

It's just that the feature
requests came very early.

we had, like our first, server release
that happened a few months before this.

we had, the first user feedback
and we were trying to find the

priority, things to tackle.

and this came and requires some,
Thinking from the team about what it

means to have a Docker image for this,
because you have to actually mount

your volumes, within the Docker image.

You will not run this as an agent in
most cases, or is it what users want?

That's an open question.

That's not an answer.

That's, you have to think about
how are they going to use that?

Because the Docker image you're going
to ship as an official one, you're

going to have to support it in some way.

You can't just.

say, okay, we just raised the Docker
image and it's not doing anything useful.

And we had users saying, oh,
well, we need to have a, this

is going to be run from a CI.

So it.

loses its state every time.

So you need to rebuild state.

Okay.

Well, that's going to be an issue
because we need to have some persistence

out of the Docker image for this.

Some people saying, oh, I'm
going to use this from a machine.

Yeah.

but that means you could have installed
Plakar on your machine rather than

Docker, because it's going to be a,
like a lot of work for us to support

a use case that's not that useful.

Whereas you could launch Docker as an
agent with Plakar in it to go query the

other things because Plakar is flexible
enough that you can run it as a, I'm

doing my job from the Plakar instance.

on my machine, but also
as this is controlling the

backups from other machines and
transferring data here and there.

I mean, depending on what
direction you take, you would not

build that image the same way.

And that wouldn't, I don't know which
one you would advertise the most.

Yeah.

Without having user feedback on this.

It needs to have discussion and we need
to have users from the community telling

us that's what we need in the ary image.

And that's what, Is going
to drive the development.

Um, that's

more, most of

the issues there,

is there a concept of backing up the
system itself, like the configuration

and the plugin list, is it got an
internal backup command that allows

me to save, essentially save state
of the whole system outside of the

individual integration backups?

Well, there's two, things.

The first thing is that none of
the state is, mandatory When you

run instance, you wipe your, cache,
folder is going to rebuild the, state.

Yeah.

So you are going to lose the
plugins that were installed.

But you can just click and reinstall.

so that would be the idea is if you did
not have backup for this, you are not

in the last case because you just, you
could go from a blank machine, you point

to the repository, it will synchronize
again, and you will get a state that's

a working state for your backup.

If you need to have backup because
you want to avoid having to re

synchronize or you want to avoid
having to reinstall, well, you can

just backup the cache directory and
you get a snap, a Plakar snapshot

with the configuration of your Plakar.

there's no particular, like
things that you would have

to do to make this possible.

It's just, there's just a standard
way of using Plakar basically.

Yeah.

on the infrastructure side of the storage.

you mentioned encryption, so
it sounds like you, do you

support encryption of backups?

Is

that?

we have out of the box,
the snapshots, the backups.

we talk in terms of snapshots, in terms of

a snapshot is a view of what the,
whatever you imported as data,

all these snapshots, they are
compressed and encrypted by default.

You have to actually, say,
I don't want the encryption.

I want to work with plaintext

have to turn it off.

Oh, okay.

That's nice.

Secure by default.

I love it.

it's end to end encrypted,
so you don't have, you don't

have a server, for example.

you're going to run it from your machine,
and you're going to say, my import bucket

is on S3, and my storage is on gcloud.

yeah.

Yeah.

you don't have a server running at AWS and
you don't have a server running at gcloud.

So all of this setup is
stored in the configuration

of the store that you create.

It is standalone.

And we don't want to trust
AWS or gcloud with keys.

And we don't have a third party
that would hold the keys to

encrypt, decrypt with the strip.

So we act with them as if they
were what we call dumpsters.

They don't do anything besides passing
packets that they see and storing

them, that kind of storage layer.

Yeah.

So at the lowest level, what if I do it?

If I do the simplest thing, if I do
the simplest deployment because I'm

an imp, I'm like, I'm primarily an
implementer, you know, an operator.

So I often think about, okay.

is it going to look like when
I have this thing set up?

what were the pieces of the puzzle
sitting and what do I have to run

long term and what are the ports I
need and how do these things connect?

So I'm guessing that there's a daemon
or I don't know what you're calling

it, the server part, but like a daemon
that runs somewhere all the time and

it has an, like an API that I'm like,
I can use a local CLI to control it.

there's two parts to it.

There's the, let's say the client part,
which is you running Plakar to import

data from a source and push it somewhere.

And there's whatever storage you
have, which may be a local disk or

which may be an S3 bucket or which
may be actually anything that,

that can actually, take a key
value, object, store, yeah.

So that would be AWS and you
don't have a server in between.

You have Plakar operating as a client
to, to your AWS bucket, for example.

so everything from the duplication,
compression, encryption is done on the

client side on the machine running Plakar.

So when the traffic leaves the
machine, you know that it can be

tampered with without being detected.

it can be decrypted without having the
keys that have not left your machine.

yeah.

that's the most simple case.

That's what you would
do, on your home setup.

You would install Plakar on your
desktop and you would run it from the

desktop saying the storage is there.

It's on my S3.

Then you have a different mode, which
is, through the integration, you could

have a server, you could have the
Ubuntu machine you said you had on your,

in your closet that could be running
Plakar and taking care of connecting

through SFTP to all the machines on
your network and doing the backup.

Then we have the non open source
version, which we're working on,

Plakar Enterprise, which provides
a server that extends Plakar.

So Plakar becomes the open
source client to an enterprise.

Product, but same tool as you would
use at home and the enterprise version

would provide a server that has
additional features, like maintaining,

privacy of the credentials for
all of your storages, for example.

So your clients at home would connect
to your, your client on the workstation

at work, yeah, they would connect to
the Plakar server of your enterprise.

And then that server would hold,
the credentials, to the actual

storages, to not link them
through the company, for example.

So you

have all these different ways of
working that allows you to have very

flexible setups that go from, yeah,
I have a mono machine and it's going

to connect directly to my store too.

I have.

segregated traffic and isolated
machine that have different privileges

and cannot access that S3 bucket,
but they could access this one.

And I can't trust them to do that.

So I have to have some layer of
validation, as it's at a enterprise level.

Yeah.

Yeah.

is that allowing for like.

Multihop backups or, I'm trying to
think of some of my more enterprise

challenging, like we had so many
backups happening that, you know,

one server couldn't do them all.

So for bandwidth purposes, so we
ended up with, in one scenario where

there was like a main orchestrator.

Server, and it had multiple backup, we
just call them agent boxes, but they,

their purpose was to back up the data and
create the snapshots, but the, but they

weren't necessarily backing up themselves.

They were backing up other machines that
also had agents, but the middle tier

of fan out.

need to back up certain amount
of terabytes every 24 hours.

if at the time, this was 20 years
ago, but at the time we were limited

to one gigabit network connections.

So we were literally creating new
servers in the middle tier because we

were saturating pipes and we couldn't
get enough backups from all the

different systems in a 24 hour period.

So we had to add more middle tier,
but there needed to be a central

orchestrator that managed the
jobs that it was distributing to

the individual middle tier stuff.

But there, you know, there's a lot
of small shops that I deal with where

one person is saddled with DevOps,
and ops, and backups, and recovery,

and like monitoring, and logging, and
storage, and cloud infrastructure.

Like they're just having to do it all.

I actually call them solo DevOps.

that's the label I give to these
unfortunate individuals that are

given way too much, work to do.

Maybe AI will help them.

maybe we can, rely a little bit more on
AI to help us with the advice on that.

But it's just, it ends
up being a whole lot.

Right.

And I saw that you had a demo on the
website, but yeah, I'm just curious

about how big does this get today?

And like, where's your vision
for where the enterprise product

that you're building is going?

now, ransomware attacks are
first targeting the backup system

of, you know, any companies.

So, for different kind of
reason, encryption is required.

And you need to be sure that your storage
And the backup server doesn't have

your credential of the encryption key.

Otherwise, if your backup system is
falling, at some point, the attackers,

they have all your data in one.

So, end to end encryption is key.

Clearly becoming a kind of
prerequisite for, you know,

securing your backup right now.

If we step back a bit you know, the
issues that you mentioned about the

size of the backup, we solved that
in the past with deduplication,

mainly on the, on the, filers.

So basically it was the storage
that was, optimizing the space.

Duplicating the data, but it's
work only with unencrypted data

because with Uncrypt data, you are
of course losing, the duplication.

So today we are in, in a situation
where we have companies that.

want to have end to end encryption
on their data, but, they cannot,

you know, in that case, using
the deduplication of the filers.

And so storing the backup will have
a crazy cost to make it happen.

A lot of, vendors basically created kind
of alternative with proprietary formats.

where they are still optimizing
the space, but at the end they

have still the encryption key.

And what we are trying to do with
Plakar is to solve this issue.

So basically, because we are doing the
encryption, the compression, and the

duplication at source, it means that,
all around the path where the backups are

going, they are already super optimized
in terms of storage and in terms of space.

we have, almost, 15, 000, cycle
of backups that we did, snapshot

that we did, on this machine.

the logical size is 24 terabytes.

So, we I have here a huge amount
of data, but the space that we

are using is only 159 gigabytes.

Even if everything is fully encrypted
and unencrypted, the storage has no

knowledge about the encryption key.

And I think It's a game changing thing
of this technology, because it allows you

to move your backup everywhere you want.

You know, in any cloud provider,
on premise, where you want, even if

you don't fully trust this provider.

And, but you can do that with an
optimized, network cost because, you

know, sometimes if you want to synchronize
data between cloud provider, you have

to pay egress cost, which is super
expensive with huge amount of data.

And because you, with Plaka, you will
just pay that at the first, for the

first backups that you are doing.

And all the snapshots that will follow
will only transfer to your server.

the few blocks that were
not backed up before.

So the storage optimized, and
it's fully end to end encrypted.

today, I don't know so much, option
to make that happen right now.

Yeah.

So it's doing incremental
backups after the first, or

incremental snapshots, I guess.

there's,

I will let Gilles, but

we differential or incremental?

Yeah.

when you have a, an incremental
backup, you actually create a chain of

dependency between all your snapshots.

The thing is, you get as, the
more your chain goes without going

through another day zero of sync.

The more you increase the
likelihood that you will have a

corruption at some point that will

break your thing.

And you, Yeah.

so you have a higher risk and you
have to test everything very often

because you want to limit that risk.

Like you, you don't want to do, to
go through the hassle of doing the

incremental backup, just to not test
it and test it in a week and realize

that, oh, you have one week worth of.

Deltas that are trashed, basically.

Yeah.

And the idea is that, you can also take
an approach that is a index reference

based, where basically what you're
doing is not saying I'm building a delta

against what happened right before, it's
building a delta against what's in the

store as a global storage repository.

So, Your backup actually benefits
from any of the previous ones doing

anything, and you don't have a chain
of dependency in the sense that you can

delete the one that happened yesterday.

It's not going to break any dependency
with the one you have today.

as long as your store is, reliable,
you can do any kind of removal that you

want with the granularity that you want.

And we can consider them as being,
autonomous in the sense that each

snapshot is, autonomous by itself.

Does not require any other one.

the thing is you have to
trust the storage anyways.

You're going to store your data there.

If you

don't trust it, well, you have to do
something that's called three to one

backup to actually ensure you don't
have one copy of your backup and you can

restore your broken backup
from another backup.

that's the idea.

we have a cool way to manage it.

And so this allows you to have all
the benefits of incremental backups

without the risk of incremental backup.

Yeah.

That's nice.

with the sync command right now, you
can actually super easily, synchronize

a ClosetStore in several locations.

So basically you are pushing
one backup in a ClosetStore.

And you can have two, three closet
stores that are, replicating

those data in different locations.

So, yeah, for the, having all
your backup in one closet.

will be too risky.

Of course, you need to have a
backup strategy on top of it.

And we are providing a cool way to
make it by, you know, having this

way to synchronize, with a again, low
cost on storage and bandwidth, this

closed store in several locations.

So you are pushing in one place, and
you are able to have two, three copies,

even in cold storage, to be sure
that your data, are remaining safe.

If the first storage
has some issue, but yes,

with different granularities, because,
you might say, oh, since it's encrypted,

you may, you need to have the exact
same copy, but that's not happening

because the snapshots are individual.

So you can actually, say, oh, I have one
store, like on my NAS near my machine.

I will back up on my machine in the
local disk, just to have user error

reparation, because I did it something,
I have it immediately available.

But I might synchronize one snapshot
per hour to the NAS and have that

one, span again, like one copy into
AWS and one copy into Google cloud,

for example, and the synchronization,
it's not doing another backup.

It's really pushing a copy of the
snapshots through different, sources

Possibly have different,
encryption keys as well.

it does, like trans, transformation
between one, one to the other so

that in the end, each one has its
own encrypted copy of the same data.

and, this saves from cases where, for
example, you would have your machine,

you want to back it up to two places.

Yeah.

You would run like natively.

You would do, oh, I'm going to do
a backup to AWS and I'm going to

do a backup to, to Google Cloud.

Yeah.

But in between.

Something may have changed.

You're not backing up the same thing.

When you're doing the sync, what you're
doing is getting the info from one of the

stores and transferring it to the other.

So at the end, they have the same
data, which has its benefits.

like if you lose
something, it has benefits.

And we repair the store,
also, if we have a

corruption,

repairing the stores as well.

if

you break something, you can actually
repair it from the other one.

yeah.

And you can also, of course, run some
check on the store to check, you know,

if the data is still, what you expect,
in the store on two different way, right?

And we have R&D, projects about,
error correcting codes to auto repair,

maintenances and stuff like that.

just wish to be clear, the crypto,
we did not do it ourselves.

it's We are a team of people who
have worked in security a lot.

We have been facing specs a lot
about crypto in banking and stuff.

So we kind of had a hunch about
what should be done on where.

And we had an external, independent
auditor with a famous corp. Cryptographer,

where the book I have behind myself,
was, okay to actually audit this with

no buyout because it has no interest
validating something that will be broken.

So that was, just to have a third party.

We managed to put cryptography in
every layer as validation concept.

You have HMAC everywhere, so if you
flip one bit somewhere, it's going to

completely break in the nice sense.

It's going to tell you
there's a corruption there.

It's in that specific file, and this
is collapsing because there's that

file and that file that also shared
that data, so they are all corrupted.

So we have already the detection
part in a very, granular way, in

the sense that it can pinpoint
very specific chunks and objects.

And having that plus the ability to
synchronize, we can, build upon, tools

that would, that we are allowed to do
a very, like pinpointed reparation.

Without having to repair everything
because that's costly too, we're going

to be able to say, Oh, I have one chunk
it's broken and I can fetch it from there.

I'm going to fetch just
that, that amount of data.

We have, well, as I said, the error
correcting codes, because since we

can detect all that is broken, we
can have on top error correcting

code that could repair, auto repair.

You know, in the same way, like repair
in a buffer, verify that it's correct.

You can check with other repository.

Oh, it's very correct.

So Yeah.

do repair for real, apply this.

So we have all these paths of,
possibilities that we can implement

that are not like that far away because
we have branches that are working.

They're not prod branches.

Working right.

now,

but they're working enough that you can
actually say, it's not just, an idea.

It's something that you can actually,
that was the focus, next month

it would be there because there's
enough, enough bricks to prove that.

And, and we,

have a ton of these ideas of, like
what would be the tools to make it

more reliable, in the sense that it's
reliable that you would detect something

is corrupted, but how can you make it.

so reliable that people will
not be stressed if that happens?

But that's the goal in the, in the idea.

I've been, as an architect in the past,
I've been in so many incidents room over

Slack with people that are, they lose
their mind when they're have, they have

an incident and the backups are hard to

manage.

Cause that's not, we know we have backups.

Now we have to go into the
backups and we never do that.

So now we have to figure
out how we go into that.

how we, and if, One of them is corrupted,
then, yeah, it's, it's, stress plus plus.

You're going to a high level of stress.

And we want

to be into the session where
they don't do not face that.

they, okay.

There's a corruption, even your backup.

Well, there are ways to get out of this.

And most of them are automated.

As you manage backups, like there's these
three phases of the DevOps, the operations

engineer that's managing backups, there's
the implementation, which is obviously

very time consuming, and you're learning
the product, and you're testing backup and

restore, so you can believe they'll work.

And then once you kind of get there
to your, the projects implemented,

and you feel like everything's
going to work in a recovery.

You tend to leave it alone, right?

Like you're checking to make sure
things are going like, as new

infrastructure shows up, you're
adding or removing jobs or whatever.

And so you're kind of in maintenance
mode, but then there is that incident day.

Where they call the backup person
and they're like, okay, we need to

bring you into the incident room
or into the Slack team or whatever,

because we now need a recovery.

And typically, most of the teams
I work in, like not everyone

that can restore, right?

There, there's only one or two
people that can restore tool.

And so in that moment.

As I can viscerally remember
being the manager of the people

managing the backups and worried,
starting to doubt everything, right?

Like they're about to test the restore
and I'm doubting like, did we, when

was the last time we verified this
type of this particular integration?

Like we've had three major version
upgrades and we've never tested

since we did the initial deployment.

So we don't even know if
this restore will work.

We recently had to replace
three of the drives in that.

So, is there a potential for some
sort of disk corruption that we

didn't know about because the files
just sit there and they never get

touched and they die slowly over time?

there's so many, moments in that where
I'm worried that someone's going to

get in a lot of trouble or fired.

And then the recovery
happens and it works.

Maybe it doesn't.

there was one time where we
ended up having corrupted files.

And we had to go to offsite tape from
like a month ago, because we had this

process where we would go to tape once
a month that would go to an offsite,

it was going to a different data center
in a different part of the state.

So it was like a 300 mile, the goal was
that no storm, if the storm took out the

data center, that's the, of the three,
two, one, that's the third copy, right?

Like it's a state away.

It's been driven there
by one of our staff.

We know that it's physically there.

And we had to go pick those tapes
up and they actually worked.

But It took like a week.

It was after a hurricane and we had a
flood and we had servers underwater.

And so we had to go to the offsite
storage, And that whole week, I was

just so nervous that these things
weren't going to get restored.

We were like, basically going to
start from six month old data at best.

And luckily, luckily it worked.

But those kind of things, we don't talk
about those kind of horror stories enough.

you know,

one of the reasons I think people are
stressed is because, partly most of the

companies, they don't have a backup team.

They like the big ones have a backup team.

The other

ones have, they don't have, a backup team.

People who are given the task to
do backups, it falls on them, as

part of a long list of other things
to do, and they have to get rid

of it, fairly fast and it's not a
topic that they are interested in.

they just like, yeah,

you have to do backup
before, before Friday.

Okay.

what do I have?

There's only enough,
there's a list of 10 tools.

None of them seems appealing.

I'm going to take.

One that's popular because no one's
going to get fired over a popular tool.

That's going to be the decision driving.

but then, if they don't have to use
these backups and if it was like a task

following up on them, they're not going to
have a look at the backup once it's done.

Like they will check that it happens,
on the regular basis, because

it's supposed to run every day.

Well, they will check that it
happens every day, but they will

not inspect the data every day.

Cause that's, they have
other things to do here.

The other thing is that in most tools, the
backups are kind of dead data in the sense

that they are meant to be backed up and
no other use than being backups, you know,

when we design stuff, we're more
interested in how do you actually

use the data, because what's going
to happen is you're going If you

have no use for data, for that data,
you're not going to look into it.

If that data that you backed up is
actually usable in a very usable

way, and you actually use it every
day, then you have a fair confidence

it's not corrupted because you've
been using it the last few days.

The demo website, that's just
the open source version, okay?

So that's not a company use that you would
have of it, but we have previews of files.

Within these files, you can preview
the photos, but you can preview

the videos, you can preview audio.

If you actually use that snapshot,
which is a backup, it's a backup

stored on the screen, for example,
you actually use it in a way that

you would use your Google Drive.

Every day, looking into things that
you actually manage and, Oh, I want

to look at the content of a file,
but I'm going to use a snapshot, not

a copy that I have on my machine.

Well, you know that it works because
you actually viewed it, recently.

yeah.

And it becomes immutable data in
the sense that you can't alter it.

it's like a read only data, but it's
read only data that's, that's you

actually use and it makes that the data
a bit less dead and a bit more lively.

I think if you have a use case that
way, you enter into an incident, you

have to restore something that you've
been looking, like you've been actually

using the snapshot every day through
a web interface or through mounting

on your system as a local directory.

Well, you're not as stressed because you
know that, that works actually, which is,

you've removed the, the painful part of
the question of checking your restores.

Yeah.

is the check similar to
a, like mock restore?

Yeah.

It's, it's an in memory restore
that discards the data after doing?

the cryptographic checks.

So it's actually, if you restored in RAM
and you validated all the checksums, but

we do it in a stream way so you don't
have to actually hold the memory for the

snapshot.

For the whole, yeah.

yeah.

Okay.

Yeah.

Cause it has to de, de dupe.

Yeah.

has to read the data

we have a couple questions I don't
even know if this is a thing.

Is there a plugin for
Cyber attack detection.

And I asked, is that, are you talking
about like ransom, like detecting

ransomware from encrypting everything?

yes, is there something like that?

is that a thing?

And what is, what do you
think about ransomware?

Like, how do you do anything
for ransomware or do you just

We do some, no, no, we do something, but,
the, like the position, the posture that

people should have is the data is tossed.

You have to have a copy elsewhere
and you have to have a copy that's,

not, reachable by ransomware.

Well, you have to have data that's
offsite and not on the network.

And that's the only way that you're
sure that, well, sure, relatively

sure that your SMR is not going to

affect you.

other provider?

And then we have, okay, once we have
tackled this and we have said to

people, don't trust anything else than
this solution, then there's all the

solutions that are like, best effort.

like for example, we have, we have,
entropy compute, compute, the entropy of

files and directories and we store this
as part of the metadata of each snapshot.

So you could actually use a diff,
like a paper diff, way to, To compare

if the entropy drastically changed
between two snapshots, for example,

this directory that had that low entropy
before has a very high entropy now.

The thing is the stores that
are pushing the They're Right.

on this.

So, so you're not, ever editing
something in the source.

So they can be, actually you can have warm
and forced, at a, at your provider level.

If you have entropy checking,
plus, the offsite copies,

offline copies, you kind of have.

Like a fairly good, situation because
things that would not completely trash or

store, you can still manage to say, Oh, I
had the machine that has the ransomware.

It's pushed back up with ransomware,
but the others, the snapshots

are not affected and it can
actually, remove the broken one.

Because they're immutable.

Then if that did not work, then
you can go back to, oh, I have a

offline copy, I have a offsite copy.

You, so you have to manage this for you.

You just can't trust a software
solution to take over somewhere.

Yeah, I like the entropy idea though.

Like, you're basically talking about if
the change rate on this particular backup

is normally 10 percent a day, having
something that notifies you when it's,

you know, double that, 20 percent change
this today or whatever, and some sort of

Yeah.

that your

And you will probably have an alert
on the size also, because of course

if you know everything is encrypted,
usually, you know, you can do with Plakar

something like 10, 000 cycles without
increasing the size of the storage.

You know, you can, yeah, increase the
frequency because we are just storing,

you know, a few metadata, only the
changes between two snapshots, so,

and so you can virtually make your
backup from, you know, every day to

every hour, every minute, depending
on the size of what You, are making.

if you have a ransomware, you will
have an alert on the size because it

will double at some point the size of
your storage and it's, something that,

should never happen, so.

Yeah.

you have that, and you have, you have
the idea that we, as I said, like very

early in the interview, we have built
some kind of database in some sense.

So we have multiple indexes.

we can look up images or videos because we
also index MIME types and stuff like that.

And the MIME types, they should be
aligned, in some way to the entropy

of the data, if you have a text plan

file and it has a high entropy,
you're going to raise alerts.

That's not, that's not great.

So you have many, many, these are the
few that come to my head right now,

but there are many, other ways, other
heuristics that you can use to actually

detect some kind of a fishy scenario that

that would gradually, take place because
you're, if it's already, if the ransomware

is already there, you should know,
because you are asked to give money.

But if you're in the middle
of the attack and you have.

a backup that's happening in this app
that have half the data corrupted,

half the assets are corrupted.

You're going to detect it through
entropy, metrics like this.

I feel like if I had to make something
myself, it would end up being

something that was so stupidly simple.

Like I'd create a monitoring solution
that watches a plain text file.

That's like, don't encrypt me.

txt or something that I put on every
single file share, every single server.

And if any single one
of them ever changes.

I get an alert, like I have some sort
of agent that somehow detects all

of them and it's the first level.

it doesn't even wait
for backups to happen.

It's just like, Oh,
this file just changed.

cause the way I've seen these things
roll out, these ransomwares is it

starts small and then just spirals.

So there are early indicators in
the early hours, because If you've

got terabytes of file storage, like
that doesn't all happen at once.

And it doesn't, and not everybody
has permissions to everything.

So it typically starts in little places.

So I'd probably like seed all
these little files everywhere.

besides the smartness of the
discussion, There's only the offline

backup that's going to save you.

There's

no other way that you be, it's,
it takes only one miss, to,

the ransomware, if you misdetect and
you let it happen, it's already done.

you don't have the luxury to,
to try and see if it works.

Yeah.

so you have to have
your own offline backup.

That's the only solution.

And then the, everything else is
nice to use that you could have.

But it should not be a blocker to
have the most annoying part, which is

the offline backend, which is the one
that takes the most effort to produce,

Yeah.

Well, I like those read only S3 buckets.

Those are something that I like to use
for, ensuring that files can't be deleted.

My backups can't be changed.

you know, did you hear about the Unisuper
incident where Google, they just deleted.

All the complete az, and region of
UniSuper, and they lost everything

because they, you know, basically they
dropped the billing accounts of the

client and it cascaded everywhere.

So, yeah, I would not rely on S3.

It's,

Oh, for sure.

I just mean, unlike a normal file
server or any, any drive storage or

anything like I can more easily ensure.

Things that are written to,
buckets don't get changed later.

Whereas like everything on a file
server is, you know, up for debate

on whether, what can access it.

but yeah, good advice.

one other question.

Gartner recently introduced the cloud
native infrastructure recovery category

in their latest hype cycle for backup
and data protection technologies.

Where would you position Plakar in.

Cares, I guess that's the
acronym for Cloud Native

Infrastructure Recovery something.

Let's say that we announced, last week or
this week that we are joining, you know, a

sponsor, Linux Foundation and CloudNative,
Direct Initiative, I don't know, yeah,

so basically we are joining those two

Sandbox maybe?

Are you going to go for Sandbox?

when you donate, are you donating
or are you just becoming a member?

we are, donating basically to, yeah,
be part, of the foundation, and because

why, it's the first step, you know,
to, who to put there and to understand

how you know this ecosystem is working
right now because we have to admit that

Jill is coming from the BSD world and
I don't have so much experience on this

one so it's we have to you know figure
out how we can be integrated and it was

the first step but yeah we are working
currently on the support of Kubernetes.

we hope, for Cloud Native Paris,
the 3rd of February that we have

something that will be, usable.

and, yeah, we really think that
at some point, a layer is missing

in the, in Cloud Native about,
you know, residency and backup.

and that, you know, with Plakar, we will
try to contribute, you know, try to bring,

up This, layer and be sure that, whatever,
the data that you have to back up, you

will be able to relay on this layer.

My previous job, you know, I was
managing quite large team with a

big e commerce company in Europe.

And I was fighting every quarter
to be sure that all the team

made their backup at some point.

But the things I never achieved is to be
sure that all the team has 3 to 1, Eclipse

3 to 1, you know, encrypted everything.

I think what is game changing with
Plakar right now is that we decoupled,

the storage, from, the technology
to store basically your backup.

and you can store your backup anywhere.

without trusting your provider.

So what does it change?

you could imagine, and it's what we
are releasing right now, a protocol

where you can push your backup to a
provider, and the provider is managing

the resilience of your data without any
kind of knowledge of your encryption

key, while we are maintaining low
network cost and low storage cost.

And I think it's, you know, the kind
of layer that is missing right now.

Be able to backup, all your
objects, basically, whatever is it.

just pushing it to a third party that
could be, you know, your own company.

It could be a team in your company
that is managing, the backup.

But today the issue is that pushing
all your data to one team in your

company with the encryption key, etc.,

to optimize the storage, it's a big
bottleneck in terms of security.

So, yeah, what we are enabling with
this new protocol is being able to,

you know, asking to every team, make
your backup, push this backup to a

third party, internal third party
or external third party, And this

third party will manage resilience
without any knowledge about your data.

So being able to make two copies
in two different cloud providers,

for example, one offline, etc.
And doing it on a clean way.

and I think it's, the kind
of contributions that we

can bring to the ecosystem.

But, yeah, of course, we want
to, do something to solve

all this resilience issue.

the way you're describing that, I can't
help but think that an OCI registry

would be a good option for that because
it's content addressable, it's SHA

hash guaranteed, unique identifiers,
it's read only, so you can be ensured

that there is integrity, it's got all
the metadata to it, so, I'm going to

put my vote in for that, but it sounds
like you're building something custom.

So, so I was going to ask as we end this
up, cause we're running a little long.

what was next?

It sounds like what's next is
Kubernetes initial Kubernetes support.

When you say Kubernetes support, are you
talking about running it on Kubernetes?

Or are you talking about backing up
like Kubernetes volumes or is it both?

we had a discussion about what was
the proper way to integrate into

Kubernetes because, it's one of our
developers that's working on it.

It was like, should I do
one integration that works?

That covers everything.

And I said, no, we have to decouple,
control plan and the data plan.

You have to be sure that, I want
to be able to back up all the

YAMLs from my configuration.

And I want to be able to selectively
back up, some of my volumes.

I don't want to have, no option, but
to back up everything or nothing.

So, so these are either two, two
separate integrations or one integrations

operating in two different modes.

but the idea is to tackle all of
them and, we were looking into his,

looking into, Valero integration.

and

it's tempting to go your own way always,
let's do our own integration, but there's

also a pragmatic way, which is, no, if
we can adapt to, to be run by Valero.

through Valero.

Then you get all the possibilities.

You can get, first our simple integration
to back up the kube configuration, which

can also, be used through Valero to, to
fit into the existing, setup of people.

They can just swap between
different solution.

They can test us while retaining their
old solution for whatever they're using.

Valero, then we can have a third
way of doing it, which is our own,

but that would come last basically.

But Yeah.

just to say we're planning on not
doing, being just a session that runs

within kube, but more as a session
that also manages to backup your kube.

Yeah.

That's one of the challenges I've been
seeing in the industry is like, You've got

like the traditional backup vendors that,
that have the plugins or integrations

or whatever they want to call it.

and, you know, you're paying lots
of money and they make you pay

for certain things like maybe the
Oracle integration is costing extra

and you know, that kind of thing.

And they're all closed source.

And then you have these open
source things like Valero and.

But the challenge with it
is it's just Kubernetes.

it's great at Kubernetes,
but it's just Kubernetes.

And typically, I don't really work with
any teams that are only Kubernetes.

I mean, even if they're Kubernetes
first and they're container first,

they're going to have other things.

And so then they have to have a completely
different set of tools for that stuff.

And those, then these two
things don't meet, right?

So, so Valera's backing up to whatever
storage you want to put on the plug in

on the back end, and then this other
system is completely separate, and

you've got, and, not that we can ever,
I mean, you know, at this point it

feels like all my clients have multiple
CIs, multiple backups, multiple clouds,

like there is no just one thing.

They've, they're doing everything
multiple times, multiple types

of databases, multiple different
database providers, and, So the

challenge I always feel like isn't to
get to one universal backup system.

It's to get as just as few as
possible so that you can maintain

yeah, that, that, that's the goal.

But if you were like, okay, imagine you
have 10, 10 separate tools, because that's

what I saw at some previous companies.

They have, like many teams, none of
them have, came up to a consensus

about what were the proper solution.

Each one came up with its
own, you end up with 10.

Even if we just reduced to three,
that's a net win over having to manage

10, 10 different solutions.

And in our sense, You can do something
very stupid, like stupid, stupidly, easy.

I mean, you can say, oh, I want an
integration that actually backs up to

another system, another backup system.

So you end up having everything
falling into Plakar through the

system of integrations, being able to
ingest data from whatever solution.

So there's also this, I'm saying it's
a possibility through the integration

system, but that means that you have
also a way to progressively de plug.

Unplug, older solution as you manage
to have integration written, but

still, be able to have everything
from day one into Plakar.

you know, you want to
back up some solution.

We don't have that integration
from that, but we have the

integration for your backup system.

Well, you

can back up through the other tool.

We back up the result of your backup.

Progressively as we have the integrations
to, to manage your tool natively,

you get, some tools out of the way.

So

the idea is to allow people to do that.

obviously we're not enough
people to write the hundreds of

integration that we would need,

but having simple SDKs, providing
good examples and starting to do the

most, like the most popular ones,
will lead us there ultimately, that's

the idea.

yeah.

And that's how tools like CyberDuck,
CyberDrive, like that whole, you know,

Project ecosystem, you know, like dozens
of different storage integrations.

I like using those tools because
they've got GUIs, they're user friendly.

They're really great for like personal
backups, personal file management.

and that tool, the magic of that tool
is that it works with like everything.

it, like every cloud storage
scenario you could think of,

it's got a plugin for that.

And so I feel like the integration or the
plugin ecosystem is is a lot of, in a lot

of ways, the magic of what makes a backup
product or a backup project, and really

interesting is the ways, all the different
things that I can backup up just in case.

I didn't see a GI repo option for GitHub.

that's maybe more of a, maybe a,
maybe we, instead of doing it with,

I guess if you do it with Git, then
you could do any of the GI providers.

But it might be better to actually do
it through the GitHub API and do it.

it depends what you want, because
on GitHub, what you would want

is probably not just the code.

It would be all the issues and all the,

right.

that's where

like two levels, right?

Yeah, it's like I need the
code, but I also could really

use all this other stuff.

Yeah.

well, this is great.

We could talk forever and I
really appreciate your time.

You've both been very
generous with your time.

we cover lots of topics in this
hour, and I'm, I am excited to start

playing with it and get started.

I am excited to hear about, what
you're going to do, what you're going

to announce on the Kubernetes side.

That's where I live.

So the Docker and Kubernetes stuff, I'm
going to subscribe to any issues that

have those words in them, so that I
can keep track of what the status is.

how do people find you?

So We've got the website,

are

on discord.

So and then you got the GitHub repo.

you got socials, looks like
you have a discord server.

So everybody that wants to get involved.

We work on Discord.

we are, as I said, all remote.

We're all working remotely and
we work transparently on discord.

So you can just come to our discord.

you can actually attend
all of our meetings.

You will be muted, , but you can actually,
look into any discussion, technical

discussion that happen, in the open.

Except the daily,

you can come.

and talk with us during the daily

yeah.

all right.

So we know what's next.

We know you're going to be at KubeCon.

People can follow you individually.

I guess you guys are on
socials, on LinkedIn.

I think in the YouTube description,
all the links are below for how to

follow these two fine gentlemen.

Well, thank you for, for having us.

This was, pretty great.

very much.

you.

All right.

Well, thank you both for being here.

And we will see you next time
here on, DevOps and Docker talk.

Ciao everybody!

Cheers.

Creators and Guests

Bret Fisher
Host
Bret Fisher
Cloud native DevOps Dude. Course creator, YouTuber, Podcaster. Docker Captain and CNCF Ambassador. People person who spends too much time in front of a computer.
Beth Fisher
Producer
Beth Fisher
Producer of the DevOps and Docker Talk and Agentic DevOps podcasts. Assistant producer on Bret Fisher Live show on YouTube. Business and proposal writer by trade.
Cristi Cotovan
Editor
Cristi Cotovan
Video editor and educational content producer. Descript and Camtasia coach.
Backup S3, Google Drive, iCloud, Notion with Plakar
Broadcast by