Backup S3, Google Drive, iCloud, Notion with Plakar
Download MP3What does it actually take to
build a backup system from scratch?
Not just slap together some Arsing
scripts and call it a day, but engineering
something that handles deduplication,
encryption, compression, indexing,
and restoring in one cohesive tool.
Well, today I'm joined by Julian and
Gilles, the CEO and CTO of Plakar,
a new company, but built on nearly
a decade of R&D in building open
source backup systems from scratch.
I am excited about this one because
Plakar goes beyond just backing
up your typical workloads like
databases, file systems, and S3.
It also can back up connectors like Google
Drive, iCloud Drive, OneDrive, and even
things like Notion Dropbox and imap.
Yeah.
You remember imap?
Well, they've recently joined
the CNCF, so we talk about their
upcoming Kubernetes integration,
obviously a part of this channel.
And on the show I suggested that they
build the ability to back up to Docker
images, basically to OCI registry storage
really is what I was asking about.
And then like within a couple
of weeks they went off and built
that, uh, as a new integration.
That's pretty dope.
This is the one backup tool that
I've seen in really, maybe ever, but
definitely recently that not only looks
useful for work and server workloads and
clusters and clouds, but is something
that I think I want to try to back
up my own personal iCloud and Google
Drive and notion and all those things.
Which I, I've found historically are, is
actually very problematic, challenging to
find a free tool that isn't just cobbled
together with a bunch of different things
and something that's reliably able to be
restored in a reasonable amount of time.
So I'm, I'm excited about this tool 'cause
I feel like it, it affects multiple parts
of my job and life and it's open source.
So let's get into it.
We're going to talk about backups.
We've talked about that before, but
there's so much more to backups.
We're going to get into it.
I'm excited.
you all started a company.
how long ago?
We can give two answers on this one,
the first one is that the company
has been incorporated in 2024.
Yeah.
it's a quite new company to support this
project, but this project is, quite, old
in the sense that Gilles did a lot of
R& D, in the past 10 years on it, so.
Yeah.
What I was noticing as I was digging
through the project was there are a lot of
foundational things when you're creating
a backup product that you have to define
that a lot of us don't think about.
I, the only thing I can think I can
equate it to not being a developer
of a backup product is similar to
creating a new database product because
you had to create a file format.
You had to create a
streaming, backup format.
You had to go, I would imagine, much more
low level than the typical application
developer has to go because you had
all these underlying fundamental
concepts of, you know, things like
the backup file, the caching of the
backups, all that stuff that, yeah,
yesterday I was at a meetup and
I was presenting the product.
Someone, asked me like,
why did you go into backup?
That seems like a very boring, area.
yeah, and it's, if you take it from
a developer perspective, it goes from
very low level to very high level.
It trusts all, like many fields
of, computer science that
you might be interested in.
If you have, like a
high appetite for tech.
So you have to know about
how a fast system works.
You have to know how, how to manage your
memory, how to manage high concurrency,
how to manage, like file formats.
In our case, we kind of
developed a database in a sense
because you have a bit tree.
You have a sense of how to manage,
match your bit tree to something.
So yeah, it's very, complete as a project
to, to dive into technical topics.
I was like, Oh, this is going to be
a small project, small side project.
And then you realize that, Oh, you
end up doing cryptography, you end up
doing compression and stuff like that.
And you, like any area you look
at, you're going to find ways to
improve it and go further into tech.
Yeah, I can only imagine how much
time is spent, on like the engineering
fundamentals of a giant file that
you need to do various things with,
because most of us don't deal with
terabyte sized files on a daily basis.
to me, the biggest files I have
to deal with are model files,
like open source model downloading
and uploading, like that's the
biggest thing I have to deal with.
maybe if I was in an enterprise, I'd
have big backups and stuff like that.
I used to manage backups at a Government
Enterprise, about 7, 000 users.
That was 15 years ago.
And I had two dedicated
staff that worked for me.
All they did was manage
the storage and backups.
Their entire job was ArcServe, I
think we were either using ArcServe
or NetBackup, but we had, you know,
Windows machines, Macs, we had, Linux
machines, we had mainframes, and
it had to handle all of that stuff.
And this was pre cloud, so we didn't
even have to worry about How do I
back up cloud storage or what we
didn't even have S3 at the time.
That wasn't a thing that much in
the early 2000s, but it was so time
consuming and such a nerve wracking
effort to deal with recovery,
which most people don't talk about.
Like we don't spend a lot of time.
when you're talking about
backups, everyone's concerned
about the backup part.
And I always focus more
on the recovery part.
And I get more excited about the recovery.
Like how easy is it?
How fast is it?
how fast can I discover the
thing that I need to recover?
Because often that's the, trick.
If you're backing up hourly and daily
and monthly and weekly, and you've
got all these incrementals all the
traditional backup terminology, like
sometimes you're like, well, we, that
person needs to recover that file.
But it needs to be the one
that's not today, because that
one was corrupted or whatever.
So then you end up Like sleuthing through
a giant caching system trying to find
the one file or the one directory on
one server somewhere amongst a thousand
servers that you had to back up that day.
And how do you do all of that
reliably and in a way that you get,
that two people can handle the data.
And now I don't know any, I don't
know any customers of mine or anyone
who has two people managing backups.
It's like a part time job for one person.
So, Something has changed.
When you tackle that issue of how do you
find the proper thing to restore in a
fast way, you end up realizing that you
have to develop some kind of database.
It's not just a backup, it's not just
like gluing files together into some kind
of archive that's going to be efficient.
You have to actually have indexes, find
things in an efficient way, be able
to generate diffs between versions of
files and do it in a way that can scale.
Because it's not really
just a volume sync.
It's more, how many files am
I going to have to look into?
And, like a large volume of small
files is as problematic as, I
think to back up huge files,
Also the performance, lots of small files
isn't exactly performant on a lot of
systems, but I mean, since I was stopped
managing backups, we now have SSDs.
So like I lived in a world where we had
spinning disks and things were super
slow and, you know, if you had gigabit
networking, you were actually doing
great, and, but times have changed.
So when I look at this, like where's the
elevator pitch that when I looked at the
website, the thing I took away from it.
It was one open source, run on my own
hardware, on prem, wherever I want to run
it, and it has this idea of integrations,
which is not new, like most backups.
You have to have like compatible, like it
has to be compatible with this database
file or this type of storage or this
NAS or this, iSCSI thing or whatever,
but in your case, it looks like the
integrations are more cloud focused.
So they're dealing with HTTP,
but specifically different APIs.
Like I saw Notion in the list,
which I'm a huge Notion fan.
I never thought about backing
up my notion like that.
Then now that I know that it exists,
I'm now like obsessed about, maybe
I should be backing up my notion.
Like, why are the, how did this happen?
how did the integration
list happen the way it does?
maybe to start and let you complete,
Gilles, but, um, you realize that most
of the SaaS provider right now, they
are on a shared model responsibility.
So it means that you are
in charge of the backup.
In the case of Notion, for example, they
are not providing any kind of backup,
and you have to do it by yourself.
And when you look on all the SaaS that
you are using for personal use are.
Even for, you know, enterprise usage, you
see that a lot of, you have a lot of all
in your, resilience, or in the protection
of your data and I think it was important
to have, software that is able to, manage,
I would say, the legacy tasks like, you
know, backuping files, et cetera, but
also be able to back up any kind of data.
including, of course,
everything coming from SAS.
So I think at some point in the product,
we decided, okay, it's not, a backup
solution that is supposed to backup only
files, but a backup solution that is, so
it should be able to backup any kind of
data and maybe you can tell a bit more
about, you know, how did you do that?
the, just to mention about the open
source part, the main driver initially
was to avoid having, vendor locking.
Because, you don't have that many
solutions that can back up many
sources and that are not closed today.
you have hacks, you have scripts that
bundle a bunch of solutions, but you don't
have one solution that you can trust.
And, and it so happened that a
friend of mine who has like a
degree in computer science, so he's
like fairly educated on the topic.
He managed to lose all his data
because he used a set of scripts
that did not behave correctly.
And, he did not realize because,
everything seemed to be okay
until the day his server crashed.
And he had to rely on the restore
part that everyone overlooks now.
and the thing is, if he had a solution
that was not a glue of multiple script
and rsync and blah, blah, blah, they
would, this would not have happened.
and now you end up having to
look at what solution allow you
to, a backup multiple source.
And you end up Having to go generally
towards commercial solutions,
that will provide support for
multiple sources without hacks.
And they will usually have,
some kind of closed format.
So you have to trust that they
will not go away or they will not
bump their prices and that you
can trust them on the long run.
And what I wanted was to initially
have a well documented format that
We are going to be fully open with a
license that prevents closing the code.
if we decided to go wrong,
someone would just fork the
code and it will go that way.
so that, that's a safeguard
against ourselves going wrong.
Yeah.
And then you have, what
do you do with that?
how do you manage multiple sources?
And you realize that most of the
open source solution, either there
are, there are three, Fairly targeted
at, at doing synchronization like
rsync and they are twisted into doing
backups through hard links, like
tricks,
or they have a high, file system,
like they're highly built around
the concept of file system.
So you can actually do a backup
of an S3 bucket, for example, but
that's using a trick to map the
S3 bucket on your file system.
So they have, limitations And, they
do not work well when you break this
limitation, if you create a bucket and
put 2 million objects in it and try
to mount it to the file system, that's
not going to work very well for you.
There was a, like a disruption
in how do you model this?
How do you model this issue that you
want to import various sources, you don't
know these sources yet, and you want to
be extensible and have a plugin system,
so you don't even know what plugins will
be written in a year from now, and make
it fit in a model that will scale if you
have flat, flattened data at the root.
Designing this model, we came up with
something abstract enough that you can
kind of prove that anything can go in.
And ourself, we work with that abstraction
so that, like they all benefit from
the same, same deduplication, same
encryption, same, like same features
without, Like when you write a
plugin yourself, you would not have
to think about all the details.
You would just have to think
about how do I get the data
from this point to this point.
It will do the work once it's
there and in a very simple API.
So our most of the work was done on that.
Finding that abstraction that allows
us to work efficiently, but assuming
a wide variety of, of sources.
And most of the integration that
we have are, some are tagged stable
and some are tagged beta because we
are a bit hard on ourselves because
beta does not mean it does not work.
It means that, we want to show it works.
And depending on the, how people
are interested in that backend,
we might drive that one, further
in terms of, projection readiness.
but they all work, to some extent.
I can imagine like little edge cases
of a lot of this stuff, especially
when you're pulling and pushing
from an API that isn't exactly.
I have interesting questions.
It's like, okay, with the notion,
how exactly does that recovery work?
And what if there's duplicate data,
do you, where do you know, we all can
conceptually, we've all, most of us have
all dealt with file based backups, right?
Same single system, same host.
Easy, easy day, right?
you're not even dealing
with remote storage.
And then, like, people tend to evolve into
a, okay, now I'm doing, like, SMB mounts
or something to put some files elsewhere,
I'm doing low tech rsync or something.
and then there's, this giant chasm,
I feel like, which is, there's all
those little utilities that are very
niche and very composable, but you're,
like, you're saying, you're building
your own scripts, you're building your
own, orchestration, essentially you're
designing the orchestration yourself.
And then from there to a complete
cohesive strategy that uses one
or two products maximum, you
suddenly jump into like enterprise.
There's a lot of enterprise backup
garbage out there, I feel like,
like there's a ton of stuff that.
Especially when it comes to
cloud APIs, I do this every year.
Every year I have, I'm a small business
of, you know, three to five people,
depending on what year we're talking.
And I, so I have some business needs.
But mostly they're the sim the backup
needs I have are like what a person
would ha an individual would have.
I have iCloud, I have, you know,
Google Drive, I have Dropbox, I
probably have STP FTP somewhere.
I have Notion, I might have some
S3 buckets, and I have Macs, and I
need to, manage all these things.
I have a an Ubuntu server in the closet,
There are things and places, I would
honestly love my GitHub Git repos to
be backed up automatically just in
case GitHub goes down and I need to
move to, you know, GitLab or something.
And when I look at, just for Google Drive
or iCloud or OneDrive, any of the sort of
top three cloud file drives or whatever
you want to call them, They're really,
I couldn't find a single product on the
internet that I could buy for one person.
It seemed like all the products out there
that were like, yeah, we'll back up your
company's Google Drive, because that's,
you know, I have the company version of
Google Drive and the company version of
OneDrive and Those don't always work with
all the consumer stuff, or if you're using
CyberDuck or some other little utility.
I was looking into backing up three
people's Google Drive, and I was
looking at possibly having to spend
500 a month on an enterprise piece
of software because their minimum
License purchase was like five users
or 10 users or something like that.
And I gave up.
I eventually just gave up.
I couldn't figure out a
scenario that didn't require.
A bunch of weird scripts with, you
know, running cron jobs that would
probably never notify me in a failure
that would run certain things.
And it just was a mess.
So you guys show up and suddenly
I'm like this, I could do this in an
afternoon and it would cost me nothing.
Like it would cost me
pennies with, Plakar.
Yeah.
and
the nice thing also about the, like
our open source dimension, 'cause we
are an open source first, company.
Clearly whatever we do is open source.
Unless it's strategically
not good to do it on purpose,
but that's the default thing.
it's to provide enough libraries
and examples to empower users to
actually extend the integration.
Our goal right now would not
be to be like, we handle all
the integrations ourselves.
That would be more like, an integration
that's fairly critical to companies.
We would do it.
to provide some kind of level of
quality, then if users want to implement
specific integration, we would like
help them, get them forward because,
like if you want to use one tool to
back up everything, you have to have the
manpower to do everything, which is not
going to happen.
and by making like some of the tasks
we did today with my team was how
do we simplify the API even further?
So, so.
People are less likely to even shoot
themselves in the foot while trying
to do something simple because that
lowers the bar to being able to
actually, instead of spending your
time writing a script, that's not going
to be very good, write an integration
because it's as simple as that script.
And it's going to be like a reviewed and
you're going to get help from others.
It's going to fit into one thing that
actually tackles the difficult part.
And that's where I would like to
reach in terms of open source.
How does this work?
in terms of the development are I
mean, all the integrations are open
source, but what, how many of those
integrations are created by the
community versus the core team is it,
I'm assuming this is led by feedback.
Like people are asking for things.
So then you're motivated to
make a, integration for them.
in terms of, how many were
done by the community,
I'm just curious, like the ratio.
And now currently it's currently all of
the integration were done by ourselves.
And
we have pushed, a few months ago,
we have pushed the SDK and we are
trying to provide example and,
you know, Simplify even further.
but it's the community that's
driving the decision about which
one we do currently, for example.
Like people have been asking for
IMAP and gcloud and stuff like that.
We're going to go spend
more time doing that.
but yeah, the idea is to start growing
the developer community, not the user
community, but the developer community
into extending their own integration.
Yeah.
we reached a right level right now, a
right level of, easiness, difficulty,
depending on how you see it, of writing
on integration because it's, it boils down
now to writing one function that scans
and allows you to enumerate your data.
And provides you an accessor to
the data to actually read it.
Once you have that, it can plug into
what we have and you get all the,
benefits behind.
Which means that some of the integration,
like the Google Cloud integration
was done in half an hour, unplanned.
So one of the developers was like,
Oh, well, I have a half an hour.
I'll do that.
And that's Okay.
He has the knowledge, but you can
assume that someone who does not have
the knowledge will take more time,
but he's not going to go from 30
minutes to a month doing that task.
You're not reinventing the wheel
every time you want to back up
a different product and, cause
I'm here for the Docker backups.
I'm here for, I'm here for, image
registries to be a, from and to.
And for me, so I actually.
years ago, I created a small
script called Docker Backup,
Volume Backup is like the name.
And it kind of took off a little bit
and then Docker ended up adding it
as an extension into Docker Desktop.
And then eventually they just made it
a default feature in Docker Desktop.
And so now like in the Docker
community, volumes were never really
meant to be moved around as images,
but they're just files, right?
So, I get more requests for fixing
that shell script, if that's all it
is, and working on that than just about
every one of my other examples, and
there's clearly a need for developers
to have, to move or backup volumes on
their local Docker system, or whether
it's Docker or Containerd or Cryo or,
whatever, it doesn't really matter.
Podman, the developer sometimes wants.
To move, you know, the database files that
are on that Docker volume somewhere else.
And there's not really a move option.
Right.
And there's no easy way you kind of
have to learn all these different
commands for extracting it out.
Do you put it in a tarball?
Do you put it in a container image?
Like all that stuff.
So I'm here for that integration.
So sign me up.
I'm going to make you laugh.
two days ago, I was
having my sleepless night
of wondering what I was going to do.
I was looking into Docker because, we had
a discussion a long time ago about, how
could we benefit from our deduplication
to, lower the size of storage for
images instead of, layering layers, each
of the layers could be deduplicated.
And so I looked into it because
I had never looked into how the
backup of this stuff worked.
And they use tar as a format,
yep.
And we have a tar importer.
Which actually can extract a tar and back
up what's inside the tar, which means
that you could back up all your images
and have the duplications through them.
And I look into how it's happening
with containers and we can back up
containers the same way, actually.
and metadata, That's all it is.
we have an integration that's not,
not, you're ready yet because, it's
a small experiment, but something we
could push forward, which is, okay,
we have already a tar integration, So
we can create a Docker integration, which
is actually, an integration that talks to
the Docker API to get a stream of the tar
that gets passed into the tar integration.
And then boom, you have a new thing
that's packaged and not using a
script on the side, which is the goal.
So really that's the point of
the, of, of, Plakar initially is
to allow doing this in ways that,
oh, that stuff is not backed up.
How can I actually back it up, you know,
in a clear way without too much effort?
Obviously, there's some dev
here, but once it's done, it's
no longer dev for other people.
So, so there's that, but the idea is
that then you, if you trust the tool,
you trust that your Docker backup
is working the same way as your SQL
backup or your file system backup.
yeah, there's even a scenario
where, You could use, cause you
know, a container registry is
nothing but an object store really.
And there's a, there are in the cloud
native community, there's sort of a
consensus around the container registry
being the artifact of all things,
like the storage of all artifacts.
And so now we have all these different
types, container images, it's just
one type for an OCI registry, but
there's all these other now file types.
We can store Helm chart data in there.
We can still compose files in there.
Each one is its own.
It's not a container image, it's just
a registry artifact, and there's even
utilities now that we use to, if we have
a new type of object that we want to store
in the registry, we can use tool utilities
to create the metadata for all that.
I'm not sure that necessarily a registry
is a great backup storage location.
Like maybe an S3 storage
system would be better.
cause all the clouds already
have that, but they all also
already have image registries.
And so a lot of times when I'm working
with teams, like if we're going to
implement some sort of new backup or
some sort of new replication system,
or like we, if we need storage for
something, it's a lot easier for
me to use what they already have.
What my ultimate back end storage, I
mean, file storage and S3 storage probably
makes the most sense for the two types
of storage for backups, but there's
probably other scenarios like container
registries, and I love the idea that when
I'm on the site, the integrations can let
me, it kind of clues me into which ones
are inputs and outputs and which ones are
both, and it made, and just staring at
the options that you had made me think
like, Well, you know, do, can I store, you
know, Google Drive backups in OneDrive?
And then, also store
OneDrive backups in Notion.
Like, I started to wonder, what's
my, what's, what is my, I have a
document that's the path of where
all the things I need and where
they all go for backups, right?
If we're all talking about 3 2 1
storage for backups, like we've, we
often as backup engineers, even of
your own software, your own, stuff at
home, you often forget a year later
what you did with all it and how often
it backups and where is it going.
I tend to forget.
And I know I'm using Backblaze
in some places and I'm using a
different cloud in other places.
And I have to document all of that
for my own sanity because every year
I think I should check my backups
and see if they're still working.
And then I forget, I don't know
where my backups are, how they work.
And I have to go and
redo all that research.
So the idea that I could maybe Get
closer to having this in one product
is something that, I might have to, I
might have to do and make a video on.
So, let's get some Docker stuff in there.
a break
think the only managed service that we
are providing to the community right now
is to, you know, send some email if you
have issue with your backup and a summary
of, you know, what you are backing up.
maybe we can cover this one because
it will be always in your mailbox,
you know, what kind of backup
you have and where it's stored.
So that could be nice.
Yeah.
And at this point, you know, once,
once we have the image registry stuff,
then you start talking about Kubernetes
and can we run this on Kubernetes?
let's talk about the storage for a
little bit and get into the weeds of it
because I'm going to, I'm going to pick,
bring this up and hopefully this isn't,
hopefully I'm not, trolling your issues.
But I brought, I was looking to deploy
this before the show so that I could
come to the show and have feedback
or experience to talk about and say,
yeah, I got it to work last night.
the first thing I go for being a Docker
guy is I want to deploy the Docker image.
There's an issue open that's asking for
the Docker image and, I think someone
replied and said, Oh, we're not ready yet.
We've got caching, we've got other things
we've got to worry about and we need to
come up with a more cohesive strategy.
So I'm here for that cohesive strategy.
Let's talk about that.
what are the challenges
that you're seeing?
And do you have a plan?
Because we mentioned, we talked a
little bit before the show, but it
sounds like there's stuff coming.
So, not to spoil anything,
but let's talk about it.
It's just that the feature
requests came very early.
we had, like our first, server release
that happened a few months before this.
we had, the first user feedback
and we were trying to find the
priority, things to tackle.
and this came and requires some,
Thinking from the team about what it
means to have a Docker image for this,
because you have to actually mount
your volumes, within the Docker image.
You will not run this as an agent in
most cases, or is it what users want?
That's an open question.
That's not an answer.
That's, you have to think about
how are they going to use that?
Because the Docker image you're going
to ship as an official one, you're
going to have to support it in some way.
You can't just.
say, okay, we just raised the Docker
image and it's not doing anything useful.
And we had users saying, oh,
well, we need to have a, this
is going to be run from a CI.
So it.
loses its state every time.
So you need to rebuild state.
Okay.
Well, that's going to be an issue
because we need to have some persistence
out of the Docker image for this.
Some people saying, oh, I'm
going to use this from a machine.
Yeah.
but that means you could have installed
Plakar on your machine rather than
Docker, because it's going to be a,
like a lot of work for us to support
a use case that's not that useful.
Whereas you could launch Docker as an
agent with Plakar in it to go query the
other things because Plakar is flexible
enough that you can run it as a, I'm
doing my job from the Plakar instance.
on my machine, but also
as this is controlling the
backups from other machines and
transferring data here and there.
I mean, depending on what
direction you take, you would not
build that image the same way.
And that wouldn't, I don't know which
one you would advertise the most.
Yeah.
Without having user feedback on this.
It needs to have discussion and we need
to have users from the community telling
us that's what we need in the ary image.
And that's what, Is going
to drive the development.
Um, that's
more, most of
the issues there,
is there a concept of backing up the
system itself, like the configuration
and the plugin list, is it got an
internal backup command that allows
me to save, essentially save state
of the whole system outside of the
individual integration backups?
Well, there's two, things.
The first thing is that none of
the state is, mandatory When you
run instance, you wipe your, cache,
folder is going to rebuild the, state.
Yeah.
So you are going to lose the
plugins that were installed.
But you can just click and reinstall.
so that would be the idea is if you did
not have backup for this, you are not
in the last case because you just, you
could go from a blank machine, you point
to the repository, it will synchronize
again, and you will get a state that's
a working state for your backup.
If you need to have backup because
you want to avoid having to re
synchronize or you want to avoid
having to reinstall, well, you can
just backup the cache directory and
you get a snap, a Plakar snapshot
with the configuration of your Plakar.
there's no particular, like
things that you would have
to do to make this possible.
It's just, there's just a standard
way of using Plakar basically.
Yeah.
on the infrastructure side of the storage.
you mentioned encryption, so
it sounds like you, do you
support encryption of backups?
Is
that?
we have out of the box,
the snapshots, the backups.
we talk in terms of snapshots, in terms of
a snapshot is a view of what the,
whatever you imported as data,
all these snapshots, they are
compressed and encrypted by default.
You have to actually, say,
I don't want the encryption.
I want to work with plaintext
have to turn it off.
Oh, okay.
That's nice.
Secure by default.
I love it.
it's end to end encrypted,
so you don't have, you don't
have a server, for example.
you're going to run it from your machine,
and you're going to say, my import bucket
is on S3, and my storage is on gcloud.
yeah.
Yeah.
you don't have a server running at AWS and
you don't have a server running at gcloud.
So all of this setup is
stored in the configuration
of the store that you create.
It is standalone.
And we don't want to trust
AWS or gcloud with keys.
And we don't have a third party
that would hold the keys to
encrypt, decrypt with the strip.
So we act with them as if they
were what we call dumpsters.
They don't do anything besides passing
packets that they see and storing
them, that kind of storage layer.
Yeah.
So at the lowest level, what if I do it?
If I do the simplest thing, if I do
the simplest deployment because I'm
an imp, I'm like, I'm primarily an
implementer, you know, an operator.
So I often think about, okay.
is it going to look like when
I have this thing set up?
what were the pieces of the puzzle
sitting and what do I have to run
long term and what are the ports I
need and how do these things connect?
So I'm guessing that there's a daemon
or I don't know what you're calling
it, the server part, but like a daemon
that runs somewhere all the time and
it has an, like an API that I'm like,
I can use a local CLI to control it.
there's two parts to it.
There's the, let's say the client part,
which is you running Plakar to import
data from a source and push it somewhere.
And there's whatever storage you
have, which may be a local disk or
which may be an S3 bucket or which
may be actually anything that,
that can actually, take a key
value, object, store, yeah.
So that would be AWS and you
don't have a server in between.
You have Plakar operating as a client
to, to your AWS bucket, for example.
so everything from the duplication,
compression, encryption is done on the
client side on the machine running Plakar.
So when the traffic leaves the
machine, you know that it can be
tampered with without being detected.
it can be decrypted without having the
keys that have not left your machine.
yeah.
that's the most simple case.
That's what you would
do, on your home setup.
You would install Plakar on your
desktop and you would run it from the
desktop saying the storage is there.
It's on my S3.
Then you have a different mode, which
is, through the integration, you could
have a server, you could have the
Ubuntu machine you said you had on your,
in your closet that could be running
Plakar and taking care of connecting
through SFTP to all the machines on
your network and doing the backup.
Then we have the non open source
version, which we're working on,
Plakar Enterprise, which provides
a server that extends Plakar.
So Plakar becomes the open
source client to an enterprise.
Product, but same tool as you would
use at home and the enterprise version
would provide a server that has
additional features, like maintaining,
privacy of the credentials for
all of your storages, for example.
So your clients at home would connect
to your, your client on the workstation
at work, yeah, they would connect to
the Plakar server of your enterprise.
And then that server would hold,
the credentials, to the actual
storages, to not link them
through the company, for example.
So you
have all these different ways of
working that allows you to have very
flexible setups that go from, yeah,
I have a mono machine and it's going
to connect directly to my store too.
I have.
segregated traffic and isolated
machine that have different privileges
and cannot access that S3 bucket,
but they could access this one.
And I can't trust them to do that.
So I have to have some layer of
validation, as it's at a enterprise level.
Yeah.
Yeah.
is that allowing for like.
Multihop backups or, I'm trying to
think of some of my more enterprise
challenging, like we had so many
backups happening that, you know,
one server couldn't do them all.
So for bandwidth purposes, so we
ended up with, in one scenario where
there was like a main orchestrator.
Server, and it had multiple backup, we
just call them agent boxes, but they,
their purpose was to back up the data and
create the snapshots, but the, but they
weren't necessarily backing up themselves.
They were backing up other machines that
also had agents, but the middle tier
of fan out.
need to back up certain amount
of terabytes every 24 hours.
if at the time, this was 20 years
ago, but at the time we were limited
to one gigabit network connections.
So we were literally creating new
servers in the middle tier because we
were saturating pipes and we couldn't
get enough backups from all the
different systems in a 24 hour period.
So we had to add more middle tier,
but there needed to be a central
orchestrator that managed the
jobs that it was distributing to
the individual middle tier stuff.
But there, you know, there's a lot
of small shops that I deal with where
one person is saddled with DevOps,
and ops, and backups, and recovery,
and like monitoring, and logging, and
storage, and cloud infrastructure.
Like they're just having to do it all.
I actually call them solo DevOps.
that's the label I give to these
unfortunate individuals that are
given way too much, work to do.
Maybe AI will help them.
maybe we can, rely a little bit more on
AI to help us with the advice on that.
But it's just, it ends
up being a whole lot.
Right.
And I saw that you had a demo on the
website, but yeah, I'm just curious
about how big does this get today?
And like, where's your vision
for where the enterprise product
that you're building is going?
now, ransomware attacks are
first targeting the backup system
of, you know, any companies.
So, for different kind of
reason, encryption is required.
And you need to be sure that your storage
And the backup server doesn't have
your credential of the encryption key.
Otherwise, if your backup system is
falling, at some point, the attackers,
they have all your data in one.
So, end to end encryption is key.
Clearly becoming a kind of
prerequisite for, you know,
securing your backup right now.
If we step back a bit you know, the
issues that you mentioned about the
size of the backup, we solved that
in the past with deduplication,
mainly on the, on the, filers.
So basically it was the storage
that was, optimizing the space.
Duplicating the data, but it's
work only with unencrypted data
because with Uncrypt data, you are
of course losing, the duplication.
So today we are in, in a situation
where we have companies that.
want to have end to end encryption
on their data, but, they cannot,
you know, in that case, using
the deduplication of the filers.
And so storing the backup will have
a crazy cost to make it happen.
A lot of, vendors basically created kind
of alternative with proprietary formats.
where they are still optimizing
the space, but at the end they
have still the encryption key.
And what we are trying to do with
Plakar is to solve this issue.
So basically, because we are doing the
encryption, the compression, and the
duplication at source, it means that,
all around the path where the backups are
going, they are already super optimized
in terms of storage and in terms of space.
we have, almost, 15, 000, cycle
of backups that we did, snapshot
that we did, on this machine.
the logical size is 24 terabytes.
So, we I have here a huge amount
of data, but the space that we
are using is only 159 gigabytes.
Even if everything is fully encrypted
and unencrypted, the storage has no
knowledge about the encryption key.
And I think It's a game changing thing
of this technology, because it allows you
to move your backup everywhere you want.
You know, in any cloud provider,
on premise, where you want, even if
you don't fully trust this provider.
And, but you can do that with an
optimized, network cost because, you
know, sometimes if you want to synchronize
data between cloud provider, you have
to pay egress cost, which is super
expensive with huge amount of data.
And because you, with Plaka, you will
just pay that at the first, for the
first backups that you are doing.
And all the snapshots that will follow
will only transfer to your server.
the few blocks that were
not backed up before.
So the storage optimized, and
it's fully end to end encrypted.
today, I don't know so much, option
to make that happen right now.
Yeah.
So it's doing incremental
backups after the first, or
incremental snapshots, I guess.
there's,
I will let Gilles, but
we differential or incremental?
Yeah.
when you have a, an incremental
backup, you actually create a chain of
dependency between all your snapshots.
The thing is, you get as, the
more your chain goes without going
through another day zero of sync.
The more you increase the
likelihood that you will have a
corruption at some point that will
break your thing.
And you, Yeah.
so you have a higher risk and you
have to test everything very often
because you want to limit that risk.
Like you, you don't want to do, to
go through the hassle of doing the
incremental backup, just to not test
it and test it in a week and realize
that, oh, you have one week worth of.
Deltas that are trashed, basically.
Yeah.
And the idea is that, you can also take
an approach that is a index reference
based, where basically what you're
doing is not saying I'm building a delta
against what happened right before, it's
building a delta against what's in the
store as a global storage repository.
So, Your backup actually benefits
from any of the previous ones doing
anything, and you don't have a chain
of dependency in the sense that you can
delete the one that happened yesterday.
It's not going to break any dependency
with the one you have today.
as long as your store is, reliable,
you can do any kind of removal that you
want with the granularity that you want.
And we can consider them as being,
autonomous in the sense that each
snapshot is, autonomous by itself.
Does not require any other one.
the thing is you have to
trust the storage anyways.
You're going to store your data there.
If you
don't trust it, well, you have to do
something that's called three to one
backup to actually ensure you don't
have one copy of your backup and you can
restore your broken backup
from another backup.
that's the idea.
we have a cool way to manage it.
And so this allows you to have all
the benefits of incremental backups
without the risk of incremental backup.
Yeah.
That's nice.
with the sync command right now, you
can actually super easily, synchronize
a ClosetStore in several locations.
So basically you are pushing
one backup in a ClosetStore.
And you can have two, three closet
stores that are, replicating
those data in different locations.
So, yeah, for the, having all
your backup in one closet.
will be too risky.
Of course, you need to have a
backup strategy on top of it.
And we are providing a cool way to
make it by, you know, having this
way to synchronize, with a again, low
cost on storage and bandwidth, this
closed store in several locations.
So you are pushing in one place, and
you are able to have two, three copies,
even in cold storage, to be sure
that your data, are remaining safe.
If the first storage
has some issue, but yes,
with different granularities, because,
you might say, oh, since it's encrypted,
you may, you need to have the exact
same copy, but that's not happening
because the snapshots are individual.
So you can actually, say, oh, I have one
store, like on my NAS near my machine.
I will back up on my machine in the
local disk, just to have user error
reparation, because I did it something,
I have it immediately available.
But I might synchronize one snapshot
per hour to the NAS and have that
one, span again, like one copy into
AWS and one copy into Google cloud,
for example, and the synchronization,
it's not doing another backup.
It's really pushing a copy of the
snapshots through different, sources
Possibly have different,
encryption keys as well.
it does, like trans, transformation
between one, one to the other so
that in the end, each one has its
own encrypted copy of the same data.
and, this saves from cases where, for
example, you would have your machine,
you want to back it up to two places.
Yeah.
You would run like natively.
You would do, oh, I'm going to do
a backup to AWS and I'm going to
do a backup to, to Google Cloud.
Yeah.
But in between.
Something may have changed.
You're not backing up the same thing.
When you're doing the sync, what you're
doing is getting the info from one of the
stores and transferring it to the other.
So at the end, they have the same
data, which has its benefits.
like if you lose
something, it has benefits.
And we repair the store,
also, if we have a
corruption,
repairing the stores as well.
if
you break something, you can actually
repair it from the other one.
yeah.
And you can also, of course, run some
check on the store to check, you know,
if the data is still, what you expect,
in the store on two different way, right?
And we have R&D, projects about,
error correcting codes to auto repair,
maintenances and stuff like that.
just wish to be clear, the crypto,
we did not do it ourselves.
it's We are a team of people who
have worked in security a lot.
We have been facing specs a lot
about crypto in banking and stuff.
So we kind of had a hunch about
what should be done on where.
And we had an external, independent
auditor with a famous corp. Cryptographer,
where the book I have behind myself,
was, okay to actually audit this with
no buyout because it has no interest
validating something that will be broken.
So that was, just to have a third party.
We managed to put cryptography in
every layer as validation concept.
You have HMAC everywhere, so if you
flip one bit somewhere, it's going to
completely break in the nice sense.
It's going to tell you
there's a corruption there.
It's in that specific file, and this
is collapsing because there's that
file and that file that also shared
that data, so they are all corrupted.
So we have already the detection
part in a very, granular way, in
the sense that it can pinpoint
very specific chunks and objects.
And having that plus the ability to
synchronize, we can, build upon, tools
that would, that we are allowed to do
a very, like pinpointed reparation.
Without having to repair everything
because that's costly too, we're going
to be able to say, Oh, I have one chunk
it's broken and I can fetch it from there.
I'm going to fetch just
that, that amount of data.
We have, well, as I said, the error
correcting codes, because since we
can detect all that is broken, we
can have on top error correcting
code that could repair, auto repair.
You know, in the same way, like repair
in a buffer, verify that it's correct.
You can check with other repository.
Oh, it's very correct.
So Yeah.
do repair for real, apply this.
So we have all these paths of,
possibilities that we can implement
that are not like that far away because
we have branches that are working.
They're not prod branches.
Working right.
now,
but they're working enough that you can
actually say, it's not just, an idea.
It's something that you can actually,
that was the focus, next month
it would be there because there's
enough, enough bricks to prove that.
And, and we,
have a ton of these ideas of, like
what would be the tools to make it
more reliable, in the sense that it's
reliable that you would detect something
is corrupted, but how can you make it.
so reliable that people will
not be stressed if that happens?
But that's the goal in the, in the idea.
I've been, as an architect in the past,
I've been in so many incidents room over
Slack with people that are, they lose
their mind when they're have, they have
an incident and the backups are hard to
manage.
Cause that's not, we know we have backups.
Now we have to go into the
backups and we never do that.
So now we have to figure
out how we go into that.
how we, and if, One of them is corrupted,
then, yeah, it's, it's, stress plus plus.
You're going to a high level of stress.
And we want
to be into the session where
they don't do not face that.
they, okay.
There's a corruption, even your backup.
Well, there are ways to get out of this.
And most of them are automated.
As you manage backups, like there's these
three phases of the DevOps, the operations
engineer that's managing backups, there's
the implementation, which is obviously
very time consuming, and you're learning
the product, and you're testing backup and
restore, so you can believe they'll work.
And then once you kind of get there
to your, the projects implemented,
and you feel like everything's
going to work in a recovery.
You tend to leave it alone, right?
Like you're checking to make sure
things are going like, as new
infrastructure shows up, you're
adding or removing jobs or whatever.
And so you're kind of in maintenance
mode, but then there is that incident day.
Where they call the backup person
and they're like, okay, we need to
bring you into the incident room
or into the Slack team or whatever,
because we now need a recovery.
And typically, most of the teams
I work in, like not everyone
that can restore, right?
There, there's only one or two
people that can restore tool.
And so in that moment.
As I can viscerally remember
being the manager of the people
managing the backups and worried,
starting to doubt everything, right?
Like they're about to test the restore
and I'm doubting like, did we, when
was the last time we verified this
type of this particular integration?
Like we've had three major version
upgrades and we've never tested
since we did the initial deployment.
So we don't even know if
this restore will work.
We recently had to replace
three of the drives in that.
So, is there a potential for some
sort of disk corruption that we
didn't know about because the files
just sit there and they never get
touched and they die slowly over time?
there's so many, moments in that where
I'm worried that someone's going to
get in a lot of trouble or fired.
And then the recovery
happens and it works.
Maybe it doesn't.
there was one time where we
ended up having corrupted files.
And we had to go to offsite tape from
like a month ago, because we had this
process where we would go to tape once
a month that would go to an offsite,
it was going to a different data center
in a different part of the state.
So it was like a 300 mile, the goal was
that no storm, if the storm took out the
data center, that's the, of the three,
two, one, that's the third copy, right?
Like it's a state away.
It's been driven there
by one of our staff.
We know that it's physically there.
And we had to go pick those tapes
up and they actually worked.
But It took like a week.
It was after a hurricane and we had a
flood and we had servers underwater.
And so we had to go to the offsite
storage, And that whole week, I was
just so nervous that these things
weren't going to get restored.
We were like, basically going to
start from six month old data at best.
And luckily, luckily it worked.
But those kind of things, we don't talk
about those kind of horror stories enough.
you know,
one of the reasons I think people are
stressed is because, partly most of the
companies, they don't have a backup team.
They like the big ones have a backup team.
The other
ones have, they don't have, a backup team.
People who are given the task to
do backups, it falls on them, as
part of a long list of other things
to do, and they have to get rid
of it, fairly fast and it's not a
topic that they are interested in.
they just like, yeah,
you have to do backup
before, before Friday.
Okay.
what do I have?
There's only enough,
there's a list of 10 tools.
None of them seems appealing.
I'm going to take.
One that's popular because no one's
going to get fired over a popular tool.
That's going to be the decision driving.
but then, if they don't have to use
these backups and if it was like a task
following up on them, they're not going to
have a look at the backup once it's done.
Like they will check that it happens,
on the regular basis, because
it's supposed to run every day.
Well, they will check that it
happens every day, but they will
not inspect the data every day.
Cause that's, they have
other things to do here.
The other thing is that in most tools, the
backups are kind of dead data in the sense
that they are meant to be backed up and
no other use than being backups, you know,
when we design stuff, we're more
interested in how do you actually
use the data, because what's going
to happen is you're going If you
have no use for data, for that data,
you're not going to look into it.
If that data that you backed up is
actually usable in a very usable
way, and you actually use it every
day, then you have a fair confidence
it's not corrupted because you've
been using it the last few days.
The demo website, that's just
the open source version, okay?
So that's not a company use that you would
have of it, but we have previews of files.
Within these files, you can preview
the photos, but you can preview
the videos, you can preview audio.
If you actually use that snapshot,
which is a backup, it's a backup
stored on the screen, for example,
you actually use it in a way that
you would use your Google Drive.
Every day, looking into things that
you actually manage and, Oh, I want
to look at the content of a file,
but I'm going to use a snapshot, not
a copy that I have on my machine.
Well, you know that it works because
you actually viewed it, recently.
yeah.
And it becomes immutable data in
the sense that you can't alter it.
it's like a read only data, but it's
read only data that's, that's you
actually use and it makes that the data
a bit less dead and a bit more lively.
I think if you have a use case that
way, you enter into an incident, you
have to restore something that you've
been looking, like you've been actually
using the snapshot every day through
a web interface or through mounting
on your system as a local directory.
Well, you're not as stressed because you
know that, that works actually, which is,
you've removed the, the painful part of
the question of checking your restores.
Yeah.
is the check similar to
a, like mock restore?
Yeah.
It's, it's an in memory restore
that discards the data after doing?
the cryptographic checks.
So it's actually, if you restored in RAM
and you validated all the checksums, but
we do it in a stream way so you don't
have to actually hold the memory for the
snapshot.
For the whole, yeah.
yeah.
Okay.
Yeah.
Cause it has to de, de dupe.
Yeah.
has to read the data
we have a couple questions I don't
even know if this is a thing.
Is there a plugin for
Cyber attack detection.
And I asked, is that, are you talking
about like ransom, like detecting
ransomware from encrypting everything?
yes, is there something like that?
is that a thing?
And what is, what do you
think about ransomware?
Like, how do you do anything
for ransomware or do you just
We do some, no, no, we do something, but,
the, like the position, the posture that
people should have is the data is tossed.
You have to have a copy elsewhere
and you have to have a copy that's,
not, reachable by ransomware.
Well, you have to have data that's
offsite and not on the network.
And that's the only way that you're
sure that, well, sure, relatively
sure that your SMR is not going to
affect you.
other provider?
And then we have, okay, once we have
tackled this and we have said to
people, don't trust anything else than
this solution, then there's all the
solutions that are like, best effort.
like for example, we have, we have,
entropy compute, compute, the entropy of
files and directories and we store this
as part of the metadata of each snapshot.
So you could actually use a diff,
like a paper diff, way to, To compare
if the entropy drastically changed
between two snapshots, for example,
this directory that had that low entropy
before has a very high entropy now.
The thing is the stores that
are pushing the They're Right.
on this.
So, so you're not, ever editing
something in the source.
So they can be, actually you can have warm
and forced, at a, at your provider level.
If you have entropy checking,
plus, the offsite copies,
offline copies, you kind of have.
Like a fairly good, situation because
things that would not completely trash or
store, you can still manage to say, Oh, I
had the machine that has the ransomware.
It's pushed back up with ransomware,
but the others, the snapshots
are not affected and it can
actually, remove the broken one.
Because they're immutable.
Then if that did not work, then
you can go back to, oh, I have a
offline copy, I have a offsite copy.
You, so you have to manage this for you.
You just can't trust a software
solution to take over somewhere.
Yeah, I like the entropy idea though.
Like, you're basically talking about if
the change rate on this particular backup
is normally 10 percent a day, having
something that notifies you when it's,
you know, double that, 20 percent change
this today or whatever, and some sort of
Yeah.
that your
And you will probably have an alert
on the size also, because of course
if you know everything is encrypted,
usually, you know, you can do with Plakar
something like 10, 000 cycles without
increasing the size of the storage.
You know, you can, yeah, increase the
frequency because we are just storing,
you know, a few metadata, only the
changes between two snapshots, so,
and so you can virtually make your
backup from, you know, every day to
every hour, every minute, depending
on the size of what You, are making.
if you have a ransomware, you will
have an alert on the size because it
will double at some point the size of
your storage and it's, something that,
should never happen, so.
Yeah.
you have that, and you have, you have
the idea that we, as I said, like very
early in the interview, we have built
some kind of database in some sense.
So we have multiple indexes.
we can look up images or videos because we
also index MIME types and stuff like that.
And the MIME types, they should be
aligned, in some way to the entropy
of the data, if you have a text plan
file and it has a high entropy,
you're going to raise alerts.
That's not, that's not great.
So you have many, many, these are the
few that come to my head right now,
but there are many, other ways, other
heuristics that you can use to actually
detect some kind of a fishy scenario that
that would gradually, take place because
you're, if it's already, if the ransomware
is already there, you should know,
because you are asked to give money.
But if you're in the middle
of the attack and you have.
a backup that's happening in this app
that have half the data corrupted,
half the assets are corrupted.
You're going to detect it through
entropy, metrics like this.
I feel like if I had to make something
myself, it would end up being
something that was so stupidly simple.
Like I'd create a monitoring solution
that watches a plain text file.
That's like, don't encrypt me.
txt or something that I put on every
single file share, every single server.
And if any single one
of them ever changes.
I get an alert, like I have some sort
of agent that somehow detects all
of them and it's the first level.
it doesn't even wait
for backups to happen.
It's just like, Oh,
this file just changed.
cause the way I've seen these things
roll out, these ransomwares is it
starts small and then just spirals.
So there are early indicators in
the early hours, because If you've
got terabytes of file storage, like
that doesn't all happen at once.
And it doesn't, and not everybody
has permissions to everything.
So it typically starts in little places.
So I'd probably like seed all
these little files everywhere.
besides the smartness of the
discussion, There's only the offline
backup that's going to save you.
There's
no other way that you be, it's,
it takes only one miss, to,
the ransomware, if you misdetect and
you let it happen, it's already done.
you don't have the luxury to,
to try and see if it works.
Yeah.
so you have to have
your own offline backup.
That's the only solution.
And then the, everything else is
nice to use that you could have.
But it should not be a blocker to
have the most annoying part, which is
the offline backend, which is the one
that takes the most effort to produce,
Yeah.
Well, I like those read only S3 buckets.
Those are something that I like to use
for, ensuring that files can't be deleted.
My backups can't be changed.
you know, did you hear about the Unisuper
incident where Google, they just deleted.
All the complete az, and region of
UniSuper, and they lost everything
because they, you know, basically they
dropped the billing accounts of the
client and it cascaded everywhere.
So, yeah, I would not rely on S3.
It's,
Oh, for sure.
I just mean, unlike a normal file
server or any, any drive storage or
anything like I can more easily ensure.
Things that are written to,
buckets don't get changed later.
Whereas like everything on a file
server is, you know, up for debate
on whether, what can access it.
but yeah, good advice.
one other question.
Gartner recently introduced the cloud
native infrastructure recovery category
in their latest hype cycle for backup
and data protection technologies.
Where would you position Plakar in.
Cares, I guess that's the
acronym for Cloud Native
Infrastructure Recovery something.
Let's say that we announced, last week or
this week that we are joining, you know, a
sponsor, Linux Foundation and CloudNative,
Direct Initiative, I don't know, yeah,
so basically we are joining those two
Sandbox maybe?
Are you going to go for Sandbox?
when you donate, are you donating
or are you just becoming a member?
we are, donating basically to, yeah,
be part, of the foundation, and because
why, it's the first step, you know,
to, who to put there and to understand
how you know this ecosystem is working
right now because we have to admit that
Jill is coming from the BSD world and
I don't have so much experience on this
one so it's we have to you know figure
out how we can be integrated and it was
the first step but yeah we are working
currently on the support of Kubernetes.
we hope, for Cloud Native Paris,
the 3rd of February that we have
something that will be, usable.
and, yeah, we really think that
at some point, a layer is missing
in the, in Cloud Native about,
you know, residency and backup.
and that, you know, with Plakar, we will
try to contribute, you know, try to bring,
up This, layer and be sure that, whatever,
the data that you have to back up, you
will be able to relay on this layer.
My previous job, you know, I was
managing quite large team with a
big e commerce company in Europe.
And I was fighting every quarter
to be sure that all the team
made their backup at some point.
But the things I never achieved is to be
sure that all the team has 3 to 1, Eclipse
3 to 1, you know, encrypted everything.
I think what is game changing with
Plakar right now is that we decoupled,
the storage, from, the technology
to store basically your backup.
and you can store your backup anywhere.
without trusting your provider.
So what does it change?
you could imagine, and it's what we
are releasing right now, a protocol
where you can push your backup to a
provider, and the provider is managing
the resilience of your data without any
kind of knowledge of your encryption
key, while we are maintaining low
network cost and low storage cost.
And I think it's, you know, the kind
of layer that is missing right now.
Be able to backup, all your
objects, basically, whatever is it.
just pushing it to a third party that
could be, you know, your own company.
It could be a team in your company
that is managing, the backup.
But today the issue is that pushing
all your data to one team in your
company with the encryption key, etc.,
to optimize the storage, it's a big
bottleneck in terms of security.
So, yeah, what we are enabling with
this new protocol is being able to,
you know, asking to every team, make
your backup, push this backup to a
third party, internal third party
or external third party, And this
third party will manage resilience
without any knowledge about your data.
So being able to make two copies
in two different cloud providers,
for example, one offline, etc.
And doing it on a clean way.
and I think it's, the kind
of contributions that we
can bring to the ecosystem.
But, yeah, of course, we want
to, do something to solve
all this resilience issue.
the way you're describing that, I can't
help but think that an OCI registry
would be a good option for that because
it's content addressable, it's SHA
hash guaranteed, unique identifiers,
it's read only, so you can be ensured
that there is integrity, it's got all
the metadata to it, so, I'm going to
put my vote in for that, but it sounds
like you're building something custom.
So, so I was going to ask as we end this
up, cause we're running a little long.
what was next?
It sounds like what's next is
Kubernetes initial Kubernetes support.
When you say Kubernetes support, are you
talking about running it on Kubernetes?
Or are you talking about backing up
like Kubernetes volumes or is it both?
we had a discussion about what was
the proper way to integrate into
Kubernetes because, it's one of our
developers that's working on it.
It was like, should I do
one integration that works?
That covers everything.
And I said, no, we have to decouple,
control plan and the data plan.
You have to be sure that, I want
to be able to back up all the
YAMLs from my configuration.
And I want to be able to selectively
back up, some of my volumes.
I don't want to have, no option, but
to back up everything or nothing.
So, so these are either two, two
separate integrations or one integrations
operating in two different modes.
but the idea is to tackle all of
them and, we were looking into his,
looking into, Valero integration.
and
it's tempting to go your own way always,
let's do our own integration, but there's
also a pragmatic way, which is, no, if
we can adapt to, to be run by Valero.
through Valero.
Then you get all the possibilities.
You can get, first our simple integration
to back up the kube configuration, which
can also, be used through Valero to, to
fit into the existing, setup of people.
They can just swap between
different solution.
They can test us while retaining their
old solution for whatever they're using.
Valero, then we can have a third
way of doing it, which is our own,
but that would come last basically.
But Yeah.
just to say we're planning on not
doing, being just a session that runs
within kube, but more as a session
that also manages to backup your kube.
Yeah.
That's one of the challenges I've been
seeing in the industry is like, You've got
like the traditional backup vendors that,
that have the plugins or integrations
or whatever they want to call it.
and, you know, you're paying lots
of money and they make you pay
for certain things like maybe the
Oracle integration is costing extra
and you know, that kind of thing.
And they're all closed source.
And then you have these open
source things like Valero and.
But the challenge with it
is it's just Kubernetes.
it's great at Kubernetes,
but it's just Kubernetes.
And typically, I don't really work with
any teams that are only Kubernetes.
I mean, even if they're Kubernetes
first and they're container first,
they're going to have other things.
And so then they have to have a completely
different set of tools for that stuff.
And those, then these two
things don't meet, right?
So, so Valera's backing up to whatever
storage you want to put on the plug in
on the back end, and then this other
system is completely separate, and
you've got, and, not that we can ever,
I mean, you know, at this point it
feels like all my clients have multiple
CIs, multiple backups, multiple clouds,
like there is no just one thing.
They've, they're doing everything
multiple times, multiple types
of databases, multiple different
database providers, and, So the
challenge I always feel like isn't to
get to one universal backup system.
It's to get as just as few as
possible so that you can maintain
yeah, that, that, that's the goal.
But if you were like, okay, imagine you
have 10, 10 separate tools, because that's
what I saw at some previous companies.
They have, like many teams, none of
them have, came up to a consensus
about what were the proper solution.
Each one came up with its
own, you end up with 10.
Even if we just reduced to three,
that's a net win over having to manage
10, 10 different solutions.
And in our sense, You can do something
very stupid, like stupid, stupidly, easy.
I mean, you can say, oh, I want an
integration that actually backs up to
another system, another backup system.
So you end up having everything
falling into Plakar through the
system of integrations, being able to
ingest data from whatever solution.
So there's also this, I'm saying it's
a possibility through the integration
system, but that means that you have
also a way to progressively de plug.
Unplug, older solution as you manage
to have integration written, but
still, be able to have everything
from day one into Plakar.
you know, you want to
back up some solution.
We don't have that integration
from that, but we have the
integration for your backup system.
Well, you
can back up through the other tool.
We back up the result of your backup.
Progressively as we have the integrations
to, to manage your tool natively,
you get, some tools out of the way.
So
the idea is to allow people to do that.
obviously we're not enough
people to write the hundreds of
integration that we would need,
but having simple SDKs, providing
good examples and starting to do the
most, like the most popular ones,
will lead us there ultimately, that's
the idea.
yeah.
And that's how tools like CyberDuck,
CyberDrive, like that whole, you know,
Project ecosystem, you know, like dozens
of different storage integrations.
I like using those tools because
they've got GUIs, they're user friendly.
They're really great for like personal
backups, personal file management.
and that tool, the magic of that tool
is that it works with like everything.
it, like every cloud storage
scenario you could think of,
it's got a plugin for that.
And so I feel like the integration or the
plugin ecosystem is is a lot of, in a lot
of ways, the magic of what makes a backup
product or a backup project, and really
interesting is the ways, all the different
things that I can backup up just in case.
I didn't see a GI repo option for GitHub.
that's maybe more of a, maybe a,
maybe we, instead of doing it with,
I guess if you do it with Git, then
you could do any of the GI providers.
But it might be better to actually do
it through the GitHub API and do it.
it depends what you want, because
on GitHub, what you would want
is probably not just the code.
It would be all the issues and all the,
right.
that's where
like two levels, right?
Yeah, it's like I need the
code, but I also could really
use all this other stuff.
Yeah.
well, this is great.
We could talk forever and I
really appreciate your time.
You've both been very
generous with your time.
we cover lots of topics in this
hour, and I'm, I am excited to start
playing with it and get started.
I am excited to hear about, what
you're going to do, what you're going
to announce on the Kubernetes side.
That's where I live.
So the Docker and Kubernetes stuff, I'm
going to subscribe to any issues that
have those words in them, so that I
can keep track of what the status is.
how do people find you?
So We've got the website,
are
on discord.
So and then you got the GitHub repo.
you got socials, looks like
you have a discord server.
So everybody that wants to get involved.
We work on Discord.
we are, as I said, all remote.
We're all working remotely and
we work transparently on discord.
So you can just come to our discord.
you can actually attend
all of our meetings.
You will be muted, , but you can actually,
look into any discussion, technical
discussion that happen, in the open.
Except the daily,
you can come.
and talk with us during the daily
yeah.
all right.
So we know what's next.
We know you're going to be at KubeCon.
People can follow you individually.
I guess you guys are on
socials, on LinkedIn.
I think in the YouTube description,
all the links are below for how to
follow these two fine gentlemen.
Well, thank you for, for having us.
This was, pretty great.
very much.
you.
All right.
Well, thank you both for being here.
And we will see you next time
here on, DevOps and Docker talk.
Ciao everybody!
Cheers.
Creators and Guests
