bitcasa and convergent encryption

People have speculated how the new startup bitcasa can both encrypt files client side and dedupe files server side.  The CEO says they use a method called “convergent encryption.”  This encryption method uses the file being encrypted to generate the encryption key. That way only people with the unencrypted file can generate the key but the same files unencrypted will be identical encrypted (and therefore de-dupable).  It is believed by the security community to work as advertised, with two possible vulnerabilities:

1) “confirmation-of-a-file attack” – someone who gains access to your files can confirm whether you have a certain file.  For example, someone could verify you have a certain movie/music file or leaked document.

2) “learn-partial-information attack” – in certain cases (from what I’ve read those cases haven’t been strictly defined) an attacker could learn some information from a file if the attacker already knew other information in the file. Examples might be a government form where a lot of the text is known but some sensitive text (e.g. your social security number) isn’t.

I’m a fan of client-side encryption, and even with these “vulnerabilities” it seems to me what bitcasa is doing is a good idea and should be adopted at least as an option by other storage companies.

One big limitation that comes with encryption is the inability to do operations like searching text on the server side. This can potentially be addressed through a method called “homomorphic encryption.”


11 thoughts on “bitcasa and convergent encryption

  1. ctb says:

    The ‘confirmation-of-a-file-attack’ is also known as non-repudiation, when you’re talking about individuals signing a document or an email with their public key.It seems risky to use the file itself as the key since it can’t be 100% confirmed (unless the file is original, comes from a single source, and can be somehow validated as such) that it’s not already in the wild somewhere. One of the strengths of PKI is that you generate a unique public/private key for your own use (which is seeded by a random or pseudo-random generator). It makes me wonder what types of patterns could be observed based on different file types, and whether certain files would prove to be stronger ‘keys’ than others (i.e. larger files vs smaller files, jpgs vs pdfs).

  2. chris dixon says:

    I was thinking one approach that might make sense is do some kind of client side detection to decide whether to do convergent or “traditional” encryption. E.g. do convergent on mp3, avi files etc – files that are likely to be dedupable and less sensitive. And then maybe let the user override these defaults.

  3. Reg Braithwaite says:

    The trouble with de-duping .mp3 and .avi files is that these are exactly the ones where a plausible confirmation-of-a-file vulnerability exists. We have the spectre of the RIAA and MPAA providing bitcasa with a long list of keys and demanding that they refuse to host any file matching those keys. Or worse, the RIAA and MPAA could deliberately wait for usage to grow and then demand a list of users who are storing any of the files matching their keys.

  4. chris dixon says:

    I also wonder with storage costs dropping so rapidly if deduping is really necessary. Maybe just encrypt everything the “traditional” way.

  5. Reg Braithwaite says:

    The issue with storage space is that file copies follow a highly exponential distribution. Most files aren’t copied at all, but a few are copied a tremendous number of times. Consider what happens when a starlet’s nipple slip appears on line. Within minutes, it is being tweeted, facebooked, and emailed. Users around the globe start saving it locally. It won’t take long for them to start saving it to the cloud. I wonder if we can use a different protocol. Some files need to be private in that nobody can read them except the owner. For others, the only privacy we need is whether we uploaded the file. It seems to me that For the latter type of file, we can guarantee user privacy and de-dupe, by giving up the ability to protect the file from being read.So the MPAA can discover that *somebody* has a copy of “The Phantom Menace,” but not which user or users it is. The file is stored in a global index with a hash. Users create a hash for files to be stored and check for existence. If the file exists, they upload a privately encrypted hash of the file. Nobody can determine which hashes they have, although everybody can read the files that have been placed in the “de-dupe” index.The company can provide two levels of security and charge appropriately.

  6. chris dixon says:

    Reg -so instead of encrypting the popular files (Kanye West song) you encrypt who has a copy of it. kind of fascinating idea.

  7. oroup says:

    They also claim to permit sharing of a file to your friends through a simple URL obtained by right-clicking the file. (The TechCrunch article makes that claim anyway.) Presuming that sharing means sharing the cleartext it gets more far-fetched that the server doesn’t have a copy of the decrypt key. I can imagine a few exotic scenarios by which all these claims might be true – a browser extension that inserts some secret into the sharing URL client side for example – but it seems more likely that these claims are just overblown and not all true.

  8. zubinwadia says:

    Homomorphic encryption is not ready for prime-time, at least not with complex data.

  9. robertovalerio says:

    Cloud Storage is a crowded place. No problem, since the market will be huge and a there will be no single big player.But I do not like new companies to be too bold and ignoring market facts:- Mozy tried to offer unlimited storage and failed! One BitCasa founder even work for Mozy and should definitely know there is no “free unlimited” storage offer that survived.- There is no “intelligent caching mechanism” that will work for all use cases. Mostly it does not. And since storage is cheap but bandwith is not (especially mobile bandwith) I do not get their approach of streaming all user data the moment it is accessed.- BitCasa “instant streaming” is nothing special. There are efficient standard video and audio protocols for media streaming. They even admit they did not invent anything new, they are just combining existing technology. Smart move – but as they said, nothing new..- The encryption based on convergent encryption is a “second class” encryption. I agree on all finding mentioned in this blog: There are weaknesses. Especially the “confirmation-of-a-file” case is important. – “Homomorphic encryption” is far from being usable. Basically we are talking about enormous computing power for very simple tasks. This is not usable for e.g. file search or data sorting. We had a look at it and it’s nothing more than a theory for a very small subset of operations.So what do I see: I see a new storage startup that tries to take the best out of all other storage startups (hybrid cloud [egnyte], encryption [cloudsafe], cheap storage [backblaze], functionality [] and easy use [dropbox]) and put it into one product. If that would be so easy every other of the cloud storage company would have done it already. Do I think they will succeed? No, they will have to sacrifice more than secure encryption. I guess it will be the price tag – the moment they run out of VC money. [Disclosure: I founded a cloud storage startup offering encrypted storage, my opinion is based on my own experiences.]

  10. herf says:

    $10/month is >100GB quota per user, even at Amazon S3 prices. Easy to consume that much with a weekend and a current digital camera. But that is months of uploading for the average user, so they won’t use it. Dedupe is a red herring. Personal media (movies, photos) is 90%+ of total bits.

  11. rells says:

    (oroup) I’d like to learn more about Bitcasa sharing of files “through a simple URL”. It would appear that they are encoding the secret key to the chunk list in the URL. If this is the case, knowing the URL string is enough for anyone to decrypt the file. Which isn’t (effectively) much different from Dropbox (and most everyone else’s) scheme of sharing via a URL string.(chris) Since you are “a fan of client-side encryption” have you checked out Lockbox ( They seem to have cracked the full client-side encryption problem (including strong credentials and user-controlled key management) to guarantee security and privacy of user data. I note they don’t share via URLs but instead use public/private keys.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: