Some thoughts on the ETH's Threema Analysis

Let's start with a disclaimer: I work at Threema. I'm a software engineer there, working on various systems. I'm not a founder or part of the management, but joined the company a few years ago. In this blog, I'm only speaking for myself, not for my employer. I'm obviously not impartial, but this blog post contains purely my own views, it was not commissioned, written or altered by my employer.

Intro
Where's Threema Coming From?
Protocol Choices
Expectations Towards Messengers
The Findings: Attack 1, Attack 2, Attack 3, Attack 4, Attack 5, Attack 6, Attack 7
Outro

Intro

On Tuesday, a new research paper by a team from ETH Zürich was published. It contains an analysis of various protocols used by Threema, resulting in 7 attacks (6 of them new). The official statement by Threema can be found here.

In the last few days, there have been quite some heated discussions and strongly worded comments on the internet (from both sides), with wordings like "Threema was broken", "riddled with vulnerabilities", "deadly flaws" vs "analysis of outdated protocol" or "hopelessly oversold findings". I don't want to write about that, but would instead welcome a more technical discussion.

The goal is to give context to some design decisions that might seem a bit odd nowadays, and to take a look at the preconditions and impact of the findings. The blogpost has gotten a bit long, I apologize for that.

Where's Threema Coming From?

The first version of Threema was written by Manuel in 2012, in a time where WhatsApp (not yet owned by Facebook) would blast your messages through the internet café without any encryption whatsoever. By opening Wireshark in Starbucks, you could read all conversations that people around you were having.

In the instant messaging world, especially on mobile, there were barely any secure options. WhatsApp (released in 2009) was quickly becoming widely used. People were still using Windows Live Messenger and chatted over Facebook (without TLS). Some nerds used XMPP with OTR (I was one of them, running my own ejabberd instance for a few years), but in practice you would either not verify the sessions at all – because it's a hassle – or the sessions would regularly break (especially across different clients). Sending files had a success rate of maybe 20%.

Peter Sunde (of Pirate Bay fame) announced Hemlis the following year in 2013, an E2EE messenger that collected 150k USD in a crowdfunding, but was discontinued before it was out of beta. The Swiss company Qnective published myENIGMA in 2013 as well, but it failed to gain traction. And then there was TextSecure, made by Twitter, which – at that time – was a replacement for your SMS app and which would opportunistically encrypt SMS messages towards recipients that also had TextSecure installed. Open Whisper Systems was founded in 2013, and until 2014 TextSecure had no support for group chats. (Today, the codebases of TextSecure and RedPhone have evolved into Signal.) This all happened after the first version of Threema was released.

In summary: Secure messaging wasn't pervasive at all, and the existing options were either overly technical, had a bad user experience or never made it out of beta. That's why Manuel wrote the first version of Threema for himself and his friends, and released it for iOS in December 2012.

Besides having end-to-end encryption of all messages, you could also scan the public key of another contact in form of a QR code, and this would mark the contact as verified in the UI. It was implemented in a way that non-technical people could understand and use. As far as I know, Threema was the first messenger to offer a simple way of authenticating contacts in a mobile messenger with a single scan, and remained the only one for several years. Other messengers sometimes provided end-to-end encryption, but often without authentication, opening up the possibility of MITM attacks. (Signal has introduced the marking of conversations as verified in 2017.)

Protocol Choices

On the protocol side: In 2012, TLS was in a bit of a bad state. Mobile operating systems commonly offered no modern ciphersuites at all and were sometimes plagued with bad random number generators. (For example: Android 4.0.4 didn't even support TLS 1.1 yet.)

Of course, for asynchronous E2EE messaging, TLS isn't really an option anyways. OTR was around at that time, but it required both parties to be online at the same time in order to establish a session. This means that you cannot send a message to a recipient before that person has accepted the conversation request. The Signal protocol didn't exist yet and the Axolotl Ratchet by Trevor Perrin and Moxie Marlinspike was only developed the following year in 2013 (and published in 2014).

With these options, the protocol choices by Threema at launch were:

A client-server protocol modelled after CurveCP, which is a protocol developed by Daniel "djb" Bernstein and announced at the CCC congress in 2010. (This protocol is also vulnerable to the ephemeral key compromise weakness described by the authors of the paper.)
An end-to-end encryption protocol based on the NaCl library (with NaCl also developed and published by Bernstein). Threema E2E messages use a classical binary type-value format and random padding to prevent the server from guessing the message contents based on the length of the encrypted message. Contacts use a long-lived public/private keypair, just like PGP.
HTTPS for the contact discovery API.

Besides simplicity of implementation (which generally is an important aspect in secure software), the choice of NaCl long-lived keys also enabled that out-of-band key verification feature mentioned above. Identities are irrevocably tied to a public key, meaning that once you've verified and stored an identity, you can be sure that messages from that identity are encrypted by the corresponding private key. In contrast to protocols where each device has its own device keys, and where users get used to regular "the key of your conversation partner has changed" warnings, this means that once you've fetched the public key of a user, the directory server isn't able to replace that public key with a different one. For the case where an identity is compromised, there is a key revocation feature built-in.

I'd argue that – given the historical context and the "don't roll your own crypto" approach – these choices were sensible and appropriate. Both the client-server protocol and the end-to-end encryption format use concepts and libraries from Daniel Bernstein, who is quite the authority in cryptography. I especially find the following quote from the paper a bit displaced:

To their credit, Threema’s developers did (largely) avoid “rolling their own crypto” but this advice should be extended to “don’t roll your own cryptographic protocol”. Of course this advice is only useful if good alternatives are available. In the particular case of the C2S protocol, Threema could have adopted the Noise IK protocol or just used TLS.

I've discussed TLS (and its state on mobile platforms) above. And the Noise Protocol Framework (which incidentally is also inspired by NaCl and CurveCP) simply didn't exist yet (AFAIK it was first published in 2013). Basing a protocol on CurveCP is precisely trying to avoid inventing cryptographic protocols from scratch. (It's unfortunate that both CurveCP and our protocol contained a flaw, and that we didn't notice this before.)

Expectations Towards Messengers

Nowadays, the world looks a bit different, especially with regards to the expectations towards a messenger. Two of the "recurring themes" are perfect forward secrecy and post-compromise security. There are various efforts underway to improve our protocols in this regard. One of them was the introduction of perfect forward secrecy – with the marketing name "Ibex" – as an additional layer, so that it can be rolled out in an interoperable way.

The development of Ibex began more than a year ago in cooperation with a cryptographer, and was not in reaction to the ETH research group (something that was claimed in interviews, even though the researchers knew from the meeting with Threema that we were already working on PFS for a while). We chose to develop our own protocol due to compatibilty reasons: Using a different, incompatible protocol would mean splitting the entire userbase, and becoming incompatible with existing apps, third party clients and server-side integrations. The decision to cooperate with an external cryptographer during the design phase was precisely to minimize the risks of "rolling your own cryptographic protocol".

There were also a lot of other changes over the years to improve privacy and security properties of our protocols and systems. Often, design choices that may seem odd at first glance, are guided by external constraints. For example, early versions of iOS did not allow waking up an application in response to a push notification. This means that if you'd get an end-to-end encrypted message, the app would not know who it's from, and could not display any name on the lock screen. Without UX features like that, the user experience is not good, clearly worse than when using a messenger that does not value privacy. As a workaround, the first version of Threema had the concept of the "public nickname", chosen by the user (optional and explicitly indicated to be public), which is included in the non-E2EE header of every message. Nowadays, iOS has the concept of notification extensions, which can decrypt incoming messages, so the public nickname became unnecessary metadata. In 2021 (after we dropped support for iOS versions that didn't yet support notification extensions) we implemented support for processing E2EE metadata (including the nickname). Additionally, we started encrypting the push payload between the chat server and Apple's APNs server. We also used this occasion for more modern algorithmic design choices like BLAKE2b as KDF and Protobuf as serialization format. (We've started using BLAKE2b for a while now for newer protocols, to achieve better key separation and to avoid payload confusion issues).

In summary, expectations towards mobile messengers are changing and requirements are increasing. Not just on the cryptographic side. A mobile messenger, which was essentially a glorified SMS app in the past, must now be able to send files, view and edit videos and photos, offer end-to-end encrypted 1:1 and group calls, offer multi-device capabilities and ease of use from the desktop, and much more, all while remaining secure and shipping regular updates and features.

We are certainly aware of these expectations and are constantly trying to review and improve our code and protocols. In some cases, we seem to do well (we have passed multiple external audits of our apps successfully without any major issues so far), and in other cases, we did not and should have been more proactive (e.g. trying to apply formal verification methods to the protocols). I reject the implication from the paper though that Threema was only reactive, and at the same time acknowledge that we must (and will) improve certain processes. Some changes are already in progress, and more are to come.

The Findings

Now on to the main findings! One thing that bugged me about the marketing website of the research team was the fact, that it summarized the broad idea and the worst-case impact for each attack scenario, but mostly did not mention the requirements, assumptions and prerequisites. A user that isn't technical enough to read and understand the paper, but reads through the website, must assume that it is highly unsafe to keep using Threema. In the worst case, they'll go back to use something like WhatsApp or Telegram (since "messengers are all unsafe, so it doesn't matter which one to pick").

So let's look at the attacks, the impact and the requirements for each of them at the time of initial disclosure. (Note: I'm not trying to imply that these findings aren't valid, or that they aren't important, I only want to show what they mean in practice.)

⚔️ Attack 1: C2S Ephemeral Key Compromise

What was required to pull off the attack?

A value that is used in the client-server-handshake (an ephemeral key) must be extracted from the mobile app. For this, an attacker must have access to the value in the app's memory, something that requires breaking Android's core security concepts, for example by infecting the user's device with targeted malware that breaks app isolation and gives full access to the memory. Additionally, even with full memory access it is non-trivial to actually identify the bytes belonging to that ephemeral key.
In their PoC, the researchers most likely modified the open source app's code to dump these keys. Unfortunately I don't know, because in contrast to the claim that "all the attacks are accompanied by proof-of-concept implementations that demonstrate their feasibility in practice", these PoCs and their prerequisites don't seem to be documented anywhere.
A network connection handshake between the client and the server must be recorded for the user that is being attacked. For this, the attacker needs access to the network of the user, of the server, or anywhere in between.
To avoid detection, the user's app must not be connected when the attacker connects to the server. If the user's app is connected at the same time (something that usually happens at least whenever a new message arrives due to push notifications), they will get a warning that another device with the same identity has connected to the server. While this warning trigger system wasn't always perfect in practice (due to a lot of false positives from users that didn't uninstall Threema when migrating to a new phone), there is always the risk of detection.

What could be gained by pulling off this attack?

An attacker can read some message metadata (e.g. the Threema ID of the sender, the Threema ID of the recipient or the timestamp of the message) of the otherwise encrypted messages, but not the E2EE message contents.
Based on this information, an attacker can delete messages from the server's message queue by acknowledging the receipt towards the server. When the real user reconnects, they will not get that message anymore. However, without knowing the content of the messages, it's hard to decide which messages to drop and which messages to keep, in order to make a semantic difference in the conversation, without risking being discovered.

How was this fixed?

The vouch box in the C2S handshake protocol was adjusted, in order to avoid this vulnerability.
When using PFS ("Ibex") in a 1:1 chat, reordered and missing messages are detected and the user is warned.

Other notes

If an attacker manages to access the memory of arbitrary apps on a smartphone, then that entire phone (and connected accounts) must be considered completely compromised, and losing an ephemeral handshake key is probably the least of your worries.

⚔️ Attack 2: Vouch Box Forgery

What was required to pull off the attack?

Have access to approximately 8100 CPU cores for 24 hours. Run a program to compute a certain value X.
Convince a user to use the server's public key like a contact public key. There are two variants suggested in the paper:
- a) Request a Threema Gateway ID with a certain public key that is equivalent to the server's public key and pay for that ID (64 CHF). Or...
- b) Get read and write access to an Android data backup of the target user. Use a vulnerability in the open source Zip4j Java library (used by the Threema app) to modify the encrypted backup (without knowing the contents) in a way that manages to precisely overwrite their own public key inside the backup file with the server's public key. (Details: Know where in the encrypted and compressed contact list backup inside the ZIP file his public key is stored. It could be anywhere in a range of typically several thousand bytes; the adversary needs to guess blindly, must hit the exact byte offset and typically only gets one chance. Overwrite the location thus found/guessed, XORing in the difference between their own (compressed) public key and the server’s public key (as ZIP encryption uses AES-CTR). The fact that the backup data is compressed further complicates the exploit to the point where the adversary would essentially need to know the entire exact backup contents in advance.) Now the user must restore this manipulated backup on their device.
After this has happened, convince the target user to send the exact value X (which could look like this: u9j6ߓ'jjखԻ^߃1כW:-́;ܡRA) to the Gateway ID from 2a or to the ID behind the public key from 2b. Once the user has sent this text (only that precise text, no additional text) to the attacker, there is a 1-in-254 chance that the attack has succeeded. For a successful attack, the text needs to be sent to the attacker by the user approximately 200 times on average.

What could be gained by pulling off this attack?

The same as for attack 1:

An attacker can read some message metadata (e.g. the Threema ID of the sender, the Threema ID of the recipient or the timestamp of the message) of the otherwise encrypted messages, but not the E2EE message contents.
Based on this information, an attacker can delete messages from the server's message queue by acknowledging the receipt towards the server. When the real user reconnects, they will not get that message anymore. However, without knowing the content of the messages, it's hard to decide which messages to drop and which messages to keep, in order to make a semantic difference in the conversation, without risking being discovered.

How was this fixed?

Threema Gateway does not allow creating accounts with certain affected public keys anymore. The account created by the researchers was suspended. We checked for other Gateway IDs with such a key, but found none.
The vouch box in the C2S handshake protocol was adjusted, in order to avoid this vulnerability.
When using PFS ("Ibex") in a 1:1 chat, reordered and missing messages are detected and the user is warned.

Other notes

The marketing website of the research team describes this attack as follows:

This attack means that, under some circumstances, a user might compromise his or her own account by simply sending a message to another user.

I'm leaving it up to the reader to decide wether this is a fair summary of the attack.

⚔️ Attack 3: Message Reordering and Deletion

What was required to pull off the attack?

The attacker needs full access to Threema's chat server
When deleting messages: The sender of a message must not pay attention to the delivery receipts. (The sender can see if some messages aren't delivered, because the delivery receipt isn't being received for this message. It's suspicious if the message before and after are marked as delivered, but the one in between still seems undelivered.)

What could be gained by pulling off this attack?

Very similar to attacks 1 and 2:

An attacker can read some message metadata (e.g. the Threema ID of the sender, the Threema ID of the recipient or the timestamp of the message) of the otherwise encrypted messages, but not the E2EE message contents.
Based on this information, an attacker can delete or reorder messages. However, without knowing the content of the messages, if an attacker would like to change semantics of a conversation, it would be guesswork which messages to reorder, and how to reorder them.

How was this fixed?

When using PFS ("Ibex") in a 1:1 chat, reordered and missing messages are detected and the user is warned.

⚔️ Attack 4: Message Replay and Reflection

What was required to pull off the attack?

The attacker needs full access to Threema's chat server over a longer time period
The attacker must store messages sent to a certain user
The attacked user must use the Android version of the app
The attacker must know when a user restores a backup

What could be gained by pulling off this attack?

When a user reinstalls the app and restores a backup on Android, or when the user switches the device (something that probably happens every few years for typical users), then the attacker could re-send old messages (from before the app was (re)installed) to that user. To the recipient, these messages could look like they were sent only recently.

How was this fixed?

We're adding the createdAt timestamp of a message to the encrypted metadata part of the message, so that the user is warned if the non-encrypted timestamp deviates significantly from the encrypted timestamp. (Note that it is deliberate that the server can overwrite the message timestamps, since this can be used to fix client-generated timestamps for users that have set their phones to the wrong timezone and thus appear to be sending messages from the future).
When using PFS ("Ibex") in a 1:1 chat, replayed messages are detected and the user is warned.

⚔️ (Attack 5: Kompromat)

(This attack was already discovered by another researcher in 2021 and patched on all platforms soon after. While the ETH researchers have strictly speaking not uncovered this issue – in contrast to some statements in the paper and website – I'm including it here for completeness.).

What was required to pull off the attack?

The attacked user A must use an app version older than December 2021 (as this attack was already discovered by another researcher in 2021 and patched on all platforms)
The attacker needs full access to Threema's directory server
When the attacked user A authenticates against the directory server, it must receive a modified authentication challenge where an ephemeral public key is replaced with the long-term public key of another Threema user B.

What could be gained by pulling off this attack?

With every modified authentication challenge sent by the attacked user A, one message will be encrypted by user A so that it can be sent to A or B and it appears to be encrypted and signed by the other party.

How was this fixed?

The immediate vulnerability was fixed in the apps by ensuring that the authentication challenge sent by the server does not start with a prefix that could also be the start of a valid end-to-end message
To also tackle the root cause of this issue (payload confusion due to missing key separation), the directory server authentication was updated to a new format that is not vulnerable to payload confusion

⚔️ Attack 6: Cloning via Threema ID Export

What was required to pull off the attack?

Full access to an unlocked phone running the Threema app
The Threema app may not be protected with access protection like a passphrase or biometric lock (or the user must be compelled to provide this information)
A backup must be exported without the user noticing
Once the account has been restored on a different device by the attacker, the user and the attacker may not connect to the server at the same time, otherwise a server warning will be generated on the user's device, stating "another connection for the same identity has been established."

What could be gained by pulling off this attack?

Full (but not undetectable) account compromise
If this happens, the user must revoke their identity and generate a new one

How was this fixed?

The "rogue device warning" detection mechanism was updated, so that a user will be warned if an attacker connects to the server with the same identity, even if the user was not connected at that time. (This feature is already implemented, but is being phased-in incrementally on the server side, to ensure that there aren't unwanted false positive warning messages shown to the user.)
We're thinking about additional/enforced access protection when exporting a backup, even if the user hasn't set up access protection for the app.

Other notes

The paper notes:

This is in contrast to Threema, where a Threema ID export is undetectable on the victim device after it happened and, due to the lack of forward secrecy and post-compromise security, irreversibly forfeits all security.

While it is correct that the creation of a backup is not detectable by the user, its usage had a high chance to be detectable because the server generates warnings on a device if another device repeatedly connects with the same ID. (With the updated rogue device warning mechanism, it should not be possible anymore to connect to the server without triggering a warning on the attacked user's device, even if that device is offline.)

⚔️ Attack 7: Compression Side-Channel

What was required to pull off the attack?

User must have enabled Threema Safe
User must have the attacker as a contact in the app, or they must not have enabled the "block unknown" privacy setting
Have a way to measure the size of a user's uploaded Threema Safe backup. This can be done with one of two methods:
- a) Have access to the Threema Safe server where backups are stored. Additionally, have a way to identify (on the Threema Safe backup server) which backup belongs to the user. By design, it is not possible to determine who a backup belongs to by looking at the encrypted backup file. One option would be knowing the IP address of the uploader, and being able to log/correlate the IP address of the backup upload. In today's mobile networks, the wide use of CGNAT – where hundreds or thousands of devices share the same public IP – may make this harder.
- b) Have access to the target device's network, or to the unlocked physical device. Record the network traffic between the device and Threema's backup server.
Have a way to restart the user's app tens of thousands of times. In the paper, they required a median of 19.4k backup attempts, and a mean of 23.4k attempts. The way this was done by the researcher is by using Android debugging tools to stop and restart the Threema app in an automated way. This requires both physical access to the unlocked phone over a long period (between a few hours to multiple days) and enabling developer mode on said Android device.
After every restart, the attacker must send a message to the attacked user, in order to change the nickname in the user's address book.

What could be gained by pulling off this attack?

Full (but not undetectable) account compromise
If this happens, the user must revoke their identity and generate a new one
In the experiment described in the paper, once all requirements above were fulfilled, they had a success rate of roughly 47%

How was this fixed?

Compression for Safe backups was disabled

Outro

The findings were disclosed to Threema on 2022-10-03 and acknowledged on the same day. The first server-side mitigations were deployed the same day. The first batch of client-side mitigations was released on 2022-10-27, more mitigations in the apps were published on 2022-11-29. Further mitigations and protocol improvements are being worked on. Development of Ibex (our protocol for PFS on the end-to-end layer) already began over a year ago and was initially released to both platforms in December. An independent formal security analysis is already on the way.

I hope that researchers will keep poking at our open source apps and our protocols, in order to make them better and more secure. We welcome coordinated disclosure and also run a bug bounty program with bounties up to 10'000 CHF.

We will keep improving our apps, adding new features in a way that's as privacy-preserving as possible, and we will keep auditing and improving both our protocols and implementations.

Happy 2023!

Some thoughts on the ETH's Threema Analysis

Contents

Intro

Where's Threema Coming From?

Protocol Choices

Expectations Towards Messengers

The Findings

⚔️ Attack 1: C2S Ephemeral Key Compromise

⚔️ Attack 2: Vouch Box Forgery

⚔️ Attack 3: Message Reordering and Deletion

⚔️ Attack 4: Message Replay and Reflection

⚔️ (Attack 5: Kompromat)

⚔️ Attack 6: Cloning via Threema ID Export

⚔️ Attack 7: Compression Side-Channel

Outro