A bridge spanning a wide, calm river, with the sun rising behind it

In January 2023, the ToIP Foundation announced the formation of the Trust Spanning Protocol Task Force (TSPTF) within our Technology Stack Working Group (TSWG). See the introductory blog post for an overview of this task force. In short, the goal is to maximize interoperability of decentralized digital trust infrastructure by defining the core protocol at the center of the ToIP hourglass model. This protocol needs to meet all the Layer 2 requirements defined in the ToIP Technology Architecture Specification.

To do this, the TSPTF spent three months to go through the proposal stage, entertaining four overall design proposals from Sam Smith, Daniel Hardman, Wenjing Chu, and Michael Herman. Then the task force entered the consolidation stage: the hard work of finding enough common ground among these proposals to reach a rough consensus about the overall framework of the design so it can prepare the first Working Draft of the specification.

The good news is that we believe we have achieved that goal. Through a series of in-depth virtual workshops, additional proposals, and small group discussions, we have arrived at seven “pillars” of the design. These seven pillars apply not just to the trust spanning protocol, but to a core family of protocols that together form the foundation for all the trust task protocols at Layer 3 of the ToIP technology stack.

In this blog post we will summarize these seven pillars:

Verifiable Identifiers
End-to-End Authenticity and Confidentiality
Direct Connections (Inner and Outer Channels)
Routing Via Intermediaries (Routing Channels)
Relationship Context Channels
Text and Binary Encoding
Trust Task Protocol Framework

This blog post does not attempt to provide the full technical details about each of these topics, rather it will attempt to summarize them using language accessible to anyone interested in decentralized digital trust infrastructure.

#1: Verifiable Identifiers

The ToIP Foundation as a whole—and the TSWG in particular—have long compared the ToIP stack to the Internet’s TCP/IP stack (see our introductory white paper for a full explanation). In particular, we have pointed to how the Internet Protocol (IP) illustrates the critical role of a spanning protocol in a protocol stack (for details, see Principle #3, The Hourglass Model, in Design Principles for the ToIP Stack).

The heart of the Internet Protocol defined a new type of address—the IP address—that enabled communications between existing networks that otherwise had no automated way to pass data between them. This new addressing layer enabled the Internet Protocol to “span” existing local area networks and create the ubiquitous global Internet we enjoy today.

Since the goal of the ToIP stack is to enable trust to span existing trust domains, the ToIP trust spanning protocol needs the same solution: a new type of address—one designed explicitly to enable cryptographically verifiable communications. In section 6.4 of the ToIP Technology Architecture Specification (TAS), we call this new type of address a verifiable identifier (VID). The three essential characteristics of a VID are:

It can be resolved securely to obtain the current public key(s) needed to verify that the VID owner controls the VID.
It can be resolved securely to obtain the current network endpoint(s) for establishing a ToIP connection with the entity identified by the VID.
It does not need to change when the controller’s key(s) are rotated or network endpoint(s) are updated.

Section 6.4 of the TAS explains that there are multiple types of VIDs, including many different types of Decentralized Identifiers (DIDs—standardized by the W3C), and several types of Autonomous Identifiers (AIDs), such as those being standardized in the Key Event Receipt Infrastructure (KERI) specifications from the ToIP Authentic Chained Data Container (ACDC) Task Force. Since a verifier’s trust in a VID may depend on both the type of VID and the characteristics of its supporting infrastructure (e.g., cryptographic algorithms, storage, key management, resolution, etc.), the trust spanning protocol specification must also include an appraisability framework: a standard way for a ToIP endpoint to describe the type and properties of its particular VID.

In addition, VIDs share the same challenge as any other new type of identifier: how do parties discover and share the VIDs they want to use for each other? How do they bootstrap setting up their first ToIP communications channel? How can they do this safely while avoiding phishing or man-in-the-middle (MITM) attacks?

All VID-based protocols, including DIDComm, KERI, and Decentralized Web Nodes, rely on some form of out–of-band introduction (OOBI), such as a QR code, to accomplish this critical first step. Regardless of the specific OOBI used, the ultimate result is that all parties now have the VIDs needed to begin using the ToIP trust spanning protocol to connect and interact.

#2: End-to-End Authenticity & Confidentiality

Almost all modern security protocols based on public/private key cryptography use some combination of message signing (for authenticity) and message encryption (for confidentiality). A longstanding question has been: precisely what combination of these two properties produces the strongest security?

Our second pillar is a firm answer to that question: the signing and encryption pattern that provides the strongest protection against both key compromise impersonation (KCI) and sender impersonation of the ciphertext is called ESSR (for Encrypt Sender’s key then Sign Receiver’s key). ESSR was first defined in a 2001 paper by Jee Hea An and is well explained in these three Neil Madden blog posts about public key authenticated encryption: PKAE1, PKAE2, PKAE3.

The bottom line: by binding the sender’s public key inside the encrypted ciphertext and binding the receiver’s public key in the enclosing signed plain text, an adversary is prevented from forging messages that compromise either authenticity or confidentiality. So the trust spanning protocol can achieve both strong authenticity and strong confidentiality by applying ESSR to all messages that require both properties.

#3: Direct Connections (Inner and Outer Channels)

With VIDs supplying verifiable public key(s) and network endpoint(s), and ESSR supplying the message signing and encryption pattern, all the essential ingredients for two ToIP endpoints are in place to establish a connection in order to communicate securely and confidentially. The only ToIP Layer 2 requirement that we have not addressed is correlation privacy: preventing public observation/correlation of the VIDs that parties are using to communicate.

Although the trust spanning protocol is designed to be transport-independent and could be used with any transport protocol for which a binding is defined (e.g., Bluetooth, AMQP, MQTT), if ToIP endpoints A and B have a direct connection with each other over a public TCP/IP network like the Internet, then full correlation privacy is impossible: anyone who can see the Internet traffic can track the flow of encrypted messages between A’s VID and B’s VID. Thus the only way A and B can achieve full correlation privacy is by employing intermediaries (see Pillar #4).

However A and B can still achieve partial correlation privacy over a direct connection by tunneling one ESSR protocol inside another. This is shown in Figure 1 below.

Two connected blocks, ToIP Endpoint A and ToIP Endpoint B.

Figure 1: A direct connection showing one ToIP channel tunneled inside another for partial correlation privacy.

In Figure 1, the VIDs A₀ and B₀ are publicly-observable VIDs used to form the outer channel (shown in red). They are outside the two endpoint boxes because they are publicly observable on the Internet. The VIDs A₁ and B₁ are private interaction VIDs that form the inner channel (shown in orange). They are inside the two endpoint boxes because they are completely private to A and B. Each ToIP message between A₁ and B₁ is carried in the encrypted payload of ToIP messages between A₀ and B₀, so A₁ and B₁ are never exposed publicly on the Internet.

This two-layer, two-channel model enables A and B to communicate privately between A₁ and B₁ over the inner channel. As explained in Pillar #5, one potential use of this private channel is for A and B to negotiate replacing their current publicly-observable VIDs A₀ and B₀ with new publicly-observable VIDs A_n and B_n if/when needed. They can do this while still keeping A₁ and B₁ hidden. This enables some degree of correlation privacy even without the use of intermediaries.

#4: Routing Via Intermediaries (Routing Channels)

To provide stronger correlation privacy—i.e., to prevent public observation that A and B are communicating at all—it is necessary to tunnel one layer deeper using intermediaries. This triple layering is shown in Figure 2.

Two points, ToIP Endpoint A and ToIP Endpoint B, connected via Intermediary alpha and Intermediary beta.

Figure 2: Adding a routing layer tunneled via intermediaries for stronger correlation privacy

In this configuration, both A and B first establish outer channels over direct connections with their respective intermediaries α and β (shown in red between VIDs A₀ and α₀ and B₀ and β₀ respectively). Then A and B tunnel another ESSR protocol (shown in blue) inside the outer channel to send ToIP routing messages. This routing channel gives the intermediaries α and β just enough information to pass the messages to the next hop on the path. The full path is never revealed to any single intermediary; all α and β know is the next destination.

The inner channel (orange) is layered within the routing channel (blue). As with direct connections, this is the fully private interaction channel between A₁ and B₁. Neither α or β know about A₁ and B₁ and cannot see any of the communications on this inner channel. To use a physics analogy, this inner channel has two layers of “insulation” that prevent the heat (private information) from leaking out into the environment (third parties observing the outer channel).

This routing architecture is not limited to two intermediaries; it can scale to more as needed. However adding more intermediaries is not required to enable better correlation privacy; it simply increases the number of intermediaries that would need to be compromised to learn the full path.

#5: Relationship Context Channels

The double or triple layering of ToIP channels described in Pillars #3 and #4 satisfies the ToIP architecture requirements for authenticity, confidentiality, and correlation privacy. Whether it operates over a direct connection or via intermediaries, the inner channel instantiates a relationship root context connecting the two endpoints.

Now, what if two endpoints A and B need to communicate over other inner channels representing different relationship contexts? For example, what if they need to:

Move to a higher level of identity assurance, e.g., for high-value transactions instead of low-value transactions?
Step up to perfect forward secrecy or some other cryptographic algorithm?
Stream data instead of sending individual messages?
Instantiate a multi-party channel incorporating other parties (such as using the IETF Message Layer Security protocol)?
Instantiate new publicly-observable VIDs to increase correlation privacy?
Instantiate new private interaction VIDs to prevent privacy leakage over time?
Invoke specific higher-level (ToIP Layer 3) trust task protocols (see Pillar #7)?

In these cases, A and B need to be able to use the default inner channel as a “control channel” to establish other inner channels. Each new inner channel needs: a) a new pair of VIDs to establish a new relationship context, and b) a new set of parameters specific to that relationship context.

This can be accomplished using a simple control channel protocol that enables A and B to establish and manage as many relationship context channels as they need over the lifetime of their relationship. This is illustrated in Figure 3 (shown over a direct connection for simplicity). The control channel is shown in orange and each new relationship context channel is shown in black.

ToIP Endpoint A and ToIP Endpoint B, connected by four lines, three black and one orange.

Figure 3: Using the default inner channel as a control channel to establish new independent relationship context channels between A and B

Note that these new inner channels can emulate the same features as the concept of sessions used in other protocols. However, by using VIDs, they have the following advantages:

They can operate independently from any session management defined by underlying transport protocols.
They can persist as long as both parties need them (not just during one communications session).
They can establish fully “cryptographic sessions” that include all the security and privacy protections of the ToIP trust spanning layer.

#6: Text and Binary Encoding

Every wire protocol that moves data from one endpoint to another must define its encoding(s). Encoding choices are even more critical (and often contentious) with security protocols because: a) they are replete with heavyweight cryptographic data structures, and b) they impose very strict rules governing message signing, encryption, decryption, and signature verification.

To avoid as much contention as possible, we propose to use Composable Event Streaming Representation (CESR) as the encoding format for all ToIP messages. The CESR specification is one of the deliverables of the ToIP ACDC Task Force. To quote from the introduction:

The Composable Event Streaming Representation (CESR) is a dual text-binary encoding format that has the unique property of text-binary concatenation composability. This composability property enables the round trip conversion en-masse of concatenated primitives between the text domain and binary domain while maintaining the separability of individual primitives. This enables convenient usability in the text domain and compact transmission in the binary domain.

Other key benefits of CESR include:

Format-agnostic. CESR streams support interleaved JSON, CBOR, and MGPK serializations.
Optimized for cryptography. Popular cryptographic material suites have compact encodings for efficiency while less compact encodings provide sufficient extensibility to support all foreseeable types.
Self-framing. CESR supports self-framing group codes that enable stream processing and pipelining in both the text and binary domains.

For a complete overview of the benefits of CESR, see the IIW presentation CESR for First Year Wizards.

Note: CESR V1.1 will include a few minor pipelining improvements based on production experience with CESR V1.0.

#7: Trust Task Protocol Framework

The final pillar is the capability for architects and developers to design, define, and build higher-level ToIP Layer 3 trust task protocols that operate over the Layer 2 relationship context channels described in Pillar #5. This trust task protocol framework should be modeled using the same protocol definition patterns established by other VID-based protocols such as DIDComm, KERI, and Decentralized Web Nodes. These definitions typically follow a two level model:

Protocol definitions specify the protocol ID (or VID), human-readable name, version, and other identifying metadata for the protocol. They also define its overall relationship context channel parameters and a catalog of its specified packets/messages.
Packet/message definitions specify the required and optional contents of each packet or message defined in the protocol.

A complete trust task protocol specification based on this framework will also include roles, states, request/response business logic, co-protocols, error messages, etc.. See section 9 of the DIDComm V2 specification for a detailed example.

The ToIP trust task protocol framework should also define a small set of standard utility trust tasks such as:

Discovery. Rather than broadly defining how two parties try to discover each other’s VIDs, this trust task will narrowly define how two ToIP endpoints that have established a connection can discover what trust tasks each other is capable of supporting (see examples below).
Error handling. The rich extensibility of trust tasks—and the fact they may be combined into complex workflows—make it very useful to have a well-known trust task for handling error messages.
Trust ping. This simple, universal trust task for testing a ToIP connection is the ToIP equivalent of the ping network utility for testing an IP connection.
Logging/auditing. This is a standard trust task for logging ToIP interactions for debugging or auditing purposes.

The whole reason for the hourglass design of the ToIP technology stack is for the Layer 2 trust spanning protocol to support any number of trust task protocols at Layer 3. With authenticity, confidentiality, and correlation privacy all handled by the trust spanning layer, it becomes far easier for infrastructure architects and developers to design reusable trust tasks. This is turn makes it much easier for application designers and developers to build trusted applications by calling a series of trust tasks to accomplish their workflows.

The current market emphasis on digital identity wallets (such as the EU eIDAS 2.0 initiative) can give the impression that the only important trust tasks are the issuance, presentation, and revocation of verifiable digital credentials. These are in fact only a small subset of the universe of trust tasks that can be enabled by the ToIP stack. Some other common examples discussed by the TSPTF include:

Stepping up authentication by using a ToIP connection to request local biometric authentication and/or liveness detection of a user.
Signing digital documents with strong, permanently verifiable digital signatures that do not require centralized service providers (a required feature of eIDAS 2.0).
Querying a trust registry to determine the authorized participants in a digital trust ecosystem (a trust task protocol under development by the ToIP Trust Registry Task Force).
Exchanging electronic business cards for peer-to-peer communications and relationship management.
Secure messaging using synchronous chat or asynchronous mail.
Transmitting digital payments via any mutually-agreed means: credit card, debit card, direct-to-bank, fiat currency, cryptocurrency, loyalty points, etc.
Issuing e-receipts for any type of digital payment or value exchange.
Buying a digital ticket for an event and presenting it for admission.
Bidding in a digital auction, including reconciling and settlement of a winning bid.
Issuing a digital purchase order, sending an invoice, and remitting payment.
Orchestrating data supply chains using “product passports” that can be the digital twin of physical or digital products.

Next Steps

The TSPTF is now moving into the Working Drafts stage. After a break during the month of August, we are beginning two weekly meetings again every Wednesday (8AM PT for NA/EU time zones and 6PM PT for APAC time zones) starting on 06 September (the full schedule of all ToIP meetings is published on the ToIP Calendar). Our goal will be to work through enough open issues to be ready to present a complete “IIW Preview Draft” at the next Internet Identity Workshop, 10-12 October in Mountain View, California. If you are interested in learning more or engaging in this work, please feel free to contact ToIP Executive Director Judith Fleenor.

Mid-Year Progress Report on the ToIP Trust Spanning Protocol