13/WAKU2-STORE
- Status: draft
- Editor: Simon-Pierre Vivier <simvivier@status.im>
- Contributors:
- Dean Eigenmann <dean@status.im>
- Oskar Thorén <oskarth@titanproxy.com>
- Aaryamann Challani <p1ge0nh8er@proton.me>
- Sanaz Taheri <sanaz@status.im>
- Hanno Cornelius <hanno@status.im>
Abstract
This specification explains the 13/WAKU2-STORE
protocol
which enables querying of messages received through the relay protocol and
stored by other nodes.
It also supports pagination for more efficient querying of historical messages.
Protocol identifier*: /vac/waku/store/2.0.0-beta4
Terminology
The term PII, Personally Identifiable Information, refers to any piece of data that can be used to uniquely identify a user. For example, the signature verification key, and the hash of one's static IP address are unique for each user and hence count as PII.
Design Requirements
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC2119.
Nodes willing to provide the storage service using 13/WAKU2-STORE
protocol,
SHOULD provide a complete and full view of message history.
As such, they are required to be highly available and
specifically have a high uptime to consistently receive and store network messages.
The high uptime requirement makes sure that no message is missed out
hence a complete and intact view of the message history
is delivered to the querying nodes.
Nevertheless, in case storage provider nodes cannot afford high availability,
the querying nodes may retrieve the historical messages from multiple sources
to achieve a full and intact view of the past.
The concept of ephemeral
messages introduced in
14/WAKU2-MESSAGE
affects 13/WAKU2-STORE
as well.
Nodes running 13/WAKU2-STORE
SHOULD support ephemeral
messages as specified in
14/WAKU2-MESSAGE.
Nodes running 13/WAKU2-STORE
SHOULD NOT store messages
with the ephemeral
flag set to true
.
Adversarial Model
Any peer running the 13/WAKU2-STORE
protocol, i.e.
both the querying node and the queried node, are considered as an adversary.
Furthermore,
we currently consider the adversary as a passive entity
that attempts to collect information from other peers to conduct an attack but
it does so without violating protocol definitions and instructions.
As we evolve the protocol,
further adversarial models will be considered.
For example, under the passive adversarial model,
no malicious node hides or
lies about the history of messages
as it is against the description of the 13/WAKU2-STORE
protocol.
The following are not considered as part of the adversarial model:
- An adversary with a global view of all the peers and their connections.
- An adversary that can eavesdrop on communication links between arbitrary pairs of peers (unless the adversary is one end of the communication). In specific, the communication channels are assumed to be secure.
Wire Specification
Peers communicate with each other using a request / response API. The messages sent are Protobuf RPC messages which are implemented using protocol buffers v3. The following are the specifications of the Protobuf messages.
Payloads
syntax = "proto3";
message Index {
bytes digest = 1;
sint64 receiverTime = 2;
sint64 senderTime = 3;
string pubsubTopic = 4;
}
message PagingInfo {
uint64 pageSize = 1;
Index cursor = 2;
enum Direction {
BACKWARD = 0;
FORWARD = 1;
}
Direction direction = 3;
}
message ContentFilter {
string contentTopic = 1;
}
message HistoryQuery {
// the first field is reserved for future use
string pubsubtopic = 2;
repeated ContentFilter contentFilters = 3;
PagingInfo pagingInfo = 4;
}
message HistoryResponse {
// the first field is reserved for future use
repeated WakuMessage messages = 2;
PagingInfo pagingInfo = 3;
enum Error {
NONE = 0;
INVALID_CURSOR = 1;
}
Error error = 4;
}
message HistoryRPC {
string request_id = 1;
HistoryQuery query = 2;
HistoryResponse response = 3;
}
Index
To perform pagination,
each WakuMessage
stored at a node running the 13/WAKU2-STORE
protocol
is associated with a unique Index
that encapsulates the following parts.
digest
: a sequence of bytes representing the SHA256 hash of aWakuMessage
. The hash is computed over the concatenation ofcontentTopic
andpayload
fields of aWakuMessage
(see 14/WAKU2-MESSAGE).receiverTime
: the UNIX time in nanoseconds at which theWakuMessage
is received by the receiving node.senderTime
: the UNIX time in nanoseconds at which theWakuMessage
is generated by its sender.pubsubTopic
: the pubsub topic on which theWakuMessage
is received.
PagingInfo
PagingInfo
holds the information required for pagination.
It consists of the following components.
pageSize
: A positive integer indicating the number of queriedWakuMessage
s in aHistoryQuery
(or retrievedWakuMessage
s in aHistoryResponse
).cursor
: holds theIndex
of aWakuMessage
.direction
: indicates the direction of paging which can be eitherFORWARD
orBACKWARD
.
ContentFilter
ContentFilter
carries the information required for filtering historical messages.
contentTopic
represents the content topic of the queried historicalWakuMessage
. This field maps to thecontentTopic
field of the 14/WAKU2-MESSAGE.
HistoryQuery
RPC call to query historical messages.
- The
pubsubTopic
field MUST indicate the pubsub topic of the historical messages to be retrieved. This field denotes the pubsub topic on whichWakuMessage
s are published. This field maps totopicIDs
field ofMessage
in11/WAKU2-RELAY
. Leaving this field empty means no filter on the pubsub topic of message history is requested. This field SHOULD be left empty in order to retrieve the historicalWakuMessage
regardless of the pubsub topics on which they are published. - The
contentFilters
field MUST indicate the list of content filters based on which the historical messages are to be retrieved. Leaving this field empty means no filter on the content topic of message history is required. This field SHOULD be left empty in order to retrieve historicalWakuMessage
regardless of their content topics. PagingInfo
holds the information required for pagination.
ItspageSize
field indicates the number ofWakuMessage
s to be included in the correspondingHistoryResponse
. It is RECOMMENDED that the queried node defines a maximum page size internally. If the querying node leaves thepageSize
unspecified, or if thepageSize
exceeds the maximum page size, the queried node SHOULD auto-paginate theHistoryResponse
to no more than the configured maximum page size. This allows mitigation of long response time forHistoryQuery
. In the forward pagination request, themessages
field of theHistoryResponse
SHALL contain, at maximum, thepageSize
amount ofWakuMessage
whoseIndex
values are larger than the givencursor
(and vise versa for the backward pagination). Note that thecursor
of aHistoryQuery
MAY be empty (e.g., for the initial query), as such, and depending on whether thedirection
isBACKWARD
orFORWARD
the last or the firstpageSize
WakuMessage
SHALL be returned, respectively.
Sorting Messages
The queried node MUST sort the WakuMessage
based on their Index
,
where the senderTime
constitutes the most significant part and
the digest
comes next, and
then perform pagination on the sorted result.
As such, the retrieved page contains an ordered list of WakuMessage
from the oldest messages to the most recent one.
Alternatively, the receiverTime
(instead of senderTime
)
MAY be used to sort messages during the paging process.
However, it is RECOMMENDED the use of the senderTime
for sorting as it is invariant and
consistent across all the nodes.
This has the benefit of cursor
reusability i.e.,
a cursor
obtained from one node can be consistently used
to query from another node.
However, this cursor
reusability does not hold when the receiverTime
is utilized
as the receiver time is affected by the network delay and
nodes' clock asynchrony.
HistoryResponse
RPC call to respond to a HistoryQuery call.
- The
messages
field MUST contain the messages found, these are 14/WAKU2-MESSAGE types. PagingInfo
holds the paging information based on which the querying node can resume its further history queries. ThepageSize
indicates the number of returned Waku messages (i.e., the number of messages included in themessages
field ofHistoryResponse
). Thedirection
is the same direction as in the correspondingHistoryQuery
. In the forward pagination, thecursor
holds theIndex
of the last message in theHistoryResponse
messages
(and the first message in the backward paging). Regardless of the paging direction, the retrievedmessages
are always sorted in ascending order based on their timestamp as explained in the sorting messagessection, that is, from the oldest to the most recent. The requester SHALL embed the returnedcursor
inside its nextHistoryQuery
to retrieve the next page of the 14/WAKU2-MESSAGE.
Thecursor
obtained from one node SHOULD NOT be used in a request to another node because the result may be different.- The
error
field contains information about any error that has occurred while processing the correspondingHistoryQuery
.NONE
stands for no error. This is also the default value.INVALID_CURSOR
means that thecursor
field ofHistoryQuery
does not match with theIndex
of any of theWakuMessage
persisted by the queried node.
Security Consideration
The main security consideration to take into account while using this protocol is that a querying node have to reveal their content filters of interest to the queried node, hence potentially compromising their privacy.
Future Work
- Anonymous query: This feature guarantees that nodes
can anonymously query historical messages from other nodes i.e.,
without disclosing the exact topics of 14/WAKU2-MESSAGE
they are interested in.
As such, no adversary in the13/WAKU2-STORE
protocol would be able to learn which peer is interested in which content filters i.e., content topics of 14/WAKU2-MESSAGE. The current version of the13/WAKU2-STORE
protocol does not provide anonymity for historical queries, as the querying node needs to directly connect to another node in the13/WAKU2-STORE
protocol and explicitly disclose the content filters of its interest to retrieve the corresponding messages. However, one can consider preserving anonymity through one of the following ways:- By hiding the source of the request i.e., anonymous communication. That is the querying node shall hide all its PII in its history request e.g., its IP address. This can happen by the utilization of a proxy server or by using Tor. Note that the current structure of historical requests does not embody any piece of PII, otherwise, such data fields must be treated carefully to achieve query anonymity.
- By deploying secure 2-party computations in which the querying node obtains the historical messages of a certain topic, the queried node learns nothing about the query. Examples of such 2PC protocols are secure one-way Private Set Intersections (PSI).
- Robust and verifiable timestamps:
Messages timestamp is a way to show that the message existed
prior to some point in time.
However, the lack of timestamp verifiability can create room for a range of attacks,
including injecting messages with invalid timestamps pointing to the far future.
To better understand the attack,
consider a store node whose current clock shows
2021-01-01 00:00:30
(and assume all the other nodes have a synchronized clocks +-20seconds). The store node already has a list of messages,(m1,2021-01-01 00:00:00), (m2,2021-01-01 00:00:01), ..., (m10:2021-01-01 00:00:20)
, that are sorted based on their timestamp.
An attacker sends a message with an arbitrary large timestamp e.g., 10 hours ahead of the correct clock(m',2021-01-01 10:00:30)
. The store node placesm'
at the end of the list,
(m1,2021-01-01 00:00:00), (m2,2021-01-01 00:00:01), ..., (m10:2021-01-01 00:00:20),(m',2021-01-01 10:00:30).
Now another message arrives with a valid timestamp e.g.,
(m11, 2021-01-01 00:00:45)
.
However, since its timestamp precedes the malicious message m'
,
it gets placed before m'
in the list i.e.,
(m1,2021-01-01 00:00:00), (m2,2021-01-01 00:00:01), ..., (m10:2021-01-01 00:00:20), (m11, 2021-01-01 00:00:45), (m',2021-01-01 10:00:30).
In fact, for the next 10 hours,
m'
will always be considered as the most recent message and
served as the last message to the querying nodes irrespective
of how many other messages arrive afterward.
A robust and verifiable timestamp allows the receiver of a message to verify that a message has been generated prior to the claimed timestamp. One solution is the use of open timestamps e.g., block height in Blockchain-based timestamps. That is, messages contain the most recent block height perceived by their senders at the time of message generation. This proves accuracy within a range of minutes (e.g., in Bitcoin blockchain) or seconds (e.g., in Ethereum 2.0) from the time of origination.
Copyright
Copyright and related rights waived via CC0.