yes, small clarification: the 1TB per dump refers to the head+middle partition of the dataset and includes the text documents and the quality signals. There is another ~700GB for the minhash signatures and 1-1.5TB for the documents in the tail split.