Is replay actually a feature in itself or just a technique implemented via a Source with different
Starting in version 4.3, Cribl Stream supports replaying data that has been exported as Parquet, using either the S3 Collector or the Filesystem Collector. Meanwhile, the Azure Blob Storage and Google Cloud Storage Collectors support ingesting data in Parquet format, but do not support replay.
I am glad to see that Parquet is now supported for S3 replay. Is replay
actually a feature in itself or just a technique implemented via a Source with different handling?
How can Azure Blob and Google Cloud Storage collectors support ingesting but not support replay?
Best Answer
-
OK, so yes, they should be replayable, no matter what their schemas are.
Except there are 2 cases that the parquet files are not readable (not yet):
if the parquet file is encrypted https://github.com/apache/parquet-format/blob/master/Encryption.md
if the parquet file links to an external column data as defined in the parquet thrift file https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L789 (It's a very rare feature)
The best is to give a go and please, let us know...0
Answers
-
If I understand well the question, parquet replay is implemented by the source Collector.
The source collector just handles the data.GCS and Azure blob will be released in the next release normally this month.0 -
oh, are Azure and GCS only via notifications ? I didn’t realize GCS was just pub sub. Makes more sense now. It just is confusing to read.
0 -
Does this mean I can take Databricks "CIM" data now out of their Gold storage tier and replay it else where?
0 -
I have to admit that I don't know what format the Databricks "CIM" data is.
I cannot find online an answer, what kind of files is it?0 -
It's just parquet files
0 -
OK, so yes, they should be replayable, no matter what their schemas are.
Except there are 2 cases that the parquet files are not readable (not yet):
if the parquet file is encrypted https://github.com/apache/parquet-format/blob/master/Encryption.md
if the parquet file links to an external column data as defined in the parquet thrift file https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L789 (It's a very rare feature)
The best is to give a go and please, let us know...0