作者:望尽天涯 | 来源:互联网 | 2023-09-06 15:16
MysparkstreamingjobisconsumingdatafromKafka我的火花流工作正在消耗卡夫卡的数据KafkaUtils.createStream(jssc
My spark streaming job is consuming data from Kafka
我的火花流工作正在消耗卡夫卡的数据
KafkaUtils.createStream(jssc, prop.getProperty(Config.ZOOKEEPER_QUORUM),
prop.getProperty(Config.KAFKA_CONSUMER_GROUP), topicMap);
whenever i restart my job it start consuming from last offset store (i am assuming this because it takes a lot of time to send processed data and if i change the consumer group it works instantly with new message)
每当我重新开始工作时,它就开始从最后一个偏移存储中消耗(我假设这是因为它需要花费大量时间来发送已处理的数据,如果我更改了消费者组,它会立即使用新消息)
I am kafka 8.1.1 where auto.offset.reset is default to largest which means whenever i'll restart kafka will send data from where i left.
我是kafka 8.1.1其中auto.offset.reset默认为最大,这意味着每当我重新启动kafka将从我离开的地方发送数据。
My use case ask me to ignore this data and process only arriving data. How can i achieve this? any suggestion
我的用例要求我忽略这些数据并仅处理到达的数据。我怎样才能实现这一目标?任何建议
1 个解决方案