New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to specify group ID #35
Comments
Hmm, I've run into that in the past, but thought it was resolved. At some point in the past, Spark jobs used their own unique consumer groups to consume messages, but I haven't been able to find much documentation on whether that's still the case or not. Could you try running without specifying the consumer group at all and let us know what happens? |
It results in an error:
What's the right version of the maven coordinate to use? I can confirm that it works on Databricks but not locally. |
I'm encountering the same error. |
The Spark Kafka connector was overhauled in Spark v2.4.0, so any version including and after that should (theoretically) work. If it's working on Databricks, it's worth checking the Databricks Runtime you're using, and trying with the version of Spark that it includes. I also found this configuration detail under kafka.group.id that I hadn't seen before. Maybe that'll help |
Closing for inactivity. |
It looks like specifying the group id is intentionally disabled; the idea is for each query to specify its own group id so that it doesn't interfere with other queries. The Spark official docs make this explicit, saying that the group.id cannot be set and that this is intentional. Running without a set group.id leads to the consumer creating its own group id for a job, which looks something like "spark-kafka-relation-25b94d8e-8ac8-4a1f-98cd-6356fa733983-driver-0" and is promptly rejected by the bootstrap servers as being an invalid groupId. This seems like reasonable behavior, since Event Hubs return an However, this means that the kafka consumer shipped with Spark (even 2.4+) pretty much doesn't work with Event Hubs for Kafka, since you can't start the consumer with the group.id or without the group.id. |
This is not the case for Event Hubs for Kafka @ManasGeorge - it handles random/non-existent consumer groups in the same way vanilla Kafka does (by automatically creating the consumer group). Since the databricks documentation I posted earlier notes that |
Ah, it's good to know Event Hubs for Kafka handles this gracefully. |
I'm going to have to disagree with this, because this problem is still happening. Using this build configuration: "org.apache.spark" %% "spark-sql" % "2.4.0",
"org.apache.spark" %% "spark-streaming-kafka-0-10" % "2.4.0",
"org.apache.spark" %% "spark-sql-kafka-0-10" % "2.4.0", Specifying the group id gets me this error:
Running without gets me this error:
Is there a different Kafka client version I have to pin? This is just running locally on my machine. |
Hi @slyons, the issue is because there is a default length limit (64 bytes) set for the group.id in the Event Hubs for Kafka. Can you provide Event Hubs namespace name? |
@sjkwak How can there be a length limit when we can't set the group ID? That's something that Spark generates in 2.4. Here's more of the output that includes the rejection from the server.
|
@slyons The limit exists in the service side. We've updated the limit for sclyondelta namespace. Can you try again and let us know if you still see the failure? |
Looks like it's still happening:
|
Looks like the ID it's trying to use is something along the lines of:
|
@slyons can you try it again? |
@sjkwak Looks like that did the trick! Thanks for working it through. For other namespaces/projects can we issue a support ticket request to increase the group id size or will this be fixed on the platform eventually? |
We're going to update production clusters with the change. In the meanwhile, if you run into the same issue in other namespace, yes, please open a support ticket. |
Thanks, @sjkwak ! |
Update for anyone reviewing this thread - the group id name length limit has been increased to 256 chars. If you see See the |
I am still LOOKING to find the solution of this Group.Id issue.It seems i have to switch back to Normal Streaming instead of SS. |
Description
Using
"org.apache.spark" %% "spark-sql-kafka-0-10" % "2.4.0"
the library prevents you from specifying the group ID.Exception in thread "main" java.lang.IllegalArgumentException: Kafka option 'group.id' is not supported as user-specified consumer groups are not used to track offsets. at org.apache.spark.sql.kafka010.KafkaSourceProvider.validateGeneralOptions(KafkaSourceProvider.scala:361) at org.apache.spark.sql.kafka010.KafkaSourceProvider.validateStreamOptions(KafkaSourceProvider.scala:416) at org.apache.spark.sql.kafka010.KafkaSourceProvider.sourceSchema(KafkaSourceProvider.scala:66) at org.apache.spark.sql.execution.datasources.DataSource.sourceSchema(DataSource.scala:209)
How to reproduce
Use the spark consuming sample locally.
Has it worked previously?
No
Checklist
IMPORTANT: We will close issues where the checklist has not been completed.
Please provide the following information:
Spark consumer
?
Sample default
Exception in thread "main" java.lang.IllegalArgumentException: Kafka option 'group.id' is not supported as user-specified consumer groups are not used to track offsets.
$Default
<REPLACE wtih e.g., Nov 7 2018 - 17:15:01 UTC>
<REPLACE with e.g., clientID=kafka-client>
<REPLACE with e.g., Willing/able to send scenario to repro issue>
<REPLACE with e.g., Ubuntu 16.04.5 (x64) LTS>
The text was updated successfully, but these errors were encountered: