-
Notifications
You must be signed in to change notification settings - Fork 28.7k
Fixed the number of worker thread #1485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There are a lot of input Block cause too many Worker threads and will load all data.So it should be to control the number of Worker threads
Can one of the admins verify this patch? |
Slightly bigger point: both the 'fixed' and 'cached' executors from
It's perfectly possible to create a |
Will always have a task to run, when the system is idle all threads will not participate CPU. When need run task need not new a thread.
|
The tasks launched on an Executor are controlled by the DAGScheduler, and should not exceed the number of cores that executor is advertising. In what situation have you seen this happening? |
Does your patch fix this problem, or do your Executors just hang after you reach enough cores? This behavior should not be happening, even with an unlimited capacity cached tread pool. |
My program is spark streaming over Hadoop yarn.It work for user click stream. |
@fireflyc again just on my tangent -- the drawback is you leave N threads allocated taking up non-trivial stack memory and so on. In most of the cases I see the overhead of starting new threads on demand isn't significant. If what you describe is happening, then fixed is certainly an improvement over cached. |
@fireflyc Spark should not be scheduling more than N concurrent tasks on an Executor. It appears that the tasks may be returning "success" but then don't actually return the thread to the thread pool. This is itself a bug -- could you run "jstack" on your Executor process to see where the threads are stuck? Perhaps new tasks are just starting before the old threads finish cleaning up, and thus this solution is the right one, but I'd like to find out exactly why. |
Hey there - as Aaron said, the executors should never have more than N tasks active if there are N cores. I think there might be a bug causing this. So I'd recommend we close this issue and open a JIRA to figure out what is going on. |
… to 3.4) (apache#1808) * rdar://111235765 ALTER TABLE ... WRITE command (apache#1231) (apache#1466) This PR cherry-picks the command to set write distribution and ordering in a table. These changes are needed to allow customers to control the required distribution and ordering in Iceberg. It adds a new command that will be only supported by the Iceberg data source. This PR comes with tests. * rdar://84102488 Support ordering and distribution during table creation (apache#1485) This PR cherry-picks the ordering and distribution during table creation to 3.2. The same syntax is supported in 3.0 and 3.1. These changes are needed to define a sort key and distribution in Iceberg tables. Yes but the changes won't affect anyone except Iceberg users. This PR comes with tests. Co-authored-by: Russell Spitzer <[email protected]> Co-authored-by: Anton Okolnychyi <[email protected]>
There are a lot of input Block cause too many Worker threads and will
load all data.So it should be to control the number of Worker threads