Skip to content

Fixed the number of worker thread #1485

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

fireflyc
Copy link
Contributor

There are a lot of input Block cause too many Worker threads and will
load all data.So it should be to control the number of Worker threads

There are a lot of input Block cause too many Worker threads and will
load all data.So it should be to control the number of Worker threads
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@srowen
Copy link
Member

srowen commented Jul 18, 2014

Slightly bigger point: both the 'fixed' and 'cached' executors from Executors have some drawbacks:

  • 'fixed' always keeps the given number of threads active even if they're not doing anything
  • 'cached' may create an unlimited number of threads

It's perfectly possible to create a ThreadPoolExecutor with core size 0 and a fixed maximum size. I wonder if that isn't the best choice here, and actually, in other usages I see throughout Spark? Because a similar issue comes up in about 10 places.

@fireflyc
Copy link
Contributor Author

Will always have a task to run, when the system is idle all threads will not participate CPU. When need run task need not new a thread.

fixed is great way.

@aarondav
Copy link
Contributor

The tasks launched on an Executor are controlled by the DAGScheduler, and should not exceed the number of cores that executor is advertising. In what situation have you seen this happening?

@fireflyc
Copy link
Contributor Author

My application have 1000+ Worker Threads.
0e75b115d7a1b2dba97284cf6443b6f0

@aarondav
Copy link
Contributor

Does your patch fix this problem, or do your Executors just hang after you reach enough cores? This behavior should not be happening, even with an unlimited capacity cached tread pool.

@fireflyc
Copy link
Contributor Author

My program is spark streaming over Hadoop yarn.It work for user click stream.
I read code,number of worker threads and block?
@aarondav

@srowen
Copy link
Member

srowen commented Jul 19, 2014

@fireflyc again just on my tangent -- the drawback is you leave N threads allocated taking up non-trivial stack memory and so on. In most of the cases I see the overhead of starting new threads on demand isn't significant. If what you describe is happening, then fixed is certainly an improvement over cached.

@aarondav
Copy link
Contributor

@fireflyc Spark should not be scheduling more than N concurrent tasks on an Executor. It appears that the tasks may be returning "success" but then don't actually return the thread to the thread pool.

This is itself a bug -- could you run "jstack" on your Executor process to see where the threads are stuck?

Perhaps new tasks are just starting before the old threads finish cleaning up, and thus this solution is the right one, but I'd like to find out exactly why.

@pwendell
Copy link
Contributor

Hey there - as Aaron said, the executors should never have more than N tasks active if there are N cores. I think there might be a bug causing this. So I'd recommend we close this issue and open a JIRA to figure out what is going on.

@asfgit asfgit closed this in 2c35666 Jul 30, 2014
viirya pushed a commit to viirya/spark-1 that referenced this pull request Oct 19, 2023
… to 3.4) (apache#1808)

* rdar://111235765 ALTER TABLE ... WRITE command (apache#1231) (apache#1466)

This PR cherry-picks the command to set write distribution and ordering in a table.

These changes are needed to allow customers to control the required distribution and ordering in Iceberg.

It adds a new command that will be only supported by the Iceberg data source.

This PR comes with tests.

* rdar://84102488 Support ordering and distribution during table creation (apache#1485)

This PR cherry-picks the ordering and distribution during table creation to 3.2. The same syntax is supported in 3.0 and 3.1.

These changes are needed to define a sort key and distribution in Iceberg tables.

Yes but the changes won't affect anyone except Iceberg users.

This PR comes with tests.

Co-authored-by: Russell Spitzer <[email protected]>
Co-authored-by: Anton Okolnychyi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants