@@ -373,6 +373,15 @@ class JavaSparkContext(val sc: SparkContext)
373
373
* other necessary info (e.g. file name for a filesystem-based dataset, table name for HyperTable,
374
374
* etc).
375
375
*
376
+ * @param conf JobConf for setting up the dataset. Note: This will be put into a Broadcast.
377
+ * Therefore if you plan to reuse this conf to create multiple RDDs, you need to make
378
+ * sure you won't modify the conf. A safe approach is always creating a new conf for
379
+ * a new RDD.
380
+ * @param inputFormatClass Class of the InputFormat
381
+ * @param keyClass Class of the keys
382
+ * @param valueClass Class of the values
383
+ * @param minPartitions Minimum number of Hadoop Splits to generate.
384
+ *
376
385
* '''Note:''' Because Hadoop's RecordReader class re-uses the same Writable object for each
377
386
* record, directly caching the returned RDD will create many references to the same object.
378
387
* If you plan to directly cache Hadoop writable objects, you should first copy them using
@@ -395,6 +404,14 @@ class JavaSparkContext(val sc: SparkContext)
395
404
* Get an RDD for a Hadoop-readable dataset from a Hadooop JobConf giving its InputFormat and any
396
405
* other necessary info (e.g. file name for a filesystem-based dataset, table name for HyperTable,
397
406
*
407
+ * @param conf JobConf for setting up the dataset. Note: This will be put into a Broadcast.
408
+ * Therefore if you plan to reuse this conf to create multiple RDDs, you need to make
409
+ * sure you won't modify the conf. A safe approach is always creating a new conf for
410
+ * a new RDD.
411
+ * @param inputFormatClass Class of the InputFormat
412
+ * @param keyClass Class of the keys
413
+ * @param valueClass Class of the values
414
+ *
398
415
* '''Note:''' Because Hadoop's RecordReader class re-uses the same Writable object for each
399
416
* record, directly caching the returned RDD will create many references to the same object.
400
417
* If you plan to directly cache Hadoop writable objects, you should first copy them using
@@ -476,6 +493,14 @@ class JavaSparkContext(val sc: SparkContext)
476
493
* Get an RDD for a given Hadoop file with an arbitrary new API InputFormat
477
494
* and extra configuration options to pass to the input format.
478
495
*
496
+ * @param conf Configuration for setting up the dataset. Note: This will be put into a Broadcast.
497
+ * Therefore if you plan to reuse this conf to create multiple RDDs, you need to make
498
+ * sure you won't modify the conf. A safe approach is always creating a new conf for
499
+ * a new RDD.
500
+ * @param fClass Class of the InputFormat
501
+ * @param kClass Class of the keys
502
+ * @param vClass Class of the values
503
+ *
479
504
* '''Note:''' Because Hadoop's RecordReader class re-uses the same Writable object for each
480
505
* record, directly caching the returned RDD will create many references to the same object.
481
506
* If you plan to directly cache Hadoop writable objects, you should first copy them using
@@ -675,6 +700,9 @@ class JavaSparkContext(val sc: SparkContext)
675
700
676
701
/**
677
702
* Returns the Hadoop configuration used for the Hadoop code (e.g. file systems) we reuse.
703
+ *
704
+ * '''Note:''' As it will be reused in all Hadoop RDDs, it's better not to modify it unless you
705
+ * plan to set some global configurations for all Hadoop RDDs.
678
706
*/
679
707
def hadoopConfiguration (): Configuration = {
680
708
sc.hadoopConfiguration
0 commit comments