Is it a bad idea to use two firewalls on the same PC?

It depends, mainly, on the third-party firewall. I have seen cases in which trying to install two software firewalls will cause both to be frightened or to cause the uninstallation of the other one. Of course, it would add much more administration and complexity, which would result in possible security risks. So I would have to say yes, depending on how the two firewalls and the specific firewall interact.

I'm general, I use a host and a network firewall instead of

apache spark – What is `ExistingRDD` and is it bad for the query plan?

From what I see, rdd.toDF () introduces PythonRDD, it converts into ExistingDrD in the consultation plan.

df1 = spark.range (100, numPartitions = 5)
df2 = df1.rdd.toDF ()

print (df1.rdd.toDebugString ())
# (5) MapPartitionsRDD[2097] in javaToPython in : 0 []
# | MapPartitionsRDD[2096] in javaToPython in : 0 []
# | MapPartitionsRDD[2095] in javaToPython in : 0 []
# | MapPartitionsRDD[2094] in javaToPython in : 0 []
# | ParallelCollectionRDD[2093] in javaToPython in : 0 []
print (df2.rdd.toDebugString ())
# (5) MapPartitionsRDD[2132] in javaToPython in : 0 []
# | MapPartitionsRDD[2131] in javaToPython in : 0 []
# | MapPartitionsRDD[2130] in javaToPython in : 0 []
# | MapPartitionsRDD[2129] in applySchemaToPythonRDD in : 0 []
# | MapPartitionsRDD[2128] on the map in SerDeUtil.scala: 137 []
# | MapPartitionsRDD[2127] in mapPartitions in SerDeUtil.scala: 184 []
# | PythonRDD[2126] in RDD in PythonRDD.scala: 53 []
# | MapPartitionsRDD[2097] in javaToPython in : 0 []
# | MapPartitionsRDD[2096] in javaToPython in : 0 []
# | MapPartitionsRDD[2095] in javaToPython in : 0 []
# | MapPartitionsRDD[2094] in javaToPython in : 0 []
# | ParallelCollectionRDD[2093] in javaToPython in : 0 []

If I use the DataFrame cache df1.cache ()Spark SQL is smart enough to make use of the query cache for the equivalent RDD.

spark.range (100, numPartitions = 5) .groupby (). count (). explain ()
# == Physical plan ==
# * (2) HashAggregate (keys =[], functions =[count(1)])
# + - Single partition exchange
# + - * (1) HashAggregate (keys =[], functions =[partial_count(1)])
# + - * (1) InMemoryTableScan
# + - InMemoryRelation [id#2525L], StorageLevel (disk, memory, deserialized, 1 replicas)
# + - * (1) Range (0, 100, step = 1, divisions = 5)

But nevertheless, ExistingDrD does not benefit from it

df2.groupby (). count (). Explanation ()
# == Physical plan ==
# * (2) HashAggregate (keys =[], functions =[count(1)])
# + - Single partition exchange
# + - * (1) HashAggregate (keys =[], functions =[partial_count(1)])
# + - * (1) Project
# + - Scan ExistingRDD[id#2573L]

It seems that the Spark SQL optimizer does not track through a ExistingDrD. It is true?

Do you still benefit from the RDD cache if you use df1.rdd.cache ()?

How bad was Hillary Clinton that she could not beat a putz like Trump?

Report abuse

Additional details