1.What is the version of spark you are using? Check the spark version you are using before going to Interview. As per 2020, the latest version of spark is 2.4.x 2.Difference between RDD, Dataframe, Dataset? RDD – RDD is Resilient Distributed Dataset. It is the fundamental data structure of Spark and is immutable collection of records partitioned across nodes of cluster. It allows us to perfo...

PySpark Core Components includes – Spark Core – All functionalities built on top of Spark Core. Contains classes like SparkContext, RDD Spark SQL – Gives API for structured data processing. Contains important classes like SparkSession, DataFrame, DataSet. Spark Streaming – Gives functionality for Streaming data processing using micro-batching technique. Contains classes lik...

Given an array, find the average of all contiguous subarrays of size ‘n’ in it. Array: [1, 3, 2, 6, -1, 4, 1, 8, 2], n=5 Output: [2.2, 2.8, 2.4, 3.6, 2.8] Solution: Sliding Window algorithm can be used to resolve this. Time Complexity: O(n) Space Complexity: O(1)

Given an array of sorted numbers and a target sum, find a pair in the array whose sum is equal to the given target. Write a function to return the indices of the two numbers (i.e. the pair) such that they add up to the given target. Example 1: Input: [1, 2, 3, 4, 6], target=6 Output: [1, 3] Explanation: The numbers at index 1 and 3 add up to 6: 2+4=6 We can use the Two Pointers approach to solve t...

Given an array of integers, return indices of the two numbers such that they add up to a specific target. You may assume that each input would have exactly one solution, and you may not use the same element twice. Given nums = [1, 4, 8, 13], target = 12, Because nums[1] + nums[2] = 4 + 8 = 12, return [1, 2]. Solution:

What is Linked List? A linked list is a linear data structure where each element is a separate object. Each element (we will call it a node) of a list is comprising of two items – the data and a reference to the next node. The last node has a reference to null. The entry point into a linked list is called the head of the list. It should be noted that head is not a separate node, but the refe...

What is an Array? An array is collection of items stored at contiguous memory locations. The idea is to store multiple items of same type(Homogeneous) together. This makes it easier to calculate the position of each element by simply adding an offset to a base value, i.e., the memory location of the first element of the array (generally denoted by the name of the array). What is Contiguous Memory?...

Definition: Simple definition of Data structure is organizing the data in memory. It is a systematic way to organize data in order to use it efficiently. There are different ways to organize data in structure. One example is Array. Array is collection of elements i.e., collection of memory locations, it is the memory locations that we store the values. In the array, structure of data is sequential...

Memory Management in Spark 1.6 Executors run as Java processes, so the available memory is equal to the heap size. Internally available memory is split into several regions with specific functions. Execution Memory storage for data needed during tasks execution shuffle-related data Storage Memory storage of cached RDDs and broadcast variables possible to borrow from execution memory (spill otherwi...