April 2020 - 24 Tutorials

Find the average of all contiguous subarrays of fixed size in it

24 Tutorials April 28, 2020 No Comments

Given an array, find the average of all contiguous subarrays of size ‘n’ in it. Array: [1, 3, 2, 6, -1, 4, 1, 8, 2], n=5 Output: [2.2, 2.8, 2.4, 3.6, 2.8] Solution: Sliding Window algorithm can be used to resolve this. Time Complexity: O(n) Space Complexity: O(1)

Programs

Find a pair in the array whose sum is equal to the given target

24 Tutorials April 28, 2020 No Comments

Given an array of sorted numbers and a target sum, find a pair in the array whose sum is equal to the given target. Write a function to return the indices of the two numbers (i.e. the pair) such that they add up to the given target. Example 1: Input: [1, 2, 3, 4, 6], target=6 Output: [1, 3] Explanation: The numbers at index 1 and 3 add up to 6: 2+4=6 We can use the Two Pointers approach to solve this. Solution: Time Complexity: O(n) Space Complexity: O(1)

hadoop

Hadoop Setup Documents

Veeraravi April 25, 2020 No Comments

Click here to download document to Setup Hadoop 2.X Click here to download document for Eclipse setup. Click here to download document for Ubuntu OS

Python / Spark

How to Retrieve Password from JCEKS file in Spark

Sai Kumar April 25, 2020 Comments Closed

In the data ingestion stage into Hadoop from RBDMS sources, it often requires password to hit source tables in RDBMS databases. Passing hard password directly is highly unsafe and bad practice in real time applications. So, password can be encrypted by creating JCEKS file. JCEKS is basically a keystore file saved in the Java Cryptography Extension KeyStore (JCEKS) format; used as an alternative keystore to the Java Keystore (JKS) format for the Java platform; stores encoded keys. When working on Spark application which deals with RDBMS sources JCEKS need to be decrypted to query the source tables. Below is the handy function to retrieve password from JCEKS file- Using PySpark Using Scala

Spark

Joins in Spark SQL- Shuffle Hash, Sort Merge, BroadCast

Sai Kumar April 22, 2020 Comments Closed

Apache Spark SQL component comes with catalyst optimizer which smartly optimizes the jobs by re-arranging the order of transformations and by implementing some special joins according to datasets. Spark performs these joins internally or you can force it to perform them. It’s worthwhile to know this topic, so that it comes to rescue when optimizing the jobs according to your use case. Shuffle Hash Join Shuffle hash join shuffles the data based on join keys and then perform the join. The shuffled hash join ensures that data on each partition will contain the same keys by partitioning the second dataset with the same default partitioner as the first, so that the keys with the same hash value from both datasets are in the same partition. It follows the classic map-reduce pattern: First ...

Programs

Two Sum

24 Tutorials April 11, 2020 No Comments

Given an array of integers, return indices of the two numbers such that they add up to a specific target. You may assume that each input would have exactly one solution, and you may not use the same element twice. Given nums = [1, 4, 8, 13], target = 12, Because nums[1] + nums[2] = 4 + 8 = 12, return [1, 2]. Solution:

Find the average of all contiguous subarrays of fixed size in it

Find a pair in the array whose sum is equal to the given target

Hadoop Setup Documents

How to Retrieve Password from JCEKS file in Spark

Joins in Spark SQL- Shuffle Hash, Sort Merge, BroadCast

Two Sum

Login

Lost Password

Register