Author: Sai Kumar

Binary Search in Scala (Iterative,Recursion,Tail Recursion Approaches)

Binary Search is a fast & efficient algorithm to search an element in sorted list of elements. It works on the technique of divide and conquer. Data structure: Array Time Complexity: Worst case: O(log n) Average case: O(log n) Best case: O(1) Space complexity: O(1) Let’s see how it can implemented in Scala with different approaches – Iterative Approach – Recursion Approach &#...

Handy Methods in SparkContext Object while writing Spark Applications

SparkContext is Main entry point for Spark functionality. Its basically a class in Spark framework, when initialized, gets access to Spark Libraries. A SparkContext is responsible for connecting to Spark cluster, and can be used to create RDD(Resilient Distributed Dataset), to broadcast variables on that cluster and has much more useful methods. To create or initialize Spark Context, SparkConf nee...

How to Retrieve Password from JCEKS file in Spark

In the data ingestion stage into Hadoop from RBDMS sources, it often requires password to hit source tables in RDBMS databases. Passing hard password directly is highly unsafe and bad practice in real time applications. So, password can be encrypted by creating JCEKS file. JCEKS is basically a keystore file saved in the Java Cryptography Extension KeyStore (JCEKS) format; used as an alternative ke...

Joins in Spark SQL- Shuffle Hash, Sort Merge, BroadCast

Apache Spark SQL component comes with catalyst optimizer which smartly optimizes the jobs by re-arranging the order of transformations and by implementing some special joins according to datasets. Spark performs these joins internally or you can force it to perform them. It’s worthwhile to know this topic, so that it comes to rescue when optimizing the jobs according to your use case. Shuffl...

Understanding Tail recursion in Scala

Tail recursion is little tricky concept in Scala and takes time to master it completely. Before we get into Tail recursion, lets try to look into recursion. A Recursive function is the function which calls itself. If some action is repetitive, we can call the same piece of code again. Recursion could be applied to problems where you use regular loops to solve it. Factorial program with regular loo...

Memory Management in Spark and its tuning

Spark has two kinds of memory- 1.Execution Memory which is used to store temporary data of shuffles, joins, sorts, and aggregations 2. Storage Memory     which is used to cache RDDs and data frames Executor has some amount of total memory, which is divided into two parts, the execution block and the storage block.This is governed by two configuration options. 1. spark.executor.memory > It is th...

How to create Spark Dataframe on HBase table[Code Snippets]

There is no direct library to create Dataframe on HBase table like how we read Hive table with Spark sql. This post gives the way to create dataframe on top of Hbase table. You need to add hbase-client dependency to achieve this. Below is the link to get the dependency. https://mvnrepository.com/artifact/org.apache.hbase/hbase-client/2.1.0 Lets say the hbase table is ’emp’ with rowKey ...

How to Add Serial Number to Spark Dataframe

You may required to add Serial number to Spark Dataframe sometimes. It can be done with the spark function called monotonically_increasing_id(). It generates a new column with unique 64-bit monotonic index for each row. But it isn’t significant, as the sequence changes based on the partition. In short,  random numbers will be assigned which are out of sequence. If the goal is add serial numb...

Python Lists

#The list class provides a mutable sequence of elements d empty_list = list() print( ’empty_list ->’ , empty_list) list_str = list(‘hello’) print(‘list_str ->’, list_str) list_tup = list((1, 2, (3, 5, 7))) print(‘list_tup ->’, list_tup) empty_list=[] print(’empty_list ->’, empty_list) list_syn = [3, 4, ‘e’, ‘...

Working with Python Strings – Operations,Functions,Formatting

While working on a real-time project you often need to play around with Strings in your logic, so it’s better to know all the functions and operations you can do with Strings. Python string can be created using Single or Double quotes. Check out this tutorial on the variables for more info. ex: temp_var = “MyString” String Concatenation- Strings can be concatenated using “+...

Python If-else statements

If-else is basic control statement in any Programming language. Python if-else statement checks the expression inside “if” parenthesis and executes only when specified condition is true. Syntax: if(condition): <set of statements to be executed> elif: <set of statements> else: <set of statements> Note:  Else-if needs to be given as elif in Python and Indentation needs ...

Python Variables and DataTypes

Python is pure object-oriented, everything variable is an object. Unlike Java, you no need to declare a variable and specify its datatype. It is intelligent enough to infer the datatype automatically. Below is the syntax to declare a variable in Python. Just specify name and use (=) operator to assign a value. [code lang=”python”] a = 6 b = 7 print(a) print(b)[/code] Output: 6 7 To che...

Lost Password

Register

Do NOT follow this link or you will be banned from the site!