In the data ingestion stage into Hadoop from RBDMS sources, it often requires password to hit source tables in RDBMS databases. Passing hard password directly is highly unsafe and bad practice in real time applications. So, password can be encrypted by creating JCEKS file. JCEKS is basically a keystore file saved in the Java Cryptography Extension KeyStore (JCEKS) format; used as an alternative keystore to the Java Keystore (JKS) format for the Java platform; stores encoded keys. When working on Spark application which deals with RDBMS sources JCEKS need to be decrypted to query the source tables. Below is the handy function to retrieve password from JCEKS file- Using PySpark Using Scala
You may required to add Serial number to Spark Dataframe sometimes. It can be done with the spark function called monotonically_increasing_id(). It generates a new column with unique 64-bit monotonic index for each row. But it isn’t significant, as the sequence changes based on the partition. In short, random numbers will be assigned which are out of sequence. If the goal is add serial number to the dataframe, you can use zipWithIndex method available on RDD. below is how you can achieve the same on dataframe. [code lang=”python”] from pyspark.sql.types import LongType, StructField, StructType def dfZipWithIndex (df, offset=1, colName="rowId"): ”’ Enumerates dataframe rows is native order, like rdd.ZipWithIndex(), but on a dataframe and preserves a ...
#The list class provides a mutable sequence of elements d empty_list = list() print( ’empty_list ->’ , empty_list) list_str = list(‘hello’) print(‘list_str ->’, list_str) list_tup = list((1, 2, (3, 5, 7))) print(‘list_tup ->’, list_tup) empty_list=[] print(’empty_list ->’, empty_list) list_syn = [3, 4, ‘e’, ‘b’] print(‘list_syn ->’ , list_syn) print(“‘a’ in list_syn ->”, ‘a’ in list_syn) print(“l not in list_syn ->”, 1 not in list_syn) empty_list.append(5) print( ’empty_list ->’ , empty_list) empty_list.append([6, 7]) print(’empty_list ->’, empty_list) last_elem = empty_list.pop() print(‘last...
While working on a real-time project you often need to play around with Strings in your logic, so it’s better to know all the functions and operations you can do with Strings. Python string can be created using Single or Double quotes. Check out this tutorial on the variables for more info. ex: temp_var = “MyString” String Concatenation- Strings can be concatenated using “+” operator same like Scala. [code lang=”python”]first_name = "Mahendra" last_name = "Dhoni" print(first_name + last_name)[/code] Output: Mahendra Dhoni Note – As already discussed in this tutorial, String has to be typecasted when concatenated with other types. ex- print(24 + “tutorials”) Above example throws exception, like TypeError: cannot...
If-else is basic control statement in any Programming language. Python if-else statement checks the expression inside “if” parenthesis and executes only when specified condition is true. Syntax: if(condition): <set of statements to be executed> elif: <set of statements> else: <set of statements> Note: Else-if needs to be given as elif in Python and Indentation needs to be taken care properly since python works on Indentation. Example: [code lang=”python”] team="csk" if(team == "csk"): print("Captain is MS Dhoni") elif(team == "rcb"): print("Captain is V Kohli") else: print("please give valid team name")[/code] Output: Captain is MS Dhoni
Python is pure object-oriented, everything variable is an object. Unlike Java, you no need to declare a variable and specify its datatype. It is intelligent enough to infer the datatype automatically. Below is the syntax to declare a variable in Python. Just specify name and use (=) operator to assign a value. [code lang=”python”] a = 6 b = 7 print(a) print(b)[/code] Output: 6 7 To check the datatype of the variable use – type() method. [code lang=”python”]type(a)[/code] Output: class <Int> You can also explicitly define the variable to specific datatype-like int(value),float(value),str(value) [code lang=”python” highlight=”float(5)” light=”false”]float_var = float(5) print(float_var) print(type(float_var))[/code] Outp...
In this article, you’ll learn how to use Python for loop (Range Collection, String, Collections)? Using Python For Loop on range collection: Using Python For Loop in String: Using Python For Loop on Collections : For any queries or doubts Ask Questions in 24Turorials Forum.
Data loading is the initial step in Big Data Analytics world, you are supposed to push all the data to Hadoop first and then you can start working on analytics. When loading data to Hadoop environment, in some cases you will be getting data in the form of flat files. Once the data is loaded, if you want to view data or query this data we need to create HIVE table on top of that data. So it is obvious to create DDL if you want to create hive table. In real time, you have to check the file get the column names and then you have to create DDL manually. This tutorial helps you to get rid of manual work and you can create DDLs dynamically in a single click with Python. Let’s say we have the incoming data file as shown below – Name|ID|ContactInfo|Date_emp Michael|100|547-968-091|2014...
Python Classes are all types – Class Definitions Class Initialization Class Methods