Archives

Python Variables and DataTypes

Python is pure object-oriented, everything variable is an object. Unlike Java, you no need to declare a variable and specify its datatype. It is intelligent enough to infer the datatype automatically. Below is the syntax to declare a variable in Python. Just specify name and use (=) operator to assign a value. [code lang=”python”] a = 6 b = 7 print(a) print(b)[/code] Output: 6 7 To check the datatype of the variable use – type() method. [code lang=”python”]type(a)[/code] Output: class <Int> You can also explicitly define the variable to specific datatype-like int(value),float(value),str(value) [code lang=”python” highlight=”float(5)” light=”false”]float_var = float(5) print(float_var) print(type(float_var))[/code] Outp...

How to Use Python For Loop ?

In this article, you’ll learn how to use Python for loop (Range Collection, String, Collections)? Using Python For Loop on range collection: Using Python For Loop in String: Using Python For Loop on Collections :   For any queries or doubts Ask Questions in 24Turorials Forum.

How to generate DDL(create statement) with columns using Python[code snippets]

Data loading is the initial step in Big Data Analytics world, you are supposed to push all the data to Hadoop first and then you can start working on analytics. When loading data to Hadoop environment, in some cases you will be getting data in the form of flat files. Once the data is loaded, if you want to view data or query this data we need to create HIVE table on top of that data. So it is obvious to create DDL if you want to create hive table. In real time, you have to check the file get the column names and then you have to create DDL manually. This tutorial helps you to get rid of manual work and you can create DDLs dynamically in a single click with Python. Let’s say we have the incoming data file as shown below – Name|ID|ContactInfo|Date_emp Michael|100|547-968-091|2014...

All about Python Classes – Demo with examples

Python Classes are all types – Class Definitions Class Initialization Class Methods

Deep dive into Partitioning in Spark – Hash Partitioning and Range Partitioning

Partitions- The data within an RDD is split into several partitions. Properties of partitions: – Partitions never span multiple machines, i.e., tuples in the same partition are guaranteed to be on the same machine. – Each machine in the cluster contains one or more partitions. – The number of partitions to use is configurable. By default, it equals the total number of cores on all executor nodes. Two kinds of partitioning available in Spark: – Hash partitioning – Range partitioning Customizing a partitioning is only possible on Pair RDDs. Hash partitioning- Given a Pair RDD that should be grouped: val purchasesPerCust = purchasesRdd.map(p -> (p.customerId, p.price)) // Pair RDD .groupByKey() groupByKey first computes per tuple (k, v) its partition p: p = k....

Spark runtime Architecture – How Spark Jobs are executed

How Spark Jobs are Executed- A Spark application is a set of processes running on a cluster. All these processes are coordinated by the driver program. The driver is: -the process where the main() method of your program run. -the process running the code that creates a SparkContext, creates RDDs, and stages up or sends off transformations and actions. These processes that run computations and store data for your application are executors. Executors: -Run the tasks that represent the application. -Return computed results to the driver. -Provide in-memory storage for cached RDDs. Execution of a Spark program: 1. The driver program runs the Spark application, which creates a SparkContext upon start-up. 2. The SparkContext connects to a cluster manager (e.g., Mesos/YARN) which allocates resour...

CUT command in Unix/Linux with examples

Cut Command: – CUT is used to process data in file. – Works only on file having column formatted data Command 1: Display particular position character cut -c3 file.txt Command 2: Range of characters cut -c3-8 file.txt cut -c3- file.txt cut -c-10 file.txt Command 3: Display Columns after seperation cut -d “|” -f2 file.txt cut -d “|” -f2-3 file.txt cut -d “|” -f2- file.txt Command 4: Display all other than given columns[–complement] cut -d “|” -f2 file.txt cut -d “|” –complement -f2 file.txt

GREP command in Unix/Linux with examples

grep – Global Regular Expression Parser It is used to search data in one/more files. Command 1: Search pattern in file - grep hello file.txt - grep sai file.txt file2.txt Command 2: Search pattern in current folder with all txt extensions. grep 1000 *.txt Command 3: Search data in all files in current folder grep 1000 *.* Command 4: Search ignoring case[-i] grep "Sai" file.txt (case sensitive by default) grep -i "Sai" file.txt Command 5: Display line number [-n] grep -n "124" result.txt Command 6: Get only filenames in which data exists[-l] grep -l "100" *.* Command 7: Search exact word [-w] grep -w Sai file.txt Command 8: Search lines of files which does not have that data(reverse of search)[-v] grep -v "1000" file.txt Command 9: - Get one record before the search grep -B 1 "Msd" fi...

SED command in Unix/Linux with examples

SED – Stream Editor Used to display & editing data Editing options are – Insertion/Updation/Deletion 2 Types of Operations ——————— – Lines Addressing – Context Addressing Line Addressing- Command 1: Display line multiple times sed '2p' file.txt sed -n '3p' file.txt (specific line => -n) sed -n '5p' file.txt Command 2: Display last line[$] sed '$p' file.txt (includes last line again along with original) sed -n '$p' file.txt (Specific) Command 3: Range of lines sed -n '2,4p' file.txt Command 4: Do not display specific lines sed -n '2!p' file.txt sed -n '2,4!p' file.txt - do not display specific range of lines(!) Context Addressing: Command 1: Display lines having a specific word sed -n '/Amit/p' file.txt sed -n '/[Aa]mi...

All about AWK command in Unix – Part 1

AWK – select column data -Search data in file and print data on console -Find data of specific columns -Format output data -Used on file with bulk of data for searching, conditional executions, updating, filtering Command 1 ——— Print specific columns awk '{print $1}' file.txt by default TAB seperator awk '{print $1 "--" $2}' file.txt Command 2 – ———– select all data from table awk '{print $0}' tabfile.txt Command 3- ———– select columns from CSV awk -F "," '{print $1}' commafile.txt 1.Seperating data using -F awk -F "," '{print $1}' commafile.txt 2.Using variable (FS) awk '{print $2}' FS="," commafile.txt Command 4- ———- Display content without displaying header of file awk 'NR!=1{print $1 " " $2...

Lost Password

Register

24 Tutorials