Archives

Important Pillars of Stats Covariance and Correlation

Introduction : Covariance and Correlation are two mathematical concepts which are quite commonly used in statistics. When comparing data samples from different populations, Both of these two determine the relationship and measures the dependency between two random variables. Covariance and correlation show that variables can have a positive relationship, a negative relationship, or no relationship at all. A sample is a randomly chosen selection of elements from an underlying population. We calculate covariance and correlation on samples rather than complete population. Covariance and Correlation measured on samples are known as sample covariance and sample correlation. Sample Covariance : Covariance measures the extent to which the relationship between two variables is linear. The sign of ...

Python Lists

#The list class provides a mutable sequence of elements d empty_list = list() print( ’empty_list ->’ , empty_list) list_str = list(‘hello’) print(‘list_str ->’, list_str) list_tup = list((1, 2, (3, 5, 7))) print(‘list_tup ->’, list_tup) empty_list=[] print(’empty_list ->’, empty_list) list_syn = [3, 4, ‘e’, ‘b’] print(‘list_syn ->’ , list_syn) print(“‘a’ in list_syn ->”, ‘a’ in list_syn) print(“l not in list_syn ->”, 1 not in list_syn) empty_list.append(5) print( ’empty_list ->’ , empty_list) empty_list.append([6, 7]) print(’empty_list ->’, empty_list) last_elem = empty_list.pop() print(‘last...

Working with Python Strings – Operations,Functions,Formatting

While working on a real-time project you often need to play around with Strings in your logic, so it’s better to know all the functions and operations you can do with Strings. Python string can be created using Single or Double quotes. Check out this tutorial on the variables for more info. ex: temp_var = “MyString” String Concatenation- Strings can be concatenated using “+” operator same like Scala. [code lang=”python”]first_name = "Mahendra" last_name = "Dhoni" print(first_name + last_name)[/code] Output: Mahendra Dhoni Note – As already discussed in this tutorial, String has to be typecasted when concatenated with other types. ex- print(24 + “tutorials”) Above example throws exception, like TypeError: cannot...

Python If-else statements

If-else is basic control statement in any Programming language. Python if-else statement checks the expression inside “if” parenthesis and executes only when specified condition is true. Syntax: if(condition): <set of statements to be executed> elif: <set of statements> else: <set of statements> Note:  Else-if needs to be given as elif in Python and Indentation needs to be taken care properly since python works on Indentation. Example: [code lang=”python”] team="csk" if(team == "csk"): print("Captain is MS Dhoni") elif(team == "rcb"): print("Captain is V Kohli") else: print("please give valid team name")[/code] Output: Captain is MS  Dhoni

Python Variables and DataTypes

Python is pure object-oriented, everything variable is an object. Unlike Java, you no need to declare a variable and specify its datatype. It is intelligent enough to infer the datatype automatically. Below is the syntax to declare a variable in Python. Just specify name and use (=) operator to assign a value. [code lang=”python”] a = 6 b = 7 print(a) print(b)[/code] Output: 6 7 To check the datatype of the variable use – type() method. [code lang=”python”]type(a)[/code] Output: class <Int> You can also explicitly define the variable to specific datatype-like int(value),float(value),str(value) [code lang=”python” highlight=”float(5)” light=”false”]float_var = float(5) print(float_var) print(type(float_var))[/code] Outp...

How to Use Python For Loop ?

In this article, you’ll learn how to use Python for loop (Range Collection, String, Collections)? Using Python For Loop on range collection: Using Python For Loop in String: Using Python For Loop on Collections :   For any queries or doubts Ask Questions in 24Turorials Forum.

How to generate DDL(create statement) with columns using Python[code snippets]

Data loading is the initial step in Big Data Analytics world, you are supposed to push all the data to Hadoop first and then you can start working on analytics. When loading data to Hadoop environment, in some cases you will be getting data in the form of flat files. Once the data is loaded, if you want to view data or query this data we need to create HIVE table on top of that data. So it is obvious to create DDL if you want to create hive table. In real time, you have to check the file get the column names and then you have to create DDL manually. This tutorial helps you to get rid of manual work and you can create DDLs dynamically in a single click with Python. Let’s say we have the incoming data file as shown below – Name|ID|ContactInfo|Date_emp Michael|100|547-968-091|2014...

All about Python Classes – Demo with examples

Python Classes are all types – Class Definitions Class Initialization Class Methods

Deep dive into Partitioning in Spark – Hash Partitioning and Range Partitioning

Partitions- The data within an RDD is split into several partitions. Properties of partitions: – Partitions never span multiple machines, i.e., tuples in the same partition are guaranteed to be on the same machine. – Each machine in the cluster contains one or more partitions. – The number of partitions to use is configurable. By default, it equals the total number of cores on all executor nodes. Two kinds of partitioning available in Spark: – Hash partitioning – Range partitioning Customizing a partitioning is only possible on Pair RDDs. Hash partitioning- Given a Pair RDD that should be grouped: val purchasesPerCust = purchasesRdd.map(p -> (p.customerId, p.price)) // Pair RDD .groupByKey() groupByKey first computes per tuple (k, v) its partition p: p = k....

Spark runtime Architecture – How Spark Jobs are executed

How Spark Jobs are Executed- A Spark application is a set of processes running on a cluster. All these processes are coordinated by the driver program. The driver is: -the process where the main() method of your program run. -the process running the code that creates a SparkContext, creates RDDs, and stages up or sends off transformations and actions. These processes that run computations and store data for your application are executors. Executors: -Run the tasks that represent the application. -Return computed results to the driver. -Provide in-memory storage for cached RDDs. Execution of a Spark program: 1. The driver program runs the Spark application, which creates a SparkContext upon start-up. 2. The SparkContext connects to a cluster manager (e.g., Mesos/YARN) which allocates resour...

Lost Password

Register

24 Tutorials