Author: Sai Kumar

Steps for creating DataFrames, SchemaRDD and performing operations using SparkSQL

Spark SQL: SparkSQL is a Spark module for Structured data processing. One use of SparkSQL is to execute SQL queries using a basic SQL syntax. There are several ways to interact with Spark SQL including SQL, the dataframes API,dataset API. The backbone for all these operation is Dataframes and SchemaRDD. DataFrames A dataFrame is a distributed collection of data organised into named columns. It is ...

Word count program in Spark

WordCount in Spark WordCount program is like basic hello world program when it comes to Big data world. Below is program to achieve wordCount in Spark with very few lines of code. [code lang=”scala”]val inputlines = sc.textfile("/users/guest/read.txt") val words = inputlines.flatMap(line=>line.split(" ")) val wMap = words.map(word => (word,1)) val wOutput = wM...

Reversal of string in Scala using recursive function

Reversal of String in Scala using recursive function – object reverseString extends App { val s = “24Tutorials” print(revs(s)) def revs(s: String): String = { // if (s.isEmpty) “” if (s.length == 1)  s else revs(s.tail) + s.head //else revs(s.substring(1)) + s.charAt(0) } } } Output: slairotuT42

Scala Important topics-Interview questions

Q1) CASE Classes: A case class is a class that may be used with the match/case statement. Case classes can be pattern matched Case classes automatically define hashcode and equals Case classes automatically define getter methods for the constructor arguments. Case classes can be seen as plain and immutable data-holding objects that should exclusively depend on their constructor arguments. Case cla...

Hadoop MapReduce Interview Questions

Hadoop MapReduce Interview Questions and Answers Explain the usage of Context Object. Context Object is used to help the mapper interact with other Hadoop systems. Context Object can be used for updating counters, to report the progress and to provide any application level status updates. ContextObject has the configuration details for the job and also interfaces, that helps it to generating the o...

Hadoop HDFS Interview questions

What is a block and block scanner in HDFS? Block – The minimum amount of data that can be read or written is generally referred to as a “block” in HDFS. The default size of a block in HDFS is 64MB. Block Scanner – Block Scanner tracks the list of blocks present on a DataNode and verifies them to find any kind of checksum errors. Block Scanners use a throttling mechanism to reserve disk...

Pig Quick notes

PIG QUICK NOTES: Pig latin – is the language used to analyze data in Hadoop using Apache Pig. A RELATION is outermost structure of Pig Latin data model. and it is bag where- -A bag is collection of Tuples -A tuple is an ordered set of fields -A field is a piece of data Pig Latin –Statements While processing data using Pig Latin, statements are the basic constructs. 1. These statements work w...

Introduction to Social Media Marketing

Dynamically brand synergistic schemas via cross functional networks. Quickly visualize web-enabled strategic theme areas for cross functional e-business. Enthusiastically productize client-centered web-readiness without cost effective outsourcing. Uniquely target integrated content whereas backend deliverables. Appropriately simplify viral bandwidth via premier users. Continually formulate virtual...

Introduction to Search Engine Marketing Online Tutorials

Dynamically brand synergistic schemas via cross functional networks. Quickly visualize web-enabled strategic theme areas for cross functional e-business. Enthusiastically productize client-centered web-readiness without cost effective outsourcing. Uniquely target integrated content whereas backend deliverables. Appropriately simplify viral bandwidth via premier users. Continually formulate virtual...

Introduction to Affiliate Marketing Online Tutorials

Dynamically brand synergistic schemas via cross functional networks. Quickly visualize web-enabled strategic theme areas for cross functional e-business. Enthusiastically productize client-centered web-readiness without cost effective outsourcing. Uniquely target integrated content whereas backend deliverables. Appropriately simplify viral bandwidth via premier users. Continually formulate virtual...

Introduction to Google AdSense Online Tutorials

Completely extend intuitive potentialities before an expanded array of web services. Appropriately communicate front-end process improvements through interactive imperatives. Conveniently grow excellent results rather than integrated supply chains. Rapidiously recaptiualize. Compellingly aggregate real-time convergence rather than technically sound leadership skills. Rapidiously mesh backend netwo...

Introduction to Email Marketing Online Tutorials

Interactively disseminate extensive ROI via scalable vortals. Completely streamline team building imperatives before reliable technology. Appropriately generate next-generation alignments through real-time initiatives. Distinctively innovate e-business growth strategies through parallel platforms.Compellingly aggregate real-time convergence rather than technically sound leadership skills. Rapidiou...

Lost Password

Register

24 Tutorials