Archives

Polynomial Logistic Regression[Case Study]

Understand Power of Polynomials with Polynomial Regression Polynomial regression is a special case of linear regression. With the main idea of how do you select your features. Looking at the multivariate regression with 2 variables: x1 and x2. Linear regression will look like this: y = a1 * x1 + a2 * x2. Now you want to have a polynomial regression (let’s make 2 degree polynomial). We will create a few additional features: x1*x2, x1^2 and x2^2. So we will get your ‘linear regression’: y = a1 * x1 + a2 * x2 + a3 * x1*x2 + a4 * x1^2 + a5 * x2^2 A polynomial term : a quadratic (squared) or cubic (cubed) term turns a linear regression model into a curve.  But because it is the data X that is squared or cubed, not the Beta coefficient, it still qualifies as a linear model. Thi...

PCA for Fast ML

Speeding Up and Benchmarking Logistic Regression With PCA Introduction : When the data becomes too much in its dimension then it becomes a problem for pattern learning. Too much information is bad on two things : compute and execution time and quality of the model fit. When the dimension of the data is too high we need to find a way to reduce it. But that reduction has to be done in such a way that we maintain the original pattern of the data.  The algorithm that we are going to discuss in this article does the similar job. The algorithm is quite famous and widely used in varieties of tasks. Its name is Principal Component Analysis aka PCA. The main purpose of principal component analysis is the analysis of data to identify patterns and finding patterns to reduce the dimensions of the data...

Naive Bayes Algorithm [Case Study]

Simple Progression Towards Simple Linear Regression Introduction : It is a classification technique based on Bayes’ Theorem with an assumption of independence among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature. For example, a dress may be considered to be a shirt if it is red, printed, and has full sleeve . Even if these features depend on each other or upon the existence of the other features, all of these properties independently contribute to the probability that this cloth is a shirt and that is why it is known as ‘Naive’. Classification Machine Learning is a technique of learning where a particular instance is mapped against one of the many labels. The labels are pre...

Multivariate MultiLabel Classification with Logistic Regression[Case Study]

Multivariate multilabel classification with Logistic Regression Introduction: The goal of the blog post is show you how logistic regression can be applied to do multi class classification.  We will mainly focus on learning to build a multivariate logistic regression model for doing a multi class classification. The data cleaning and preprocessing parts will be covered in detail in an upcoming post. Logistic regression is one of the most fundamental and widely used Machine Learning Algorithms. Logistic regression is usually among the first few topics which people pick while learning predictive modeling. Logistic regression is not a regression algorithm but actually a probabilistic classification model. Classification in Machine Learning is a technique of learning where a particular instance...

Multivariate Linear Regression[Case Study]

Learn To Make Prediction By Using Multiple Variables Introduction : The goal of the blogpost is to equip beginners with basics of Linear Regression algorithm having multiple features and quickly help them to build their first model. This is also known as multivariable Linear Regression. We will mainly focus on the modeling side of it . The data cleaning and preprocessing parts would be covered in detail in an upcoming post. A multivariable model can be thought of as a model in which multiple variables are found on the right side of the model equation. This type of statistical model can be used to attempt to assess the relationship between a number of variables. A simple linear regression model has a continuous outcome and one predictor, whereas a multiple or multivariable linear regression...

k-Nearest Neighbors Classification algorithm [Case Study]

Predicting car quality with the help of Neighbors Introduction : The goal of the blogpost is to get the beginners started with fundamental concepts  of the K Nearest Neighbour Classification Algorithm popularly known by the name KNN classifiers. We will mainly focus on learning to build your first KNN model. The data cleaning and preprocessing parts would be covered in detail in an upcoming post. Classification Machine Learning is a technique of learning where a particular instance is mapped against one of the many labels. The labels are prespecified to train your model . The machine learns the pattern from the data in such a way that the learned representation successfully  maps the original dimension to the suggested label/class without any more intervention from a human expert. How does...

K-Means model for Predicting Car quality[Case Study]

Problem Statement : To build a simple K-Means model for clustering the car data into different groups. Data details ========================================== 1. Title: Car Evaluation Database========================================== The dataset is available at  “http://archive.ics.uci.edu/ml/datasets/Car+Evaluation” 2. Sources: (a) Creator: Marko Bohanec (b) Donors: Marko Bohanec   (marko.bohanec@ijs.si) Blaz Zupan      (blaz.zupan@ijs.si) (c) Date: June, 19973. Past Usage: The hierarchical decision model, from which this dataset is derived, was first presented in M. Bohanec and V. Rajkovic: Knowledge acquisition and explanation for multi-attribute decision making. In 8th Intl Workshop on Expert Systems and their Applications, Avignon, France. pages 59-78, 1988. Within machine-learning, ...

K-Means Clustering Algorithm

Introduction : The goal of the blogpost is to get the beginners started with fundamental concepts  of the K Means clustering Algorithm. We will mainly focus on learning to build your first  K Means clustering model. The data cleaning and preprocessing parts would be covered in detail in an upcoming post. Clustering : Clustering can be considered the most important unsupervised learning problem; so, as every other problem of this kind, it deals with finding a structure in a collection of unlabeled data. A loose definition of clustering could be “the process of organizing objects into groups whose members are similar in some way”. A cluster is therefore a collection of objects which are “similar” between them and are “dissimilar” to the objects belonging to other clusters. We can show this w...

Using Decision Trees for Regression Problems [Case Study]

Introduction : The goal of the blogpost is to equip beginners with the basics of Decision Tree Regressor algorithm and quickly help them to build their first model. We will mainly focus on the modelling side of it. The data cleaning and preprocessing parts would be covered in detail in an upcoming post. In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference between the estimated values and what is estimated. The MSE is a measure of the quality of an estimator—it is always non-negative, and values closer to zero are better. The Mean Squared Error is given by: Enough of theory , let’s start with implementation. P...

Decision Tree model for prediction of Car quality [Case Study]

Problem Statement : To build a Decision Tree model for prediction of car quality given other attributes about the car. Data details: ========================================== 1. Title: Car Evaluation Database ========================================== The dataset is available at  “http://archive.ics.uci.edu/ml/datasets/Car+Evaluation” 2. Sources: (a) Creator: Marko Bohanec (b) Donors: Marko Bohanec   (marko.bohanec@ijs.si) Blaz Zupan      (blaz.zupan@ijs.si) (c) Date: June, 1997 3. Past Usage: The hierarchical decision model, from which this dataset is  derived, was first presented in M. Bohanec and V. Rajkovic: Knowledge acquisition and explanation for  multi-attribute decision making. In 8th Intl Workshop on Expert Systems and their Applications, Avignon, France. pages 59-78, 1988. With...

Making Intelligent decisions with Decision Trees

Introduction : In this blog we will discuss a Machine Learning Algorithm called Decision Tree. The goal of the blogpost is to get the beginners started with fundamental concepts of a Decision Tree and quickly help them to develop their first tree model in no time. A decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences. It is one way to display an algorithm that only contains conditional control statements. A decision tree is a flowchart-like structure in which each internal node represents a “test” on an attribute , each branch represents the outcome of the test, and each leaf node represents a class label (decision taken after computing all attributes). The paths from root to leaf represent classification ru...

Important Pillars of Stats Covariance and Correlation

Introduction : Covariance and Correlation are two mathematical concepts which are quite commonly used in statistics. When comparing data samples from different populations, Both of these two determine the relationship and measures the dependency between two random variables. Covariance and correlation show that variables can have a positive relationship, a negative relationship, or no relationship at all. A sample is a randomly chosen selection of elements from an underlying population. We calculate covariance and correlation on samples rather than complete population. Covariance and Correlation measured on samples are known as sample covariance and sample correlation. Sample Covariance : Covariance measures the extent to which the relationship between two variables is linear. The sign of ...

Lost Password

Register

24 Tutorials