Estimate the unit sales of Walmart retail goods

Table of contents:

  1. Business Problem
  2. Source of Data
  3. Why ML approach, and Why not normal Statistical Method
  4. Performance metrics
  5. Existing approaches to the problem
  6. Exploratory Data Analysis
  7. First Cut Solution
  8. Model Explanation
  9. Comparison of models
  10. Future Work
  11. References

Business Problem

In the era of online stores, It’s become essential and mandatory to know the demand for any product in near future. It will help stores to stock the product which demand is going to be high in near future. If we are able to do so, it will directly improve the revenue of the company.

i.e. if we have an event next week, definitely the…

Hypothesis Testing is a statistical way to test our experiment to see if we have a meaningful result or not.

By Wikipedia :

“ statistical hypothesis is a hypothesis that is testable on the basis of observed data modeled as the realized values taken by a collection of random variables”

Table of Content

  • Hypothesis Testing Methodology
  • Steps to perform Hypothesis Testing
  • Case Study 1: Is there any difference in height of classroom 1 and classroom 2 students?
  • Case Study 2 : A company manufacturing RAM chips claims the defective rate of the population is 5%. Let p denote the true defective probability.
  • Properties…

Recently, I was searching for Jupyter Notebook to run my Android mobile. But there was no straight way to install it. So I thought let’s write about it.

We will divide it into two parts, the first will be application installation, and the second will be the Android Setup.

1. Installation

There are two things you need to install from the google play store to get started.

1 . Install Pydroid 3 — IDE for Python 3 from google play store

2. Install Pydroid repository plugin from google play store

Automatic any activity on a web browser called Web Automation. Selenium is one of the most practiced open-source frameworks for automation.

I have divided this blog into two parts. The first section will cover the introduction of selenium & installation of selenium in python and the second part will cover automation using python.

What is Selenium?

Selenium is a portable framework for testing web applications. It provides a single interface that lets you write test scripts in programming languages like Ruby, Java, NodeJS, PHP, Perl, Python, and C#, among others. …

It is my second blog on TensorFlow 2.0 and I’ll explain image classification on the CIFAR-10 dataset. CIFAR stands for Canadian Institute For Advanced Research and 10 refers to 10 classes. It consists of 60000 32x32 color images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

Let’s start with importing libraries and dataset

  • lines, 2–5, imported Tensorflow, Keras, Numpy and Matplotlib
  • line 8, load the data CIFAR10 in cifar variable
  • line 9, store the data in two sets as the dataset is loaded in two sets by calling cifar.load_data()
  • line…

Tensorflow 2.0 final release was announced on September 30, 2019. It is Google’s open-source AI framework for machine learning and high-performance numerical computation. It supports many classification and regression algorithms and more generally, deep learning and neural networks.

What’s new in Tensorflow2.0

  • Powerful experimentation
  • Robust model Deployment in production
  • No need to create a session
  • Low-level API supports
  • 3 times faster training performance

Let’s Start with Fashion Mnist Dataset

Fashion-MNIST is a dataset of Zalando’s article images consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated…

Personalized Medicine: Redefining Cancer Treatment

Machine Learning is tough and you can only get confidence by doing hands-on it. In this blog, I am using Personalized Medicine: Redefining Cancer Treatment dataset from Kaggle to give some glimpse of Machine Learning. It was a competition organized in 2017 by NIPS.

Overview of Dataset: Dataset has 9 different classes that we have to predict. Both training and test, datasets are provided into two different files. One (training/test_variants) provides information about the genetic mutation, whereas other (training/test_text) provides the clinical evidence(text) that human experts used to classify the genetic mutations.

File Descriptions:

  • training_variants

Cofusion matrix is used to measure the performance of the classification model. Checking our model performance by accuracy sometimes it’s misleading when we have imbalanced data. You can read more about accuracy here.

So what is a Confusion matrix?

It is performance matrics to measure classification models where output is binary or multiclass. It has a table of 4 different combinations.

There are two things to noticed in the above image

  • Predicted values- Values that are predicted by the model.
  • Actual Value- Values that are actually in a dataset.

Here, we are taking binary classification for understanding the model. Positive…

In a machine learning domain performance is one of the measure things that we want to know how our model is performing. There are many techniques to measure the performance of the model. Today we will discuss Accuracy.

Accuracy is defined as the correctly classified points by a total no of points on the test set.

Accuracy = #correctly classified points / Total no of points in testset

Suppose we have 1000 data points in which 600 are positive and 400 are negative. …

In Machine Learning, Most of the time we deal with an imbalanced dataset. Before knowing how to deal with it let’s see what is it.

We have two types of dataset Balanced Dataset and Imbalanced Dataset. Suppose we have 2 class classifier n1 and n2 and n1 is +ve points and n2 -ve points.


n = n1 +n2

