SmartInternz

Objectives of the FDP

  1. To understand the need and complexities in handling analysis of large volumes of data.
  2. To provide a basic understanding of the concepts and methods for implementing machine learning.
  3. To provide an in-depth understanding of various machine learning algorithms.
  4. Understand the concepts of Big Data Analytics.
  5. To gain hands on experience in machine learning using PySpark.
  6. To develop predictive models against real world business scenarios using PySpark.

Learning Outcomes

Upon completion of the faculty development programme, a participant will be able to:

  1. Understand the importance of machine learning in data analysis and modeling.
  2. Get exposure to a variety of machine learning algorithms.
  3. Understand the process of automation with machine learning using Spark.
  4. Gain experience in doing independent study and research on real world big data problems using scalable machine learning practices.

Speaker

Mr.P.Mohan

Sr.Data Scientist @ Tech Mahindra, Hyderabad.

Rich Industrial Experience in the areas of applied Machine Learning, Artificial Intelligence and Big Data Analytics
Project domains including Healthcare, Banking, Manufacturing, Retail etc.

Modules to be covered

The six day faculty development programme emphasizes on discussing various concepts from fundamentals of machine learning and big data analytics to development and deployment of predictive data models using Spark. Brief discussions include:

Day Topics
Day-1

Introduction to Machine Learning

  1. Types of Machine Learning
  2. Understanding Math for Machine Learning

Understanding Big Data Analytics

  1. A quick overview of Hadoop environment

Getting started with Spark

  1. Understanding Spark Programming model
  2. Spark Clusters and dataframes
  3. Machine Learning algorithms by Spark
  4. Spark vs Hadoop

Designing a Machine Learning System

  1. Business use cases on Customer Segmentation, Personalization
  2. Data cleansing

Preprocessing and Preparing Data with Spark

  1. Exploring and visualizing data
  2. Data processing and transformation
  3. Feature Extraction
Day-2

Building Classification Models with Spark

  1. Introduction to types of Classification Models
  2. Logistic Regression
  3. Decision Trees
  4. Random Forest Classifier
  5. Evaluating performance of models
  6. Improving model performance and hyper parameter tuning
    1. Understanding Accuracy and Prediction
    2. Precision and Recall
    3. Working with ROC curves and AUC
Day-3

Developing Regression Models with Spark

  1. Linear Regression
  2. Multiple Linear Regression
  3. Feature Engineering
  4. Model evaluation
    1. Understanding Mean Squared Error and Root Mean Square Error
    2. Mean Absolute Error and R-squared coefficient
  5. Improving model performance and hyper parameter tuning
Day-4

Developing a Clustering Model with Spark

  1. Introduction to types of clustering
  2. K-Means clustering
  3. Feature Engineering
  4. Hierarchical clustering
  5. Evaluating model performance
    1. Internal and External evaluation metrics
  6. Improving model performance
Day-5

Building a Recommendation Engine with Spark

  1. Content based filtering
  2. Collaborative filtering
  3. Training and using a recommendation model
  4. Evaluating performance of recommendation models
    1. Working with Mean Squared Error
    2. Mean average precision at K
Day-6

Natural Language Processing

  1. Introduction
  2. NLP Framework-Tokenization, Stemming, Count Vectorization ,TF-IDF
  3. Sentiment Analysis using NLP

Registrations