Monday, 31 March 2014

R training course outline

This program is designed for professionals who want to solve real-world business problems using complex structured and unstructured data with an emphasis on the importance of asking more meaningful research and business questions while effectively communicating findings.

We offer the blend of face-to-face week end classes and online courses .

Audience:

Just passed out college graduates
Business Intelligence professionals
Professionals aspiring to take up Data Scientist certification
Big Data Practitioners & Analytics Engineers
Business and Data Analysts
Analytics Project managers

Pre requisites : No Statistics or Java experience is required. Basics statistics knowledge is preferable but
                          not mandatory

Duration/Training/Fee related queries pl contact 9840014739

Module 1

Big data & Data Science Introduction
    Introduction to Big Data &Analytics
    Hadoop eco system 
    Big Data use cases
    Data Science Overview
    Role of Data Scientist
    Hadoop Architecture

Module 2

Data Acquisition
(i)                  HDFS shell commands
(ii)                Install & configure Hadoop Sqoop, Flume & PIG tools
(iii)               Export and import data using Sqoop. Extract the data from web logs using Flume.
(iv)              Extract, Transform and Load data from web logs and databases into Hadoop Cluster using Sqoop, PIG and flume.
Module 3

Data Evaluation & Transformation
(i)                  Working with various file formats including binary files, JSON, XML and csv
(ii)                Evaluating data using various tools
(iii)               Understanding of Data sets sampling and filtering
(iv)              Writing Map only Hadoop jobs
(v)                Joining data sets
(vi)              Write records into new formats such as SequenceFileOutputFormat and AvroOutputFormat
Module 4

Statistics Level 1
Introduction to Statistics
Samples & Populations 
Statistics Basics
Sampling Concepts
Sample Selection methods
Presenting Categorical variables in Chart using R
Presenting Numerical variables in Chart using R

Descriptive Statistics
Measuring the Central Tendency using R
Measuring Spread – quartiles and 5-number summary
Measures of Position
Measures of Variation
Statistics Level 2
Inferential Statistics
Probability
Rules of Probabilities, Assigning Probabilities
Probability Distributions,
Binomial and Poisson Probability Distributions

Hypothesis Testing
          Session 1
          Session 2 & 3

Anova & Chi-Square testing
             Session 1
             Session 2
             Session 3 & 4  
Module  5

Introduction to R
R History
Integrating R with Hadoop
Basic Data Types
Vectors
Factors
            Matrix
            List
            Data Frame
Creating Data sets, transformation of data sets using R map-reduce programs
Basic  data manipulation using R

Module 6

Data Science & Machine Learning – Level 1
Data Science fundamentals
Data Science use cases
Machine Learning fundamentals
Types of Machine Learning algorithms
Identify the algorithms appropriate to each model
Building machine learning models using Apache Mahout & R
Supervised machine learning
       Fundamentals of Regression
       Steps for training a set of data in order to
       identify new data based on known data
       Linear Regression  - Forecasting
       Logistic Regression – Forecasting
       Support Vector Machines

Machine Learning – Level 2
Unsupervised machine Learning
        Market Basket Analysis using Association Rules
        Clustering fundamentals and its use cases
        Cluster Analysis
        Decision Trees
        Time Series Analysis
        K-means clustering
Recommendation Algorithms
        Item based
        User based

Text mining
Module 7

Model Optimization
Bagging
Boosting
Random Forests

 Case study
Insurance, Retail , Telecom, BFS domains