INSTRUCTOR LED ONLINE LIVE SESSIONS

Big Data Specialization

Course Brief

How big is BIG? Become a big data expert through an intensive training program customised across various levels designed specifically for you. It will make participants solve real-time problems with huge datasets. Through this intensive program we aim to train the participants in a way that they are prepared to appear for International Certifications as mentioned below:

  • Hortonworks Data Platform Certified Developer:Java (HDPCD:Java)
  • Hortonworks Data Platform Certified Developer :Spark(HDPCD:Spark)
  • Hortonworks Data Platform Certified Developer (HDPCD)
  • CCA Spark from Cloudera

Course Structure:

There are 4 modules in BigData Specialization.

  • DataScience with R
  • Hadoop with MapReduce using Java
  • Hadoop Ecosystem (Sqoop, Flume, Pig, Hive)
  • Spark with Scala

Introduction- Definition - DS in various fields - Examples - Impact of Data Science - Major Activities - Toolkit - Data Scientist - Compare with others - Data Science Team

    Learning Outcomes:

    • Understanding Data Science and related fields
    • Be able to identify major activities of data science for the given problem
    • Understanding role of Data Scientist and how it differs from a data engineer and a data analyst.
    • Be able to choose deployment model for organization
    • Understand how to create a data science team.

Introduction to R : What is R - Data Science with other languages - Features of R - Environment - R at a glance.

Basics of R(Series & Ctrl Statements): Assignment - Modes - Operators - special numbers - Logical values - Basic Functions - Generating data sets - Control Structures.

Vectors:Definition- Declaration - Generating - Indexing - Naming - Adding & Removing elements - Operations on Vectors - Recycling - Special Operators - Functions for vectors - Missing values - NULL values - Filtering & subsetting. Exercises.

    Learning Outcomes:

    • Understand the how R differs from other languages
    • Be able to write R scripts for given problem
    • Be able to generate series
    • Be able to handle data in vectors and get required results from given data sets

Descriptive Statistics: Introduction - Descriptive Statistics - Central Tendency - Variability - Mean - Median - Range - Variance - Summary-Exercises.

Graphics : : Introduction - Types - Packages - Basic graph - Histograms - Stem Leaf Graph - Box Plots - Scatter Plots - Bar Plots.

    Learning Outcomes:

    •  Understand the importance of statistics, types and statistics in real world
    • Be able to find the central tendency, summary of given data sets
    •  Understand the importance of graphical output and various graphs
    • Be able to plot various graphs for the given data set
    • Implement Descriptive Statistics in R

Arrays: Creating Arrays - Dimensions & Naming - Indexing & Naming - Functions on Arrays.

Matrices : Creating Matrices - Adding rows/columns - Removing rows/columns - Reshaping - Operations - Special functions.

Lists: Creating - Naming - Accessing elements - Adding - Removing - Special Functions - Recursive Lists.

Data frames: Creating - Naming - Accessing - Adding - Removing - Special functions - Merging Exercises.

Functions: Creating - Functions on Function Object - Scope of Variables - Accessing Global Environment - Closures - Recursion - Creating New Binary Operator.

    Learning Outcomes:

    • Understand various data structures in R
    • Be able to choose suitable data structure for the given data set
    • Be able to retrieve the required result from the given data set
    • Be able to solve the problems by creating functions
    • Be able to merge and split the data sets
    • Be able to apply statistics on various data structures

Linear Regression: : Inferential Statistics - Types of Learning - Linear Regression- Simple Linear Regression - Coefficients - Confidence Interval - RSE - R2 - Implementation in R - lm - functions on lm - predict - Plotting - fitting regression line Exercises.

Multiple Linear Regression: Introduction- comparison with simple linear regression - Correlation Matrix - F Statistic - Response vs Predictors - Deciding important variable - Model fit - Predictions.

Generating a model - Interactive terms - Non Linear Transformations - Anova - lm with polynomial Exercises.

Classification & Logistic Regression : Classification - Examples - Logistic Regression Definition - Estimating coefficients - Predictions - Multiple Logistic Regression - More than 2 response classes - Implementation in R - glm - predict Exercises.

    Learning Outcomes:

    • Understand Inferential Statistics, types and regression concepts
    • Understand the population and sample for the given data set
    • Be able to understand how to fit a model for the given data set
    • Be able to find the relation between response and predictors
    • Be able to predict the values for given data set based on sample data set

Mr. P.V.N.Balarama Murthy
Data Science with R

Mr. P.V.N.Balarama Murthy, is an M.Tech(CSE) having over 10 years of teaching and technical training experience. He is specialist in Data Science and Bigdata. He has experience in deploying hadoop clusters. As technical trainer, he has trained a number of people in C,C++, Java, Oracle, Hadoop (Administration, Development with MR, PIG, Hive, Flume, Sqoop) and Data Science with R. He has guided to his credit 15+ students to get Hortonworks certifications for Hadoop.

A dedicated, resourceful and result oriented instructor that he is, it is helping shape up careers of students.

Ms. Jyothi SanjeevaMani
Data Science with R

Ms. Jyothi SanjeevaMani has over 15 years of satisfying teaching and technical training experience. She is a Research Scholar of Big Data Analytics from a reputed university. As a technical trainer she trained many students in industry oriented subjects like C, C++, Java, MySQL, Oracle (SQL, PL/SQL), Python, Linux, Openstack, BigData - Hadoop(MapReduce, Pig, Hive, Sqoop, Flume), Data Science with both Python and R.

She is an Asst.Professor with the Department of IT at The Keshav Memorial Institute of Technology (KMIT).

She is a dedicated, resourceful and a result oriented instructor, who strives to help students change marginal grades into good grades.

    Introduction Data, Storage, Bigdata, Distributed environment, Hadoop introduction History, Environment, Benefits - Subprojects HDFS, Map-Reduce, PIG, Hbase, Hive, Zoo-Keeper, SQOOP, Mahout, MongoDB, Hadoop DB.

    Learning Outcomes:

    • Understand big data, challenges, distributed environment
    • Know hadoop and sub projects

    Hadoop Architecture : Overall Architecture-NameNode - Datanode Fault Tolerance - Read&Write operations - Interfaces(Command line interface, JSP, API) - HDFS Shell - FS Shell Commands - Java API Programs.

    Learning Outcomes:

    • Acquire knowledge of HDFS components , Namenode, Datanode
    • Acquire knowledge of storing and maintaining data in cluster, reading and writing data to/from cluster
    • Be able to maintain files in HDFS
    • Be able to access data from HDFS through java program

    Map-Reduce Introduction - Map-Reduce Architecture - Yarn Architecture - Basic M-R Programs - Detailed description of M-R Methods and exercises.

    Learning Outcomes:

    • Understand Map-Reduce paradigm and Yarn Architecture
    • Analyze a given problem in map-reduce pattern
    • Be able to write Basic Map-Reduce Programs

    Rkey/value pairs - Different types of values from a mapper - GenericWritable - Custom values from mapper - Writable - Custom keys from Mapper - WritableComparable - Exercises.

    Learning Outcomes:

    • Understand the key-value pairs from map to reduce
    • Be able to design applications with custom value types
    • Be able to design applications with custom key types
    • Applications with Generic writable

    Input format - FileInputFormat - Steps for Input - RecordReader - Custom FileInputFormat - Custom RecordReader - Exercise Output format - FileOutputFormat - RecordWriter - Custom FileOutputFormat -Custom RecordWriter.

    Learning Outcomes:

    • Understand the input and output formats of map-reduce application
    • Be able to read different formats of files into map-reduce application
    • Be able to produce different formats of files from map-reduce application

    Joins- various types - Reduce Side joins - Distributed Cache - Map-Side Join - Exercises.

    Learning Outcomes:

    • Be able to take data from multiple data sets and join them
    • Be able to implement various joins in Map-Reduce
    • Be able to design applications with map-side joins
    • Be able to design application with reduce side join
    • Be able to use distributed cache

Mr. P.V.N.Balarama Murthy
Hadoop Map Reduce and Hadoop Ecosystem

Mr. P.V.N.Balarama Murthy, is an M.Tech(CSE) having over 10 years of teaching and technical training experience. He is specialist in Data Science and Bigdata. He has experience in deploying hadoop clusters. As technical trainer, he has trained a number of people in C,C++, Java, Oracle, Hadoop (Administration, Development with MR, PIG, Hive, Flume, Sqoop) and Data Science with R. He has guided to his credit 15+ students to get Hortonworks certifications for Hadoop.

A dedicated, resourceful and result oriented instructor that he is, it is helping shape up careers of students.

Ms. Jyothi SanjeevaMani
Hadoop Map Reduce

Ms. Jyothi SanjeevaMani has over 15 years of satisfying teaching and technical training experience. She is a Research Scholar of Big Data Analytics from a reputed university. As a technical trainer she trained many students in industry oriented subjects like C, C++, Java, MySQL, Oracle (SQL, PL/SQL), Python, Linux, Openstack, BigData - Hadoop(MapReduce, Pig, Hive, Sqoop, Flume), Data Science with both Python and R.

She is an Asst.Professor with the Department of IT at The Keshav Memorial Institute of Technology (KMIT).

She is a dedicated, resourceful and a result oriented instructor, who strives to help students change marginal grades into good grades.

    Introduction - types of Data Ingestion - Ingesting Batch Data - Ingesting Streaming Data - Examples

    Learning Outcomes:

    • Understanding Data Ingestion.

    Introduction - Sqoop Architecture - Connect to MySQL database - Sqoop - Import - Export - Eval - Joins - exercises.

    Learning Outcomes:

    • Understand Sqoop architecture and uses
    • Able to load real-time data from an RDBMS table/Query on to HDFS
    • Able to write sqoop scripts for exporting data from HDFS onto RDMS tables.

    Introduction - Flume Architecture - Flume master - Flume Agents - Flume Collectors - creation of Flume configuration files - Examples - Exercises

    Learning Outcomes:

    • Understand Flume architecture and uses
    • Able to create flume configuration files to stream and ingest data onto HDFS

    Introduction-Pig Data Flow Engine-Map Reduce Vs. Pig - Data Types-Basic Pig Programming-Modes of execution in PIG-Miscellaneous Commands - Group, Filter, Join, Order, Flatten, cogroup, Flatten, Illustrate, Explain - Parameter substitution- creating simple UDFs in Pig-Examples-Exercises.

    Learning Outcomes:

    • Understand Apache PIG , PIG Data Flow Engine
    • Understand data types, data model, and modes of execution.
    • Able to store the data from a Pig relation on to HDFS.
    • Able to load data into Pig Relation with or without schema.
    • Able to split, join, filter, and transform the data using pig operators
    • Able to write pig scripts and work with UDFs.

    Descriptive Statistics: Introduction - Descriptive Statistics - Central Tendency - Variability - Mean - Median - Range - Variance - Summary Exercises
    Graphics : Introduction - Types - Packages - Basic graph - Histograms - Stem Leaf Graph - Box Plots - Scatter Plots - Bar Plots.

    Learning Outcomes:

    • Understand the importance of Hive, Hive Architecture
    • Able to find the central tendency, summary of given data sets.
    • Understand the importance of graphical output and various graphs.
    • Able to plot various graphs for the given data set.
    • Implement Descriptive Statistics in R.

Mr. P.V.N.Balarama Murthy
Hadoop Map Reduce and Hadoop Ecosystem

Mr. P.V.N.Balarama Murthy, is an M.Tech(CSE) having over 10 years of teaching and technical training experience. He is specialist in Data Science and Bigdata. He has experience in deploying hadoop clusters. As technical trainer, he has trained a number of people in C,C++, Java, Oracle, Hadoop (Administration, Development with MR, PIG, Hive, Flume, Sqoop) and Data Science with R. He has guided to his credit 15+ students to get Hortonworks certifications for Hadoop.

A dedicated, resourceful and result oriented instructor that he is, it is helping shape up careers of students.

Ms. Jyothi SanjeevaMani
Hadoop Ecosystem

Ms. Jyothi SanjeevaMani has over 15 years of satisfying teaching and technical training experience. She is a Research Scholar of Big Data Analytics from a reputed university. As a technical trainer she trained many students in industry oriented subjects like C, C++, Java, MySQL, Oracle (SQL, PL/SQL), Python, Linux, Openstack, BigData - Hadoop(MapReduce, Pig, Hive, Sqoop, Flume), Data Science with both Python and R.

She is an Asst.Professor with the Department of IT at The Keshav Memorial Institute of Technology (KMIT).

She is a dedicated, resourceful and a result oriented instructor, who strives to help students change marginal grades into good grades.

    Introduction- Data types - variables - Control Structures-strings-classes-methods-objects

    Learning Outcomes:

    • Understanding Scala basics able to write simple scripts

    Traits, mixins, packages, lists, sets, maps, tuples

    Learning Outcomes:

    Understand the packaging, traits, collections, functional programming with scala. Able to write scala scripts for given problem.

    Introduction - Motivation - Importance - Architecture - Interfaces - Basic Programs

    Learning Outcomes:

    Understand the overall concept of spark and able to write basic spark commands and small programs.

    Introduction - Concept - Creating RDD - Loading Data from LFS & HDFS - Operations - Transformations - Actions - Persistence

    Learning Outcomes:

    Participant is able to handle various types of data sets and large files. Implement scripts on various RDD operations.

    Introduction & Concept - Dataframe - Operations on Dataframes - SQL Application - Hive from Spark SQL - Reading & Writing from Hive Tables.

    Learning Outcomes:

    Expert level handling of data sets with SQL commands Implement scripts for complex data handling operations.

Mr. P.V.N.Balarama Murthy
Spark with Scala

Mr. P.V.N.Balarama Murthy, is an M.Tech(CSE) having over 10 years of teaching and technical training experience. He is specialist in Data Science and Bigdata. He has experience in deploying hadoop clusters. As technical trainer, he has trained a number of people in C,C++, Java, Oracle, Hadoop (Administration, Development with MR, PIG, Hive, Flume, Sqoop) and Data Science with R. He has guided to his credit 15+ students to get Hortonworks certifications for Hadoop.

A dedicated, resourceful and result oriented instructor that he is, it is helping shape up careers of students.

Ms. Jyothi SanjeevaMani
Spark with Scala

Ms. Jyothi SanjeevaMani has over 15 years of satisfying teaching and technical training experience. She is a Research Scholar of Big Data Analytics from a reputed university. As a technical trainer she trained many students in industry oriented subjects like C, C++, Java, MySQL, Oracle (SQL, PL/SQL), Python, Linux, Openstack, BigData - Hadoop(MapReduce, Pig, Hive, Sqoop, Flume), Data Science with both Python and R.

She is an Asst.Professor with the Department of IT at The Keshav Memorial Institute of Technology (KMIT).

She is a dedicated, resourceful and a result oriented instructor, who strives to help students change marginal grades into good grades.

  • Are there any pre requisites for learning Hadoop?

    Basic knowledge of Java and Mysql will help.

  • How long does it take to complete this Specialization?

    Most learners are able to complete the Specialization in about four and a half months.

  • Do I need to take the courses in a specific order?

    We recommend taking the courses in the order presented, as each subsequent course will build on material from previous courses.

  • What will I be able to do upon completing this Specialization?

    Upon completion as a Big Data Specialization you will be able to help a company with the following:

    • Hadoop Certification

      This specialization will unlock great career opportunities as a Hadoop developer. Become a Hadoop expert by learning concepts like Pig, Hive, Flume and Sqoop. Get industry-ready with some of the best Big Data projects and real-life use-cases.

    • Cost reduction

      Big data technologies like Hadoop and cloud-based analytics provide substantial cost advantages.

    • Faster, better decision making
    • Analytics has always involved attempts to improve decision making, and big data doesn't change that. Large organizations are seeking both faster and better decisions with big data, and they're finding them. Driven by the speed of Hadoop and in-memory analytics, several companies are now focused on speeding up existing decisions.

    • New products and services

      Perhaps the most interesting use of big data analytics is to create new products and services for customers. Online companies have done this for a decade or so, but now predominantly offline firms are doing it too.

  • What are the payment options?

    You can pay by Credit Card, Debit Card or Net Banking from all the leading banks. We use a Payment Gateway.

  • Are there any pre requisites for R ?

    No requirements are needed to learn R. The only knowledge that needed to learn R is basic statistical knowledge.

  • Why you should learn R first for data science

    • R is becoming the lingua franca for data science. That's not to say that it's the only language, or that it's the best tool for every job. It is, however, the most widely used and it is rising in popularity.
    • Beyond tech giants like Google, Facebook, and Microsoft, R is widely in use at a wide range of companies including Bank of America, Ford, TechCrunch, Uber, and Trulia.
    • R is popular in academia: R isn't just a tool for industry. It is also very popular among academic scientists and researchers.
    • Learning the "skills of data science" is easiest in R. To do this, you'll need to master the 3 core skill areas of data science: data manipulation, data visualization, and machine learning. Mastering these skill areas will be easier in R than almost any other language.

  • What are the payment options?

    You can pay by Credit Card, Debit Card or Net Banking from all the leading banks. We use a Payment Gateway.

  • Are there any pre requisites for Hadoop Ecosystem?

    Basic knowledge of Java will help.

  • What will I be able to do upon completing Hadoop?

    This specialization will unlock great career opportunities as a Hadoop developer. Become a Hadoop expert by learning concepts like Pig, Hive, Flume and Sqoop. Get industry-ready with some of the best Big Data projects and real-life use-cases.

  • Do you provide placement assistance?

    Teleuniv is associated with Keshav Memorial Institute of Technology, one among the top performing colleges in Hyderabad and hence lot of recruitment firms contacts us for our students profiles from time to time. Since there is a big demand for this skill, we help our certified students get connected to prospective employers. Having said that, please understand that we don't guarantee any placements however if you go through the course diligently and complete the assignments and exercises you will have a very good chance of getting a job.

Login to your Teleuniv account!

Don't have an account? Sign up

Sign up and start learning!

Show

By signing up, you agree to our Terms of Use and Privacy Policy.

Already have an account? Login

Forgot Password?

Lost your password? Please enter your email address. You will receive a link to create a new password.

Error message here!

or Back to login