Hadoop For Administrators Training Course

Apache Hadoop stands as the premier framework for processing Big Data across server clusters. Over the course of this three-day program—with an optional fourth day—participants will explore the business advantages and practical applications of Hadoop and its surrounding ecosystem. The curriculum covers cluster deployment planning and scalability, alongside the installation, maintenance, monitoring, troubleshooting, and optimization of Hadoop environments. Attendees will also engage in bulk data loading exercises, become acquainted with various Hadoop distributions, and gain hands-on experience in installing and managing ecosystem tools. The course concludes with an in-depth discussion on securing clusters using Kerberos.

“The materials were meticulously prepared and covered comprehensively. The lab sessions were highly beneficial and well-structured.”
— Andrew Nguyen, Principal Integration DW Engineer, Microsoft Online Advertising

Target Audience

Hadoop systems administrators

Delivery Format

A blend of theoretical lectures and practical hands-on labs, with an approximate distribution of 60% lectures and 40% lab work.

This course is available as onsite live training in South Africa or online live training.

Thank you for sending your enquiry! One of our team members will contact you shortly.

Thank you for sending your booking! One of our team members will contact you shortly.

Course Outline

Introduction
- Hadoop history and core concepts
- The Hadoop ecosystem
- Distributions
- High-level architecture
- Common Hadoop myths
- Hadoop challenges (hardware and software)
- Labs: Discussing Big Data projects and associated problems
Planning and installation
- Selecting software and Hadoop distributions
- Cluster sizing and planning for future growth
- Selecting hardware and network infrastructure
- Rack topology
- Installation procedures
- Multi-tenancy
- Directory structures and log management
- Benchmarking
- Labs: Cluster installation and running performance benchmarks
HDFS operations
- Core concepts (horizontal scaling, replication, data locality, rack awareness)
- Nodes and daemons (NameNode, Secondary NameNode, HA Standby NameNode, DataNode)
- Health monitoring
- Administration via command-line and browser interfaces
- Expanding storage and replacing faulty drives
- Labs: Familiarizing oneself with HDFS command lines
Data ingestion
- Using Flume for log and other data ingestion into HDFS
- Utilizing Sqoop for importing data from SQL databases to HDFS, and exporting back to SQL
- Hadoop data warehousing with Hive
- Transferring data between clusters (distcp)
- Leveraging S3 as a complement to HDFS
- Best practices and architectures for data ingestion
- Labs: Setting up and utilizing Flume, along with Sqoop
MapReduce operations and administration
- Parallel computing prior to MapReduce: comparing HPC with Hadoop administration
- MapReduce cluster load management
- Nodes and Daemons (JobTracker, TaskTracker)
- Walk-through of the MapReduce UI
- MapReduce configuration
- Job configuration
- Optimizing MapReduce performance
- Ensuring robustness in MR: guidance for developers
- Labs: Executing MapReduce examples
YARN: a new architecture and enhanced capabilities
- YARN design objectives and implementation architecture
- New actors: ResourceManager, NodeManager, Application Master
- Installing YARN
- Job scheduling under YARN
- Labs: Investigating job scheduling mechanisms
Advanced topics
- Hardware monitoring
- Cluster-wide monitoring
- Adding and removing servers, upgrading Hadoop
- Backup, recovery, and business continuity planning
- Oozie job workflows
- Hadoop high availability (HA)
- Hadoop Federation
- Securing your cluster with Kerberos
- Labs: Setting up monitoring systems
Optional tracks
- Cloudera Manager for cluster administration, monitoring, and routine tasks; installation and usage. In this track, all exercises and labs are conducted within the Cloudera distribution environment (CDH5).
- Ambari for cluster administration, monitoring, and routine tasks; installation and usage. In this track, all exercises and labs are conducted within the Ambari cluster manager and Hortonworks Data Platform (HDP 2.0).

Requirements

Proficiency in basic Linux system administration
Fundamental scripting capabilities

Prior knowledge of Hadoop or Distributed Computing is not mandatory, though these topics will be introduced and explained during the course.

Lab Environment

Zero Installation Required: Students are not required to install Hadoop software on their personal devices. A fully operational Hadoop cluster will be provided for use.

Participants will need to have the following tools available:

An SSH client (Linux and Mac systems include this by default; for Windows users, PuTTY is recommended)
A web browser to access the cluster. We recommend using Firefox with the FoxyProxy extension installed.

21 Hours

Need help picking the right course?
southafrica@nobleprog.co.za or +27 (0)10 005 5793

Testimonials (1)

Hands on exercises. Class should have been 5 days, but the 3 days helped to clear up a lot of questions that I had from working with NiFi already

Hadoop For Administrators Training Course

Target Audience

Delivery Format

Course Outline

Requirements

Lab Environment

Testimonials (1)

James - BHG Financial

Course - Apache NiFi for Administrators

Upcoming Courses

Hadoop For Administrators

Hadoop For Administrators

Hadoop For Administrators

Hadoop For Administrators

Hadoop For Administrators

Hadoop For Administrators

Related Categories

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites

Hadoop For Administrators Training Course

Target Audience

Delivery Format

Course Outline

Requirements

Lab Environment

Testimonials (1)

James - BHG Financial

Course - Apache NiFi for Administrators

Upcoming Courses

Hadoop For Administrators

Hadoop For Administrators

Hadoop For Administrators

Hadoop For Administrators

Hadoop For Administrators

Hadoop For Administrators

Related Courses

Infomatica with Big Data (BDM)

Apache NiFi for Administrators

Apache NiFi for Developers

Python, Spark, and Hadoop for Big Data

Related Categories

Hadoop

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites