Big Data Training Courses in Egypt

Big Data Training Courses

Online or onsite, instructor-led live Big Data training courses start with an introduction to elemental concepts of Big Data, then progress into the programming languages and methodologies used to perform Data Analysis. Tools and infrastructure for enabling Big Data storage, Distributed Processing, and Scalability are discussed, compared and implemented in demo practice sessions.

Big Data training is available as "online live training" or "onsite live training". Online live training (aka "remote live training") is carried out by way of an interactive, remote desktop. Onsite live Big Data trainings in Egypt can be carried out locally on customer premises or in NobleProg corporate training centers.

NobleProg -- Your Local Training Provider

Testimonials

★★★★★
★★★★★

Big Data Course Outlines in Egypt

Course Name
Duration
Overview
Course Name
Duration
Overview
35 hours
Overview
Advances in technologies and the increasing amount of information are transforming how business is conducted in many industries, including government. Government data generation and digital archiving rates are on the rise due to the rapid growth of mobile devices and applications, smart sensors and devices, cloud computing solutions, and citizen-facing portals. As digital information expands and becomes more complex, information management, processing, storage, security, and disposition become more complex as well. New capture, search, discovery, and analysis tools are helping organizations gain insights from their unstructured data. The government market is at a tipping point, realizing that information is a strategic asset, and government needs to protect, leverage, and analyze both structured and unstructured information to better serve and meet mission requirements. As government leaders strive to evolve data-driven organizations to successfully accomplish mission, they are laying the groundwork to correlate dependencies across events, people, processes, and information.

High-value government solutions will be created from a mashup of the most disruptive technologies:

- Mobile devices and applications
- Cloud services
- Social business technologies and networking
- Big Data and analytics

IDC predicts that by 2020, the IT industry will reach $5 trillion, approximately $1.7 trillion larger than today, and that 80% of the industry's growth will be driven by these 3rd Platform technologies. In the long term, these technologies will be key tools for dealing with the complexity of increased digital information. Big Data is one of the intelligent industry solutions and allows government to make better decisions by taking action based on patterns revealed by analyzing large volumes of data — related and unrelated, structured and unstructured.

But accomplishing these feats takes far more than simply accumulating massive quantities of data.“Making sense of thesevolumes of Big Datarequires cutting-edge tools and technologies that can analyze and extract useful knowledge from vast and diverse streams of information,” Tom Kalil and Fen Zhao of the White House Office of Science and Technology Policy wrote in a post on the OSTP Blog.

The White House took a step toward helping agencies find these technologies when it established the National Big Data Research and Development Initiative in 2012. The initiative included more than $200 million to make the most of the explosion of Big Data and the tools needed to analyze it.

The challenges that Big Data poses are nearly as daunting as its promise is encouraging. Storing data efficiently is one of these challenges. As always, budgets are tight, so agencies must minimize the per-megabyte price of storage and keep the data within easy access so that users can get it when they want it and how they need it. Backing up massive quantities of data heightens the challenge.

Analyzing the data effectively is another major challenge. Many agencies employ commercial tools that enable them to sift through the mountains of data, spotting trends that can help them operate more efficiently. (A recent study by MeriTalk found that federal IT executives think Big Data could help agencies save more than $500 billion while also fulfilling mission objectives.).

Custom-developed Big Data tools also are allowing agencies to address the need to analyze their data. For example, the Oak Ridge National Laboratory’s Computational Data Analytics Group has made its Piranha data analytics system available to other agencies. The system has helped medical researchers find a link that can alert doctors to aortic aneurysms before they strike. It’s also used for more mundane tasks, such as sifting through résumés to connect job candidates with hiring managers.
21 hours
Overview
Apache Spark's learning curve is slowly increasing at the begining, it needs a lot of effort to get the first return. This course aims to jump through the first tough part. After taking this course the participants will understand the basics of Apache Spark , they will clearly differentiate RDD from DataFrame, they will learn Python and Scala API, they will understand executors and tasks, etc. Also following the best practices, this course strongly focuses on cloud deployment, Databricks and AWS. The students will also understand the differences between AWS EMR and AWS Glue, one of the lastest Spark service of AWS.

AUDIENCE:

Data Engineer, DevOps, Data Scientist
14 hours
Overview
Apache SolrCloud is a distributed data processing engine that facilitates the searching and indexing of files on a distributed network.

In this instructor-led, live training, participants will learn how to set up a SolrCloud instance on Amazon AWS.

By the end of this training, participants will be able to:

- Understand SolCloud's features and how they compare to those of conventional master-slave clusters
- Configure a SolCloud centralized cluster
- Automate processes such as communicating with shards, adding documents to the shards, etc.
- Use Zookeeper in conjunction with SolrCloud to further automate processes
- Use the interface to manage error reporting
- Load balance a SolrCloud installation
- Configure SolrCloud for continuous processing and fail-over

Audience

- Solr Developers
- Project Managers
- System Administrators
- Search Analysts

Format of the course

- Part lecture, part discussion, exercises and heavy hands-on practice
14 hours
Overview
ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
21 hours
Overview
In this instructor-led, live training in Egypt, participants will learn how to use Python and Spark together to analyze big data as they work on hands-on exercises.

By the end of this training, participants will be able to:

- Learn how to use Spark with Python to analyze Big Data.
- Work on exercises that mimic real world cases.
- Use different tools and techniques for big data analysis using PySpark.
28 hours
Overview
In this instructor-led, live training in Egypt, participants will learn about the technology offerings and implementation approaches for processing graph data. The aim is to identify real-world objects, their characteristics and relationships, then model these relationships and process them as data using a Graph Computing (also known as Graph Analytics) approach. We start with a broad overview and narrow in on specific tools as we step through a series of case studies, hands-on exercises and live deployments.

By the end of this training, participants will be able to:

- Understand how graph data is persisted and traversed.
- Select the best framework for a given task (from graph databases to batch processing frameworks.)
- Implement Hadoop, Spark, GraphX and Pregel to carry out graph computing across many machines in parallel.
- View real-world big data problems in terms of graphs, processes and traversals.
21 hours
Overview
This course is aimed at developers and data scientists who wish to understand and implement AI within their applications. Special focus is given to Data Analysis, Distributed AI and NLP.
35 hours
Overview
MLlib is Spark’s machine learning (ML) library. Its goal is to make practical machine learning scalable and easy. It consists of common learning algorithms and utilities, including classification, regression, clustering, collaborative filtering, dimensionality reduction, as well as lower-level optimization primitives and higher-level pipeline APIs.

It divides into two packages:

-

spark.mllib contains the original API built on top of RDDs.

-

spark.ml provides higher-level API built on top of DataFrames for constructing ML pipelines.

Audience

This course is directed at engineers and developers seeking to utilize a built in Machine Library for Apache Spark
21 hours
Overview
This instructor-led, live training in Egypt (online or onsite) is aimed at developers who wish to carry out big data analysis using Apache Spark in their .NET applications.

By the end of this training, participants will be able to:

- Install and configure Apache Spark.
- Understand how .NET implements Spark APIs so that they can be accessed from a .NET application.
- Develop data processing applications using C# or F#, capable of handling data sets whose size is measured in terabytes and pedabytes.
- Develop machine learning features for a .NET application using Apache Spark capabilities.
- Carry out exploratory analysis using SQL queries on big data sets.
21 hours
Overview
This instructor-led, live training in Egypt (online or onsite) is aimed at engineers who wish to set up and deploy Apache Spark system for processing very large amounts of data.

By the end of this training, participants will be able to:

- Install and configure Apache Spark.
- Quickly process and analyze very large data sets.
- Understand the difference between Apache Spark and Hadoop MapReduce and when to use which.
- Integrate Apache Spark with other machine learning tools.
14 hours
Overview
This instructor-led, live training in Egypt (online or onsite) is aimed at data scientists who wish to use the SMACK stack to build data processing platforms for big data solutions.

By the end of this training, participants will be able to:

- Implement a data pipeline architecture for processing big data.
- Develop a cluster infrastructure with Apache Mesos and Docker.
- Analyze data with Spark and Scala.
- Manage unstructured data with Apache Cassandra.
21 hours
Overview
This instructor-led, live training in Egypt (online or onsite) is aimed at software engineers who wish to stream big data with Spark Streaming and Scala.

By the end of this training, participants will be able to:

- Create Spark applications with the Scala programming language.
- Use Spark Streaming to process continuous streams of data.
- Process streams of real-time data with Spark Streaming.
21 hours
Overview
In this instructor-led, live training in Egypt (onsite or remote), participants will learn how to set up and integrate different Stream Processing frameworks with existing big data storage systems and related software applications and microservices.

By the end of this training, participants will be able to:

- Install and configure different Stream Processing frameworks, such as Spark Streaming and Kafka Streaming.
- Understand and select the most appropriate framework for the job.
- Process of data continuously, concurrently, and in a record-by-record fashion.
- Integrate Stream Processing solutions with existing databases, data warehouses, data lakes, etc.
- Integrate the most appropriate stream processing library with enterprise applications and microservices.
21 hours
Overview
Teradata is one of the popular Relational Database Management System. It is mainly suitable for building large scale data warehousing applications. Teradata achieves this by the concept of parallelism.

This course introduces the delegates to Teradata.
7 hours
Overview
Spark SQL is Apache Spark's module for working with structured and unstructured data. Spark SQL provides information about the structure of the data as well as the computation being performed. This information can be used to perform optimizations. Two common uses for Spark SQL are:
- to execute SQL queries.
- to read data from an existing Hive installation.

In this instructor-led, live training (onsite or remote), participants will learn how to analyze various types of data sets using Spark SQL.

By the end of this training, participants will be able to:

- Install and configure Spark SQL.
- Perform data analysis using Spark SQL.
- Query data sets in different formats.
- Visualize data and query results.

Format of the Course

- Interactive lecture and discussion.
- Lots of exercises and practice.
- Hands-on implementation in a live-lab environment.

Course Customization Options

- To request a customized training for this course, please contact us to arrange.
14 hours
Overview
Magellan is an open-source distributed execution engine for geospatial analytics on big data. Implemented on top of Apache Spark, it extends Spark SQL and provides a relational abstraction for geospatial analytics.

This instructor-led, live training introduces the concepts and approaches for implementing geospacial analytics and walks participants through the creation of a predictive analysis application using Magellan on Spark.

By the end of this training, participants will be able to:

- Efficiently query, parse and join geospatial datasets at scale
- Implement geospatial data in business intelligence and predictive analytics applications
- Use spatial context to extend the capabilities of mobile devices, sensors, logs, and wearables

Format of the Course

- Interactive lecture and discussion.
- Lots of exercises and practice.
- Hands-on implementation in a live-lab environment.

Course Customization Options

- To request a customized training for this course, please contact us to arrange.
21 hours
Overview
OBJECTIVE:

This course will introduce Apache Spark. The students will learn how Spark fits into the Big Data ecosystem, and how to use Spark for data analysis. The course covers Spark shell for interactive data analysis, Spark internals, Spark APIs, Spark SQL, Spark streaming, and machine learning and graphX.

AUDIENCE :

Developers / Data Analysts
21 hours
Overview
Knowledge discovery in databases (KDD) is the process of discovering useful knowledge from a collection of data. Real-life applications for this data mining technique include marketing, fraud detection, telecommunication and manufacturing.

In this instructor-led, live course, we introduce the processes involved in KDD and carry out a series of exercises to practice the implementation of those processes.

Audience

- Data analysts or anyone interested in learning how to interpret data to solve problems

Format of the Course

- After a theoretical discussion of KDD, the instructor will present real-life cases which call for the application of KDD to solve a problem. Participants will prepare, select and cleanse sample data sets and use their prior knowledge about the data to propose solutions based on the results of their observations.
28 hours
Overview
Pentaho Open Source BI Suite Community Edition (CE) is a business intelligence package that provides data integration, reporting, dashboards, and load capabilities.

In this instructor-led, live training, participants will learn how to maximize the features of Pentaho Open Source BI Suite Community Edition (CE).

By the end of this training, participants will be able to:

- Install and configure Pentaho Open Source BI Suite Community Edition (CE)
- Understand the fundamentals of Pentaho CE tools and their features
- Build reports using Pentaho CE
- Integrate third party data into Pentaho CE
- Work with big data and analytics in Pentaho CE

Audience

- Programmers
- BI Developers

Format of the course

- Part lecture, part discussion, exercises and heavy hands-on practice

Note

- To request a customized training for this course, please contact us to arrange.
21 hours
Overview
Pentaho Data Integration is an open-source data integration tool for defining jobs and data transformations.

In this instructor-led, live training, participants will learn how to use Pentaho Data Integration's powerful ETL capabilities and rich GUI to manage an entire big data lifecycle and maximize the value of data within their organization.

By the end of this training, participants will be able to:

- Create, preview, and run basic data transformations containing steps and hops
- Configure and secure the Pentaho Enterprise Repository
- Harness disparate sources of data and generate a single, unified version of the truth in an analytics-ready format.
- Provide results to third-part applications for further processing

Audience

- Data Analyst
- ETL developers

Format of the course

- Part lecture, part discussion, exercises and heavy hands-on practice
14 hours
Overview
This instructor-led, live training in Egypt (online or onsite) is aimed at data scientists who wish to use Excel for data mining.

- By the end of this training, participants will be able to:
- Explore data with Excel to perform data mining and analysis.
- Use Microsoft algorithms for data mining.
- Understand concepts in Excel data mining.
14 hours
Overview
This instructor-led, live training in Egypt (online or onsite) is aimed at data analysts who wish to program with R in SAS for cluster analysis.

By the end of this training, participants will be able to:

- Use cluster analysis for data mining
- Master R syntax for clustering solutions.
- Implement hierarchical and non-hierarchical clustering.
- Make data-driven decisions to help to improve business operations.
35 hours
Overview
KNIME is a free and open-source data analytics, reporting and integration platform. KNIME integrates various components for machine learning and data mining through its modular data pipelining concept. A graphical user interface and use of JDBC allows assembly of nodes blending different data sources, including preprocessing (ETL: Extraction, Transformation, Loading), for modeling, data analysis and visualization without, or with only minimal, programming. To some extent as advanced analytics tool KNIME can be considered as a SAS alternative.

Since 2006, KNIME has been used in pharmaceutical research, it also used in other areas like CRM customer data analysis, business intelligence and financial data analysis.
14 hours
Overview
This instructor-led, live training (online or onsite) is aimed at data analysts and data scientists who wish to implement more advanced data analytics techniques for data mining using Python.

By the end of this training, participants will be able to:

- Understand important areas of data mining, including association rule mining, text sentiment analysis, automatic text summarization, and data anomaly detection.
- Compare and implement various strategies for solving real-world data mining problems.
- Understand and interpret the results.

Format of the Course

- Interactive lecture and discussion.
- Lots of exercises and practice.
- Hands-on implementation in a live-lab environment.

Course Customization Options

- To request a customized training for this course, please contact us to arrange.
14 hours
Overview
Apache ActiveMQ is an open source message broker written in Java.
14 hours
Overview
This instructor-led, live training in Egypt (online or onsite) is aimed at application developers and engineers who wish to master more sophisticated usages of the Teradata database.

By the end of this training, participants will be able to:

- Manage Teradata space.
- Protect and distribute data in Teradata.
- Read Explain Plan.
- Improve SQL proficiency.
- Use main utilities of Teradata.
7 hours
Overview
The objective of the course is to enable participants to gain a mastery of the fundamentals of R and how to work with data.
28 hours
Overview
MemSQL is an in-memory, distributed, SQL database management system for cloud and on-premises. It's a real-time data warehouse that immediately delivers insights from live and historical data.

In this instructor-led, live training, participants will learn the essentials of MemSQL for development and administration.

By the end of this training, participants will be able to:

- Understand the key concepts and characteristics of MemSQL
- Install, design, maintain, and operate MemSQL
- Optimize schemas in MemSQL
- Improve queries in MemSQL
- Benchmark performance in MemSQL
- Build real-time data applications using MemSQL

Audience

- Developers
- Administrators
- Operation Engineers

Format of the course

- Part lecture, part discussion, exercises and heavy hands-on practice
14 hours
Overview
Apache Arrow is an open-source in-memory data processing framework. It is often used together with other data science tools for accessing disparate data stores for analysis. It integrates well with other technologies such as GPU databases, machine learning libraries and tools, execution engines, and data visualization frameworks.

In this onsite instructor-led, live training, participants will learn how to integrate Apache Arrow with various Data Science frameworks to access data from disparate data sources.

By the end of this training, participants will be able to:

- Install and configure Apache Arrow in a distributed clustered environment
- Use Apache Arrow to access data from disparate data sources
- Use Apache Arrow to bypass the need for constructing and maintaining complex ETL pipelines
- Analyze data across disparate data sources without having to consolidate it into a centralized repository

Audience

- Data scientists
- Data engineers

Format of the Course

- Part lecture, part discussion, exercises and heavy hands-on practice

Note

- To request a customized training for this course, please contact us to arrange.
14 hours
Overview
Apache Hama is a framework based on the Bulk Synchronous Parallel (BSP) computing model and is primarily used for Big Data analytics.

In this instructor-led, live training, participants will learn the fundamentals of Apache Hama as they step through the creation of a BSP-based application and a vertex-centric program using the Apache Hama frameworks.

By the end of this training, participants will be able to:

- Install and configure Apache Hama
- Understand the fundamentals of Apache Hama and the Bulk Synchronous Parallel (BSP) programming model
- Build a BSP-based program using Apache Hama BSP framework
- Build a vertex-centric program using Apache Hama Graph Framework
- Build, test, and debug their own Apache Hama applications

Audience

- Developers

Format of the course

- Part lecture, part discussion, exercises and heavy hands-on practice

Note

- To request a customized training for this course, please contact us to arrange.
Online Big Data courses, Weekend Big Data courses, Evening Big Data training, Big Data boot camp, Big Data instructor-led, Weekend Big Data training, Evening Big Data courses, Big Data coaching, Big Data instructor, Big Data trainer, Big Data training courses, Big Data classes, Big Data on-site, Big Data private courses, Big Data one on one training

Course Discounts

Course Discounts Newsletter

We respect the privacy of your email address. We will not pass on or sell your address to others.
You can always change your preferences or unsubscribe completely.

Some of our clients

is growing fast!

We are looking to expand our presence in Egypt!

As a Business Development Manager you will:

  • expand business in Egypt
  • recruit local talent (sales, agents, trainers, consultants)
  • recruit local trainers and consultants

We offer:

  • Artificial Intelligence and Big Data systems to support your local operation
  • high-tech automation
  • continuously upgraded course catalogue and content
  • good fun in international team

If you are interested in running a high-tech, high-quality training and consulting business.

Apply now!

This site in other countries/regions