Assignment 6 -- Hadoop NoSQL (Part 4 of 6)

In assignments 6 and 7, your team is evaluating NoSQL technologies for the purpose of integrating the data collected by the Social Media platforms (which is done in NoSQL) into your analytics database (done in MySQL). In real life you do not get to do this. You have to adapt to the data, but part of our objective here is for you to understand the options well enough to understand what you will be dealing with -- perhaps even identifying a favorite

Each member of your team should perform research on at least two components of the Apache Hadoop Ecosystem (HDFS, HBASE, Mapreduce, Yam, FLUME, Drill, Hive, Pig, etc -- there are several more). Do enough research to become the team expert on that tool. At a minimum, you should know basically:

**NOTE** It is in your best interest to do a quick shallow dive on all of the tools at the beginning so you have a high level idea of what each does. Then have each team member choose tools that they find interesting


Hybrid Class Assignment - Hadoop Presentation

Your team is going to create a presentation in LibreOffice Impress (*1) much in the same way you would develop code. As a team, select a collection of Apache Hadoop tools to collect Social Media data (as if you were running the Social Media company) and provide a path from the NoSQL side to the data delivered to your Data Analytics database in the sample data feed from Lab 4

You might want to look ahead to Lab 7 at this point -- there are some advantages to creating them in tandem

The idea here is that each team member will be responsible for one or more tools of the presentation. I am thinking you need at least 4 of the tools at a minimum, but you can use more, however, split the work evenly. Each team member is responsible for creating their slides for the presentation ... a minimum of 4:

  1. What does it do
  2. Why am I using it in my solution
  3. Where does it fit in the flow of information
  4. Possible concerns

You can have more but NOT more than 8 per component

I expect a coordinated result that shows your team is communicating, so in advance select a format for your slides. The order should follow the flow of the data, starting with where the data resides in Hadoop and ending with how you are going to provide a data feed like the one in Lab 4.

Remember: I have provided you with a Virtual Reality environment where your team can meet at any time -- use it.


Assignment Submission Criteria

To turn in this assignment you will go to the NSC Canvas shell for this class. Submit the following:

  1. A single presentation (a copy of which is turned in by EACH member of the team) which outlines how you can use Hadoop in your diabolical plan to program humans by manipulating their Social Media feeds. Sure you can wordsmith this and try to put a positive spin on it.
  2. You should be prepared to present this in class – 2 teams will present
  3. A participation scoring sheet -- list the members of your team and indicate their participation using the following scale:

You should only list 1 person as leader (and you must list at least 1 person as leader). The first 3 categories are respectable contribution categories. This item is to be turned in by each team member and is intended to be private (for my eyes only). I DO NOT want you to share your evaluation with your team members

*1 – Libre Office Impress is an Open Source (free) presentation tool ( like PowerPoint) available on all platforms (Linux, Mac, Windows)