Question 5 : You have been assigned to run a logistic regression model for each of 100 countries, and all the data is currently stored in a PostgreSQL database. Which tool/library would you use to produce these models with the least effort?

1. RStudio

2. MADlib

3. RStudio

4. HBase

Correct Answer : 2

Exp : MADlib is an open-source library for scalable in-database analytics. It offers dataparallel implementations of mathematical, statistical, and machine learning methods for structured and unstructured data. Because MADlib is designed and built to accommodate massive parallel processing of data, MADlib is ideal for Big Data in-database analytics. MADlib supports the opensource database PostgreSQL as well as the Pivotal Greenplum Database and Pivotal HAWQ. HAWQ is a SQL query engine for data stored in the Hadoop Distributed File System (HDFS). Module Description Generalized Linear Models : Includes linear regression, logistic regression, and multinomial logistic regression