PHIL STORY

2016. 10. 15. 19:20 - Phil lee

Zeppelin + Python Spark SQL + Hive + Hadoop

Well, after entering into Kiwi Plus, I have a bunch of tasks to handle as everyone expects.

In fact, I really want to go through these tasks to improve my skillsets.

Who knows? I feel like I would have wanted to do my best for these distributed frameworks and data analysis a long time ago.

Pseudo Code in Zepplin with PySpark

Import datetime

From pyspark.sql import HiveContext

Import mysql.connector

Yesterday = datetime.date(today() – 1)

Sql = "select agg_date, count(id) from location_log where between from and to"

sqlContext = HiveContext(sc)

df = sqlContext.sql(sql).toPandas()

df.set_index(agg_date)

mysqlConnector

for agg_date in df.index

…

Zeppelin

Spark-SQL

Hive

Provides HiveContext with efficient configuration and functions
After setting configuration 'hive-site.xml' for warehouse location, HiveContext in Spark can access hive tables

Install Spark and Zeppelin and Hadoop (0)	2016.10.12
System Monitoring Tool and a Plugin with Flink and BigBench (0)	2015.12.16
Haloop? Why using combiner? (1)	2015.07.12
Big Data Visualization & Value Chain Part.1 (0)	2015.06.07
Parsing Log by Using Regular Expression (0)	2015.04.29