2016.10.15 19:20 - Phil lee

Zeppelin + Python Spark SQL + Hive + Hadoop


Well, after entering into Kiwi Plus, I have a bunch of tasks to handle as everyone expects.

In fact, I really want to go through these tasks to improve my skillsets.

Who knows? I feel like I would have wanted to do my best for these distributed frameworks and data analysis a long time ago.


Pseudo Code in Zepplin with PySpark

Import datetime

From pyspark.sql import HiveContext

Import mysql.connector


Yesterday = datetime.date(today() – 1)

Sql = "select agg_date, count(id) from location_log where between from and to"


sqlContext = HiveContext(sc)

df = sqlContext.sql(sql).toPandas()




for agg_date in df.index




  • Provides spark and jdbs interpreter to access Spark execution and Warehouse


  • provides Hive version for HiveQL compatibility.
  • Supports HiveContext with useful functions to handle a table


  • Provides HiveContext with efficient configuration and functions
  • After setting configuration 'hive-site.xml' for warehouse location, HiveContext in Spark can access hive tables

Reference - http://tomining.tistory.com/89






댓글을 입력하세요

티스토리 툴바