Well, after entering into Kiwi Plus, I have a bunch of tasks to handle as everyone expects.
In fact, I really want to go through these tasks to improve my skillsets.
Who knows? I feel like I would have wanted to do my best for these distributed frameworks and data analysis a long time ago.
Pseudo Code in Zepplin with PySpark
From pyspark.sql import HiveContext
Yesterday = datetime.date(today() – 1)
Sql = "select agg_date, count(id) from location_log where between from and to"
sqlContext = HiveContext(sc)
df = sqlContext.sql(sql).toPandas()
for agg_date in df.index
- Provides spark and jdbs interpreter to access Spark execution and Warehouse
- provides Hive version for HiveQL compatibility.
- Supports HiveContext with useful functions to handle a table
- Provides HiveContext with efficient configuration and functions
- After setting configuration 'hive-site.xml' for warehouse location, HiveContext in Spark can access hive tables
Reference - http://tomining.tistory.com/89
'Distributed System Information' 카테고리의 다른 글
|Zeppelin + Python Spark SQL + Hive + Hadoop (0)||2016.10.15|
|Install Spark and Zeppelin and Hadoop (0)||2016.10.12|
|System Monitoring Tool and a Plugin with Flink and BigBench (0)||2015.12.16|
|Haloop? Why using combiner? (1)||2015.07.12|
|Big Data Visualization & Value Chain Part.1 (0)||2015.06.07|
|Parsing Log by Using Regular Expression (0)||2015.04.29|