Well, after entering into Kiwi Plus, I have a bunch of tasks to handle as everyone expects.
In fact, I really want to go through these tasks to improve my skillsets.
Who knows? I feel like I would have wanted to do my best for these distributed frameworks and data analysis a long time ago.
Pseudo Code in Zepplin with PySpark
Import datetime
From pyspark.sql import HiveContext
Import mysql.connector
Yesterday = datetime.date(today() – 1)
Sql = "select agg_date, count(id) from location_log where between from and to"
sqlContext = HiveContext(sc)
df = sqlContext.sql(sql).toPandas()
df.set_index(agg_date)
mysqlConnector
for agg_date in df.index
…
Zeppelin
- Provides spark and jdbs interpreter to access Spark execution and Warehouse
Spark-SQL
- provides Hive version for HiveQL compatibility.
- Supports HiveContext with useful functions to handle a table
Hive
- Provides HiveContext with efficient configuration and functions
- After setting configuration 'hive-site.xml' for warehouse location, HiveContext in Spark can access hive tables
Reference - http://tomining.tistory.com/89
'Distributed System Information' 카테고리의 다른 글
Install Spark and Zeppelin and Hadoop (0) | 2016.10.12 |
---|---|
System Monitoring Tool and a Plugin with Flink and BigBench (0) | 2015.12.16 |
Haloop? Why using combiner? (1) | 2015.07.12 |
Big Data Visualization & Value Chain Part.1 (0) | 2015.06.07 |
Parsing Log by Using Regular Expression (0) | 2015.04.29 |