安装包版本说明
python: 3.6 / 3.7 / 3.8 pyhive: 0.6.2 thrift: 0.13.0 thrift_sasl: 0.4.2 sasl: 根据python version决定
一,到https://www.lfd.uci.edu/~gohlke/pythonlibs/#sasl下载对应自己python版本的sasl (cp37 代表python version 为3.7)
本地安装:
二,安装pyhive: 0.6.2 ,thrift: 0.13.0 ,thrift_sasl: 0.4.2
pip install thrift==0.13.0 pip install thrift_sasl==0.4.2 pip install pyhive==0.6.2
三,hadoop、hive配置
hadoop 配置目录下的core-site.xml
hadoop.proxyuser.lylg.hosts 和hadoop.proxyuser.lylg.groups 中的lylg 都替换自己的登录的用户名
fs.defaultFS hdfs://lylg102:9000 hadoop.tmp.dir /opt/module/hadoop-2.7.2/data/tmp hadoop.proxyuser.lylg.hosts * hadoop.proxyuser.lylg.groups *
hive 配置目录下的hive-site.xml
javax.jdo.option.ConnectionURL jdbc:mysql://lylg102:3306/metastore?createDatabaseIfNotExist=true JDBC connect string for a JDBC metastore javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver Driver class name for a JDBC metastore javax.jdo.option.ConnectionUserName root username to use against metastore database javax.jdo.option.ConnectionPassword 000000 password to use against metastore database hive.metastore.warehouse.dir /user/hive/warehouse location of default database for the warehouse hive.cli.print.header true hive.cli.print.current.db true hive.metastore.schema.verification false datanucleus.schema.autoCreateAll true hive.execution.engine tez hive.server2.thrift.bind.host lylg102 hive.server2.thrift.port 10000 hive.metastore.sasl.enabled false If true, the metastore Thrift interface will be secured with SASL. Clients must authenticate with Kerberos. hive.server2.enable.doAs false hive.server2.authentication NONE
四,测试连接Hive
请看详细官方Github:https://github.com/dropbox/PyHive/tree/v0.6.2
第一种测试:(可能会报错,请使用第二种,详情请看github Pyhive)
from pyhive import hive
conn = hive.Connection(host='lylg102',
port=10000,
auth="NONE",
database='default',
username='lylg')
cursor = conn.cursor()
cursor.execute('SELECt * from students')
for result in cursor.fetchall():
print(result)
cursor.close()
conn.close()
第二种测试:
from pyhive import hive
from TCLIService.ttypes import TOperationState
def execute_sql(query):
hive.connect('lylg102', configuration={'hive.exec.reducers.max': '123'})
cursor = hive.Connection(host='lylg102',
port=10000,
auth='NONE',
database='default',
username='lylg').cursor()
cursor.execute(query, async_=True)
status = cursor.poll().operationState
while status in (TOperationState.INITIALIZED_STATE, TOperationState.RUNNING_STATE):
status = cursor.poll().operationState
res = cursor.fetchall()
cursor.close()
return res
if __name__ == '__main__':
sql = '
select avg(views),
avg(score),
sum(views) from cartoon_info'
res = execute_sql(sql)
print(res)
参考:https://its401.com/article/weixin_34232617/93727029
https://github.com/dropbox/PyHive/issues/240



