HDFS读写流程
HDFS client调用FileSystem.open(filePath),
与NN进行[RPC]通信,返回该文件的block列表,返回FSDataInputStream
HDFS client调用FileSystem.read方法
与第一个块最近的DN进行读取,读取完成后,检查是否OK?
如果ok,就会关闭与DN通信
如果不ok,就会从第二个节点去读取,以此类推
当block列表全部完成后,HDFS client调用FSDataInputStream的close方法,关闭数据流
HDFS写流程
HDFS client 调用 Filesystem.create (filePath)方法与NN进行【RPC】通信
NN会检香这个文件是否存在?是否有权限创建这个文件
如果都可以,就创建个文件这个时候,文件是没有数据的,也不关联block
NN会再根据文件的大小,块大小,副木数等计算要上传多少的块和对应的D节点,最终这个信息返回给客户端 【FSDataoutputstream】对象
HDFS client调用客户端【FSDataoutputstream 】的write方法,根据N返回的信息,将第一个块的第一个副木写到DN1,写完复制到DN2,再复制到DN3。
当我们三个副木写完,DN3返回ack【确认字符】DN2
52 DN2接收到ack,返送ack给 DN1DN1接受到ack,返回ack给到 【FSDataoutputstream 】,告诉它第一个块的三副本写完了
以此类推
当所有的块全部都写完
HDES client调用 【ESDataoutputstream】的close方法
关闭数据流
然后调用Filesystem.complete方法,告诉w文件写成功。
sql
1.求往后第四个数
select
lead(sal,4) over(partition by deptno order by hiredate)
from emp
2.求当前行和以下五 行
select
sum(sal) over (partition by deptno order by hiredate rows between current row and 4 following)
3.求每个spu当前和之后的和
select
spu_id,
pt,
click_pv,
sum(click_pv) over(partition by spu_id order by pt ) as x,
sum(click_pv) over(partition by spu_id order by pt rows between unbounded preceding and 2 following) as y --当前行和之前行
from click
4.连续登录
select id,max© from(select id,sum(b)+1 as c from (select
id,
login_date,
lead(login_date,1) over(partition by id order by login_date)-login_date as b
from login)
where b=1
group by id
)
group by id
每年第一天
select id,date_sub,count(1) from
(select
id,
login_date,
row_number() over(partition by id order by login_date) as rn,
dayodyear(login_date)-row_number() over(partition by id order by login_date) as date_sub
from login
) as a
group by id,date_sub
做差
select id,date_sub,count(1) from
(select
id,
login_date,
row_number() over(partition by id order by login_date) as rn,
date_sub(login_date,interval row_number() over()partition bu id order by login_date)day)as date_sub
login_date) as date_sub
from login
) as a
group by id,date_sub
对应开始结束日期
max(date_date) min(date_date) group by id
5.emp表递归
select t.empno, t.ename, t.mgr, level
from emp t
start with t.empno =‘7839’
connect by prior t.empno = t.MGR
order by level



