数仓sql场景:求第一次出现和最后一次出现及最后一次的相关信息

news/2024/9/27 14:14:07

1.需求

根据身份证号分组聚合求第一次(开始时间最小)出现的基站及对应时间和最后一次(结束时间最大)出现的基站,ip4,ip6及对应时间

2.应用场景

安全行业,上网流量日志监控行业,如追击罪犯,要查看第一次出现的位置和最后一次出现的位置及最后一次联系人等相关信息,从而可以和最后一次联系人及相关信息从而分析罪犯去向等信息

3.示例数据

with a as 
(select 1 as identitycode,'st_001' as stationid,'ip4_001' as ip4,'' as ip6, 1 as isipv4,20240615150101 as starttime,20240615150601 as endtime
union all 
select 1 as identitycode,'st_002' as stationid,'' as ip4,'ip6_001' as ip6, 0 as isipv4,20240615150601 as starttime,20240615150901 as endtime
union all 
select 1 as identitycode,'st_003' as stationid,'ip4_02' as ip4,'' as ip6, 1 as isipv4,20240615151301 as starttime,20240615151701 as endtime
union all
select 2 as identitycode,'st_004' as stationid,'' as ip4,'ip6_004' as ip6, 0 as isipv4,20240618150101 as starttime,20240618150601 as endtime
union all 
select 2 as identitycode,'st_005' as stationid,'' as ip4,'ip6_001' as ip6, 0 as isipv4,20240618150601 as starttime,20240618150901 as endtime
union all 
select 2 as identitycode,'st_006' as stationid,'ip4_07' as ip4,'' as ip6, 1 as isipv4,20240618151301 as starttime,20240618151701 as endtime)

4.sql实现

hivesql实现,在仓内实现结果直接应用

with a as 
(select 1 as identitycode,'st_001' as stationid,'ip4_001' as ip4,'' as ip6, 1 as isipv4,20240615150101 as starttime,20240615150601 as endtime
union all 
select 1 as identitycode,'st_002' as stationid,'' as ip4,'ip6_001' as ip6, 0 as isipv4,20240615150601 as starttime,20240615150901 as endtime
union all 
select 1 as identitycode,'st_003' as stationid,'ip4_02' as ip4,'' as ip6, 1 as isipv4,20240615151301 as starttime,20240615151701 as endtime
union all
select 2 as identitycode,'st_004' as stationid,'' as ip4,'ip6_004' as ip6, 0 as isipv4,20240618150101 as starttime,20240618150601 as endtime
union all 
select 2 as identitycode,'st_005' as stationid,'' as ip4,'ip6_001' as ip6, 0 as isipv4,20240618150601 as starttime,20240618150901 as endtime
union all 
select 2 as identitycode,'st_006' as stationid,'ip4_07' as ip4,'' as ip6, 1 as isipv4,20240618151301 as starttime,20240618151701 as endtime)select b.identitycode,b.starttime as starttime,b.stationid as first_stationid,c.endtime as lasttime,c.stationid as last_stationid,c.ip4,c.ip6,c.isipv4 from 
(select identitycode,stationid,starttime,row_number() over(PARTITION by identitycode order by starttime) srn from a) b 
left join 
(select identitycode,stationid,ip4,ip6,isipv4,endtime,row_number() over(PARTITION by identitycode order by endtime desc) lrn from a) c
on b.identitycode=c.identitycode where b.srn=1 and c.lrn=1
;

输出结果

olap引擎如clickhouse实现

select 
cur.identitycode AS identitycode,
toUInt64(cur.first.2) AS firsttime,
toUInt64(cur.last.5) AS lasttime,
cur.first.1 AS firststationid,
cur.last.1 AS laststationid,
toUInt64(cur.last.2) AS ip4,
cur.last.3 AS ip6,
toUInt8(cur.last.4) AS isipv4
from
(SELECT 
identitycode,
any(account) as account,
any(accounttype) as accounttype,
argMin(tuple(stationid ,starttime),starttime) AS first,
argMax(tuple(stationid,ip4,ip6,isipv4,endtime),endtime) AS last
FROM dwd_identity_base_cur_hour cur 

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.ryyt.cn/news/45776.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈,一经查实,立即删除!

相关文章

SVG pattern 标签的用法和应用场景

通过使用 通过使用 <pattern> 标签,可以在 SVG 图像内部定义可重复使用的任意图案。这些图案可以通过 fill 属性或 stroke 属性进行引用。 使用场景 例如我们要在 <svg> 中绘制大量的圆点点,可以通过重复使用 <circle> 标签来实现。<svg width="10…

软件工具推荐-1数据库客户端-dbeaver

官网 https://dbeaver.io/ 免费社区版本 免费版本,功能齐全,支持各种数据库 其他包含TDengine这类时序数据, 功能非常强大,增删改查无需写sql,谁用谁便利 but,也许,可能,会使你失去写sql的能力,另一方面也反应了他的可视化功能非常强大 常用基本操作基本不能写sql

4.13 拔掉网线后, 原本的 TCP 连接还存在吗? (转载)

4.13 拔掉网线后, 原本的 TCP 连接还存在吗? 大家好,我是小林。 今天,聊一个有趣的问题:拔掉网线几秒,再插回去,原本的 TCP 连接还存在吗? 可能有的同学会说,网线都被拔掉了,那说明物理层被断开了,那在上层的传输层理应也会断开,所以原本的 TCP 连接就不会存在的了…

Angular cli 父传子,子传父,服务Service, @input,@output,@ViewChild 装饰器的简单使用,看这一篇就够了

直接code 1:Angular cli 创建组件componentng g component components\rightng g c wave 简写需要定位到根路径下即可创建组件Could not find an NgModule. Use the skip-import option to skip importing in NgModule. PS C:\myAngulrDemos\20240428demo\mydemo01\src> c…

业务数据脱敏

业务数据脱敏 一、什么是数据脱敏 先来看看什么是数据脱敏?数据脱敏也叫数据的去隐私化,在我们给定脱敏规则和策略的情况下,对敏感数据比如 手机号、银行卡号 等信息,进行转换或者修改的一种技术手段,防止敏感数据直接在不可靠的环境下使用。 像政府、医疗行业、金融机构、…

面试官问:Kafka 会不会丢消息?怎么处理的?

作者:Java3y链接:https://www.zhihu.com/question/628325953/answer/3281764326来源:知乎著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。面试官:今天我想问下,你觉得Kafka会丢数据吗? 候选者:嗯,使用Kafka时,有可能会有以下场景会丢消息 候选…

kafka 如何保证不重复消费又不丢失数据?

作者:Java3y链接:https://www.zhihu.com/question/483747691/answer/2392949203来源:知乎著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。面试官:今天我想问下,你觉得Kafka会丢数据吗? 候选者:嗯,使用Kafka时,有可能会有以下场景会丢消息 候选…