R社交网络分析包在传染病传播链可视化的应用

初衷 在这次疫情处理过程中,了解到在梳理传播链的过程中,很多时候仍然是手工在powerpoint等软件绘制传播链的。采用这种方式的优点能够在图中根据设计者需要加入较多的信息,比如:人员大致位置分布,接触的途径和强度等信息。不足之处在于,在链条上节点(感染者)较少的时候还能够梳理得很明确,但一旦节点达到一定数量,其中关系复杂度将呈几何倍数增加(比如1人传多个,1人与多个感染者有接触之类)。 在这种情况下。单纯的手工整理,将耗费非常多的脑力。最严重的缺点是,当现场流调信息变更,对链条进行修订的时,其中一个节点或链接的变化,会因连锁作用导致整个链条的变化。节点越多,变化的影响范围越大,越复杂,就像整理线头一样。当感染者人数上升到一定数量时,手动整理已经变成了一件难以完成的事情。由于本人对R的热衷,探索了一下能不能使用软件自动化链就是自己懒嘛的方式绘制传播,使用igraph,ggraph和networkD3最终效果如下面几张图,个人觉得还是networkD3炫酷的互动效果最好。 具体制作过程 参见我使用的3个包的说明。。。。。详细步骤待补充。 数据 节点数据 节点数据里面只需要包含所有感染者的基本信息,比如编号,姓名,类别等等。 边数据 边数据最基础的要求为,节点数据左右感染者的对应关系,简单说就像Excel两列,第一列from, 第二列to,代表每一行两个感染者的关系,从谁传播到谁,当然这些资料需要辛苦在现场的流调专家们提供。 可视化 igraph 首先使用graph_from_data_frame(d =line, vertices = node, directed = T)将节点和边转换成igraph,就可以直接plot第一张图, 参数自己可以调节。 netwokd3 个人最喜欢的效果,使用igraph_to_networkD3命令,将igraph数据转换一下,就可以使用simpleNetwork,forceNetwork,sankeyNetwork(画出交互性网络图了。试验了下,手机浏览器一样可以互动,包括拖动节点,放大,移动等,非常棒的体验。 {"x":{"links":{"source":[2,8,10,12,3,7,56,2,14,13,1,56,0,2,2,2,2,3,4,5,6,7,7,1,56,1,9,2,11,6,24,3,56,7,12,12,1,12,12,6,7,10,15,15,7,28,30,12,7,56,16,14,14,17,37,18,53,2,56,1,5,3,6],"target":[10,11,12,12,13,14,15,16,17,18,19,19,1,20,21,22,23,24,25,26,27,28,29,2,2,30,31,32,33,34,35,35,36,37,38,39,3,40,41,42,43,44,45,46,47,47,47,48,49,4,50,51,52,53,53,54,55,5,6,7,8,9,9],"value":[8,4,8,3,8,8,4,8,8,4,8,3,3,8,8,8,8,8,8,8,8,8,8,8,3,4,8,8,8,8,3,8,4,8,6,8,3,8,8,8,8,8,8,8,8,3,3,8,8,4,8,8,8,8,3,8,4,8,4,4,8,4,2],"colour":["#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666","#666"]},"nodes":{"name":["0号","1号","2号","3号","4号","5号","6号","7号","8号","9号","10号","11号","12号","13号","14号","15号","16号","17号","18号","19号","20号","21号","22号","23号","24号","25号","26号","27号","28号","29号","30号","31号","32号","33号","34号","35号","36号","37号","38号","39号","40号","41号","42号","43号","44号","45号","46号","47号","48号","49号","50号","51号","52号","53号","54号","55号","56号"],"group":[2,2,2,1,7,2,1,6,5,1,2,5,8,4,3,9,2,3,4,2,2,2,2,2,1,7,2,1,6,6,6,1,2,5,1,1,2,3,8,8,8,8,1,6,2,9,9,6,8,6,2,3,3,3,4,3,2],"nodesize":[200,150,30.6,34,35.6,43.1,40.5,10.8,46.9,44.5,42.8,46.8,22.8,27.3,25.4,18.7,11,14.2,23.1,12.4,31.3,11.1,21.7,11.6,11.2,1.3,24,48.2,33.1,2.9,29.2,15.5,45.1,38,23.8,44.4,32.7,15,36,41.2,4.8,22.9,1.6,38.9,41.1,38.6,43.3,9.9,2.1,46.1,12,42.3,1.8,18.9,40.6,27.9,20]},"options":{"NodeID":"name","Group":"group","colourScale":"d3.scaleOrdinal(d3.schemeCategory20);","fontSize":12,"fontFamily":"serif","clickTextSize":30,"linkDistance":50,"linkWidth":"function(d) { return Math.sqrt(d.value); }","charge":-30,"opacity":2,"zoom":true,"legend":false,"arrows":true,"nodesize":true,"radiusCalculation":" Math.sqrt(d.nodesize)+6","bounded":false,"opacityNoHover":1,"clickAction":null}},"evals":[],"jsHooks":[]} {"x":{"links":{"source":[2,8,10,12,3,7,56,2,14,13,1,56,0,2,2,2,2,3,4,5,6,7,7,1,56,1,9,2,11,6,24,3,56,7,12,12,1,12,12,6,7,10,15,15,7,28,30,12,7,56,16,14,14,17,37,18,53,2,56,1,5,3,6],"target":[10,11,12,12,13,14,15,16,17,18,19,19,1,20,21,22,23,24,25,26,27,28,29,2,2,30,31,32,33,34,35,35,36,37,38,39,3,40,41,42,43,44,45,46,47,47,47,48,49,4,50,51,52,53,53,54,55,5,6,7,8,9,9],"value":[8,4,8,3,8,8,4,8,8,4,8,3,3,8,8,8,8,8,8,8,8,8,8,8,3,4,8,8,8,8,3,8,4,8,6,8,3,8,8,8,8,8,8,8,8,3,3,8,8,4,8,8,8,8,3,8,4,8,4,4,8,4,2]},"nodes":{"name":["0号","1号","2号","3号","4号","5号","6号","7号","8号","9号","10号","11号","12号","13号","14号","15号","16号","17号","18号","19号","20号","21号","22号","23号","24号","25号","26号","27号","28号","29号","30号","31号","32号","33号","34号","35号","36号","37号","38号","39号","40号","41号","42号","43号","44号","45号","46号","47号","48号","49号","50号","51号","52号","53号","54号","55号","56号"],"group":["0号","1号","2号","3号","4号","5号","6号","7号","8号","9号","10号","11号","12号","13号","14号","15号","16号","17号","18号","19号","20号","21号","22号","23号","24号","25号","26号","27号","28号","29号","30号","31号","32号","33号","34号","35号","36号","37号","38号","39号","40号","41号","42号","43号","44号","45号","46号","47号","48号","49号","50号","51号","52号","53号","54号","55号","56号"]},"options":{"NodeID":"name","NodeGroup":"name","LinkGroup":null,"colourScale":"d3.scaleOrdinal(d3.schemeCategory20);","fontSize":16,"fontFamily":null,"nodeWidth":15,"nodePadding":10,"units":"Letter(s)","margin":{"top":null,"right":null,"bottom":null,"left":null},"iterations":20,"sinksRight":true}},"evals":[],"jsHooks":[]} ggraph ggraph基本研用了ggplot2绘图的方式,画出来的图也相对更漂亮。首先使用tidygraph包将igraph类型的数据转换为ggraph更合适的元数据。然后可以愉快地使用ggplot2的方式画图了。

March 25, 2022 · Luo Fei

R爬上海疫情数据简单可视化分析(4月7日更新)

今天本地的疫情终于没有增加,抽了点时间关注其他地区的疫情形势。看官方通报的数据,对曾经的模范城市的疫情有兴趣,决定来简单看看。 1 数据获取 1.1 数据来源 要获取准确的数据,当然是上官方网站。打开上海市卫健委的官网(https://wsjkw.sh.gov.cn/xwfb/index.html),疫情数据公告都在“新闻发布”栏目中,再仔细一看,疫情信息的标题中就包含了所有新增、确诊数据。真是太方便了。新闻页面也是连续的以_[num]为页码的编号,这种页面爬起来不要太省事。 1.2 R包 首先加载需要用到的包,主要使用rvest包获取静态网页信息,谷歌浏览器selectorgadget插件获取需要信息的节点,tidyverse整理数据,绘图,lubridate包处理日期变量。 library(rvest) library(tidyverse) library(lubridate) library(readxl) library(openxlsx) library(ggforce) library(mgcv) 1.3 主页信息获取 使用selectorgadget在网页上找到需要的变量“时间”和“标题”,对应的node分别是.time和.list-data a,使用rvest抓取后,转为文本存储在列表中。 ## 设置url url <- "https://wsjkw.sh.gov.cn/xwfb/index.html" ## 获取首页信息 content <- read_html(url) reportdate <- content %>% html_nodes(".time") %>% html_text() %>% as.Date.character() title <- content %>% html_nodes(".list-date a") %>% html_text() basedata <- tibble(reportdate = reportdate, title = title) 1.4 爬取所有数据 上海的疫情变化主要从3月开始,但为了查看之前的变化,是否输入病例压力增大导致本次疫情,因此爬取了2到200页到的数据。改用purrr包的map_df函数爬取。 ##使用循环爬取2到200也网页信息 url_1 <- c("https://wsjkw.sh.gov.cn/xwfb/index.html", paste0("https://wsjkw.sh.gov.cn/xwfb/index_", c(2:200), ".html")) webfun_0 <- function(url){ webpage <- read_html(url) tibble( "reportdate" = webpage %>% html_nodes("....

March 24, 2022 · Luo Fei

建站记录(踩坑心得)  [draft]

在繁忙的工作中,断断续续、跌跌撞撞地自学R语言,翻资料,查英文。这一段历程对于一个大叔来说,作实有点辛苦,值得记录。还记得第一次正式接触和使用R还是2015年在上海工作期间。那时候也没有深入的了解和学习,只是基础的学习了平实在工作中可能会使用的基础功能。真正认真开始学习是2019年新冠疫情暴发之初,因为要处理大量数据、分析、绘图、建模,所以认真花费了一段时间来学习。在学习过程中参考了不少大牛的教材,书籍和参考资料等。尤其要感谢一辉。从rmarkdown、bookdown、blogdown、knit,为了那个文学编程,害我还学习了lantex、html、css、pandoc等,可谓是一把辛酸泪啊😭。关于这段历程,后面还是计划开个新章好好记录。这篇文章主要记录下,自己使用blogdown+ hugo + netlify, 踩坑心得 建站历程。 关于blogdown blogdown的具体功能这里不赘述了,希望了解的请参考一辉大神的blogdown,我这里只是记录自己的心得坑: 踩坑记录 自定义“代码高亮”无效(待解决) 根据hugopaper的教程设置发现下面代码有效,设置后代码无highlight。 params: assets: disableHLJS: true 下面这段代码无效,设置后无法显示,上传到netfily同样无效。 markup: highlight: # anchorLineNos: true codeFences: true guessSyntax: true lineNos: true # noClasses: false style: monokai 在highlightjs 网站下载相关的css样式后,将喜欢的样式名称改为an-old-hope.min后放在网站根目录/assets/css可以更改高亮样式,但是建议选择黑暗模式的样式。现在还未找到在网站白天/黑夜模式切换下。css下载地址 更改样式后,白色字体貌似是hugo-paper主题重新定义了的,无法根据样式更改。待解决 baseURL 如果根据教程提示将baseURL改为在netfily上提供的域名,在用自己的域名解析后,次级链接仍然会链接到原netfily域名上 解决方案: 在config.yaml设置如下 baseURL: /

January 12, 2022 · Luo Fei