久久精品人人爽,华人av在线,亚洲性视频网站,欧美专区一二三

proc sql語句在數據清洗中的運用

138次閱讀
沒有評論

共計 2028 個字符,預計需要花費 6 分鐘才能閱讀完成。

本篇內容介紹了“proc sql 語句在數據清洗中的運用”的有關知識,在實際案例的操作過程中,不少人都會遇到這樣的困境,接下來就讓丸趣 TV 小編帶領大家學習一下如何處理這些情況吧!希望大家仔細閱讀,能夠學有所成!

libname clean c:/books/clean   /* 定義永久性數據庫 */

* 新建一個樣本數據 one;
data one;
  input X Y Z;
datalines;
1 2 3
101 202 303
44 55 66
444 555 666
;
title Values of X from data set ONE where X is greater than 100
/* 有條件的選擇觀測值 */

proc sql;
  select X
  from one
  where X gt 100; 
quit;

*Program 8-2;
*** 檢查無效的字符型的數值;
title Checking for Invalid Character Data
proc sql;
  select Patno,
  Gender,
  DX,
  AE
  from clean.patients
  where Gender not in (M , F ,)  or
  notdigit(trim(DX))and not missing(DX) or
  AE not in (0 , 1 ,
quit;

* 檢查無效的數值型的數值;
title Checking for out-of-range numeric values
proc sql;
  select Patno,
  HR,
  SBP,
  DBP
  from clean.patients
  where HR  not between 40 and 100 and not missing(HR)  or
  SBP not between 80 and 200 and not missing(SBP)  or
  DBP not between 60 and 120 and not missing(DBP);
quit;

* 基于標準差利用簡單的算法來檢查數值;
title Data values beyond two standard deviations
proc sql;
  select Patno,
  SBP
  from clean.patients
  having SBP not between mean(SBP) – 2 * std(SBP) and
  mean(SBP) + 2 * std(SBP)  and
  SBP is not missing;
quit;

* 檢查缺失值;
options linesize=84;
title Observations with missing values
proc sql;
  select *
  from clean.patients
  where Patno  is missing or
  Gender  is missing or
  Visit  is missing or
  HR  is missing or
  SBP  is missing or
  DBP  is missing or
  DX  is missing or
  AE  is missing;
quit;

* 檢查日期;
title Dates before June 1, 1998 or after October 15, 1999
proc sql;
  select Patno,
  Visit
  from clean.patients
  where Visit not between 01jun1998 d and 15oct1999 d and
  Visit is not missing;
quit;

* 檢查重復值;
title Duplicate Patient Numbers
proc sql;
  select Patno,
  Visit
  from clean.patients
  group by Patno
  having count(Patno) gt 1;
quit;

* 識別對應多個觀察值的變量;
title Listing of patients who do not have two visits
proc sql;
  select Patno,
  Visit
  from clean.patients2
  group by Patno
  having count(Patno) ne 2;
quit;

* 檢查兩個文件中對應要求的序列號 ID;
data one;
  input Patno X Y;
datalines;
1 69 79
2 56 .
3 66 99
5 98 87
12 13 14
;
data two;
  input Patno Z;
datalines;
1 56
3 67
4 88
5 98
13 99
;

* 兩個文件都不含的 ID;
title Patient numbers not in both files
proc sql;
  select One.patno as ID_one,
  Two.patno as ID_two
  from one full join two
  on One.patno eq Two.patno
  where One.patno is missing or Two.patno is missing;
quit;

“proc sql 語句在數據清洗中的運用”的內容就介紹到這里了,感謝大家的閱讀。如果想了解更多行業相關的知識可以關注丸趣 TV 網站,丸趣 TV 小編將為大家輸出更多高質量的實用文章!

正文完
 
丸趣
版權聲明:本站原創文章,由 丸趣 2023-07-28發表,共計2028字。
轉載說明:除特殊說明外本站除技術相關以外文章皆由網絡搜集發布,轉載請注明出處。
評論(沒有評論)
主站蜘蛛池模板: 南投县| 略阳县| 太保市| 绥阳县| 襄城县| 南充市| 鲜城| 安仁县| 健康| 郴州市| 东台市| 二连浩特市| 正阳县| 铜山县| 乌兰浩特市| 绍兴市| 东乡| 郸城县| 广南县| 岳西县| 钦州市| 临江市| 梁平县| 额尔古纳市| 朝阳区| 甘南县| 江津市| 沾化县| 岳池县| 遂溪县| 荣昌县| 张家界市| 巴中市| 民权县| 华容县| 武穴市| 广宗县| 故城县| 黄陵县| 株洲市| 林甸县|