数据库应用
如果您拥有数据库后不知道干点什么,不妨参考作者的另一个开源项目:Vonng/isd
您可以直接复用监控系统Grafana,以交互式的方式查阅近30000个地面气象站过去120年间的亚小时级气象数据。
ISD —— Intergrated Surface Data
这里包含了下载、解析、处理、可视化NOAA ISD数据集所需的所有工具。 能让您查阅近30000个地面气象站过去120年间的亚小时级气象数据。并充分体验PostgreSQL带来的强大的数据分析与处理能力!
SYNOPSIS
Download, Parse, Visualize Intergrated Suface Dataset.
Including 30000 meteorology station, sub-hourly observation records, from 1900-2020.
Quick Started
-
Clone repo
git clone https://github.com/Vonng/isd && cd isd
-
Prepare a postgres database
Connect via something like
isd
orpostgres://user:pass@host/dbname
)# skip this if you already have a viable database PGURL=postgres psql ${PGURL} -c 'CREATE DATABASE isd;' # database connection string, something like `isd` or `postgres://user:pass@host/dbname` PGURL='isd' psql ${PGURL} -AXtwc 'CREATE EXTENSION postgis;' # create tables, partitions, functions psql ${PGURL} -AXtwf 'sql/schema.sql'
-
Download data
- ISD Station: Station metadata, id, name, location, country, etc…
- ISD History: Station observation records: observation count per month
- ISD Hourly: Yearly archived station (sub-)hourly observation records
- ISD Daily: Yearly archvied station daily aggregated summary
git clone https://github.com/Vonng/isd && cd isd bin/get-isd-station.sh # download isd station from noaa (proxy makes it faster) bin/get-isd-history.sh # download isd history observation from noaa bin/get-isd-hourly.sh <year> # download isd hourly data (yearly tarball 1901-2020) bin/get-isd-daily.sh <year> # download isd daily data (yearly tarball 1929-2020)
-
Build Parser
There are two ISD dataset parsers written in Golang :
isdh
for isd hourly dataset andisdd
for isd daily dataset.make isdh
andmake isdd
will build it and copy to bin. These parsers are required for loading data into database.You can download pre-compiled binary to bin/ dir to skip this phase.
-
Load data
Metadata includes
world_fences
,china_fences
,isd_elements
,isd_mwcode
,isd_station
,isd_history
. These are gzipped csv file lies indata/meta/
.world_fences
,china_fences
,isd_elements
,isd_mwcode
are constant dict table. Butisd_station
andisd_history
are frequently updated. You’ll have to download it from noaa before loading it.# load metadata: fences, dicts, station, history,... bin/load-meta.sh # load a year's daily data to database bin/load-isd-daily <year> # load a year's hourly data to database bin/laod-isd-hourly <year>
Note that the original
isd_daily
dataset has some un-cleansed data, refer caveat for detail.
Data
Dataset
数据集 | 样本 | 文档 | 备注 |
---|---|---|---|
ISD Hourly | isd-hourly-sample.csv | isd-hourly-document.pdf | (Sub-) Hour oberservation records |
ISD Daily | isd-daily-sample.csv | isd-daily-format.txt | Daily summary |
ISD Monthly | N/A | isd-gsom-document.pdf | Not used, gen from daily |
ISD Yearly | N/A | isd-gsoy-document.pdf | Not used, gen from monthly |
Hourly Data: Oringinal tarball size 105GB, Table size 1TB (+600GB Indexes).
Daily Data: Oringinal tarball size 3.2GB, table size 24 GB
It is recommended to have 2TB storage for a full installation, and at least 40GB for daily data only installation.
Schema
Data schema definition
Station
CREATE TABLE public.isd_station
(
station VARCHAR(12) PRIMARY KEY,
usaf VARCHAR(6) GENERATED ALWAYS AS (substring(station, 1, 6)) STORED,
wban VARCHAR(5) GENERATED ALWAYS AS (substring(station, 7, 5)) STORED,
name VARCHAR(32),
country VARCHAR(2),
province VARCHAR(2),
icao VARCHAR(4),
location GEOMETRY(POINT),
longitude NUMERIC GENERATED ALWAYS AS (Round(ST_X(location)::NUMERIC, 6)) STORED,
latitude NUMERIC GENERATED ALWAYS AS (Round(ST_Y(location)::NUMERIC, 6)) STORED,
elevation NUMERIC,
period daterange,
begin_date DATE GENERATED ALWAYS AS (lower(period)) STORED,
end_date DATE GENERATED ALWAYS AS (upper(period)) STORED
);
Hourly Data
CREATE TABLE public.isd_hourly
(
station VARCHAR(11) NOT NULL,
ts TIMESTAMP NOT NULL,
temp NUMERIC(3, 1),
dewp NUMERIC(3, 1),
slp NUMERIC(5, 1),
stp NUMERIC(5, 1),
vis NUMERIC(6),
wd_angle NUMERIC(3),
wd_speed NUMERIC(4, 1),
wd_gust NUMERIC(4, 1),
wd_code VARCHAR(1),
cld_height NUMERIC(5),
cld_code VARCHAR(2),
sndp NUMERIC(5, 1),
prcp NUMERIC(5, 1),
prcp_hour NUMERIC(2),
prcp_code VARCHAR(1),
mw_code VARCHAR(2),
aw_code VARCHAR(2),
pw_code VARCHAR(1),
pw_hour NUMERIC(2),
data JSONB
) PARTITION BY RANGE (ts);
Daily Data
CREATE TABLE public.isd_daily
(
station VARCHAR(12) NOT NULL,
ts DATE NOT NULL,
temp_mean NUMERIC(3, 1),
temp_min NUMERIC(3, 1),
temp_max NUMERIC(3, 1),
dewp_mean NUMERIC(3, 1),
slp_mean NUMERIC(5, 1),
stp_mean NUMERIC(5, 1),
vis_mean NUMERIC(6),
wdsp_mean NUMERIC(4, 1),
wdsp_max NUMERIC(4, 1),
gust NUMERIC(4, 1),
prcp_mean NUMERIC(5, 1),
prcp NUMERIC(5, 1),
sndp NuMERIC(5, 1),
is_foggy BOOLEAN,
is_rainy BOOLEAN,
is_snowy BOOLEAN,
is_hail BOOLEAN,
is_thunder BOOLEAN,
is_tornado BOOLEAN,
temp_count SMALLINT,
dewp_count SMALLINT,
slp_count SMALLINT,
stp_count SMALLINT,
wdsp_count SMALLINT,
visib_count SMALLINT,
temp_min_f BOOLEAN,
temp_max_f BOOLEAN,
prcp_flag CHAR,
PRIMARY KEY (ts, station)
) PARTITION BY RANGE (ts);
Update
ISD Daily and ISD hourly dataset will rolling update each day. Run following scripts to load latest data into database.
# download, clean, reload latest hourly dataset
bin/get-isd-daily.sh
bin/load-isd-daily.sh
# download, clean, reload latest daily dataset
bin/get-isd-daily.sh
bin/load-isd-daily.sh
# recalculate latest partition of monthly and yearly
bin/refresh-latest.sh
Parser
There are two parser: isdd
and isdh
, which takes noaa original yearly tarball as input, generate CSV as output (which could be directly consume by PostgreSQL Copy command).
NAME
isdh -- Intergrated Surface Dataset Hourly Parser
SYNOPSIS
isdh [-i <input|stdin>] [-o <output|st>] -p -d -c -v
DESCRIPTION
The isdh program takes isd hourly (yearly tarball file) as input.
And generate csv format as output
OPTIONS
-i <input> input file, stdin by default
-o <output> output file, stdout by default
-p <profpath> pprof file path (disable by default)
-v verbose progress report
-d de-duplicate rows (raw, ts-first, hour-first)
-c add comma separated extra columns