Skip to main content

yaml语法学习

首先我们要学Yaml语法: 具体yaml语法可以参考yaml语法详情

yaml语法的核心我觉得也是字典语言

缩减空格:

YAML对此非常严格:仅缩进2个空格

结构化数据-言外之意就是字典对象

两个主要结构:

  1. 对象 类似javascript的对象 或者html <dl>
  2. 数组 类似javascript的数组 类似html<ul>

读取yaml文件

config_file  = './test.yaml'
with open(config_file, 'r') as stream:
config = yaml.load(stream.read(),Loader=yaml.FullLoader)

对象

YAML中的对象以单词/术语开头,后跟冒号和空格

对象包含对象

dimensions:
width: 3 metres
height: 8 metres
length: 12 metres
weight: 4 tonnes

我们可以dimension['width']获取wdith数值

for key,values in config['dimensions']:
print(key,values)

数组

likes_to_eat:
- Other dinosaurs
- Meat
- More meat
- Not plants
for key,values in enumerate(config['likes_to_eat']):
print(key,values)

数据包含对象

Person:
- name: T. rex
period: Late Cretaceous Period
- name: Stegosaurus
period: Late Jurassic Period
- name: Velociraptor
period: Cretaceous Period

"-"可以看出列表[],然后如果里面有对象,就是列表嵌套对象的意思

以上就是yaml最基本的用法

文字区块

如果您的YAML中有超大文本块,则可以使用竖线(|)或大于符号(>)的方式将文本分开,并允许更多格式。 大于号使您可以在多行上编写文本,因为当您解析YAML时,这些行将折叠为一行。

desc: >
Tyrannosaurus (/tɨˌrænəˈsɔrəs/ or /taɪˌrænəˈsɔrəs/ ("tyrant lizard", from the Ancient Greek tyrannos (τύραννος), "tyrant", and sauros (σαῦρος), "lizard"[1])) is a genus of coelurosaurian theropod dinosaur.
The species Tyrannosaurus rex (rex meaning "king" in Latin), commonly abbreviated to T. rex, is one of the most well-represented of the large theropods.

验证

我们可以去yaml语法验证网站验证(http://www.yamllint.com/)

我的一个实例:

jobs: 
-
name: "read_file and conbine as one dataframe5"
sources:
-
Fillna:
fields: NUM_OF_MTHS_PD_30
num: 0
JobDescription: "read Curr_RD file"
object: "gs://zz_michael/yaml/Curr_RD.avro"
view: Curr_RD
-
Fillna:
fields: NUM_OF_MTHS_PD_30
num: 0
JobDescription: "read Prev_RD file"
object: "gs://zz_michael/yaml/Prev_RD.avro"
view: Prev_RD
-
JobDescription: "left join"
Key: ARNG_ID
how: left_outer
join: left_outer
left: Curr_RD
right: Prev_RD
table: Curr_RD
view: df5
-
name: "filter Df5 df"
sources:
-
JobDescription: "filter df5"
filters: "NUM_OF_MTHS_PD_30 >=1"
table: df5
view: df6
transforms:
-
sql: "CREATE OR REPLACE TEMPORARY VIEW df7 as select * from df5 where NUM_OF_MTHS_PD_30 is null or NUM_OF_MTHS_PD_30 <1"
-
name: fillna
sources:
-
Fillna:
fields: NUM_OF_MTHS_PD_30
num: 0
JobDescription: fillna
table: df7
view: df8
-
name: union
sources:
-
JobDescription: union
table: df6
union: df8
view: df9
-
name: write to avro file
sources:
-
JobDescription: "left join"
Key: ARNG_ID
how: left_outer
join: left_outer
left: Curr_RD
right: df9
table: Curr_RD
view: df10
targets:
final_object: df10
format: csv
mode: overwrite
target_location: "gs://zz_michael/yaml/output"
transforms:
-
sql: "CREATE OR REPLACE TEMPORARY VIEW df10 as select * from df10"
name: haha

解析上面YAML文件

  1. 我们可以看到Jobs是对象 jobs(object)
  2. 从'-'可以看出,Jobs对象嵌套列表 类似 [i for i in config["jobs"]],所以我们这里可以遍历jobs
  3. 列表里面嵌套对象: [{"name":"","sources":""},{"name":"","sources":"","transforms":""},{"name":"","sources":""},{"name":"","sources":""},{"name":"","sources":"","transforms":"",'targets':""}]
  4. 我们看到source也是有嵌套的,也是嵌套对象[{},{}]
  5. Fillna对象里面也是有嵌套对象