难题！ Python 如何解析类似于 pipe 的 string，并转换成一个类型确定的 object 呢

想做一个东西，cli 或者当 library 都可以用。大概是

python main.py data.csv --transform "parser | get='name' | len==10 | original | parser | get='age'"

做的就是遍历所有行，parse 数据，找到“名字”长度为 10 的行，返回“age”

传输的数据：
class Item:
orignial: Any
value: Any

我写了 Parser, Len, GetField, Orignal 这几个 class 。初步计划是事先把 class 放到一个 dict 里，解析字符串为，并把操作符和数值拿来初始化 class

pipe_units = [
Parser()，
Get("=", "name")，
Len("==", "10")，
Original()，
Parser(),
Get("=", "name")，
]

然后 pipe = CompiledPipe(pipe_units)

wrapped_records = CsvReader(f) # 也是个 pipe unit

pipe.set_upstream(wrapped_records) # 或者 wrapped_records >> pipe

for out_record in pipe:
print(out_record)

这样的做法有什么明显缺陷吗？解析 pipe 字符串有什么比较好的方法吗？现在直接用 split 之类的方法来做，感觉很粗糙。这个 parse 动作，在业界有专有名词吗？谢谢各位

fgwmlhdkkkw

124 天前

试试这个

https://github.com/lark-parser/lark

ipwx

124 天前

写 dsl 可以用 pyparsing

liberize

124 天前

如果 get 后面的 name 里包含'|'，直接用 split 有问题。

GeekGao

124 天前

```
from pyparsing import Word, alphanums, Suppress, Group, OneOrMore, Optional

def parse_pipeline(pipeline_string):
# 定义基本元素
command = Word(alphanums + "_")
argument = Word(alphanums + "_='")
pipe = Suppress("|")

# 定义命令结构
command_structure = Group(command + Optional(Group(OneOrMore(argument))))

# 定义整个管道结构
pipeline = OneOrMore(command_structure + Optional(pipe))

# 解析字符串
parsed = pipeline.parseString(pipeline_string)

result = []
for item in parsed:
if len(item) == 1:
result.append({"command": item[0], "args": []})
else:
result.append({"command": item[0], "args": item[1].asList()})

return result

# 使用
pipeline_str = "parser | get='name' | len==10 | original | parser | get='age'"
parsed_pipeline = parse_pipeline(pipeline_str)
print(parsed_pipeline)

```

Output:
```
[{'command': 'parser', 'args': []}, {'command': 'get', 'args': ["='name'"]}, {'command': 'len', 'args': ['==10']}, {'command': 'original', 'args': []}, {'command': 'parser', 'args': []}, {'command': 'get', 'args': ["='age'"]}]
```

抛砖引玉。

nowheremanx

124 天前

@GeekGao 这些楼上各位，很有收获。比我现在各种正则表达式干净很多。

请问这个是 ChatGPT 写的吗（感觉很工整！）

GeekGao

124 天前

@nowheremanx 额这算哪门子工整。。。

rming

124 天前

awk 可解，就是学习成本有点高

june4

124 天前

格式这么简单有序的东西，完全没必要手写分析器从一个个字符处理，split 加正则才是正道，除非你本意是想学点新东西

这是一个专为移动设备优化的页面（即为了让你能够在 Google 搜索结果里秒开这个页面），如果你希望参与 V2EX 社区的讨论，你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/1065193

V2EX 是创意工作者们的社区，是一个分享自己正在做的有趣事物、交流想法，可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.