0%

查找在Beautiful Soup中标记的next items

❓ 我想用Beautiful Soup和Python解析HTML文件,例如

1
2
3
4
5
6
7
8
9
<h1>Title 1</h1>
<div class="item"><p>content 1</p></div>
<div class="item"><p>content 2</p></div>
...
<h1>Title 2</h1>
<div class="item"><p>content 3</p></div>
<div class="item"><p>content 4</p></div>
<div class="item"><p>content 5</p></div>
...

我如何将其解析为这样的字典

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
{
"Title 1": [
{
"content": "content 1"
},
{
"content": "content 2"
}
],
"Title 2": [
{
"content": "content 3"
},
{
"content": "content 4"
},
{
"content": "content 5"
}
]
}

✔️ 这是您可以实现的方法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
soup = bs4.BeautifulSoup(html)
data = {}
row = []
title = ""
for tag in soup:
print(tag)
if tag.name == 'h1':
if title:
data[title] = row
row = []
title = tag.string

elif tag.name == 'div':
row.append(tag.string)

if title:
data[title] = row