我是Node.js的乞丐,为了测试purpouse,我想创建一个简单的应用程序,它基于给定的HTML创建一个对象数组。
让我解释一下。我有一个HTML字符串,它包含多个div元素,如下所示:
<div class="user_container">
<div class="user">
<div class="thumb">
<!-- thumbnail block-->
</div>
<div class="web_presence_locations"></div>
<div class="user_data">
<span class="name">Jaroslaw Chujczynski</span>
<p class="location_with_flag">
<!-- img with url here-->
Leeds,
United Kingdom
</p>
<div class="user_details">
<div class="amount currency">
£28,000.00
<span class="overbooked">(in overfunding)</span>
</div>
</div>
</div>
</div>
<div class="profile_container">
<div class="extra_profile_data" style="">
<div class="investments last">
<h3 class="h5">Recent Investments</h3>
<ul>
<li class="first">
<div class="campaign-logo-frame">
<a class="campaign_link" href="/test1">test1</a>
<span class="currency">£28,000.00</span>
</div>
</li>
<li class="">
<div class="campaign-logo-frame">
<a class="campaign_link" href="/test2">test2</a>
<span class="currency">£28,000.00</span>
</div>
</li>
<li class="">
<div class="campaign-logo-frame">
<a class="campaign_link" href="/test3">test3</a>
<span class="currency">£28,000.00</span>
</div>
</li>
<li class="">
<div class="campaign-logo-frame">
<a class="campaign_link" href="/test4">test4</a>
<span class="currency">£28,000.00</span>
</div>
</li>
</ul>
</div>
</div>
</div>
</div>
我想要做的是基于我在上面的div中的数据创建一个对象,例如,它将如下所示:
{
name: 'Jaroslaw Chujczynski',
location: 'Leeds, United Kingdom',
ammountCurrency: '£28,000.00 (in overfunding)',
lastInvestments: [
{
name: 'test1',
currency: '£28,000.00'
}, {
name: 'test2',
currency: '£28,000.00'
}, {
name: 'test3',
currency: '£28,000.00'
}, {
name: 'test4',
currency: '£28,000.00'
}]
}
当然,在我的html中会有很多这样的div,所以我将创建一个这样的对象数组。
好吧,我现在有的是:
const fs = require('fs');
const cheerio = require('cheerio');
const getAllData = (fileName) => {
try {
return fs.readFileSync(fileName, 'utf8');
} catch(e) {
console.log('Error:', e.stack);
}
}
const data = getAllData('test.html');
const $ = cheerio.load(data);
const filterData = () => {
console.log($('div[class="user_container"]'));
}
filterData();
它给我的回报是这样的--那是不需要的(或者它必须是这样的?):
namespace: 'http://www.w3.org/1999/xhtml',
attribs: [Object: null prototype] {
class: 'user_container'
},
'x-attribsNamespace': [Object: null prototype] {
class: undefined
},
'x-attribsPrefix': [Object: null prototype] {
class: undefined
},
children: [ [Node], [Node], [Node], [Node], [Node], [Node] ],
parent: Node {
type: 'tag',
name: 'section',
namespace: 'http://www.w3.org/1999/xhtml',
attribs: [Object: null prototype],
'x-attribsNamespace': [Object: null prototype],
'x-attribsPrefix': [Object: null prototype],
children: [Array],
parent: [Node],
prev: [Node],
next: [Node]
},
etc....
所以我不确定,但我认为首先我必须获得一个div块的数组,其中class是user_container
,当我获得它时,我必须迭代这个数组,为它们每个创建对象。
有人能帮我一下吗?
html是XML的一种类型--您应该查看XML工具--让这些工具解析html,然后您可以使用该工具对它们运行XML查询。这将允许您使用xtract XML,您可以将其转换为JSON。
快速的google搜索会返回以下用于nodejs的XML工具--但还有更多:
https://www.npmjs.com/package/fast-xml-parser-说它还将导出到JSON
http://www.curtismlarson.com/blog/2018/10/03/edit-xml-node-js/-有一个详细的walk thu.