公司有需求要从用户的评论信息中屏蔽掉手机号,家庭地址,脏话之类的敏感信息。主要采取了两种方式,一种是在后台管理系统中手动删除和屏蔽,另一种是 server 端上 batch 管理用 AI 自动屏蔽。手动操作不仅需要屏蔽敏感信息还要从文本中提取敏感信息的类容并且打上标签,从而用作公司内部 AI 模型的训练数据集,于是我就开发了manual-data-masking这个轻量的 javasript 库,不仅可以让手动屏蔽敏感信息的操作更加轻松简单,还可以生成标签数据集和屏蔽掉敏感信息后的文本内容。
举个例子,有这么一条用户商品评论:
tmd ,怎么我昨天刚买的手机就不好用了,我要立马退货,赶紧给我打电话道歉,手机号:0808080808 。
使用manual-data-masking,你可以像这样进行划词屏蔽:
当然,被屏蔽的敏感内容数据集( dataMasked )和被去除敏感信息的用户评论( textAfterDataMasking )也会同时被生成:
Data masked 可以用于构建 data masking 相关的 AI 模型的训练数据集:
[
{
content: "Damn it",
category: "Person Name",
start: 0, // 敏感内容在全文中的开始位置
end: 7, // 敏感内容在全文中结束位置
},
{
content: "080808080",
category: "Phone Number",
start: 120,
end: 129,
},
];
Text after data masking:
***,怎么我昨天刚买的手机就不好用了,我要立马退货,赶紧给我打电话道歉,手机号:*********。
npm install manual-data-masking
import { create as createManualDataMasking } from "manual-data-masking";
const dataMasked = [
{
"content": "Damn it",
"category": "Person Name",
"start": 0,
"end": 7
}
]
const categories = [
{
"value": "Person Name",
"color": "#b6656c"
},
{
"value": "Swear Word",
"color": "#577eba"
},
{
"value": "Phone Number",
"color": "#3e6146"
}
]
const text = "Damn it, The phone i just bought last week has been broken 😠, \n I need refund right now, Call me on this phone number: 080808080."
const $manualDataMasking = createManualDataMasking({
container: document.getElementById("demo"),
text,
dataMasked,
categories,
replaceCharactor: "*",
dataMaskingCharactor: "X",
maxHeight: 100
})
$manualDataMasking.on("afterDataMasking", (dataMasked, textAfterDataMasking) => {
console.log(JSON.stringify(dataMasked));
console.log(textAfterDataMasking);
})
</script>
<script src="https://unpkg.com/manual-data-masking"></script>
<script>
const dataMasked = [
{
content: "Damn it",
category: "Person Name",
start: 0,
end: 7,
},
];
const categories = [
{
value: "Person Name",
color: "#b6656c",
},
{
value: "Swear Word",
color: "#577eba",
},
{
value: "Phone Number",
color: "#3e6146",
},
];
const text =
"Damn it, The phone i just bought last week has been broken 😠, \n I need refund right now, Call me on this phone number: 080808080.";
const $manualDataMasking = manualDataMasking.create({
container: document.getElementById("demo"),
text,
dataMasked,
categories,
replaceCharactor: "*",
dataMaskingCharactor: "X",
maxHeight: 100,
});
$manualDataMasking.on(
"afterdataMasking",
(dataMasked, textAfterDataMasking) => {
console.log(JSON.stringify(dataMasked));
console.log(textAfterDataMasking);
}
);
</script>
更多关于参数和方法的说明请参照文档: https://github.com/HC200ok/manual-data-masking
manual-data-masking
还有一个 Vue2 的版本,请参照这里:https://github.com/HC200ok/vue2-text-annotation
如果你觉得这个库有意思或者有帮助到你,请支持我一下,给我一个 github⭐,谢谢!
1
reorx 2022-06-17 10:07:14 +08:00 via iPhone
好项目,支持
|
3
liushuigs 2022-06-27 17:27:13 +08:00
script 版本的[示例]( https://runjs.qingting.work/#/projects/409921fd79434e97),似乎跑不起来呢。
|