栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 面试经验 > 面试问答

如何使用pdfbox检查文本是否透明

面试问答 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

如何使用pdfbox检查文本是否透明

事实证明, 透明文本 实际上根本不是透明的,而只是被图像覆盖:在201103 SA的关键吸烟统计数据2010
FINAL.pdf中
,图像“
SA — 2004的关键吸烟统计数据”已被图像覆盖。显示TC标志。

下面显示了文本剥离程序类的概念证明,该类将忽略图像覆盖的文本。

public class VisibleTextStripper extends PDFTextStripper{    public VisibleTextStripper() throws IOException    {        super();        registerOperatorProcessor("Do", new Invoke());    }    //    // Hiding operations    //    void hide(String name)    {        Matrix ctm = getGraphicsState().getCurrentTransformationMatrix();        float x = ctm.getXPosition();        float y = ctm.getYPosition();        float scaledWidth = ctm.getXScale();        float scaledHeight = ctm.getYScale();        for(List<TextPosition> characters : charactersByArticle)        { Collection<TextPosition> toRemove = new ArrayList<TextPosition>(); for (TextPosition character : characters) {     Matrix matrix = character.getTextPos();     float cx = matrix.getXPosition();     float cy = matrix.getYPosition();     float cw = character.getWidth();     float ch = character.getHeight();     if (overlaps(x, scaledWidth, cx, cw) && overlaps(y, scaledHeight, cy, cw))     {         System.out.printf("Hidden by '%s': X: %f; Y: %f; Width: %f; Height: %f; Char: '%s'n", name, cx, cy, cw, ch, character.getCharacter());         toRemove.add(character);     } } characters.removeAll(toRemove);        }    }    private boolean overlaps(float start1, float width1, float start2, float width2)    {        if (width1 < 0)        { start1 += width1; width1 = -width1;        }        if (width2 < 0)        { start2 += width2; width2 = -width2;        }        if (start1 < start2)        { return start1 + width1 >= start2;        }        else        { return start2 + width2 >= start1;        }    }    //    // operator processors    //    public static class Invoke extends OperatorProcessor    {                private static final Log LOG = LogFactory.getLog(Invoke.class);                public void process(PDFOperator operator, List<COSbase> arguments) throws IOException        { VisibleTextStripper drawer = (VisibleTextStripper)context; COSName objectName = (COSName)arguments.get( 0 ); Map<String, PDXObject> xobjects = drawer.getResources().getXObjects(); PDXObject xobject = (PDXObject)xobjects.get( objectName.getName() ); if ( xobject == null ) {     LOG.warn("Can't find the XObject for '"+objectName.getName()+"'"); } else if( xobject instanceof PDXObjectImage ) {     drawer.hide(objectName.getName()); } else if(xobject instanceof PDXObjectForm) {     PDXObjectForm form = (PDXObjectForm)xobject;     COSStream formContentstream = form.getCOSStream();     // if there is an optional form matrix, we have to map the form space to the user space     Matrix matrix = form.getMatrix();     if (matrix != null)      {         Matrix xobjectCTM = matrix.multiply( context.getGraphicsState().getCurrentTransformationMatrix());         context.getGraphicsState().setCurrentTransformationMatrix(xobjectCTM);     }     // find some optional resources, instead of using the current resources     PDResources pdResources = form.getResources();     context.processSubStream( context.getCurrentPage(), pdResources, formContentstream ); }        }    }}

它适合您的示例文档。

支票

if (overlaps(x, scaledWidth, cx, cw) && overlaps(y, scaledHeight, cy, cw))

不幸的是,假设不涉及文本和图像的旋转(所有转换都汇总)。

对于通用解决方案,您必须将此测试更改为某种检查,以检查由

Matrix ctm =getGraphicsState().getCurrentTransformationMatrix()
重叠转换的1x1正方形是否与由
Matrixmatrix = character.getTextPos()
固定宽度,高度
cw = character.getWidth()
和的字符框重叠
ch =character.getHeight()
。也许简单的重叠是不够的,您可能希望充分覆盖字符框。

此外,该测试忽略了图像遮罩,即图像的透明度。



转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/440103.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号