Java关于创建文件(夹)核心代码提醒关于Pdf转图片图片转PdfWord转PdfWord转HtmlHTML转Wordhtml内容转.html文件读取.html文件内容,进行修改,重命名在重新写回Html转Pdf
java使用itext把含图片的html转为pdf PDF转Html[Java 将 XML 转为 Excel](https://www.e-iceblue.cn/spirexlsjavaconversion/convert-xml-to-excel-using-java.html)Excel转XMLExcel转PDFExcel转图片Excel转WordWord转图片
[ java导出文件压缩包这块可以看看】=]excle导入导出数据的处理(包括多个sheet页,各种样式的设定等内容) 在线预览篇
Word转html实现在线预览
课外补充:
如果你遇到NoClassDefFoundError戳我
// jpg文件转出路径
File file = new File(outputFilePath + Integer.valueOf(i+1) + ".jpg");
if (!file.getParentFile().exists()) {
// 不存在则创建父目录及子文件
file.getParentFile().mkdirs();
file.createNewFile();
}
关于Pdf转图片
Pdf文件转图片方法一
1、maven依赖 2、转换代码
(生成图片清晰,效率1秒转3页的样子)
org.apache.pdfbox
fontbox
2.0.1
org.apache.pdfbox
pdfbox
2.0.1
import org.apache.pdfbox.pdmodel.PDdocument;
import org.apache.pdfbox.pdmodel.PDPageTree;
import org.apache.pdfbox.rendering.PDFRenderer;
import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.*;
public class PdfToImgOne {
static void pdfToImageFile(String inputFilePath, String outputFilePath) throws Exception {
long currentTimeMillisStart = System.currentTimeMillis();
PDdocument doc = null;
ByteArrayOutputStream os = null;
InputStream stream = null;
OutputStream out = null;
try {
// pdf路径
stream = new FileInputStream(inputFilePath);
// 加载解析PDF文件
doc = PDdocument.load(stream);
PDFRenderer pdfRenderer = new PDFRenderer(doc);
PDPageTree pages = doc.getPages();
int pageCount = pages.getCount();
for (int i = 0; i < pageCount; i++) {
BufferedImage bim = pdfRenderer.renderImageWithDPI(i, 200);
os = new ByteArrayOutputStream();
ImageIO.write(bim, "jpg", os);
byte[] dataList = os.toByteArray();
// jpg文件转出路径
File file = new File(outputFilePath + Integer.valueOf(i + 1) + ".jpg");
if (!file.getParentFile().exists()) {
// 不存在则创建父目录及子文件
file.getParentFile().mkdirs();
file.createNewFile();
}
out = new FileOutputStream(file);
out.write(dataList);
}
} catch (Exception e) {
e.printStackTrace();
throw e;
} finally {
if (doc != null) doc.close();
if (os != null) os.close();
if (stream != null) stream.close();
if (out != null) out.close();
}
long currentTimeMillisEnd = System.currentTimeMillis();
System.out.println("18页的Pdf转换完成耗时:"
+(currentTimeMillisEnd - currentTimeMillisStart)/1000);//18页的Pdf转换完成耗时:6
}
}
Java使用icepdf将pdf文件按页转成图片方法二
1、maven依赖 2、转换代码
(生成图片不够清晰,效率1秒转2页的样子)但是人家可以旋转生成的图片,生成图片进行放大也是清晰的
org.icepdf.os
icepdf-core
6.1.2
javax.media
jai-core
import org.icepdf.core.exceptions.PDFException;
import org.icepdf.core.exceptions.PDFSecurityException;
import org.icepdf.core.pobjects.document;
import org.icepdf.core.pobjects.Page;
import org.icepdf.core.util.GraphicsRenderingHints;
import javax.imageio.IIOImage;
import javax.imageio.ImageIO;
import javax.imageio.ImageWriter;
import javax.imageio.stream.ImageOutputStream;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.Iterator;
public class PdfToImgTwo {
public static void tranfer(String filepath, String imagepath, float rotation, float zoom) throws PDFException, PDFSecurityException, IOException {
long currentTimeMillisStart = System.currentTimeMillis();
document document = null;
document = new document();
document.setFile(filepath);
int maxPages = document.getPageTree().getNumberOfPages();
for (int i = 0; i < maxPages; i++) {
BufferedImage img = (BufferedImage) document.getPageImage(i, GraphicsRenderingHints.SCREEN, Page.BOUNDARY_CROPBOX, rotation, zoom);
Iterator iter = ImageIO.getImageWritersBySuffix("jpg");
ImageWriter writer = (ImageWriter) iter.next();
File outFile = new File(imagepath + Integer.valueOf(i + 1) + ".jpg");
FileOutputStream out = new FileOutputStream(outFile);
ImageOutputStream outImage = ImageIO.createImageOutputStream(out);
writer.setOutput(outImage);
writer.write(new IIOImage(img, null, null));
}
long currentTimeMillisEnd = System.currentTimeMillis();
System.out.println("18页的Pdf转换完成耗时:"
+ (currentTimeMillisEnd - currentTimeMillisStart) / 1000);//18页的Pdf转换完成耗时:11
}
public static void main(String[] args) throws PDFException, PDFSecurityException, IOException {
tranfer("E:\pdf\1.pdf", "E:\pdf\img\", 0.0f, 2);
}
}
图片转Pdf
这块需要你导入一些jar包依赖
import com.itextpdf.io.image.ImageDataFactory;
import com.itextpdf.kernel.color.DeviceGray;
import com.itextpdf.kernel.geom.Rectangle;
import com.itextpdf.kernel.pdf.Pdfdocument;
import com.itextpdf.kernel.pdf.PdfWriter;
import com.itextpdf.kernel.pdf.xobject.PdfFormXObject;
import com.itextpdf.layout.Canvas;
import com.itextpdf.layout.document;
import com.itextpdf.layout.element.Image;
import com.itextpdf.layout.property.TextAlignment;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.net.MalformedURLException;
public class ImgToPdfOne {
public static void imgToPdf(String imgFilePath, String outputFilePath) {
try {
long currentTimeMillisStart = System.currentTimeMillis();
File createPdf = new File(outputFilePath);
if (!createPdf.getParentFile().exists()) {
// 不存在则创建父目录及子文件
createPdf.getParentFile().mkdirs();
try {
createPdf.createNewFile();
} catch (IOException e) {
e.printStackTrace();
}
}
Pdfdocument pdfDoc = new Pdfdocument(new PdfWriter(outputFilePath));
document doc = new document(pdfDoc);
Image image = new Image(ImageDataFactory.create(imgFilePath));
image.scaleToFit(400, 700);
PdfFormXObject template = new PdfFormXObject(
new Rectangle(image.getImageScaledWidth(), image.getImageScaledHeight()));
Canvas canvas = new Canvas(template, pdfDoc).add(image);
// String watermark = "Welcome to yiibai.com";//这句话会打印在图片上 不支持中文 有长度限制
String watermark = "";
//下面图片格式可调整
canvas.setFontColor(DeviceGray.RED).showTextAligned(watermark, 100, 160, TextAlignment.CENTER);
// Adding template to document
Image image1 = new Image(template);
doc.add(image1);
doc.close();
long currentTimeMillisEnd = System.currentTimeMillis();
System.out.println("Pdf图片转换完成耗时:"
+ (currentTimeMillisEnd - currentTimeMillisStart) / 1000);//Pdf图片转换完成耗时:0
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (MalformedURLException e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
imgToPdf("E:\pdf\test\1.jpg", "E:\pdf\test\1.pdf");
}
}
下面这个图片转Pdf挺强大,可以把文件夹下多个图片写入一个Pdf中,你还可以设置写入图片的大小,位置,唯一一个缺点好像最后总是多一个空白页。
需要引入一个jar包。( 添加必须的iText.jar;)
import com.lowagie.text.document;
import com.lowagie.text.documentException;
import com.lowagie.text.Image;
import com.lowagie.text.pdf.PdfWriter;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
public class ImgToPdfTwo {
public void imgToPdfTwo() {
//创建一个文档对象
document doc = new document();
try {
//定义输出文件的位置 需要提前创建1.pdf文件来存截图
PdfWriter.getInstance(doc, new FileOutputStream("E:\pdf\test\1.pdf"));
//开启文档
doc.open();
//设定字体 为的是支持中文
// baseFont bfChinese = baseFont.createFont("STSong-Light", "UniGB-UCS2-H", baseFont.NOT_EMBEDDED);
// Font FontChinese = new Font(bfChinese, 12, Font.NORMAL);
//向文档中加入图片
String path = "E:\pdf\test\";//图片所在的文件夹
//遍历该文件下的文件
File file = new File(path);
File[] files = file.listFiles();//获取到所有文件遍历自己想要操作的文件
//如果只是遍历一张图片直接把路径写死即可 灵活应用
// Image jpg1 = Image.getInstance("E:\pdf\test\1.jpg");
for (int i = 0; i < files.length; i++) {
File file1 = files[i];
//根据后缀判断是否是图片
String[] imgTrue = file1.getName().split("\.");
if ("jpg".equals(imgTrue[1])) {
//取得图片~~~图片格式:
System.out.println("---" + file1.getName());
Image jpg1 = Image.getInstance(path + "/" + imgTrue[0] + ".jpg"); //原来的图片的路径
//获得图片的高度
float heigth = jpg1.height();
float width = jpg1.width();
System.out.println("原图片heigth" + i + "----" + heigth);
System.out.println("原图片width" + i + "-----" + width);
//合理压缩,h>w,按w压缩,否则按w压缩
// int percent=getPercent(heigth, width); //图片小时基本效果就是把原图片放到pdf中
//统一按照宽度压缩
int percent = getPercent2(heigth, width);
//设置图片居中显示
jpg1.setAlignment(Image.MIDDLE);
//直接设置图片的大小~~~~~~~第三种解决方案,按固定比例压缩
// jpg1.scaleAbsolute(210.0f, 297.0f);
//按百分比显示图片的比例
jpg1.scalePercent(percent);//表示是原来图像的比例;
//todo 可设置图像高和宽的比例
//jpg1.scalePercent(50, 100);
doc.add(jpg1);
}
}
//关闭文档并释放资源
doc.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (documentException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
public int getPercent(float h, float w) {
int p = 0;
float p2 = 0.0f;
if (h > w) {
p2 = 297 / h * 100;
} else {
p2 = 210 / w * 100;
}
p = Math.round(p2);
return p;
}
public int getPercent2(float h, float w) {
int p = 0;
float p2 = 0.0f;
p2 = 530 / w * 100;
p = Math.round(p2);
return p;
}
public static void main(String[] args) {
ImgToPdfTwo pt = new ImgToPdfTwo();
pt.imgToPdfTwo();
}
}
Word转Pdf
方法一,缺陷:doc文件转pdf(目前最大支持21页),好像转换少了最后一页,自己注意
项目远程仓库
aspose-words 这个需要配置单独的仓库地址才能下载,不会配置的可以去官网直接下载jar引入项目代码中。
AsposeJavaAPI Aspose Java API https://repository.aspose.com/repo/ org.apache.pdfbox pdfbox 3.0.0-RC1 com.github.jai-imageio jai-imageio-jpeg2000 1.3.0 com.aspose aspose-words 21.9 pom
import com.aspose.words.document;
import com.aspose.words.SaveFormat;
import org.apache.pdfbox.Loader;
import org.apache.pdfbox.contentstream.operator.Operator;
import org.apache.pdfbox.cos.COSArray;
import org.apache.pdfbox.cos.COSDictionary;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.cos.COSString;
import org.apache.pdfbox.pdfparser.PDFStreamParser;
import org.apache.pdfbox.pdfwriter.ContentStreamWriter;
import org.apache.pdfbox.pdmodel.PDdocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageTree;
import org.apache.pdfbox.pdmodel.PDResources;
import org.apache.pdfbox.pdmodel.common.PDStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
//doc文件转pdf(目前最大支持21页)
public static void doc2pdf(String wordPath,String outPutPdfPath,String newName) {
long old = System.currentTimeMillis();
try {
//新建一个pdf文档
String pdfPath = outPutPdfPath + newName+".pdf";
File file = new File(pdfPath);
FileOutputStream os = new FileOutputStream(file);
//Address是将要被转化的word文档
document doc = new document(wordPath);
//全面支持DOC, DOCX, OOXML, RTF HTML, Opendocument, PDF, EPUB, XPS, SWF 相互转换
// doc.save(os, SaveFormat.PDF);
doc.save(os, SaveFormat.PDF);
os.close();
//去除水印
removeWatermark(new File(pdfPath));
//转化用时
long now = System.currentTimeMillis();
System.out.println("Word 转 Pdf 共耗时:" + ((now - old) / 1000.0) + "秒");
} catch (Exception e) {
System.out.println("Word 转 Pdf 失败...");
e.printStackTrace();
}
}
//替换pdf文本内容
public static void replaceText(PDPage page, String searchString, String replacement) throws IOException {
PDFStreamParser parser = new PDFStreamParser(page);
List> tokens = parser.parse();
for (int j = 0; j < tokens.size(); j++) {
Object next = tokens.get(j);
if (next instanceof Operator) {
Operator op = (Operator) next;
String pstring = "";
int prej = 0;
if (op.getName().equals("Tj")) {
COSString previous = (COSString) tokens.get(j - 1);
String string = previous.getString();
string = string.replaceFirst(searchString, replacement);
previous.setValue(string.getBytes());
} else if (op.getName().equals("TJ")) {
COSArray previous = (COSArray) tokens.get(j - 1);
for (int k = 0; k < previous.size(); k++) {
Object arrElement = previous.getObject(k);
if (arrElement instanceof COSString) {
COSString cosString = (COSString) arrElement;
String string = cosString.getString();
if (j == prej) {
pstring += string;
} else {
prej = j;
pstring = string;
}
}
}
if (searchString.equals(pstring.trim())) {
COSString cosString2 = (COSString) previous.getObject(0);
cosString2.setValue(replacement.getBytes());
int total = previous.size() - 1;
for (int k = total; k > 0; k--) {
previous.remove(k);
}
}
}
}
}
List contents = new ArrayList<>();
Iterator streams = page.getContentStreams();
while (streams.hasNext()) {
PDStream updatedStream = streams.next();
OutputStream out = updatedStream.createOutputStream(COSName.FLATE_DECODE);
ContentStreamWriter tokenWriter = new ContentStreamWriter(out);
tokenWriter.writeTokens(tokens);
contents.add(updatedStream);
out.close();
}
page.setContents(contents);
}
//移除图片水印
public static void removeImage(PDPage page, String cosName) {
PDResources resources = page.getResources();
COSDictionary dict1 = resources.getCOSObject();
resources.getXObjectNames().forEach(e -> {
if (resources.isImageXObject(e)) {
COSDictionary dict2 = dict1.getCOSDictionary(COSName.XOBJECT);
if (e.getName().equals(cosName)) {
dict2.removeItem(e);
}
}
page.setResources(new PDResources(dict1));
});
}
//移除文字水印
public static boolean removeWatermark(File file) {
try {
//通过文件名加载文档
PDdocument document = Loader.loadPDF(file);
PDPageTree pages = document.getPages();
Iterator iter = pages.iterator();
while (iter.hasNext()) {
PDPage page = iter.next();
//去除文字水印
replaceText(page, "evaluation Only. Created with Aspose.Words. Copyright 2003-2021 Aspose Pty Ltd.", "");
replaceText(page, "Created with an evaluation copy of Aspose.Words. To discover the full versions of our APIs please", "");
replaceText(page, "visit: https://products.aspose.com/words/", "");
// replaceText(page, "Created with an evaluation copy of Aspose.Words. To discover the full", "");
// replaceText(page, "versions of our APIs please visit: https://products.aspose.com/words/", "");
// replaceText(page, "This document was truncated here because it was created in the evaluation", "");
//去除图片水印
removeImage(page, "X1");
}
document.removePage(document.getNumberOfPages() - 1);
file.delete();
document.save(file);
document.close();
return true;
} catch (IOException ex) {
ex.printStackTrace();
return false;
}
}
方法二:亲测好用
我们需要引入一个jar,2个dll文件
引入jar文件和dll文件 , jar文件的引入就不多说了,
关于dll文件,放在jdk文件下面的bin目录下
import com.jacob.activeX.ActiveXComponent;
import com.jacob.com.Dispatch;
import java.io.File;
public static void wordToPdfSecond(String wordFile,String pdfFile){
// 开始时间
long start = System.currentTimeMillis();
ActiveXComponent app = null;
try {
// 打开word
app = new ActiveXComponent("Word.Application");
// 设置word不可见,很多博客下面这里都写了这一句话,其实是没有必要的,因为默认就是不可见的,如果设置可见就是会打开一个word文档,对于转化为pdf明显是没有必要的
//app.setProperty("Visible", false);
// 获得word中所有打开的文档
Dispatch documents = app.getProperty("documents").toDispatch();
System.out.println("打开文件: " + wordFile);
// 打开文档
Dispatch document = Dispatch.call(documents, "Open", wordFile, false, true).toDispatch();
// 如果文件存在的话,不会覆盖,会直接报错,所以我们需要判断文件是否存在
File target = new File(pdfFile);
if (target.exists()) {
target.delete();
}
System.out.println("另存为: " + pdfFile);
// 另存为,将文档报错为pdf,其中word保存为pdf的格式宏的值是17
Dispatch.call(document, "SaveAs", pdfFile, 17);
// 关闭文档
Dispatch.call(document, "Close", false);
// 结束时间
long end = System.currentTimeMillis();
System.out.println("转换成功,用时:" + (end - start)/1000.0 + "ms");
} catch (Exception e) {
System.out.println("转换失败" + e.getMessage());
} finally {
// 关闭office
app.invoke("Quit", 0);
}
}
下面这个我没有亲自尝试,感兴趣可以试试
JAVA-Word转PDF各种版本实现方式–亲测有效
Java Word转为PDF/Html/图片(基于Spire.Cloud.SDK for Java)
Java在线预览(word转html)–强势推荐
上面这篇博客我试了下两大缺陷 它只能支持.doc,不认.docx 而且就算你把.docx重命名为.doc它也不认,它默认会把html文件生成在同一目录下,包括图片信息,文字内容等,所以不能随意切换目录,不然你打开html文件信息内容不全。
这块Word转Html方法分两个 一个转.doc 一个转.docx
org.apache.poi poi 3.14 org.apache.poi poi-scratchpad 3.14 org.apache.poi poi-ooxml 3.14 fr.opensagres.xdocreport xdocreport 1.0.6 org.apache.poi poi-ooxml-schemas 3.14 org.apache.poi ooxml-schemas 1.3 org.apache.directory.studio org.apache.commons.io 2.4
下面这个仅支持.doc文件转HTML
import org.apache.commons.io.FileUtils;
import org.apache.poi.hwpf.HWPFdocument;
import org.apache.poi.hwpf.converter.PicturesManager;
import org.apache.poi.hwpf.converter.WordToHtmlConverter;
import org.apache.poi.hwpf.usermodel.Picture;
import org.apache.poi.hwpf.usermodel.PictureType;
import org.w3c.dom.document;
import javax.xml.parsers.documentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import java.io.*;
import java.util.List;
public class Word2007ToHtml {
public static void wordToHtmlFile(String wordFilePath, String outHtmlFilePath, String htmlName) throws Throwable {
System.out.println("开始转换html");
long currentTimeMillisStart = System.currentTimeMillis();
InputStream input = new FileInputStream(wordFilePath);
HWPFdocument worddocument = new HWPFdocument(input);
// 实例化WordToHtmlConverter,为图片等资源文件做准备
WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(
documentBuilderFactory.newInstance().newdocumentBuilder()
.newdocument());
wordToHtmlConverter.setPicturesManager(new PicturesManager() {
@Override
public String savePicture(byte[] content, PictureType pictureType,
String suggestedName, float widthInches, float heightInches) {
return suggestedName;
}
});
wordToHtmlConverter.processdocument(worddocument);
// 处理图片,会在同目录下生成并保存图片
List pics = worddocument.getPicturesTable().getAllPictures();
if (pics != null) {
for (int i = 0; i < pics.size(); i++) {
Picture pic = (Picture) pics.get(i);
try {
pic.writeImageContent(new FileOutputStream(outHtmlFilePath
+ pic.suggestFullFileName()));
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
}
document htmldocument = wordToHtmlConverter.getdocument();
ByteArrayOutputStream outStream = new ByteArrayOutputStream();
DOMSource domSource = new DOMSource(htmldocument);
StreamResult streamResult = new StreamResult(outStream);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer serializer = tf.newTransformer();
serializer.setOutputProperty(OutputKeys.ENCODING, "utf-8");
serializer.setOutputProperty(OutputKeys.INDENT, "yes");
serializer.setOutputProperty(OutputKeys.METHOD, "html");
serializer.transform(domSource, streamResult);
outStream.close();
String content = new String(outStream.toByteArray());
FileUtils.writeStringToFile(new File(outHtmlFilePath, htmlName + ".html"), content, "utf-8");
long currentTimeMillisEnd = System.currentTimeMillis();
System.out.println("转换完成耗时:" + (currentTimeMillisEnd - currentTimeMillisStart) / 1000.0);
}
public static void main(String[] args) throws Throwable {
wordToHtmlFile("C:\Users\雷神\Desktop\设计文档模板\JAVA开发工程师新111.doc", "E:\pdf\html\22\", "abc");
}
}
跟上面一样要求只不过这个特点是:图片转为base64
import org.apache.poi.hwpf.HWPFdocument;
import org.apache.poi.hwpf.converter.PicturesManager;
import org.apache.poi.hwpf.converter.WordToHtmlConverter;
import org.apache.poi.hwpf.usermodel.PictureType;
import org.w3c.dom.document;
import javax.xml.parsers.documentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import java.io.*;
import java.util.base64;
public static void wordToHTMLDoc(String docxFilePath, String outputFilePath, String outputFileName) throws IOException, ParserConfigurationException, TransformerException {
// docx to html
// 1) 加载XWPFdocument及文件
InputStream stream = new FileInputStream(docxFilePath);
HWPFdocument worddocument = new HWPFdocument(stream);
WordToHtmlConverter converter = new WordToHtmlConverter(
documentBuilderFactory.newInstance().newdocumentBuilder().newdocument());
// 对HWPFdocument进行转换
converter.setPicturesManager(new PicturesManager() {
@Override
public String savePicture(byte[] content, PictureType pictureType, String suggestedName,
float widthInches, float heightInches) {
String type = pictureType.name();
final base64.Encoder encoder = base64.getEncoder();
return "data:image/" + type + ";base64," + new String(encoder.encodeToString(content));
}
});
converter.processdocument(worddocument);
document htmldocument = converter.getdocument();
ByteArrayOutputStream outStream = new ByteArrayOutputStream();
DOMSource domSource = new DOMSource(htmldocument);
StreamResult streamResult = new StreamResult(outStream);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer serializer = tf.newTransformer();
serializer.setOutputProperty(OutputKeys.ENCODING, "utf-8");// 编码格式
serializer.setOutputProperty(OutputKeys.INDENT, "yes");// 是否用空白分割
serializer.setOutputProperty(OutputKeys.METHOD, "html");// 输出类型
serializer.transform(domSource, streamResult);
outStream.close();
String templateContent = new String(outStream.toByteArray());
System.out.println(templateContent);
//将HTML文件内容写入文件中
FileOutputStream fileoutputstream = new FileOutputStream(outputFilePath + outputFileName + ".html");// 建立文件输出流
byte tag_bytes[] = templateContent.getBytes();
fileoutputstream.write(tag_bytes);
fileoutputstream.close();
}
下面这个仅仅支持.docx转html
import org.apache.poi.xwpf.converter.core.BasicURIResolver;
import org.apache.poi.xwpf.converter.core.FileImageExtractor;
import org.apache.poi.xwpf.converter.xhtml.XHTMLConverter;
import org.apache.poi.xwpf.converter.xhtml.XHTMLOptions;
import org.apache.poi.xwpf.usermodel.XWPFdocument;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.OutputStreamWriter;
public static void word2007ToHtml(String filepath, String targetFilePath, String targetFileName) throws Exception {
String imagePathStr = targetFilePath + "/image/";
OutputStreamWriter outputStreamWriter = null;
try {
XWPFdocument document = new XWPFdocument(new FileInputStream(filepath));
XHTMLOptions options = XHTMLOptions.create();
// 存放图片的文件夹
options.setExtractor(new FileImageExtractor(new File(imagePathStr)));
// html中图片的路径
options.URIResolver(new BasicURIResolver("image"));
outputStreamWriter = new OutputStreamWriter(new FileOutputStream(targetFilePath+targetFileName+".html"), "utf-8");
XHTMLConverter xhtmlConverter = (XHTMLConverter) XHTMLConverter.getInstance();
xhtmlConverter.convert(document, outputStreamWriter, options);
} finally {
if (outputStreamWriter != null) {
outputStreamWriter.close();
}
}
}
这个是图片转为base64 就不用上面还生成在本地了,比上面更好用 推荐使用。
在上面pom的基础上在增加一个依赖
org.jsoup jsoup 1.7.3
import org.apache.poi.xwpf.converter.xhtml.XHTMLConverter;
import org.apache.poi.xwpf.usermodel.XWPFdocument;
import org.apache.poi.xwpf.usermodel.XWPFPictureData;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.io.*;
import java.util.base64;
import java.util.List;
public static void wordToHTML(String docxFilePath,String outputFilePath,String outputFileName)throws IOException{
// docx to html
// 1) 加载XWPFdocument及文件
InputStream stream = new FileInputStream(docxFilePath);
XWPFdocument document = document = new XWPFdocument(stream);
List list = document.getAllPictures();
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
XHTMLConverter.getInstance().convert(document, outputStream, null);
String s = new String(outputStream.toByteArray());
org.jsoup.nodes.document doc = Jsoup.parse(s);
Elements elements = doc.getElementsByTag("img");
if (elements != null && elements.size() > 0 && list != null) {
for (Element element : elements) {
String src = element.attr("src");
for (XWPFPictureData data : list) {
if (src.contains(data.getFileName())) {
String type = src.substring(src.lastIndexOf(".") + 1);
final base64.Encoder encoder = base64.getEncoder();
String base64 = "data:image/" + type + ";base64," + new String(encoder.encodeToString(data.getData()));
element.attr("src", base64);
break;
}
}
}
}
document.close();
String templateContent = doc.toString();
//将HTML文件内容写入文件中
FileOutputStream fileoutputstream = new FileOutputStream(outputFilePath+outputFileName+".html");// 建立文件输出流
byte tag_bytes[] = templateContent.getBytes();
fileoutputstream.write(tag_bytes);
fileoutputstream.close();
}
下面这个也是只支持.docx 但是转成的效果样式有些不太友好,不知道智能换行,不推荐
还需要导入一个jar包
import com.aspose.words.document;
import com.aspose.words.License;
import com.aspose.words.SaveFormat;
import java.io.ByteArrayInputStream;
import java.io.File;
import java.io.FileOutputStream;
public static void word2HTML(String docPath, String savePath) {
try {
String s = "Aspose.Total for Java Aspose.Words for Java Enterprise 20991231 20991231 8bfe198c-7f0c-4ef8-8ff0-acc3237bf0d7 sNLLKGMUdF0r8O1kKilWAGdgfs2BvJb/2Xp8p5iuDVfZXmhppo+d0Ran1P9TKdjV4ABwAgKXxJ3jcQTqE/2IRfqwnPf8itN8aFZlV3TJPYeD3yWE7IT55Gz6EijUpC7aKeoohTb4w2fpox58wWoF3SNp6sK6jDfiAUGEHYJ9pjU= ";
ByteArrayInputStream is = new ByteArrayInputStream(s.getBytes());
License license = new License();
license.setLicense(is);
document document = new document(docPath);
document.save(new FileOutputStream(new File(savePath)), SaveFormat.HTML);
} catch (Exception e) {
e.printStackTrace();
}
}
下面两个网址好像转换好点,没试过,因为看要注册获取密钥,一月800,超过次数可能会出问题,就没有尝试
如何在 Java 中将 DOCX 转换为 HTML网址1
如何在 Java 中将 DOCX 转换为 HTML网址2
注意:亲测好用,默认转.doc 可自己改为.docx
相关jar包下载
import java.io.BufferedReader;
import java.io.ByteArrayInputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.IOException;
import org.apache.poi.poifs.filesystem.DirectoryEntry;
import org.apache.poi.poifs.filesystem.documentEntry;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;
public class Test {
public boolean writeWordFile(String filepath,String outPutWordPath,String newWordName) throws Exception {
boolean flag = false;
ByteArrayInputStream bais = null;
FileOutputStream fos = null;
String path = outPutWordPath;
try {
if (!"".equals(path)) {
File fileDir = new File(path);
if (fileDir.exists()) {
String content = readFile(filepath);
// content=new String(content.getBytes(), "UTF-8");
//System.out.println("content====="+content);
byte b[] = content.getBytes();
bais = new ByteArrayInputStream(b);
POIFSFileSystem poifs = new POIFSFileSystem();
DirectoryEntry directory = poifs.getRoot();
documentEntry documentEntry = directory.createdocument("Worddocument", bais);
//fos = new FileOutputStream(path +newWordName+ ".doc");
fos = new FileOutputStream(path +newWordName+ ".docx");
poifs.writeFilesystem(fos);
bais.close();
fos.close();
}
}
} catch (IOException e) {
e.printStackTrace();
} finally {
if(fos != null) fos.close();
if(bais != null) bais.close();
}
return flag;
}
public String readFile(String filename) throws Exception {
StringBuffer buffer = new StringBuffer("");
BufferedReader br = null;
try {
br = new BufferedReader(new FileReader(filename));
buffer = new StringBuffer();
while (br.ready())
buffer.append((char) br.read());
} catch (Exception e) {
e.printStackTrace();
} finally {
if(br!=null) br.close();
}
return buffer.toString();
}
public static void main(String[] args) throws Exception {
new Test().writeWordFile("D:\appinstall\weixin\file\JAVA开发工程师新11.html"
,"D:\appinstall\weixin\file\","lxd");
}
}
html内容转.html文件
这块核心思想就是动态组装html内容。
html内容转.html文件(无图片)
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
public static void createHtml(String templateContent, String htmlFile){
try{
//将HTML文件内容写入文件中
FileOutputStream fileoutputstream = new FileOutputStream(htmlFile);// 建立文件输出流
byte tag_bytes[] = templateContent.getBytes();
fileoutputstream.write(tag_bytes);
fileoutputstream.close();
}catch(Exception e){
e.printStackTrace();
}
}
html内容转.html文件(有图片)
需要动态获取图片,那源模板就得有图片路径的占位符,然后把占位符换成图片路径即可,没啥技术点。(你可以在编写html内容通过传参替换,也可以在上面方法基础上加个图片路径参数,把占位符内容替换)
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
public static void MakeHtml(String filePath,String imagePath,String disrPath,String fileName ){
try {
String img = " ";
// System.out.print("文件输入路径:n"+filePath);
String templateContent = "";
FileInputStream fileinputstream = new FileInputStream(filePath);// 读取模板文件
int lenght = fileinputstream.available();
byte bytes[] = new byte[lenght];
fileinputstream.read(bytes);
fileinputstream.close();
templateContent = new String(bytes);
System.out.println(templateContent);
templateContent = templateContent.replaceAll("###title###", img);
System.out.println("---------------开始(修改)替换--------------");
System.out.println(templateContent);
String fileame = fileName + ".html";
fileame = disrPath+ File.separator + fileame;// 生成的html文件保存路径。
FileOutputStream fileoutputstream = new FileOutputStream(fileame);// 建立文件输出流
System.out.println("文件输出路径:n"+fileame);
byte tag_bytes[] = templateContent.getBytes();
fileoutputstream.write(tag_bytes);
fileoutputstream.close();
} catch (Exception e) {
System.out.print(e.toString());
}
}
Html转Pdf
这个博客还行,主要HtmlUtils,提供变动方法,然后动态拼接,思想挺好可以学学。
设置中文字体,ttf文件夹下SimSum-01和Dengb ttf文件分别支持细字体和粗字体,缺一不可。
ttf字体下载地址:
com.itextpdf
html2pdf
2.0.2
java使用itext把含图片的html转为pdf
MAVEN 依赖
com.itextpdf itextpdf 5.4.2 org.xhtmlrenderer core-renderer R8
PDF技术(四)-Java实现Html转PDF文件
PDF转Html Java 将 XML 转为 Excel Excel转XML Excel转PDF Excel转图片 Excel转Word Word转图片还需要 aspose-words-15.8.0-jdk16.jar:
这个效率还是效果都不错 就是第一页照片有个水印,我想通过给word加个空白页来解决,没找到方法暂时未解决。
后来不用上面那个jar包,用这个jar包可去水印
import com.aspose.words.document;
import com.aspose.words.ImageSaveOptions;
import com.aspose.words.SaveFormat;
import java.io.File;
import java.io.FileOutputStream;
public static void doc2Img(String inPath, String outDir) {
try {
System.out.println(inPath + " -> " + outDir);
long old = System.currentTimeMillis();
// word文档
document doc = new document(inPath);
// 支持RTF HTML,Opendocument, PDF,EPUB, XPS转换
ImageSaveOptions options = new ImageSaveOptions(SaveFormat.PNG);
int pageCount = doc.getPageCount() ;
for (int i = 0; i < pageCount; i++) {
File file = new File(outDir + "/" + (i+1) + ".jpg");
FileOutputStream os = new FileOutputStream(file);
options.setPageIndex(i);
doc.save(os, options);
os.close();
}
long now = System.currentTimeMillis();
System.out.println("convert OK! " + ((now - old) / 1000.0) + "秒");
} catch (Exception e) {
e.printStackTrace();
}
}
使用Java读取PDF文件,并转成String类型字符串返回
java后台实现pdf下载导出
java 调用itext 把html转成pdf文档
HTML转PDF工具(wkhtmltopdf)介绍,支持widows和linux
java实现HTML页面转PDF亲测好用
java开发html转pdf示例(转载)
java将html的图文转化为PDF文件输出
前端和java后台将HTML转换成pdf
java使用itext把含图片的html转为pdf
itext html转pdf 图片,itext2.0.8 将 HTML 转换成 PDF, 完美 CSS, 带图片, 自动分页
PDF技术(四)-Java实现Html转PDF文件
java网页直接转成PDF(样式不缺失)
Java实现HTML(带图片)转PDF的解决方案
java:将html生成图片的所有方法比较
java 将html生成图片_java通过html生成pdf,支持css和图片以及横向打印
java将html内容生成pdf(无水印)
java导出生成word之XML方式
使用jpedal解析PDF到XML
Java与XML(PDF)
word内容制定替换
原图
效果图
这一块我就不详细说了,给个源码自己去玩吧。



