这是您的程序“访问”或“连接”到网页的方式。
URL url; InputStream is = null; DataInputStream dis; String line; try { url = new URL("http://stackoverflow.com/"); is = url.openStream(); // throws an IOException dis = new DataInputStream(new BufferedInputStream(is)); while ((line = dis.readLine()) != null) { System.out.println(line); } } catch (MalformedURLException mue) { mue.printStackTrace(); } catch (IOException ioe) { ioe.printStackTrace(); } finally { try { is.close(); } catch (IOException ioe) { // nothing to see here } }这将下载html页面的源代码。
对于HTML解析看到这个
还看看jSpider和jsoup



