lxml.html examples of parsing from
- URLs
- Files
- Strings
URLs
import lxml.html htmltree = lxml.html.parse('http://joecodeswell.com') htmltree.xpath("//title")[0].text ''' OUTPUT: 'JoeCodeswell.com' '''
Files
N.B. Save ‘http://joecodeswell.com’ as a file named ‘JoeCodeswell.com.htm’.
Make sure to cd to the dir containing the file before running the following.
import lxml.html htmltree = lxml.html.parse('JoeCodeswell.com.htm') htmltree.xpath("//title")[0].text ''' OUTPUT: 'JoeCodeswell.com' '''
Strings
N.B. Save ‘http://joecodeswell.com’ as a file named ‘JoeCodeswell.com.htm’.
Make sure to cd to the dir containing the file before running the following.
import lxml.html f = open('JoeCodeswell.com.htm', 'r'); the_string = f.read(); f.close() htmltree = lxml.html.fromstring(the_string) htmltree.xpath("//title")[0].text ''' OUTPUT: 'JoeCodeswell.com' '''