on Sat Jul 05 08:38:00 GMT 2008 in PHP and viewed 22398 times
If you run a content driven web site, even a blog, one problem you might find is getting content. A great free source of content is RSS Feeds. Instead of manually adding each item in the feed into the database, you’ll set up a script to parse the feed and automatically add each item into the database.
The idea here is that you run a website. It’s a good size, but you need to find content, right? Well, instead of writing content constantly, the other idea is to take content from RSS Feeds. It might not be the best source of original content, but at least it’s content. So what you’re going to learn here is how to parse the xml in an rss feed, and add it to your database for display later.
The structure of your actual database will be very basic. All you need is a table to hold a title, description, and link. You could possibly add a date, but at the moment, it’s not needed. So just run this MySQL create table command.
CREATE TABLE `stories` (
`id` int(11) NOT NULL auto_increment,
`title` varchar(255) default NULL,
`description` text,
`link` varchar(255) NOT NULL default '',
PRIMARY KEY (`id`)
) TYPE=MyISAM;
As I said, very basic.
This can get a bit complicated. PHP’s built in XML Parser is called expat, which uses SAX, or the Simple API for XML. The idea is that SAX will parse elements as they comes to it. The alternative for SAX is the DOM, or Document Object Module. The DOM takes the whole XML tree into memory and parses it there. It can be a bit easier to understand, but it could also be a huge load on your memory. So stick with SAX for now.
Here are the basics for the XML Parser.
$counter = 0;
$type = 0;
$tag = "";
$itemInfo = array();
$channelInfo = array();
$xmlParser = xml_parser_create();
xml_parser_set_option($xmlParser, XML_OPTION_CASE_FOLDING, TRUE);
xml_parser_set_option($xmlParser, XML_OPTION_SKIP_WHITE, TRUE);
xml_set_element_handler($xmlParser, "opening_element", "closing_element");
xml_set_character_data_handler($xmlParser, "c_data");
The first five variables are something which will come in handy later.
The idea is to create a parser in $xmlParser. Then set a couple options. XML_OPTION_CASE_FOLDING means that <item> is equal to <ITEM> is equal to <iTeM>, contrary to the specification, which says that all xml should be lowercase. But that’s okay. Then XML_OPTION_SKIP_WHITE just means that it’ll skip the white space.
The next function is xml_set_element_hander(). What it does is set the two functions to handle elements in the document: the opening element, and the closing element, which are represented by the second two arguments. Opening_element corresponds to a function you’ll define in a second, and same with closing_element.
The last deal is xml_set_character_data_handler. It’s along the same lines as the previous function. It’ll assigns a function to parse the actual data in the document, and that function will be “c_data”, which you’ll get to right now.
Well, now you need to start with the functions to parse the xml document. Yes, unfortunately, while SAX does take a load off the memory, the configuration can take a bit. But it goes pretty fast.
function opening_element($xmlParser, $name, $attribute){
global $tag, $type;
$tag = $name;
if($name == "CHANNEL"){
$type = 1;
}
else if($name == "ITEM"){
$type = 2;
}
}//end opening element
The function takes three arguments: the parser, which is a given, the name of the element, and any attributes. Now what you do here is declare tag and type a global so you can access them in and out of this function. Then you have to take $tag and make it equal to name. Just so the script knows what the element is outside of the script. The next part is where you get into the actual data. What it does here, is if the $name of the element is “CHANNEL” (remember to put all uppercase because of XML_OPTION_CASE_FOLDING), or if you’re still at the beginning of the feed, then the type will be 1. If the $name is “ITEM”, well, you’re inside an actual story in the feed, so the type will be 2.
function c_data($xmlParser, $data){
global $tag, $type, $channelInfo, $itemInfo, $counter;
$data = trim(htmlspecialchars($data));
if($tag == "TITLE" || $tag == "DESCRIPTION" || $tag == "LINK"){
if($type == 1){
$channelInfo[strtolower($tag)] .= $data;
}//end checking channel
else if($type == 2){
$itemInfo[$counter][strtolower($tag)] .= $data;
}//end checking for item
}//end checking tag
}//end cdata funct
Here is where you really get into the interesting parts of the feed, because this is where you’ll actually get the data from the document.
The function takes two arguments, the parser (again) and $data, which is the actual data (a bit obvious, but still need to sound like I’m explaining something).
Next you need to declare global the $tag (which is the name of the element you’re in right now, remember?), $type (which is either 1 for channel, or 2 for item), and then $channelInfo, $itemInfo, and $counter, which you’ll get to in a second.
The first thing to do is to take the data and escape any characters that could hurt the script with htmlspecialchars() and then trim any extra whitespace with trim().
Next the function checks to see if the element is either the title, the description or the link element. If so, then it checks to see if the $type equals 1 (or if the title/description/link is for the overall feed) and adds the information to the $channelInfo array. Strtolower just means string to lowercase, so you don’t have to remember to type the tag name in uppercase to get the data. And if the type is 2 (or it’s an item element), then the function adds the information to the $itemInfo array. What $counter does is keeps track of each item, otherwise the data would be appended to the data before, and there would be a lot of confusion. So the counter separates items. After the $counter is the tag’s lowercase name as another hash, and the data is added to that.
function closing_element($xmlParser, $name){
global $tag, $type, $counter;
$tag = "";
if($name == "ITEM"){
$type = 0;
$counter++;
}
else if($name == "CHANNEL"){
$type = 0;
}
}//end closing_element
Finally at the home stretch with the functions.
The closing_element function takes just two arguments: the parser, and the name of the element. And of course you need to declare your globals to get at them.
The first thing it does is clear the $tag variable to start again with another element. If the name of this closing element is ITEM, then make the type equal to 0, and increase the counter (for another item). If the name is actually CHANNEL, well make the type equal to 0 still.
You don’t want to move the $type = 0 outside the if loop. If the closing element is actually the end of the title or something, and you make the type 0, well, the item may still have the description and link to parse and it’ll get confused, thinking it’s outside the channel. (It won’t actually think it’s outside the channel, but for the sake of your functions, it might as well be.)
Well, that wasn’t too brutal. The functions are mostly pretty small. But it can take a couple reads to understand it all. But now you need to actually parse the feed.
$fp = file($_GET['rss']);
foreach($fp as $line){
if(!xml_parse($xmlParser, $line)){
die("Could not parse file.");
}
}
This isn’t too difficult. The file will be contained in $fp which gets the file from the file function, which in turn gets the feed from the GET variable you provide, meaning that you’ll tell the parser which feed to parse by directing your browser to http://yourwebsite.com/add.php?rss=http://rssfeed.location
After that, it takes each line of the feed and tries to parse it. If it can’t parse the line, the parser will tell you so.
$counter = 0;
$type = 0;
$tag = "";
$itemInfo = array();
$channelInfo = array();
function opening_element($xmlParser, $name, $attribute){
global $tag, $type;
$tag = $name;
if($name == "CHANNEL"){
$type = 1;
}
else if($name == "ITEM"){
$type = 2;
}
}//end opening element
function closing_element($xmlParser, $name){
global $tag, $type, $counter;
$tag = "";
if($name == "ITEM"){
$type = 0;
$counter++;
}
else if($name == "CHANNEL"){
$type = 0;
}
}//end closing_element
function c_data($xmlParser, $data){
global $tag, $type, $channelInfo, $itemInfo, $counter;
$data = trim(htmlspecialchars($data));
if($tag == "TITLE" || $tag == "DESCRIPTION" || $tag == "LINK"){
if($type == 1){
$channelInfo[strtolower($tag)] = $data;
}//end checking channel
else if($type == 2){
$itemInfo[$counter][strtolower($tag)] .= $data;
}//end checking for item
}//end checking tag
}//end cdata funct
$xmlParser = xml_parser_create();
xml_parser_set_option($xmlParser, XML_OPTION_CASE_FOLDING, TRUE);
xml_parser_set_option($xmlParser, XML_OPTION_SKIP_WHITE, TRUE);
xml_set_element_handler($xmlParser, "opening_element", "closing_element");
xml_set_character_data_handler($xmlParser, "c_data");
$fp = file($_GET['rss']);
foreach($fp as $line){
if(!xml_parse($xmlParser, $line)){
die("Could not parse file.");
}
}
That’s a pretty good amount of code. You now have a complete feed parser, ready to go. The next thing to do, is add the items to your database.
The first thing you’ll want to do is connect to your database. So add in your standard database connection:
$connection = mysql_connect("localhost",
"username",
"password");
mysql_select_db("database", $connection);
Next you need to cycle through each item in $items and add it to the database. So you can do that with a pretty simple foreach() loop.
foreach($itemInfo as $items){
$query = mysql_query("SELECT * FROM stories WHERE title = '".htmlentities($items['title'],
ENT_QUOTES)."'") or die(mysql_error());
$num = mysql_num_rows($query);
if($num > 0){
echo $items['title']." already exists!<br />";
}
else {
if (mysql_query("INSERT INTO stories VALUES('', '".htmlentities($items['title'],
ENT_QUOTES)."', '".htmlentities($items['description'], ENT_QUOTES)."',
'".htmlentities($items['link'],ENT_QUOTES)."')") or die(mysql_error())){
echo $items['title']." was added!<br />";
}
}
}
It looks a bit obfuscated, but it’s pretty easy.
The loop takes each individual array in $itemInfo (which was put in as $counter) and puts it as $items. The first thing you need to do is to check if there already is a story with the same name. So you can do that with a pretty easy title check. The function htmlentities(string, ENT_QUOTES) will strip any harmful characters to the database, and the argument ENT_QUOTES tells it to escape both single and double quotes. The number of rows is checked and put into $num. If $num is greater than 0 (meaning that the title already exists) then it should tell you that the feed has already been added. Otherwiser, you need to insert into the table, the values of each item. Each item is accessed with $items[‘tagname’]. Then the insert should tell you the title of the item just added. And now you should be done.
<?php
$connection = mysql_connect("localhost",
"username",
"password");
mysql_select_db("database", $connection);
$counter = 0;
$type = 0;
$tag = "";
$itemInfo = array();
$channelInfo = array();
function opening_element($xmlParser, $name, $attribute){
global $tag, $type;
$tag = $name;
if($name == "CHANNEL"){
$type = 1;
}
else if($name == "ITEM"){
$type = 2;
}
}//end opening element
function closing_element($xmlParser, $name){
global $tag, $type, $counter;
$tag = "";
if($name == "ITEM"){
$type = 0;
$counter++;
}
else if($name == "CHANNEL"){
$type = 0;
}
}//end closing_element
function c_data($xmlParser, $data){
global $tag, $type, $channelInfo, $itemInfo, $counter;
$data = trim(htmlspecialchars($data));
if($tag == "TITLE" || $tag == "DESCRIPTION" || $tag == "LINK"){
if($type == 1){
$channelInfo[strtolower($tag)] = $data;
}//end checking channel
else if($type == 2){
$itemInfo[$counter][strtolower($tag)] .= $data;
}//end checking for item
}//end checking tag
}//end cdata funct
$xmlParser = xml_parser_create();
xml_parser_set_option($xmlParser, XML_OPTION_CASE_FOLDING, TRUE);
xml_parser_set_option($xmlParser, XML_OPTION_SKIP_WHITE, TRUE);
xml_set_element_handler($xmlParser, "opening_element", "closing_element");
xml_set_character_data_handler($xmlParser, "c_data");
$fp = file($_GET['rss']);
foreach($fp as $line){
if(!xml_parse($xmlParser, $line)){
die("Could not parse file.");
}
}
foreach($itemInfo as $items){
$query = mysql_query("SELECT * FROM stories WHERE title = '".htmlentities($items['title'],
ENT_QUOTES)."'") or die(mysql_error());
$num = mysql_num_rows($query);
if($num > 0){
echo $items['title']." already exists!<br />";
}
else {
if (mysql_query("INSERT INTO stories VALUES('', '".htmlentities($items['title'],
ENT_QUOTES)."', '".htmlentities($items['description'], ENT_QUOTES)."',
'".htmlentities($items['link'],ENT_QUOTES)."')") or die(mysql_error())){
echo $items['title']." was added!<br />";
}
}
}
?>
That’s a pretty gigantic code base, eh? But if it’ll generate money and traffic for you, it’s a small price to pay I guess.
Just keeping the stories in the database isn’t enough, right? So all you need is a simple script to display them.
This can be pretty basic, so feel free to skip this, but if not…
<?php
$connection = mysql_connect("localhost",
"username",
"password");
mysql_select_db("database", $connection);
$select = mysql_query("SELECT * FROM stories ORDER BY id DESC LIMIT 10") or die(mysql_error());
while($array = mysql_fetch_array($select)){
extract($array);
echo "<h2>".$title."</h2>";
echo "<p>".$description."</p>";
echo "<a href='".$link."'>Read More...</a><br /><br />";
}
?>
First things first, connect to the database. Then select every field from stories, and only take ten stories. Then loop through each story and output the title, description, and a little link to read more.
Here’s the complimentary example
What you’re going to want to do is take everything after the rss= and add a different rss feed to see it really work.
Well, there you have it. Your very own content building machine. Watch out though, if all your content is just from feeds, the chances that you’ll develop a big content driven website is probably pretty slim. Develop a healthy balance with lots of unique content and just a bit of feed content.
Some ideas for expansion could include:
So as always, go test this out and prosper with your content.
Very nice, thanks for this!!
by Jet