Grabbing the first paragraph

This is a discussion on "Grabbing the first paragraph" within the PHP Forum section. This forum, and the thread "Grabbing the first paragraph are both part of the Program Your Website category.


 Subscribe in a reader

Go Back   Webforumz.com > Main Forums > Program Your Website > PHP Forum

Notices




Reply
 
LinkBack Thread Tools
  #1  
Old Mar 16th, 2007, 01:10
Elite Veteran
Join Date: Jan 2007
Location: You know where
Age: 31
Posts: 4,617
Thanks: 0
Thanked 0 Times in 0 Posts
Grabbing the first paragraph

Ok ... a while back a friend of mine gave me this script that would grab the text in <p>...</p> from an article and display it. For some reason ... I just noticed that it display the 2 paragraph instead of the first one like I want it to.

I have no idea how to fix this ... I suck at regular expressions and things like that in php. Anyways ... here's the code:

PHP: Select all

function paraSummary($subject) {
$matches = array();
preg_match("/(<p>[^<]+<\/p>)/"$subject$matches);
return 
$matches[1];

I tried changing the return $matches[1]; to return $matches[0]; but that didn't work.

Normally I would call the functions like this
PHP: Select all

<?=paraSummary($latestblogs)?>

where $latestblog is taken from my db.

Any help would be great!
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote

  #2  
Old Mar 18th, 2007, 01:36
Elite Veteran
Join Date: Jan 2007
Location: You know where
Age: 31
Posts: 4,617
Thanks: 0
Thanked 0 Times in 0 Posts
Re: Grabbing the first paragraph

Just a little bump

I really need help with this
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
  #3  
Old Mar 18th, 2007, 04:57
Reputable Member
Join Date: Jul 2005
Location: Melksham, Wilts, UK
Posts: 293
Thanks: 0
Thanked 0 Times in 0 Posts
Re: Grabbing the first paragraph

I tried

Code: Select all
<?php

$latestblogs = <<<HERE
<p>The price of cheese</p>
<p>The Prince of Cheshire</p>
<p>The Polish Child</p>
HERE;

function paraSummary($subject) {
$matches = array();
preg_match("/(<p>[^<]+<\/p>)/", $subject, $matches);
return $matches[1];
}
?>

And the Winner is ... <?=paraSummary($latestblogs)?>
And it printed out

Code: Select all
And the Winner is ... <p>The price of cheese</p>
which is what I would have expected - the first and not the second paragraph. Have you checked your data? Are you sure that your first paragraph is the same as the second one - or does it have a div= class= or id= attribute, or something else that makes it different and causes it to fail to match? Spurious spaces? Capital Ps?
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
  #4  
Old Mar 18th, 2007, 12:45
Elite Veteran
Join Date: Jan 2007
Location: You know where
Age: 31
Posts: 4,617
Thanks: 0
Thanked 0 Times in 0 Posts
Re: Grabbing the first paragraph

Thank you so much Graham for looking into this.

Here's what I have in my db ...
Code: Select all
<p>Many people tend to forget that the body \"tag\" is also an element and can be used to style your site. In <a href=\"http://www.karinne.net/archive/?bid=5\">How to center your site horizontally</a> I showed you how to take a wrap your site in that div to center it. Now ... I\'m going to use the body element to do the same. Of course, you use this option when your whole site is of a certain width.</p>

<p>Again, the code is pretty simple: margin: 0 auto; and a width: 780px (which should be changed to accommodate the width you have set-up for you site) but this time ... in the body.</p>


<pre>&lt;!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" 
\"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\"&gt;
&lt;html xmlns=\"http://www.w3.org/1999/xhtml\" lang=\"en-CA\" xml:lang=\"EN-CA\"&gt;

&lt;head&gt;
    &lt;title&gt;Some title&lt;/title&gt;
    &lt;meta http-equiv=\"Content-Type\" content=\"text/html; charset=iso-8859-1\" /&gt;
    
    &lt;style type=\"text/css\"&gt;
    html {
        background-color: #fff;
    }
    
    body {
        color: #fff;
        margin: 0 auto;
        width: 780px;
        background-color: #900;
    }
    &lt;/style&gt;    
&lt;/head&gt;

&lt;body&gt;

this site is centered

&lt;/body&gt;
&lt;/html&gt;</pre>

<p><strong>Tested in:</strong> Firefox 2.0, Opera 9, IE6</p>
Then I do a $latestblogs = stripslashes($row_blog_for_category['blog']); to get rid of the slashes and then <?=paraSummary($latestblogs)?> .

So ... there's not extra stuff in there that I can see?

The first paragraph that should be spitting out is

Code: Select all
<p>Many people tend to forget that the body \"tag\" is also an element and can be used to style your site. In <a href=\"http://www.karinne.net/archive/?bid=5\">How to center your site horizontally</a> I showed you how to take a wrap your site in that div to center it. Now ... I\'m going to use the body element to do the same. Of course, you use this option when your whole site is of a certain width.</p>
but it's showing

Code: Select all
<p>Again, the code is pretty simple: margin: 0 auto; and a width: 780px (which should be changed to accommodate the width you have set-up for you site) but this time ... in the body.</p>
help
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
  #5  
Old Mar 18th, 2007, 17:16
Reputable Member
Join Date: Jul 2005
Location: Melksham, Wilts, UK
Posts: 293
Thanks: 0
Thanked 0 Times in 0 Posts
Re: Grabbing the first paragraph

But there's *something* that's triggering a failure to match on the first line. Have you tried to print out $latestblogs and see what you have in there? Putting it through htmlspecialchars() will let you see all the tags.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
  #6  
Old Mar 18th, 2007, 17:18
Reputable Member
Join Date: Jul 2005
Location: Melksham, Wilts, UK
Posts: 293
Thanks: 0
Thanked 0 Times in 0 Posts
Re: Grabbing the first paragraph

GOT IT ....

It's the extra "href" tag within the first paragraph.

You're looking for <p> then characters which are NOT < then </p> .... but in the case of the first paragraph you have an extra <a href= .... in there, so it fails to match.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
  #7  
Old Mar 18th, 2007, 17:41
Elite Veteran
Join Date: Jan 2007
Location: You know where
Age: 31
Posts: 4,617
Thanks: 0
Thanked 0 Times in 0 Posts
Re: Grabbing the first paragraph

Ok ... so ... how can I make this work regardless of what's between the <p> ... </p>?
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
  #8  
Old Mar 18th, 2007, 19:45
Reputable Member
Join Date: Jul 2005
Location: Melksham, Wilts, UK
Posts: 293
Thanks: 0
Thanked 0 Times in 0 Posts
Re: Grabbing the first paragraph

If you want to match *regardless* of intermediate tags, try

preg_match("/(<p>.*?<\/p>)/s", $subject, $matches);

(any number of any character, but as few as possible - a sparse match. And note the extra "s" after the second slash. Forces the "." to match against new line characters too in case the <p> and </p> are on different lines.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
  #9  
Old Mar 18th, 2007, 21:18
Elite Veteran
Join Date: Jan 2007
Location: You know where
Age: 31
Posts: 4,617
Thanks: 0
Thanked 0 Times in 0 Posts
Re: Grabbing the first paragraph

AH!!!! thank you thank you thank you thank you thank you!!!!!!!!!!!!!!!!!

*smooches*
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
  #10  
Old Mar 18th, 2007, 23:43
Junior Member
Join Date: Jan 2006
Posts: 20
Thanks: 0
Thanked 0 Times in 0 Posts
Re: Grabbing the first paragraph

On a slight tangent, however if you want a generic solution that avoids issues with low-level changes to code and so on, you could do worse than learn to use Perl and the LWP. Tokenise the HTML with something like html tokeparser or even split it into a tree and you can develop some very robust solutions with minimal code to do all sorts of data extraction tasks.

Cheers

Dan
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
  #11  
Old Mar 19th, 2007, 10:56
Elite Veteran
Join Date: Jan 2007
Location: You know where
Age: 31
Posts: 4,617
Thanks: 0
Thanked 0 Times in 0 Posts
Re: Grabbing the first paragraph

What?!?! Was that whole paragraph even written in English?! I didn't understand a word of what you just wrote in that post!

The code works ... and works just the way I want it.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
  #12  
Old Mar 19th, 2007, 11:07
Junior Member
Join Date: Jan 2006
Posts: 20
Thanks: 0
Thanked 0 Times in 0 Posts
Re: Grabbing the first paragraph

Lol

It was a bit of a tangent. If your script works fine then obviously no reason at all to change it - just added this for anyone else who is looking to do more intensive parsing of HTML at any stage and comes across this thread

Dan
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
  #13  
Old Mar 19th, 2007, 20:58
Ryan Fait's Avatar
Elite Veteran
Join Date: May 2006
Location: Las Vegas
Posts: 3,787
Thanks: 0
Thanked 0 Times in 0 Posts
Re: Grabbing the first paragraph

That sounds very interesting. Any links to resources on the subject?
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
Reply

Tags
php, regular expressions, summary

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Similar Threads
Thread Thread Starter Forum Replies Last Post
Help with IFRAME and grabbing URL tbathgate JavaScript Forum 6 Feb 28th, 2008 10:17
grabbing still images from wmv for PHP? Raynia PHP Forum 3 Sep 21st, 2007 06:24
Paragraph formatting when importing text from TXT or XML McAurthur19 Flash & Multimedia Forum 0 Mar 20th, 2007 19:12
Screen-grabbing from a DVD James-Clarke Graphics and 3D 18 Jan 31st, 2007 15:14
Paragraph & menu showing wrong in FireFox cbrams9 Web Page Design 9 Oct 25th, 2006 14:19


All times are GMT. The time now is 15:24.


Powered by vBulletin®
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0 RC8
© 2003-2008 Webforumz.com : All Rights Reserved

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42