Parsing text

This is a discussion on "Parsing text" within the PHP Forum section. This forum, and the thread "Parsing text are both part of the Program Your Website category.


 Subscribe in a reader

Go Back   Webforumz.com > Main Forums > Program Your Website > PHP Forum

Notices




Reply
 
LinkBack Thread Tools
  #1  
Old May 24th, 2006, 12:40
Reputable Member
Join Date: Mar 2005
Location: Margaritaville (a state of mind somewhere between Inebriation and San Diego), CA
Posts: 245
Thanks: 6
Thanked 0 Times in 0 Posts
Parsing text

I'm just learning php and need to parse some large text files. I was going to use Perl for this, but I read somewhere that "php is the new Perl"... I'm guessing that's an overstatement, but it got me to thinking that I may as well do this (if possible) in php so I can learn from it.

The text file is from a database that I need to replicate. I don't have admin access to it, so I'm using print-to-text-file output of a report. The text file looks something like this...
Code: Select all
 
ID_101027   Roger
ID_101028   Michael
ID_101029   Julie
- - more records - -
- - garbage (report header/footer) lines - -
ID_101027   Smith
ID_101028   Robertson
ID_101029   Williams
- - more records - -
- - garbage (report header/footer) lines - -
ID_101027   1010 Mulberry Drive
ID_101028   772 Waverly Lane
ID_101029   3601 Hamilton Way
- - more records - -
- - garbage (report header/footer) lines - -
(you get the idea)
What I need to do is...
1. Find the (first instance of the first) identifier
2. Write it (and the associated field value) to the first two elements of a multidimensional array
3. Find the next line containing the identifier
4. Split the line into the identifier and the field value
5. Write the field value to an array element
6. When the identifier is not found, restart the loop from the next identifier
7. Use an insert query to add the values from the array to the new database.

Gotchas: Some of the fields (e.g. Notes) will be multiple line fields. These may be interrupted by the aforementioned garbage lines. (Seems like the way to get around this is to get rid of the garbage lines first.) Also, some fields may contain null values.

Questions:
1. Is php well-suited (or well enough) for doing line-by-line text parsing?
2. What functions should I be looking into in order to accomplish this?
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote

  #2  
Old May 24th, 2006, 21:35
Most Reputable Member
Join Date: Apr 2006
Location: Cornwall, UK
Posts: 1,310
Thanks: 0
Thanked 0 Times in 0 Posts
Re: Parsing text

Both Perl and PHP and others for that matter are suited to your purpose.

You need to be looking at what are known as Regular Expressions. Make a large pot of coffee because you are going to have to do some serious reading and understanding.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
  #3  
Old May 25th, 2006, 13:14
Reputable Member
Join Date: Mar 2005
Location: Margaritaville (a state of mind somewhere between Inebriation and San Diego), CA
Posts: 245
Thanks: 6
Thanked 0 Times in 0 Posts
Re: Parsing text

Thanks for the reply, ukgeoff!

I'm familiar with regular expresions and have used them before. The part that I find most frustrating is that they always seem to want to select much more than I intend. How greedy are they in php? Do they stop at line breaks in php? (And/or can you tell them how greedy to be/where to stop?)

Beyond regular expressions, I was wondering about what functions to use in order to split lines. Also, what would I use to (for example) select from line 77, column 12 to the end of line 80?
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
  #4  
Old May 25th, 2006, 14:14
Most Reputable Member
Join Date: Apr 2006
Location: Cornwall, UK
Posts: 1,310
Thanks: 0
Thanked 0 Times in 0 Posts
Re: Parsing text

You can specify if their to be greedy or not.

I recommend a regex tool both as a training tool, it comes with great documentation, and as a build and test tool.

RegexBuddy: Learn, Create, Understand, Test, Use and Save Regular Expression

With regard to the second idea. You would probably have to read the bytes into a variable looking for the EOL markers. But I haven't really thought that one through.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
  #5  
Old May 26th, 2006, 19:55
New Member
Join Date: Aug 2005
Location: Scotland, United Kingdom
Age: 21
Posts: 8
Thanks: 0
Thanked 0 Times in 0 Posts
Re: Parsing text

About splitting the lines up, i think you could use php's explode to split up the lines and then foreach to go through the array value (would be each line in this case).

I didnt look into it, but i think im along the right lines here.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
  #6  
Old May 31st, 2006, 23:03
New Member
Join Date: May 2006
Location: Utah
Posts: 5
Thanks: 0
Thanked 0 Times in 0 Posts
Re: Parsing text

Is there any way to tell that the data for an ID is a first name, last name, address, etc? (maybe that's what the garbage does some how?) If not, are the first names always first, then the last names, and then the addresses?

Look at:
preg_match
preg_replace
explode
stristr
str_replace

Last edited by agent-j; May 31st, 2006 at 23:06.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
  #7  
Old Jun 1st, 2006, 07:25
Most Reputable Member
Join Date: Apr 2006
Location: Cornwall, UK
Posts: 1,310
Thanks: 0
Thanked 0 Times in 0 Posts
Re: Parsing text

The data in the fields of the database come out in whatever order you determine by the construct of your query.

If this is an area you are interested in, then you need to do some basic reading up on databases and SQL.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
Reply

Tags
parsing, text

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Similar Threads
Thread Thread Starter Forum Replies Last Post
xml parsing ktsirig PHP Forum 1 Apr 12th, 2008 16:05
making text field text disapear Phixon JavaScript Forum 4 Feb 2nd, 2008 07:49
Catch XML parsing Exceptions. alexgeek PHP Forum 0 Jan 5th, 2008 10:53
XML Parsing Error: Opening ending tag mismatch bobby198010 Web Page Design 11 Oct 20th, 2007 09:07
Cross-browser XML parsing??? gohankid77 Other Programming Languages 4 Mar 28th, 2005 17:39


All times are GMT. The time now is 19:29.


Powered by vBulletin®
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0 RC8
© 2003-2008 Webforumz.com : All Rights Reserved