You are here:  » Dropping records if word or phrase is not present.


Dropping records if word or phrase is not present.

Submitted by bwhelan1 on Sat, 2007-02-17 21:30 in

David,

To drop records if a word or phrase is not present in a particular field?

Will the filter shown here: http://www.pricetapestry.com/node/167 work for dropping records if a given field contains a specific word or phrase or does it look for the entire contents of that field to match exactly?

Regards,
Bill Whelan
Computer Store

Submitted by support on Sun, 2007-02-18 16:55

Hi Bill,

As it stands, that version will only compare the entire field. Try this version for substring compare:

<?php
  
/*************************************************/
  /* dropRecordIfNot                               */
  /*************************************************/
  
$filter_names["dropRecordIfNot"] = "Drop Record (If Not)";
  function 
filter_dropRecordIfNotConfigure($filter_data)
  {
    print 
"Drop record if field DOES NOT contain text:<br />";
    print 
"<input type='text' size='40' name='text' value='".widget_safe($filter_data["text"])."' />";
    
widget_errorGet("text");
  }
  function 
filter_dropRecordIfNotValidate($filter_data)
  {
  }
  function 
filter_dropRecordIfNotExec($filter_data,$text)
  {
    global 
$filter_dropRecordFlag;
    if(
$filter_dropRecordFlag)
    {
      return 
$text;
    }
    else
    {
      
$filter_dropRecordFlag = (strpos($text,$filter_data["text"])!==FALSE);
    }
    return 
$text;
  }
?>

Cheers,
David.

Submitted by bwhelan1 on Mon, 2007-02-19 01:33

Thanks for the quick reply.

I get the following error when applying the above filter to the name field:

Fatal error: Call to undefined function: str_pos() in /home/site/public_html/includes/filter.php on line 302

Bill Whelan

Computer Store

Submitted by support on Mon, 2007-02-19 05:58

Sorry, Bill - it's strpos() not str_pos(). I've corrected the code above...

Cheers,
David.

Submitted by bwhelan1 on Tue, 2007-02-20 21:15

David,

I have added the code to filter.php and it appears that was done properly as I no longer get any errors when importing. However, it did not filter any records. I have it searching for the word mustang in the name field. I was expecting to end up with nothing but products with the word mustang in the name but that did not happen. Any ideas? Am I applying the filter properly?

Bill Whelan

Computer Store

Submitted by support on Tue, 2007-02-20 21:23

Hi Bill,

I think the code is the wrong way round. To drop records if "mustang" is not in the field (where "mustang" is specified in the match box) you would need this code:

      $filter_dropRecordFlag = (strpos($text,$filter_data["text"])===FALSE);

As it stands, it should have been dropping records with Mustang - but you said it was not filtering anything so i'm not 100% sure about this - but try the above modification and see if this works as required...

Cheers,
David.

Submitted by bwhelan1 on Tue, 2007-02-20 22:18

David,

This seems to be looking for the exact value of mustang. I filters every record now.

Is there a wildcard that can be used to overcome this?

Bill Whelan

Computer Store

Submitted by support on Wed, 2007-02-21 09:06

Hi Bill,

Do you mean that it's a case issue - so you need it to permit "Mustang" but not "mustang"? Can you give a couple of examples of product names, and the filter values you are using...

Cheers,
David.

Submitted by bwhelan1 on Wed, 2007-02-21 22:03

I am using lower case. Hopefully it is not case sensitive.

An example of a record that should be added to the db would be a line that contained the following in the name field:

Xenon Ford Mustang 2005 Complete Body Kit

I want to drop anything that does not contain the word "mustang" in either the name or description field, depending on which field I choose to use for the filter.

Bill Whelan

Computer Store

Submitted by support on Thu, 2007-02-22 12:15

Hi Bill,

As it stands, it will be case sensitive. PHP5 has a non-case sensitive version called stripos(), but if we would need to strtolower() each parameter before performing the test. The complete filter, with the corrected statement from the previous post should now look as follows:

<?php
  
/*************************************************/
  /* dropRecordIfNot                               */
  /*************************************************/
  
$filter_names["dropRecordIfNot"] = "Drop Record (If Not)";
  function 
filter_dropRecordIfNotConfigure($filter_data)
  {
    print 
"Drop record if field DOES NOT contain text:<br />";
    print 
"<input type='text' size='40' name='text' value='".widget_safe($filter_data["text"])."' />";
    
widget_errorGet("text");
  }
  function 
filter_dropRecordIfNotValidate($filter_data)
  {
  }
  function 
filter_dropRecordIfNotExec($filter_data,$text)
  {
    global 
$filter_dropRecordFlag;
    if(
$filter_dropRecordFlag)
    {
      return 
$text;
    }
    else
    {
      
$filter_dropRecordFlag = (strpos(strtolower($text),strtolower($filter_data["text"]))===FALSE);
    }
    return 
$text;
  }
?>

Cheers,
David.

Submitted by bwhelan1 on Thu, 2007-02-22 17:43

David,

That worked perfectly. Thanks for all your efforts.

Bill Whelan

Computer Store

Submitted by bwhelan1 on Wed, 2007-03-28 19:04

Firts of all, thanks for the all the great support with your script, amazing.

Now for my question: Drop If Not works great for one word or phrase but what if I wanted to drop all records that did not match two or more words or phrases? Would multiple instances of this filter work or would the application of the first filter drop everyting not matching it and leave nothing for subsequent filters to compare against?

Bill Whelan

Computer Store

Submitted by support on Thu, 2007-03-29 07:47

Hi Bill,

As it stands with the code above, you are correct - you would not be able to have multiple instances of this filter because if the first one decides to drop the record subsequent instances won't make any difference.

However, if you modify the filter so that it doesn't look to see if the drop record flag has already been set; it should then behave as you want - because a subsequent call would "unset" the flag if the word exists. Here's the slightly modified version:

<?php
  
/*************************************************/
  /* dropRecordIfNot                               */
  /*************************************************/
  
$filter_names["dropRecordIfNot"] = "Drop Record (If Not)";
  function 
filter_dropRecordIfNotConfigure($filter_data)
  {
    print 
"Drop record if field DOES NOT contain text:<br />";
    print 
"<input type='text' size='40' name='text' value='".widget_safe($filter_data["text"])."' />";
    
widget_errorGet("text");
  }
  function 
filter_dropRecordIfNotValidate($filter_data)
  {
  }
  function 
filter_dropRecordIfNotExec($filter_data,$text)
  {
    global 
$filter_dropRecordFlag;
    
$filter_dropRecordFlag = (strpos(strtolower($text),strtolower($filter_data["text"]))===FALSE);
    return 
$text;
  }
?>

Hope this helps!
Cheers,
David.

Submitted by bwhelan1 on Sun, 2007-04-08 20:45

David,

Will this non-case-sensitive as well as the previous?

Bill Whelan

Computer Store

Submitted by support on Mon, 2007-04-09 05:21

Hi Bill,

Yes - it strtolower()'s both strings before the test, so it should be non-case-sensitive.

Cheers,
David.

Submitted by bwhelan1 on Wed, 2007-05-02 22:56

David,

This filter worked fine for a while but now is filtering everything out.

The URL is: {saved}

The URL to an exact copy of filter.php for this site is: {saved}

I have 2 "Drop If Not's", one for "nba" and the other for "basketball" applied to the "advertiser category" column.

The only thing I know has changed is that the hosting company upgraded to MySQL 5.

Bill Whelan

Computer Store

Submitted by bwhelan1 on Thu, 2007-05-03 01:14

Regarding previous post, I zipped the .php file and the url is: {saved}

Bill Whelan

Computer Store

Submitted by support on Thu, 2007-05-03 09:23

Hi Bill,

The database shouldn't make any difference - it doesn't come into until the filters have been applied. However, I've just studied the XML, and couldn't see any records with advertisercategory "Basketball", which would explain why all records are dropped. Could the merchant have changed their categorisation recently?

Cheers,
David.

Submitted by bwhelan1 on Thu, 2007-05-10 18:29

David,

I have a feed where multiple "drop if not's" are being applied but only the last one is being applied when imported.

Bill Whelan

Computer Store

Submitted by support on Fri, 2007-05-11 07:42

Hi Bill,

The last version of this filter above removed the check to see if the flag was already set - because of a different issue. If you revert to the version previous to that (which includes the modifications to make it case insensitive), it will behave as expected...

Cheers,
David.

Submitted by bwhelan1 on Wed, 2007-05-23 20:50

David,

I have the need to do any of two things with the "Drop If Not" filter.

1) Drop the record if a given field does not contain a word or phrase without regard to case.

2) To be able to run more than one instance of the above mentioned filter on the same feed.

Which one should I be using? The environment is PHP 5.2.1 and MySQL is showing up as 4.0.27-standard. I recently switched to a dedicated server and this account was restored from a backup. I've installed this script for other domains on this server and the MySQL version on those shows as "5.0.24-standard". Do you see any reason why I would be having problems with the filters?

When I try to apply any of the filters I get the following:

# /usr/bin/php /home/sitename/public_html/scripts/import.php 10273828-Auto_Parts_Warehouse.xml
X-Powered-By: PHP/5.2.1
Content-type: text/html
importing 10273828-Auto_Parts_Warehouse.xml...[0/35737]<br />
<b>Warning</b>: mysql_fetch_array(): supplied argument is not a valid MySQL result resource in <b>/home/sitename/public_html/includes/database.php</b> on line <b>21</b><br />
<br />
<b>Warning</b>: mysql_num_rows(): supplied argument is not a valid MySQL result resource in <b>/home/sitename/public_html/includes/database.php</b> on line <b>26</b><br />
importing 10273828-Auto_Parts_Warehouse.xml...[done]
backfilling reviews... <br />
<b>Warning</b>: mysql_fetch_array(): supplied argument is not a valid MySQL result resource in <b>/home/sitename/public_html/includes/database.php</b> on line <b>21</b><br />
<br />
<b>Warning</b>: mysql_num_rows(): supplied argument is not a valid MySQL result resource in <b>/home/sitename/public_html/includes/database.php</b> on line <b>26</b><br />
backfilling reviews...[done]

With the exception of the first code example on this page, in which case it imports all products and filters nothing.

Bill Whelan

Computer Store

Submitted by support on Thu, 2007-05-24 13:01

Hi Bill,

This is the version you need to be able to use the filter against multiple fields:

<?php
  
/*************************************************/
  /* dropRecordIfNot                               */
  /*************************************************/
  
$filter_names["dropRecordIfNot"] = "Drop Record (If Not)";
  function 
filter_dropRecordIfNotConfigure($filter_data)
  {
    print 
"Drop record if field DOES NOT contain text:<br />";
    print 
"<input type='text' size='40' name='text' value='".widget_safe($filter_data["text"])."' />";
    
widget_errorGet("text");
  }
  function 
filter_dropRecordIfNotValidate($filter_data)
  {
  }
  function 
filter_dropRecordIfNotExec($filter_data,$text)
  {
    global 
$filter_dropRecordFlag;
    if(
$filter_dropRecordFlag)
    {
      return 
$text;
    }
    else
    {
      
$filter_dropRecordFlag = (strpos(strtolower($text),strtolower($filter_data["text"]))===FALSE);
    }
    return 
$text;
  }
?>

The key thing is the check for the flag already being set before doing the comparison. This means that if any of the tests pass, the record will be dropped.

I'm not sure why you should be getting database errors as a result of incorporating a filter. If this is still occuring, look for the following code in the database_querySelect() function of includes/database.php (line 17 in the file)...

    $result = mysql_query($sql,$link);

.. then add the following code on the next line:

    if (!$result)
    {
      print mysql_error();
      print "<br>";
      print $sql;
      exit();
    }

That should display the MySQL error message and the SQL that caused the problem so that we can see what is happening. If you could post the output that you get using this modification I can see where in the code the filter is causing a problem...

Cheers,
David.

Submitted by bwhelan1 on Fri, 2007-05-25 02:59

David,

Using the above specified filter and with the code added into database.php I get this when trying to import the feed:

MySQL server has gone away
SELECT COUNT(*) AS productCount FROM `products` WHERE merchant='Auto Parts Warehouse'

When I remove the "Drop If Not" filter, the feed imports fine.

Bill Whelan

Computer Store

Submitted by support on Sat, 2007-05-26 15:25

Hi Bill,

I've just been reading up on the reasons for "MySQL server has gone away" and none really make sense in this situation. Is this the feed with several hundred thousand products? Can you try adding a Search and Replace filter (search for something random) and see if the import works with another filter type added - we need to establish whether it is any filter, or just the Drop Record (If Not) filter that is causing the problem...

Cheers,
David.

Submitted by bwhelan1 on Sun, 2007-05-27 04:56

David,

The feed has 30K products. It works with the "Find and Replace" applied to the Buy URL field with no problems. It only presents a problem when using any variation of the "Drop If Not" filter.

Bill Whelan

Computer Store

Submitted by support on Tue, 2007-05-29 13:58

Hi Bill,

If you don't enter any text in the drop record filter text box, but still include the filter, does that allow all products to be imported (as normal)? What i'm trying to establish is if the database error occurs when the majority of records are being dropped by this filter, as it seems to be a timeout based issue...

Cheers,
David.

Submitted by bwhelan1 on Fri, 2007-06-01 01:37

David,

I get this:

(The lines below are repeated several hundred times before the last one appears)

Warning: strpos() [function.strpos]: Empty delimiter. in /home/username/public_html/includes/filter.php on line 301
Warning: strpos() [function.strpos]: Empty delimiter. in /home/username/public_html/includes/filter.php on line 301
Warning: strpos() [function.strpos]: Empty delimiter. in /home/username/public_html/includes/filter.php on line 301
Warning: strpos() [function.strpos]: Empty delimiter. in /home/username/public_html/includes/filter.php on line 301
MySQL server has gone away
SELECT COUNT(*) AS productCount FROM `products` WHERE merchant='Auto Parts Warehouse'

Bill Whelan

Basketball Equipment

Submitted by support on Fri, 2007-06-01 06:45

Hi Bill,

The warning can be suppressed by changing the following line:

      $filter_dropRecordFlag = (strpos(strtolower($text),strtolower($filter_data["text"]))===FALSE);

to...

      $filter_dropRecordFlag = (@strpos(strtolower($text),strtolower($filter_data["text"]))===FALSE);

The @ character in-front of the function call should prevent any errors or warnings from being raised by that call. It would be worth trying this one more time as the time taken in processing that error could cause the same database problem. If the database error occurs this time we'll have to try a null filter as I trust you can see that there is no obvious reason why this code should cause such an obscure MySQL error...

Cheers,
David.

Submitted by knight01 on Fri, 2007-10-26 07:21

David,
Sorry to bring back an older thread. But I'm trying to use the droprecordifnot on the product title field, specifically to drop any record that does not have the term HD in the title of the product.

I'm getting no records imported and think it could be due to the word being less than three characters... is this possibly the issue? Is there a way to drop a record based on a short term like this?

I'm also trying to use the droprecordif based on the category. Will it drop the record if any word I use appears in the category name, or must it match the category name completely?

Thanks!

Submitted by support on Fri, 2007-10-26 08:49

Hi,

I would suggest trying the much more flexible Drop Record If Not (RegExp) which enables you to match more than one term, and should work fine with HD on its own which you would specify in the filter as (HD) - but if you want more terms you can include them separated with the PIPE character, for example (HD|High Definition) etc.

The code and more info are in this thread:

http://www.pricetapestry.com/node/1551

This filter will operate fine on parts of the field you are using it again, it does not have to match the entire field.

Cheers,
David.