You are here:  » Weird feed


Weird feed

Submitted by wdekreij on Tue, 2006-12-19 16:00 in

Hi,

I have a very strange feed, it looks like this (for example):

bla1;bla2;bla3; (header row)
baba1;baba2 - baba3;baba4;

The thing is.. I want to seperate baba2 and baba3, because these are two different things.. Any idea's how to fix this problem?

Submitted by support on Tue, 2006-12-19 20:48

Hi,

I'm assuming that you've been able to register the feed successfully by selecting a "semi-colon" separated file format - as it will not be automatically detected.

If "baba2" and "baba3" are then part of the same field then you may be able to separate them using a couple of new filters that are not part of the distribution but that you can add quite easily...

http://www.pricetapestry.com/node/339

(see the second message)

This mod will add 2 new filters - "Split Before" and "Split After", which you can use by specifying "-" as the split character to extract either "baba2" or "baba3" as required.

If you have any problems using this mod feel free to send me a link to this particular feed (reply to your reg code email or forum registration email is the easiest way) on your server, for example...

http://www.yoursite.com/feeds/feed.txt

...and i'll download it to my test server and take a look for you...

Cheers,
David.

Submitted by wdekreij on Wed, 2006-12-20 17:18

Thanks a lot, I will look into it later this week, first building the rest of the site..

Btw, is it only possible to load feeds from my server, or can I also load extern-placed feeds? Right now I have to download the feeds and upload them to my server, but the feeds are changing every day.. So would be nice if I can auto-load them from the Affiliate Company..

Submitted by support on Wed, 2006-12-20 17:30

Hi Wilco,

The automation scripts that are part of the Price Tapestry distribution exist for this purpose. See the documentation, particularly of import.php for more information...
http://www.pricetapestry.com/node/6

Here's some comprehensive instrucutions for setting it all up:
http://www.pricetapestry.com/node/198

Cheers,
David.

Submitted by wdekreij on Sun, 2007-01-14 22:35

Allright, here I am again :-)

The problem is, before and after the "-" (for instance, could also be a "/") is NOT the same field.. This makes it a lot more complex, I guess..

Any idea's where to start?

Submitted by support on Sun, 2007-01-14 23:31

Hi,

Are both parts of the "-" separated field items that you need to register (i.e. one of product name, description, image url, buy url or price)?

If there is a blank field in your feed, you can register this as one of the fields, and then use a Text After filter to copy in the value of the combined field using the placeholder notation; for example:

%FIELD3%

For the other required field, register the combined field.

At this point, and with just the one filter as described above you would now have 2 fields containing the same thing - the combined field in your awkard feed.

Now, there is another thread with code to add 2 new filters - Split Before, and Split After:

http://www.pricetapestry.com/node/339

If you make this modification; you can now use "Split Before" on the first field that contains the combined value, splitting on the "-" character; and against the other field use "Split After", again splitting on the "-" character.

These steps should enable you to bring in the separate values from the combined field. I appreciate that this is quite complex to setup, but this is an extremely unusual method of formatting explicit data and not one that can be handled easily at all by any computer program. It is certainly worth contacting the merchant in this instance and asking them to provide better formatted product data!

Hope this helps,
Cheers,
David.

Submitted by bwhelan1 on Mon, 2007-05-07 23:33

David,

Any suggestions on this feed's category?

Example:

TEC000000FBC000276FBS000251COM043000
BUS053000COM021000COM021040
COM000000BUS096000
COM000000COM051000MAT015000

The categories are represented by the three alpha characters and subcategories are represented by the numberic characters (9 digits). Additional sections that the item resides in are indicated by an additional 9 character alpanumeric string. The top level category is always first. The first three alpha characters represent the category where COM = Computers; MAT = Math, etc. They are all run together with no delimiter.

I want all the categories where COM is the first three characters and drop all other records, even if COM is in that field I do not care unless it is the first three. Any ideas?

Bill Whelan

Computer Store

Submitted by support on Tue, 2007-05-08 09:01

Hi Bill,

The best way to handle this is with a new filter that drops records by matching a regular expression rather than just a simple string search. Here's the code - add this to includes/filter.php

<?php
  
/*************************************************/
  /* dropRecordRegExp                              */
  /*************************************************/
  
$filter_names["dropRecordRegExp"] = "Drop Record RegExp";
  function 
filter_dropRecordRegExpConfigure($filter_data)
  {
    print 
"Drop record if field matches regular expression:<br />";
    print 
"<input type='text' size='40' name='text' value='".widget_safe($filter_data["text"])."' />";
    
widget_errorGet("text");
  }
  function 
filter_dropRecordRegExpValidate($filter_data)
  {
  }
  function 
filter_dropRecordRegExpExec($filter_data,$text)
  {
    global 
$filter_dropRecordFlag;
    if(
$filter_dropRecordFlag)
    {
      return 
$text;
    }
    else
    {
      
$filter_dropRecordFlag ereg($filter_data["text"],$text);
    }
    return 
$text;
  }
?>

To use it in your example, register a new "Drop Record RegExp" against the category field, and in the box enter the following expression:

^COM

The ^ character means match the start of the line, so it will only drop the record if the category begins with COM.

Hope this helps!
Cheers,
David.

Submitted by bwhelan1 on Thu, 2007-05-10 14:08

Is there a "Drop if Not" version of this filter? Thanks.

Bill Whelan

Computer Store

Submitted by support on Thu, 2007-05-10 14:20

Hi Bill,

The following should do the trick:

<?php
  
/*************************************************/
  /* dropRecordIfNotRegExp                         */
  /*************************************************/
  
$filter_names["dropRecordIfNotRegExp"] = "Drop Record If Not RegExp";
  function 
filter_dropRecordIfNotRegExpConfigure($filter_data)
  {
    print 
"Drop record if field does not match regular expression:<br />";
    print 
"<input type='text' size='40' name='text' value='".widget_safe($filter_data["text"])."' />";
    
widget_errorGet("text");
  }
  function 
filter_dropRecordIfNotRegExpValidate($filter_data)
  {
  }
  function 
filter_dropRecordIfNotRegExpExec($filter_data,$text)
  {
    global 
$filter_dropRecordFlag;
    if(
$filter_dropRecordFlag)
    {
      return 
$text;
    }
    else
    {
      
$filter_dropRecordFlag = !ereg($filter_data["text"],$text);
    }
    return 
$text;
  }
?>

Cheers,
David.

Submitted by bwhelan1 on Sat, 2007-07-07 03:40

Hi Bill,

I tried searching the forum for other options when using this filter (i.e. either this OR that in a given field) but only turn up this thread. Any suggestions?

Bill Whelan

Basketball Equipment

Submitted by support on Sat, 2007-07-07 06:32

Hello Bill,

With the regular expression check, you should be able to enter the following:

(this|that)

This will drop the record if the field does not contain this or that...

Cheers,
David.

Submitted by bwhelan1 on Sat, 2007-07-07 19:11

Sorry David, my mistake...

I meant "this" AND "that" in a field. I mean to have all "hockey jerseys" in a feed where the words hockey and jersey might not run together but will be in the same field but possibly be separated by one or more words or characters.

Bill Whelan

Basketball Equipment

Submitted by support on Sat, 2007-07-07 19:49

Hi Bill,

Regular expressions can start to look quite complicated quite quickly! Consider how the above expression works. The pipe character - "|" -, as you've probably determined, means logical "OR". Another expression you can use within a regular expression is ".*", where "." means match any character, and "*" means repeated zero or more times - so it's basically a wildcard. Therefore, we can combine the wildcard with both versions of this and that - (this,that and that,this), all combined with the OR operator...

(.*this.*that.*|.*that.*this.*)

That _should_ do the trick, and will drop a record if it does not contain "this" AND "that", in any order...!

Cheers!
David.

Submitted by bwhelan1 on Sat, 2007-07-07 20:56

David,

That didn't work and I'm sure I didn't explain properly before.

I end up with the below categories when using the filter (nhl|jerseys)and the goal is to have nothing but hockey jerseys on the site:

Apparel/Footwear -> Jerseys
MLB FanShop -> Autographed Memorabilia -> Autographed Items -> Jerseys Autographed
MLB FanShop -> Jerseys
NASCAR FanShop -> Jerseys
NHL FanShop -> Autographed Memorabilia -> Autographed Items -> Hockey Pucks Autographed
NHL FanShop -> Autographed Memorabilia -> Autographed Items -> Hockey Sticks Autographed
NHL FanShop -> Autographed Memorabilia -> Autographed Items -> Jerseys Autographed
NHL FanShop -> Autographed Memorabilia -> Autographed Items -> Other Autographed Items
NHL FanShop -> Jerseys

How can I filter out everything that doesn't have both NHL and Jerseys in it? In other words, to have the above categories trimmed to only:

NHL FanShop -> Jerseys
NHL FanShop -> Autographed Memorabilia -> Autographed Items -> Jerseys Autographed

Bill Whelan

Basketball Equipment

Submitted by support on Sun, 2007-07-08 06:48

Hi Bill,

Based on that structure, try the following - no brackets or "OR" character this time:

.*NHL.*Jerseys.*

Cheers,
David.

Submitted by bwhelan1 on Sun, 2007-07-08 16:54

I tried that one and about every other variation I could try and nothing imported.

Bill Whelan

Basketball Equipment

Submitted by support on Sun, 2007-07-08 18:51

Hi Bill,

Could you run the test code below on your server? This should print "Drop Record Flag: False", to indicate that the record with the category seen in the code will be imported. If this works, I would re-register the feed to double check that the correct field is registered as the category, and if it still doesn't work I would study the feed manually to verify that it does contain the categories that you are expecting. Feel free to email me a link to the feed to download if you're not able to do this on your server...

testfilter.php

<?php
  
function filter_dropRecordIfNotRegExpExec($filter_data,$text)
  {
    global 
$filter_dropRecordFlag;
    if(
$filter_dropRecordFlag)
    {
      return 
$text;
    }
    else
    {
      
$filter_dropRecordFlag = !eregi($filter_data["text"],$text);
    }
    return 
$text;
  }
  
$filter_dropRecordFlag false;
  
// this is what you would enter into the filter configuration page
  
$filter_data["text"] = ".*NHL.*Jerseys.*";
  
// this is what will be in the category field
  
$text "NHL FanShop -> Autographed Memorabilia -> Autographed Items -> Jerseys Autographed";
  
filter_dropRecordIfNotRegExpExec($filter_data,$text);
  print 
"Drop Record Flag: ".($filter_dropRecordFlag?"True":"False");
?>

Cheers,
David.

Submitted by bwhelan1 on Sun, 2007-07-08 19:03

David,

I get the following when adding the above filter:

Fatal error: Cannot redeclare filter_droprecordifnotregexpexec() (previously declared in /home/mydomain/public_html/includes/filter.php:339) in /home/mydomain/public_html/includes/filter.php on line 399

Bill Whelan

Basketball Equipment

Submitted by support on Sun, 2007-07-08 19:05

Sorry Bill - the above is a standalone test script - just save it as something like testfilter.php and run it on its own...

Cheers,
David.

Submitted by bwhelan1 on Sun, 2007-07-08 19:55

David,

I get: Drop Record Flag: False

Bill Whelan

Basketball Equipment

Submitted by support on Sun, 2007-07-08 20:09

Hi Bill,

That means that it is working as expected. I've replied to your latest email with some more thoughts.

Cheers,
David.

Submitted by bwhelan1 on Sun, 2007-07-08 22:45

David,

I noticed that your result when applying the filter .*NHL.*Jerseys.* appears as:


NHL FanShop Autographed Memorabilia Autographed Items Jerseys Autographed

NHL FanShop Jerseys

Where the desired result for me would be:


NHL FanShop -> Autographed Memorabilia -> Autographed Items -> Jerseys Autographed

NHL FanShop -> Jerseys

The reason being is that I have commented out the "normalise" function for "category" to leave a character(s) to trigger the Split Before filter.

Could this be my problem and if so what would you suggest?

Bill Whelan

Basketball Equipment

Submitted by support on Mon, 2007-07-09 06:26

Hi Bill,

The normalise function won't make any difference as the characters that are being looked for by the regular expression are not affected. What might change things however is the Split Before filter. This must be applied after the "Drop Record If Not RegExp", because the filters work in series, so the input to the next filter is the output from the previous one...

Cheers,
David.

Submitted by bwhelan1 on Mon, 2007-07-09 16:46

David,

I would be applying the Split Before at the very end. I am having is with only the filter in question applied and nothing else.

Bill Whelan

Basketball Equipment

Submitted by support on Mon, 2007-07-09 21:08

Hi Bill,

I'll follow up by email with some more ideas and tests for you to try.

Cheers,
David.