@mattab opened this Issue on April 18th 2011 Member

The Tracking API is more and more used. A new use case, is that when the function setForceVisitDateTime() is used, and a date recorded in the past, if Piwik reports are already processed for this date (and the week/month/year containing this date), then the reports are never re-processed. This causes discrepencies.

There are 2 main use cases for this problem:

  • Log import use case #703
  • Paypal IPN Tracking use case, recording a conversion that happened a few hours ago #2222
  • or any kind of "after the fact" tracking

We need a mechanism to force such visit/pages inserted in the past to 'flush' the past reports and they will be re-processed at the next archiving run.

The challenge is to make this efficient.

A proposal for this:

  • in Tracker, if the request being tracked is in the past (before today midnight), then we are in the case that some existing reports become out of date
    • when this is the case, we store in Option table, all the unique "days" that have been tracked in the past
  • Now, when a report is requested via API, the API we would if there are any dates that were loaded in the past recently
  • if there is such dates, loop and execute the query:
DELETE FROM archive WHERE date1 <= '$date' AND '$date' <= date2```
 on all archive tables.  
  * this should be optimized to only run on these archive tables that may contain such record (not good to loop over ALL archive tables for ALL dates to delete) 
  * might be better performance to run one query only per archive table, deleting records for all these dates in the past
  * once done, reset the flag. If deleting one day at a time, delete the flag for this day after being done.

* it is important that piwik works when the Import apache log script is running, while archiving is also running (be careful about handling the flag of 'dates' and not lose the information that is being saved by Tracker while importing logs, while Archiving is reading/updating the flag as well)

I think this will work while not adding much overhead?
@mattab commented on April 23rd 2011 Member

There is a potential issue with this functionnality if logs have been deleted, as per #5 (see #5425 )

@mattab commented on September 7th 2011 Member

Dupe of #2584

This Issue was closed on September 7th 2011
Powered by GitHub Issue Mirror