Considering pages when caching a JSON response

Social networks work on the basis of presenting a feed and allowing the user to interact with the content contained in the feed in (hopefully) novel and interesting ways that the user derives usefulness (joy?) from. The presentation of the feed varies greatly from social network to social network but the underlaying model is often very similar to:

Pagination Model

The Feed being the set of the total information, if you are using a RESTful API this will often take the form of an endpoint url: https://greatsocialnetwork.com/api/v2/news

The Page being the subset of the information contained within the feed and will often be actual the response from the endpoint.

The Post being the individual piece of content that the user will actually see and will be contained within the Page.

Coupled with this common data model is the concept of caching. Caching is were is locally store data to allow for quicker retrieve, we could can cache the result of calculations or duplication of data stored else where such a on server. The trade off associated with caching is that we gain quicker retrieval time at the expense of accuracy (or up-to-dateness). Depending on your data set this will impact how long you can store and present potentially out of date data to the user. In the example app that we are going to build we don't care much for accuracy of individual posts but we do care that we present a full set of pages and that those pages are in the correct order without gaps.

Stacking the questions

I'm going to use Stackoverflow as the example of a social network, as it's API follows the model described above (and you don't have to register to start to using it 😃). Let's look at the Question's endpoint, the URL for this is https://api.stackexchange.com/2.2/questions?order=desc&sort=creation&site=stackoverflow - the response from this endpoint contains an array of questions (described as items) such as:

{  
   "items":[  
      {  
         "tags":[  
            "ios"
         ],
         "owner":{  
            "reputation":55,
            "user_id":5024892,
            "user_type":"registered",
            "accept_rate":42,
            "profile_image":"https://www.gravatar.com/avatar/1ed5d5eae3802a3f8ce37d09233505ec?s=128&d=identicon&r=PG&f=1",
            "display_name":"TaroYuki",
            "link":"http://stackoverflow.com/users/5024892/taroyuki"
         },
         "is_answered":true,
         "view_count":28,
         "answer_count":1,
         "score":-1,
         "last_activity_date":1455221065,
         "creation_date":1455220954,
         "question_id":35348969,
         "link":"http://stackoverflow.com/questions/35348969/convert-to-double-failed-if-the-value-is-null-or-empty",
         "title":"Convert to double failed if the value is null or empty"
      }
   ],
   "has_more":true,
   "quota_max":300,
   "quota_remaining":296
}

Paginating on this endpoint is as simple as adding &page=n where n is whichever page you are wanting to retrieve.

Armed with this API, data model and caching priority we will build a system that will build up a paginated data set focused on ensuring that the user isn't presented with gaps in that data set. If a gap is spotted the app should delete the non first/top page of data and retrieve fresh pages in sequence - the delete should happen at the earliest possible moment. So using this requirement we will parse this JSON response into a set of NSManagedObject subclasses and present these in a tableview using a FetchedResultsController. The app will request the first/top page of the feed when the user opens the app and the next page of data when the user reaches the end of cached pages by scrolling the tableview to it's end.

The example below works for a chronologically ordered set of data.

Modeling

The first thing to consider is that we need a way to determine if we any gaps in our pages. Following the data model described above I propose that we build the following model:

  • a Feed class that will store a set of pages
  • a Page class that will hold a reference to the feed it's in and also a set of pages
  • a Question class that will hold a reference to it's page

Ok, so we have a basic data model but this doesn't allow to identify if we have gaps in our cached pages. In order to support this we need to add more state to both the Feed and Page classes. Let's add an arePagesInSequence boolean property that will us to query if the feed contains pages that are in sequence or if it has gaps in it's model. The arePagesInSequence property will be a calculated value that we will determine by examining the response that we retrieve from API requests. When paginating we don't care if the data is in sequence or not as we are adding new pages of data to the end of our feed driven by the users downward scroll on the tableview as we are building on what we have so shouldn't have any gaps. We only need to concern ourselves with determining if the pages are in sequence when we request the first/top most page.

In order to do we need to examine the pages of Questions that we are returned; each Question has it's own unique ID. Using this information we know that if we get back a page that contains a question that we already have then the pages are in sequence and if we don't then we need to treat our feed as being out of sequence (it's possible to get back a page of unique IDs but the feed is actually still in sequence however for simplicity in this example we will ignore this scenario). In order to do we can compare if a page has the same number of questions after the JSON has been parsed as was present in the JSON response. Let's store this value in fullPage boolean property, we need this property as the last page can contain less questions than the maximum number of questions available for the page size. After parsing each page we can then query fullPage on the newly parsed page and set the arePagesInSequence property on the feed instance.

Next we need to consider how to store the URL for the next page in the pagination sequence. If we store it in the feed, we will need not only the next-in-sequence URL and end of feed URL. You may be thinking 'but don't we delete out-sequence pages?', and while this is true we only want to delete the out-of-sequence pages when the user is not interacting with any of those pages i.e. when the user is on the first/page of questions. So it's entirely possible (and very much probable) that the app that need to store the URL for the out-sequence pages and the in-sequence pages. This has the potential to be a headache! But if we think about our pages a linked-list we can see that it shouldn't be the Feed that stores the next URL but rather the Page, this way when we request the next page of data it's just a case of passing in the proceeding the page and asking it for the next URL - we don't care if that request is filling a gap or adding a page to the end/bottom of the feed. Let's add a nextHref property to our Page class.

Enough chat, let's see some code

Feed:

NS_ASSUME_NONNULL_BEGIN

extern NSString * kPTEBaseURLString;

@interface PTEFeed : NSManagedObject

@property (nonatomic, strong, readonly) NSArray *orderedPages;

+ (PTEFeed *)questionFeed;
+ (PTEFeed *)questionFeedWithManagedObjectContext:(NSManagedObjectContext *)managedObjectContext;

@end

NS_ASSUME_NONNULL_END

#import "PTEFeed+CoreDataProperties.h"

PTEFeed+CoreDataProperties:

NS_ASSUME_NONNULL_BEGIN

@interface PTEFeed (CoreDataProperties)

@property (nullable, nonatomic, retain) NSNumber *arePagesInSequence;
@property (nullable, nonatomic, retain) NSSet *pages;

@end

@interface PTEFeed (CoreDataGeneratedAccessors)

- (void)addPagesObject:(PTEPage *)value;
- (void)removePagesObject:(PTEPage *)value;
- (void)addPages:(NSSet *)values;
- (void)removePages:(NSSet *)values;

@end

NS_ASSUME_NONNULL_END

Using the newer approach to splitting NSManagedObject subclasses into one concrete class (PTEFeed) that the developer can add to and one category (PTEFeed+CoreDataProperties) that is generated for us. Here we have the arePagesInSequence property we spoke about above and the class also contains a few convenience methods for retrieving one particular instance of a feed and an ordered array of pages.

Page:

NS_ASSUME_NONNULL_BEGIN

@interface PTEPage : NSManagedObject

// Insert code here to declare functionality of your managed object subclass

@end

NS_ASSUME_NONNULL_END

#import "PTEPage+CoreDataProperties.h"

PTEPage+CoreDataProperties:

NS_ASSUME_NONNULL_BEGIN

@interface PTEPage (CoreDataProperties)

@property (nullable, nonatomic, retain) NSDate *createdDate;
@property (nullable, nonatomic, retain) NSString *nextHref;
@property (nullable, nonatomic, retain) NSNumber *index;
@property (nullable, nonatomic, retain) NSNumber *fullPage;
@property (nullable, nonatomic, retain) NSSet *questions;
@property (nullable, nonatomic, retain) PTEFeed *feed;

@end

@interface PTEPage (CoreDataGeneratedAccessors)

- (void)addQuestionsObject:(PTEQuestion *)value;
- (void)removeQuestionsObject:(PTEQuestion *)value;
- (void)addQuestions:(NSSet *)values;
- (void)removeQuestions:(NSSet *)values;

@end

NS_ASSUME_NONNULL_END

Both nextHref and fullPage are present as spoken about above.

Question:

NS_ASSUME_NONNULL_BEGIN

@interface PTEQuestion : NSManagedObject

// Insert code here to declare functionality of your managed object subclass

@end

NS_ASSUME_NONNULL_END

#import "PTEQuestion+CoreDataProperties.h"

PTEQuestion+CoreDataProperties:

NS_ASSUME_NONNULL_BEGIN

@interface PTEQuestion (CoreDataProperties)

@property (nullable, nonatomic, retain) NSString *title;
@property (nullable, nonatomic, retain) NSString *author;
@property (nullable, nonatomic, retain) NSDate *createdDate;
@property (nullable, nonatomic, retain) NSNumber *index;
@property (nullable, nonatomic, retain) NSNumber *questionID;
@property (nullable, nonatomic, retain) PTEPage *page;

@end

NS_ASSUME_NONNULL_END

Nothing much here related to the our pagination approach other than that questions are associated with a page.

Ok, so thats our data model complete - lets look how we retrieve the JSON that we cache and how we populate the properties declared in our model classes.

Retrieving

To retrieve our JSON response we will work with three groups of classes:

  • a QuestionsAPIManager class that will hide the details of API call being made and the details of how that API call is made.
  • a PTEQuestionsRetrievalOperation class that will handle processing the JSON response on a background thread.
  • a PTEQuestionParser class that will actually parse the JSON response.

By abstracting the actual API call behind QuestionsAPIManager's interface we can handle configuring the URL that will be used without the ViewController needing to care about the details instead the ViewController.

+ (void)retrievalQuestionsForFeed:(PTEFeed *)feed
                          refresh:(BOOL)refresh
                       completion:(void(^)(BOOL successful))completion
{
    NSURLSession *session = [NSURLSession sharedSession];

    NSURL *url = nil;

    if (feed.pages.count > 0)
    {
        PTEPage *page = [feed.orderedPages lastObject];
        url = [NSURL URLWithString:page.nextHref];
    }
    else
    {
        NSString *urlString = [[NSMutableString alloc] initWithString:kPTEBaseURLString];
        url = [NSURL URLWithString:urlString];
    }

    NSManagedObjectID *feedObjectID = feed.objectID;

    NSURLSessionDataTask *task = [session dataTaskWithURL:url
                                        completionHandler:^(NSData * _Nullable data, NSURLResponse * _Nullable response, NSError * _Nullable error)
                                  {
                                      dispatch_async(dispatch_get_main_queue(), ^
                                                     {
                                                         PTEQuestionsRetrievalOperation *operation = [[PTEQuestionsRetrievalOperation alloc] initWithFeedID:feedObjectID
                                                                                                                                                       data:data
                                                                                                                                                    refresh:refresh
                                                                                                                                                 completion:completion];

                                                         [[PTEQueueManager sharedInstance].queue addOperation:operation];
                                                     });
                                  }];

    [task resume];
}

In the above method we pass in a refresh parameter to trigger either a refresh (first/top page) or pagination request. We then make the actual API call and pass it's response onto the operation to be processed. An interesting aside is that when scheduling the operation we switch onto the main queue/thread this is because when we call the block that we pass through to operation will be executed on the thread that it was called on but we will see better in the operation itself.

The operation handles serializing the NSData returned into an NSDictionary, triggering the parsing of that serialized NSDictionary and updating the parsed Page's properties within the context of the Feed.

@interface PTEQuestionsRetrievalOperation ()

@property (nonatomic, strong) NSManagedObjectID *feedID;
@property (nonatomic, strong) NSData *data;
@property (nonatomic, copy) void (^completion)(BOOL successful);
@property (nonatomic, assign) BOOL refresh;

@property (nonatomic, strong) NSOperationQueue *callBackQueue;

- (NSNumber *)indexOfNewPageInFeed:(PTEFeed *)feed;
- (void)reorderIndexInFeed:(PTEFeed *)feed;

@end

@implementation PTEQuestionsRetrievalOperation

#pragma mark - Init

- (instancetype)initWithFeedID:(NSManagedObjectID *)feedID
                          data:(NSData *)data
                       refresh:(BOOL)refresh
                    completion:(void(^)(BOOL successful))completion
{
    self = [super init];

    if (self)
    {
        self.feedID = feedID;
        self.data = data;
        self.completion = completion;
        self.callBackQueue = [NSOperationQueue currentQueue];
        self.refresh = refresh;
    }

    return self;
}

#pragma mark - Main

- (void)main
{
    [super main];

    NSError *serializationError = nil;

    NSDictionary *jsonResponse = [NSJSONSerialization JSONObjectWithData:self.data
                                                                 options:NSJSONReadingMutableContainers
                                                                   error:&serializationError];

    if (serializationError)
    {
        [self.callBackQueue addOperationWithBlock:^
         {
             if (self.completion)
             {
                 self.completion(NO);
             }
         }];
    }
    else
    {
        [[CDSServiceManager sharedInstance].backgroundManagedObjectContext performBlockAndWait:^
        {
            PTEQuestionParser *parser = [[PTEQuestionParser alloc] init];
            PTEPage *page = [parser parseQuestions:jsonResponse];

            PTEFeed *feed = [[CDSServiceManager sharedInstance].backgroundManagedObjectContext existingObjectWithID:self.feedID
                                                                                                              error:nil];

            page.nextHref = [NSString stringWithFormat:@"%@&page=%@", kPTEBaseURLString, @(feed.pages.count + 1)];
            page.index = [self indexOfNewPageInFeed:feed];

            [self reorderIndexInFeed:feed];

            if (self.refresh)
            {
                feed.arePagesInSequence = @(!page.fullPage.boolValue);
            }

            [feed addPagesObject:page];

            /*----------------*/

            [[CDSServiceManager sharedInstance] saveBackgroundManagedObjectContext];
        }];

        /*----------------*/

        [self.callBackQueue addOperationWithBlock:^
         {
             if (self.completion)
             {
                 self.completion(YES);
             }
         }];
    }
}

#pragma mark - PageIndex

- (void)reorderIndexInFeed:(PTEFeed *)feed
{
    NSArray *pages = feed.orderedPages;

    for (NSUInteger index = 0; index < pages.count; index++)
    {
        PTEPage *page = pages[index];
        page.index = @(index);
    }
}

- (NSNumber *)indexOfNewPageInFeed:(PTEFeed *)feed
{
    NSNumber *indexOfNewPage = nil;

    if (self.refresh)
    {
        indexOfNewPage = @(-1);
    }
    else
    {
        indexOfNewPage = @(feed.pages.count);
    }

    return indexOfNewPage;
}

@end

In the above class we determine the next page URL by using the count of the pages that we have cache and determine if the feed's pages are in sequence.

The parser itself is pretty standard.

@implementation PTEQuestionParser

#pragma mark - Parse

- (PTEPage *)parseQuestions:(NSDictionary *)questionsRetrievalReponse
{
    PTEPage *page = [NSEntityDescription cds_insertNewObjectForEntityForClass:[PTEPage class]
                                                       inManagedObjectContext:[CDSServiceManager sharedInstance].backgroundManagedObjectContext];

    NSArray *questionsReponse = questionsRetrievalReponse[@"items"];

    for (NSUInteger index = 0; index < questionsReponse.count; index++)
    {
        NSDictionary *questionResponse = questionsReponse[index];

        PTEQuestion *question = [self parseQuestion:questionResponse];
        question.index = @(index);

        if (!question.page)
        {
            [page addQuestionsObject:question];
        }
        else
        {
            page.fullPage = @(NO);
        }
    }

    return page;
}

- (PTEQuestion *)parseQuestion:(NSDictionary *)questionReponse
{
    NSUInteger questionID = [questionReponse[@"question_id"] unsignedIntegerValue];
    NSPredicate *predicate = [NSPredicate predicateWithFormat:@"questionID == %@", @(questionID)];

    PTEQuestion *question = (PTEQuestion *)[[CDSServiceManager sharedInstance].backgroundManagedObjectContext cds_retrieveFirstEntryForEntityClass:[PTEQuestion class]
                                                                                                                                         predicate:predicate];

    if (!question)
    {
        question = [NSEntityDescription cds_insertNewObjectForEntityForClass:[PTEQuestion class]
                                                      inManagedObjectContext:[CDSServiceManager sharedInstance].backgroundManagedObjectContext];

        question.questionID = @(questionID);
    }

    question.title = questionReponse[@"title"];

    NSDictionary *ownerResponse = questionReponse[@"owner"];

    question.author = ownerResponse[@"display_name"];

    return question;
}

What's important to note for our pagination approach is:

if (!question.page)    
{  
    [page addQuestionsObject:question];    
}    
else    
{
     page.fullPage = @(NO);    
}

Here we work if the page is a full page or not by checking if the question already exists in our cached data - fullPage defaults to YES

Tidying up after ourselves

So we have now seen how we retrieve, parse and store data but we still need to look at how we delete pages when the feed goes out of sequence. For this we want to trigger the delete action when the user has scrolled onto the first page of questions and can no longer see the out of sequence - thankfully as we are using a tableview to present our questions we can implement some option UITableViewDelegate methods to control this.

- (void)tableView:(UITableView *)tableView didEndDisplayingCell:(UITableViewCell *)cell forRowAtIndexPath:(NSIndexPath *)indexPath
{
    if (self.fetchedResultsController.fetchedObjects.count > indexPath.row)
    {
        if(!self.feed.arePagesInSequence.boolValue)
        {
            PTEDeleteOutOfSyncQuestionPagesOperation *operation = [[PTEDeleteOutOfSyncQuestionPagesOperation alloc] init];

            [[PTEQueueManager sharedInstance].queue addOperation:operation];
        }
    }
}

In the above method we check if the feed's pages are in sequence and trigger the deletion of older pages.

And that is other pagination approach completed, phew! I know I included a lot of code in this post but these is actually a lot more in the example repo that goes with this post:

https://github.com/wibosco/PaginationCoreData-Example

I hope that you enjoyed this post and as ever if you have any questions or comments please get in touch.