Skip to content

Conversation

@samuel02
Copy link
Contributor

This is a simple implementation with support for pagination. I have added a new model (Metadata) as well as a mapper. When any of the repository loaders are used the Metadata mapper will be used to parse and instantiate an instance of the Metadata model and store it on the repository. The model has three attributes; total_resources, total_pages current_page. This will then be used to paginate through the result set.

I wasn't sure how to test the pagination itself in a proper way, so any input on that would be nice!

I also noted that dry-types and dry-structs should be updated but there are some breaking changes that I didn't have time to look at.

Any feedback is welcome, I really need this feature!

Copy link
Member

@d-Pixie d-Pixie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @samuel02, welcome!

Thanks for the PR 😃. I might be wrong about the helper methods in the Pagination module not being needed anymore after my proposed refactoring, but I didn't see them used anywhere else.

It should also be noted that this might be a VERY expensive call, depending on the number of pages and the speed at which Fortnox returns the results. It should probably print a warning message in the Rails log so that anyone using this is made aware. Might be a good idea to mention it in the README as well.

Welcome to the community. If you want to leave an email we can invite you to our Slack channels for the gem. I'll ask @ehannes to have a look at your tests.

Sorry for the delay, it's GDPR days right now so it's a bit busy 🙂

spec.required_ruby_version = '>= 2.3'
spec.add_dependency 'dry-struct', '~> 0.1'
spec.add_dependency 'dry-types', '~> 0.8'
spec.add_dependency 'dry-types', '~> 0.8', '< 0.13.0'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean < here or <=? This will keep it to the latest 0.12.x, unclear if that was the intention :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually meant <.

0.13.0 introduces breaking changes and I didn't want the upgrade to be part of this PR. See https://github.com/dry-rb/dry-types/blob/master/CHANGELOG.md#v0130-2018-05-03

module API
module Model
class Metadata < Model::Base
STUB = {}.freeze
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stub must contain all the required attributes. The stub is the minimal acceptable instance.

instantiate_collection_response(response_hash)
results = []

loop do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this might be a little clearer:

# In the pagination module
def pages
  (1..total_pages ).to_a # Returns an array with [1,2,...,n] where n == total_pages
end

# In this file
def all
  pages.each_with_object([]) do |page_number, results|
    response_hash = get(self.class::URI, query: { page: page_number })
    results.push( *instantiate_collection_response( response_hash ))
  end
end

That way we don't have to do bounds checking, so we can drop last_page? and get the page number for free, so we can drop next_page. And we get a cleaner method without temporary variables 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. I like that way that looks but that doesn't solve for the inital request to get the number of total pages.. I guess we could do one request and then only start paginating if there are more pages but that makes it kind of ugly.. Any suggestions?

Copy link
Member

@d-Pixie d-Pixie Jun 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My updated suggestion:

def all( limit: -1 )
  # Cast and clamp to natural number
  requested_number_of_entities = Integer(limit)
  limit_imposed = requested_number_of_entities.positive?
  # Max 500 entities per request
  request_limit = limit_imposed ? [requested_number_of_entities,500].max : 500
  # Start at the first page
  page = 1
  results = []

  loop do
    response_hash = get(self.class::URI, query: { page: page, limit: request_limit })
    results.push( *instantiate_collection_response( response_hash ))

    total_pages = response_hash['MetaInformation']['@TotalPages']

    limit_reached = limit_imposed && results.length >= requested_number_of_entities
    no_more_pages = total_pages == page
    break if limit_reached or no_more_pages
    page += 1
  end

  results = results[0...requested_number_of_entities] unless requested_number_of_entities.negative?

  return results
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[I updated it a bit after I ran it locally and got some errors]

That code works for getting all when < 500 as well as limiting the result to the requested limit. I tested pagination by setting the request_limit to 10 and making sure the query still returned the right number of entities.

The test code I used, after starting the debug console with bin/console, was this:

reload!
Fortnox::API.configure do |config|
  config.client_secret = '[SECRET]'
  config.access_token = '[UUID]'
end
r = Fortnox::API::Repository::Invoice.new
r.all.length

Sometimes with a limit: argument to the all call.

module API
module Repository
module Pagination
def last_page?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer needed.

current_page == total_pages
end

def next_page
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer needed.

current_page + 1
end

def current_page
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer needed.

@samuel02
Copy link
Contributor Author

samuel02 commented May 16, 2018

@d-Pixie thanks for the welcome and thanks for good comments! My email can be found on my profile 🙂 Would really like to join the Slack channel and join the discussion about the future of the gem.

I thought about it being an expensive call and I see a few different ways forward:

  1. To parameterize the limit and set the default to 100 and only if you explicitly call it with nil you return ALL.
  2. To instead of returning the full array return an Enumerator that could be lazily evaluated and thus the pagination would only happen when needed. A call to to_a would result in the same thing still.
  3. Like you said we could add a warning in the logger but that wouldn't really do a lot since you'd only see that after the fact.

I honestly don't know what I personally think would be the best way. You could argue that it's up to the user of the gem to understand that a call to all will be expensive for a repository containing a lot of data. But then again maybe we should offer a way to at least return the total count in an easy way.

EDIT: And, btw, good luck with the GDPR stuff! 🙂

@d-Pixie
Copy link
Member

d-Pixie commented Jun 11, 2018

The problem with "it's up to the user of the gem to understand that a call to all will be expensive" is that it is due to how Fortnox forces us to implement the all call.
Normally this wouldn't be an expensive call, you would call the index endpoint and get a list of entities back and we could figure out what pages to request from Fortnox depending on our pagination settings vs theirs and that would be it.
No we have to call index to get a shallow list of items and then call the entity endpoint for each to get the full version. So that is a O(n+1) call instead of O(1) as in the other case. Given that n=100 in this case and a network latency of 50ms that is a total call time of at least 50*100ms or 5 seconds. And a 50ms network is optimistic as well as not accounting for any other delays.

At the very least we need to clearly warn about this and log a warning in development. I left the implementation out because of these issues and still think it is better to explain this in the Readme and leave it up to the developers using the gem to implement any paging/loading they need manually.
We page in page sizes of 10 for example, so for us it is a lot cheaper to just do it manually then to make a 5 second call to get all of them. I'd say chunking this on the backend, loading 10 at a time and streaming to the frontend, is basically the only reasonable solution right now if you really want to load all entities.
Also remember that the default request timeout is 2 seconds, so a 5 second call will be killed by the browser or Nginx unless you configure it to specifically allow a lot longer requests.

@d-Pixie
Copy link
Member

d-Pixie commented Jun 11, 2018

@ehannes Could you weigh in on this as well?

@ehannes
Copy link
Member

ehannes commented Jun 11, 2018

@samuel02 would you like to describe your use case? What problem does this new feature solve for you?

@samuel02
Copy link
Contributor Author

@ehannes The use case is basically backfilling of data. I'm working on a system that's going to be the new source of truth for billing. Historically most order and customer data has been kept in the system but invoicing has been done "manually" in Fortnox. Now the goal is to store invoice data in the system and then just "sync" the data to Fortnox for sendout and accounting purposes. But that also means I need way to retroactively create invoices in the system from Fortnox. And since this has been a manual process there's no easy way of just iterating over orders and searching invoices via the Fortnox API. So the idea is to basically do:

fortnox_invoices = Fortnox::API::Repository::Invoice.new
fortnox_invoices.all.each do |fortnox_invoice|
  if order = find_plausible_order_for_fortnox_invoice(fortnox_invoice)
    invoice = Invoice.create(order: order, ....)
    FortnoxInvoiceMapping.create(
      invoice: invoice,
      document_number: fortnox_invoice,
      url: fortnox_invoice.url
    )
  else
    logger.info "Could not find plausible order for fortnox invoice: #{fortnox_invoice}"
  end
end

And this code would run in a background job or a script. Also I would probably cache all data from Fortnox for like 24 hours in order to be able to more easily recover from failures etc.

@d-Pixie I agree a lot with your concerns. So what do you think about instead returning an enumerator? And load it lazily like e.g. https://www.rubydoc.info/gems/twitter/Twitter/SearchResults + https://github.com/sferik/twitter/blob/master/lib/twitter/enumerable.rb . We could also take some inspiration from https://github.com/octokit/octokit.rb#auto-pagination and have an enable/disable feature for "auto-pagination". Re: logging, absolutely but that's still giving the user the info after the fact. Re: shallow vs. full - do we automatically fetch the full version for each record on a call to #all?

Either way I think we need to come up with a better solution and I'm sure we can do it together :)! I was really surprised when I realized that #all didn't return all but just 100.

module Fortnox
module API
module Repository
module Pagination
Copy link
Member

@d-Pixie d-Pixie Jun 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should drop this and create a proper class for the metadata if we want methods on it instead.

Right now the @metadata is an instance of the metadata model, that is fine. But the pagination shouldn't live on the repository singleton. The way me current example looks it keeps all that in the single call. Your implementation leaves things on the global repository object instead ....

A better place for it would be the collection we are building, results = [] in my example. By creating a collection class and pushing the logic to that we could handle pagination in that and support things like batching from the API etc in that object instead.

But for now I think it is ok to just stick it in the main method. Check this for more on why I think that: #149 (base gem work can be found at https://github.com/Accodeing/rest_easy)

@d-Pixie d-Pixie changed the title Add simple pagination Add proper #all implementation Jun 15, 2018
@d-Pixie d-Pixie changed the title Add proper #all implementation Add proper #all implementation Jun 15, 2018
@mbirman mbirman mentioned this pull request Oct 30, 2019
@ehannes ehannes force-pushed the development branch 2 times, most recently from e19d08d to bd64a28 Compare February 17, 2025 07:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants