Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build a web-proxy into the deployment stack #27

Open
agstephens opened this issue Feb 9, 2023 · 4 comments
Open

Build a web-proxy into the deployment stack #27

agstephens opened this issue Feb 9, 2023 · 4 comments
Assignees

Comments

@agstephens
Copy link
Collaborator

agstephens commented Feb 9, 2023

Requirement

Requirements:

  • rate-limit access
  • stop denial-of-service attacks
  • potentially provide quality of service (based on roles)
  • allow sign-up
  • identify user via a token (like and API_KEY)

Potential solution

We have deployed an Nginx proxy

API Management:

  • a generic proxy for everything
  • could use Nginx:
    • used in Kubernetes ingress controller
  • Nginx can call out to whatever you like:
    • can talk to a separate endpoint
    • we have built an access-management Django app
      • uses Django middleware stack
    • configured for web-based single sign-on
    • or can do token-based access:
      • like an API KEY
    • can talk to a Policy Engine:
      • OPA (popular in K8)
    • the access token that gets passed in has:
      • user identifier
      • usage attributes
      • so you don't have to do a third-party call to get attributes

William has set up a testbed for CORDEX for this:

  • includes a Django extra (web-app) to register for access to attributes
  • need to create a fresh index

Without K8:

  • deploy with docker-ansible and include nginx and filter

Example filter rule:

opa:
  restrictedPaths:
    - name: cordex
      path: "^/thredds/(fileServer|dodsC)/esg_cordex/.*"
      group: "cordex_research"
    - name: cordex_demo
      path: "^/thredds/(fileServer|dodsC)/esg_cordex_demo/.*"
      group: "cordex_demo"
@agstephens agstephens self-assigned this Feb 9, 2023
@agstephens
Copy link
Collaborator Author

agstephens commented Mar 9, 2023

Discussion (09/03/2023) points:

  • Putting the API_KEY in the URL query string is not currently encouraged/supported by our tools - so maybe this is only a desirable requirement.
  • Open Policy Agent (OPA) can be configured with:
    • YAML rule, or
    • OPA script - written in PROLOG:
      • can hand-craft rules in this language
  • Response is always "YES" or "NO".
  • Ideally, we would have rate-limiting that takes into account:
    • overall allocation
    • previous use
    • overuse during the recent period
    • priority use for certain users/projects
  • Where would the accounting happen?
    1. Need a job-management DB:
    • tracking usage
    • tracking allocation
    • maybe in the WPS application - or connected to it

Stages of managing a request before it is sent to the WPS:

  1. Request received by Proxy
  2. Proxy talks to the PEP
  3. PEP (Policy Enforcement Point Service - i.e. our Django App) checks with the OPA service (which is the first Policy Decision Point (PDP)) - layer 1 PDP
  4. The second PDP could be our new Service XYZ (later rebadged to the WAM!) - layer 2 PDP - might not need to exist at first
  5. Service XYZ could decide on the response based on rules/logic/usage:
    • Service XYZ could be part of the WPS or a separate application that tracks and records usage (that the WPS might also talk to separately).
    • Service XYZ would need to have access to usage information about the WPS (i.e. a database of job logs/stats)
    • Service XYZ needs some business logic to decide whether to allow a job
  • Note: the above approach could be written with only the OPA service and no second layer to start with.

@agstephens
Copy link
Collaborator Author

agstephens commented Mar 17, 2023

What exists and what would need to be built?

Components:

  • ESGF Web Processing Service (WPS) - can be quickly deployed on existing VMs
  • Nginx Proxy Server - could be configured into WPS Nginx playbook:
    • Add a new location block in the configuration
  • PEP - is a lightweight Django App:
    • Would be run on WPS server and configured by playbook
  • PDP - is and OPA service:
    • Lightweight service (running from a binary) - running on its own port
    • Configured by some prolog code - installed by playbook
  • Optionally, we build the second PDP - the WPS Access Manager (WAM!):
    • Could be any type of service end-point
    • It returns a response code and a JSON document (with arbitrary content)
      • e.g. return "authorised" or "not authorised" or anything
    • The features of the WAM should be built in response to real-life use cases that need supporting (Ag!).

Examples

The dap.ceda.ac.uk service has many of these components:

@agstephens
Copy link
Collaborator Author

Need to investigate Slurm-like solutions for the business logic in the WAM!

@agstephens
Copy link
Collaborator Author

Regarding Keycloak, it looks like EGI-Checkin and Globus-Auth might be replacing Keycloak for LLNL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant