Running validate on HPC #1502
Replies: 5 comments
-
|
@jblalockusgs a few questions/comments:
|
Beta Was this translation helpful? Give feedback.
-
|
Here's an example report with The only other DEBUG information was the various execution times that don't get routed to the report file: In that particular case, it was having problems with the I've also had a few cases where Out of 10,000 I'm running |
Beta Was this translation helpful? Give feedback.
-
|
thanks @jblalockusgs. @al-niessner any thoughts on above? I am thinking this is possibly an issue with trying to download the schemas too often, but I may be wrong. @jblalockusgs it may be work testing with local schemas/schematrons |
Beta Was this translation helpful? Give feedback.
-
|
I do not know about too often, but I do know that the PDS site does sputter occasionally and it is load related. More than too many accesses, since I was doing dozens not thousands of downloads, it seemed to be collisions. I never tracked down the actual cause or problem so the cause is all speculation. I can, with certainty, confirm the sputtering. Yes, this is the perfect situation for local schema. You may already know all of the schema you need. If you do not, then you will need to collect them from all your labels. You will need all of them because it is an override all or none situation. Make a catalog file like: When you run validate, point it at the catalog file with |
Beta Was this translation helpful? Give feedback.
-
|
Just to follow up: We were just able to do full validation on 753,384 products in ~19 hours, so this suggestion was a huge help for us...once we got everything working smoothly. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
We've been experimenting with running
validateon our USGS HPC (https://hpcportal.cr.usgs.gov/hpc-user-docs/supercomputers/hovenweep.html). This seems to have a lot of potential for handling some of our larger archives with many products/labels.We've noticed some intermittent errors like
error.label.unresolvable_resourcethat are happening because of timeouts. Is there some sort of rough upper limit to the number of simultaneous validations we can/should run? Or is this a situation where we should try and use local schemas and schematrons? I'm concerned that we might be overwhelming something if we are running several thousand instances ofvalidatesimultaneously. The HPC can handle that...but I'm not surevalidatecan.Beta Was this translation helpful? Give feedback.
All reactions