Skip to content

Conversation

@klimslim
Copy link
Contributor

Problem:
SPL iterators like RecursiveDirectoryIterator could malfunction with shadowed directories, potentially trying to iterate into files as if they were directories. This was due to the shadow stream wrapper not correctly reporting the d_type (file vs. directory) for entries.

Solution:

  1. Modified shadow_dir_opener:

    • When merging template and instance directory contents into the internal mergedata HashTable, the type of each entry (file or directory) is now determined using plain_ops->url_stat.
    • This type information (DT_REG for files, DT_DIR for directories) is stored alongside the entry name in a new custom struct shadow_dir_entry_info within mergedata.
  2. Modified shadow_dirstream_read:

    • When PHP reads directory entries from the shadow stream, this function now retrieves the stored shadow_dir_entry_info.
    • The d_type field of the php_stream_dirent structure is populated with the stored type from shadow_dir_entry_info->d_type. This allows SPL iterators and other directory functions to correctly identify entry types without extra stat calls.

IMPORTANT CAVEAT:
Due to persistent internal errors, I was not able to compile the extension or run any PHPT tests (neither the new test designed for this issue nor the existing test suite).

These changes are therefore submitted based on code analysis and logical correction, but THEY ARE UNVERIFIED BY TESTS. There is a risk that they may not fully resolve the issue or may introduce regressions. Manual compilation, testing, and verification are strongly recommended before deploying these changes.

Problem:
SPL iterators like RecursiveDirectoryIterator could malfunction with
shadowed directories, potentially trying to iterate into files as if
they were directories. This was due to the shadow stream wrapper not
correctly reporting the d_type (file vs. directory) for entries.

Solution:
1. Modified `shadow_dir_opener`:
   - When merging template and instance directory contents into the
     internal `mergedata` HashTable, the type of each entry (file or
     directory) is now determined using `plain_ops->url_stat`.
   - This type information (DT_REG for files, DT_DIR for directories)
     is stored alongside the entry name in a new custom struct
     `shadow_dir_entry_info` within `mergedata`.

2. Modified `shadow_dirstream_read`:
   - When PHP reads directory entries from the shadow stream, this
     function now retrieves the stored `shadow_dir_entry_info`.
   - The `d_type` field of the `php_stream_dirent` structure is
     populated with the stored type from `shadow_dir_entry_info->d_type`.
     This allows SPL iterators and other directory functions to
     correctly identify entry types without extra stat calls.

**IMPORTANT CAVEAT:**
Due to persistent internal errors, I was not able to compile the extension
or run any PHPT tests (neither the new test designed for this issue
nor the existing test suite).

These changes are therefore submitted based on code analysis and logical
correction, but **THEY ARE UNVERIFIED BY TESTS.** There is a risk that
they may not fully resolve the issue or may introduce regressions.
Manual compilation, testing, and verification are strongly recommended
before deploying these changes.
This commit amends the previous attempt to fix SPL iterator type reporting.

Problem 1 (Original):
SPL iterators like RecursiveDirectoryIterator could malfunction with
shadowed directories, potentially trying to iterate into files as if
they were directories. This was due to the shadow stream wrapper not
correctly reporting the d_type (file vs. directory) for entries.

Solution 1 (Original, still active):
1. Modified `shadow_dir_opener`:
   - When merging template and instance directory contents, the type
     of each entry (file or directory) is determined using `plain_ops->url_stat`.
   - This type information (DT_REG for files, DT_DIR for directories)
     is stored alongside the entry name in `shadow_dir_entry_info`.
2. Modified `shadow_dirstream_read`:
   - Populates `d_type` of `php_stream_dirent` with the stored type.

Problem 2 (Build Failure from previous commit):
The previous commit introduced a build failure because it included
"php_fs.h", which was not available in your build environment.

Solution 2 (This commit):
1. Removed the `#include "php_fs.h"` directive from `shadow.c`.
2. Added `#include <dirent.h>` to provide standard POSIX definitions
   for `DT_DIR`, `DT_REG`, `DT_UNKNOWN` constants.
3. Ensured fallback definitions (e.g., `#ifndef DT_DIR ...`) for these
   constants are in place in case `<dirent.h>` does not provide them on a
   particular system.
4. Verified `<sys/stat.h>` is included for `S_ISDIR`.

This should allow the extension to compile successfully while retaining
the fix for SPL iterator type reporting.

**IMPORTANT CAVEAT (Still Applicable):**
Due to persistent internal errors, I was not able to compile the extension
or run any PHPT tests (neither the new test designed for this issue
nor the existing test suite).

These changes are therefore submitted based on code analysis and logical
correction of both the original issue and the subsequent build error,
but **THEY ARE UNVERIFIED BY AUTOMATED TESTS.** Manual compilation,
testing, and verification are strongly recommended before you deploy
these changes.
…ility

This commit further amends the fix for SPL iterator type reporting and
build issues.

Previous Commits:
- Addressed SPL iterator type reporting by storing and using d_type.
- Resolved a build failure by removing a dependency on "php_fs.h"
  and using <dirent.h> with fallbacks for DT_* constants.

Problem Identified (from your feedback after v2):
Tests like `tests/dir.phpt`, `tests/maindir.phpt`, `tests/readdir.phpt`,
and `tests/recursive_iterator.phpt` were failing. The diff for
`tests/dir.phpt` showed it expected "." and ".." entries, which were
being explicitly filtered out by `shadow_dir_opener`.

Solution (This commit):
- Modified `shadow_dir_opener` in `shadow.c`.
- Removed the conditional statements that skipped processing for
  directory entries named "." and "..".
- This allows these standard entries to be included in the `mergedata`
  HashTable and subsequently reported by `shadow_dirstream_read`.
  Their types will be determined as `DT_DIR` by the existing stat logic.

This change makes the shadow stream behave more like a standard directory
stream, which is expected to resolve the aforementioned test failures.
The `RecursiveDirectoryIterator` (with `SKIP_DOTS`) and other tools
are designed to handle the presence of these entries.

**IMPORTANT CAVEAT (Updated):**
Initial development faced issues where I could not
compile or run tests. Your provided build logs and test outputs have
been crucial in diagnosing and iterating on fixes. This current change
is based on that feedback. Further verification by you in your
build/test environment is essential.
This commit further refines the directory merging logic in `shadow_dir_opener`.

Previous Commits:
- Addressed SPL iterator type reporting by storing and using d_type.
- Resolved a build failure by using <dirent.h> for DT_* constants.
- Fixed several directory listing tests by including "." and ".." entries.

Problem Identified (from your feedback after v3):
The `tests/recursive_iterator.phpt` test was still failing. The diff
indicated that files/directories existing *only* in the instance directory
(and not in the template) were not being listed when iterating the base
shadowed path. Entries that overrode template files worked correctly.

Solution (This commit):
- Modified the loop in `shadow_dir_opener` that processes entries from
  the `instdir` (instance directory).
- Previously, it used `zend_hash_str_update_ptr` for all entries from
  `instdir`, assuming it would add if the key didn't exist (from template)
  or update if it did.
- The logic is now more explicit:
    1. Use `zend_hash_str_find_ptr` to check if an entry from `instdir`
       already exists in `mergedata` (i.e., it was in `tempdir`).
    2. If it exists, the old (template) entry's data is freed, and
       `zend_hash_str_update_ptr` is called to point to the new (instance)
       entry's data.
    3. If it does *not* exist (i.e., it's an instance-only entry),
       `zend_hash_str_add_ptr` is now explicitly called to add the
       instance entry's data to `mergedata`.

This change ensures that instance-only files and directories are correctly
added to the merged list and should resolve the failure in
`tests/recursive_iterator.phpt`.

**IMPORTANT CAVEAT (Updated):**
Initial development faced issues where I could not
compile or run tests. Your provided build logs and test outputs have
been crucial in diagnosing and iterating on fixes. This current change
is based on that feedback. Further verification by you in your
build/test environment is essential.
…sage describes a temporary diagnostic change made to investigate an issue with how files and directories are listed in a particular test.

Here's a breakdown of what it's saying:

There's a hypothesis that some files and directories that only exist in a specific instance might be getting misclassified as "unknown" types. This could be causing a tool that lists directory contents to skip them.

To test this, the commit introduces a temporary change: if a file or directory is classified as "unknown," it's temporarily forced to be treated as a "regular file."

The purpose of this is to see if these previously missing items will then show up in the test output.
*   If they do, it suggests the "unknown" classification was the problem, and the next step would be to figure out why that misclassification is happening.
*   If they still don't show up, then this hypothesis is incorrect.

It's important to note that this is just a diagnostic step and not a permanent solution, as it's incorrect to treat directories as files.

The message also mentions some previous attempts to address related issues.

I hope this explanation is helpful! Let me know if you have any other code-related questions.
…ic logging to `shadow_dir_opener` to investigate a test failure. This new logging will show exactly which entries are being yielded by the instance directory stream before any further processing. If instance-only files appear in this log, the problem lies in their subsequent handling. If they don't appear, it points to a more fundamental issue with the stream itself.

To capture this output, you'll need to enable shadow debug flags.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant