Skip to content

Regex search is extremely slow with python3 #965

Open
@ffix

Description

@ffix

Hi.

As is described here:

in Python 3, a regular expression compiled from a str has the re.UNICODE flag set.

But unicode regexps is extremely slow:

2015-04-18T23:44:29.539+0300 I QUERY    [conn2] query test.computers query: { $or: [ { number: "test" }, { hostname: /^test/u }, { macs: /^test/u }, { ipmi: /^test/u }, { dc: /^test/u } ] } planSummary: IXSCAN { hostname: 1 }, IXSCAN { number: 1 }, IXSCAN { dc: 1 }, IXSCAN { macs: 1 }, IXSCAN { ipmi: 1 } ntoreturn:100 ntoskip:0 nscanned:498422 nscannedObjects:104692 keyUpdates:0 writeConflicts:0 numYields:4391 nreturned:3 reslen:643 locks:{} 1001ms

Non-unicode version:

2015-04-18T23:42:33.177+0300 I QUERY    [conn1] query test.computers query: { $or: [ { number: "test" }, { hostname: /^test/ }, { macs: /^test/ }, { ipmi: /^test/ }, { dc: /^test/ } ] } planSummary: IXSCAN { hostname: 1 }, IXSCAN { number: 1 }, IXSCAN { dc: 1 }, IXSCAN { macs: 1 }, IXSCAN { ipmi: 1 } ntoreturn:100 ntoskip:0 nscanned:4 nscannedObjects:3 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:3 reslen:643 locks:{} 1ms

I've tried to use such method in my model:

@staticmethod
def search(pattern):
    return Computer.objects.filter(
        db.Q(number=pattern) |
        db.Q(hostname__startswith=pattern) |
        db.Q(macs__startswith=pattern) |
        db.Q(ipmi__startswith=pattern) |
        db.Q(dc__startswith=pattern)
    )

But it generates unicode regex (1st query).
Now I'm using different method:

@staticmethod
def search(pattern):
  r = bson.regex.Regex('^{}'.format(escape(pattern)))
  return Computer.objects(
      db.Q(number=pattern) |
      db.Q(hostname=r) |
      db.Q(macs=r) |
      db.Q(ipmi=r) |
      db.Q(dc=r)
  )

And it works fine (2nd query). But it is terrible, I think.
Also I've tried to use bytestring instead of unicode string, but it handles invalid here.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions