Open
Description
Hi.
As is described here:
in Python 3, a regular expression compiled from a str has the re.UNICODE flag set.
But unicode regexps is extremely slow:
2015-04-18T23:44:29.539+0300 I QUERY [conn2] query test.computers query: { $or: [ { number: "test" }, { hostname: /^test/u }, { macs: /^test/u }, { ipmi: /^test/u }, { dc: /^test/u } ] } planSummary: IXSCAN { hostname: 1 }, IXSCAN { number: 1 }, IXSCAN { dc: 1 }, IXSCAN { macs: 1 }, IXSCAN { ipmi: 1 } ntoreturn:100 ntoskip:0 nscanned:498422 nscannedObjects:104692 keyUpdates:0 writeConflicts:0 numYields:4391 nreturned:3 reslen:643 locks:{} 1001ms
Non-unicode version:
2015-04-18T23:42:33.177+0300 I QUERY [conn1] query test.computers query: { $or: [ { number: "test" }, { hostname: /^test/ }, { macs: /^test/ }, { ipmi: /^test/ }, { dc: /^test/ } ] } planSummary: IXSCAN { hostname: 1 }, IXSCAN { number: 1 }, IXSCAN { dc: 1 }, IXSCAN { macs: 1 }, IXSCAN { ipmi: 1 } ntoreturn:100 ntoskip:0 nscanned:4 nscannedObjects:3 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:3 reslen:643 locks:{} 1ms
I've tried to use such method in my model:
@staticmethod
def search(pattern):
return Computer.objects.filter(
db.Q(number=pattern) |
db.Q(hostname__startswith=pattern) |
db.Q(macs__startswith=pattern) |
db.Q(ipmi__startswith=pattern) |
db.Q(dc__startswith=pattern)
)
But it generates unicode regex (1st query).
Now I'm using different method:
@staticmethod
def search(pattern):
r = bson.regex.Regex('^{}'.format(escape(pattern)))
return Computer.objects(
db.Q(number=pattern) |
db.Q(hostname=r) |
db.Q(macs=r) |
db.Q(ipmi=r) |
db.Q(dc=r)
)
And it works fine (2nd query). But it is terrible, I think.
Also I've tried to use bytestring instead of unicode string, but it handles invalid here.