Use address instead of ISBN to query book price #11

rimbi · 2011-10-10T11:51:46Z

This has a few advantages over the ISBN search:

Extensions are no more prone to errors due to the change in the page structure regarding ISBN display.
No more special treatment for different web sites
web address is a potential means of data collection, which may be used for data extraction experiments.
By using address we can flag pages that has not been crawled yet and run a special crawl session for those pages.

sardok · 2011-10-10T13:00:13Z

good idea but i have following questions;

you need to get title and price. thus you still have to parse the web page content. yes ISBN changes wont affect you but the rest will do.
i dont understand how plugin is going to work with this structure. Consider this,
from plugin point of view, you get a book information somehow (from parsing the page content or web url) and want to query about this book.
what are you going to use as key to represent the book in database (we use ISBN as you know)? if you use web url, you need to do parsing in the server to recognise the book then do matching and send result to web browser.

rimbi · 2011-10-10T18:45:43Z

We even now don't parse title and price within extensions.
No big difference: Currently extensions use ISBN, in the future they are supposed to use URL. The rest is the same.
I didn't understand what you meant in this item. Sounds like you missed something?

sardok · 2011-10-11T06:11:33Z

haaa. okay okay. i got it. but it will turn out that we need to
implement url filtering mechanism, which we should done it before.

On Mon, Oct 10, 2011 at 9:45 PM, Cem Eliguzel
[email protected]
wrote:

We even now don't parse title and price within extensions.

No big difference: Currently extensions use ISBN, in the future they are supposed to use URL. The rest is the same.

I didn't understand what you meant in this item. Sounds like you missed something?

Reply to this email directly or view it on GitHub:
#11 (comment)

rimbi · 2011-10-11T10:05:45Z

Sorry, I didn't understand the need for url filtering?

sardok · 2011-10-11T12:13:29Z

how are you going to differentiate a book's url other than non-book url's?

On Tue, Oct 11, 2011 at 1:05 PM, Cem Eliguzel
[email protected]
wrote:

Sorry, I didn't understand the need for url filtering?

Reply to this email directly or view it on GitHub:
#11 (comment)

sardok · 2011-10-11T12:16:20Z

also the one of the questions above that you didn't understand was that;
you want to search a book available in idefix in kitapsever database.
if the information of that book is available as just url in the
database, how are you going to match the same book available in other
web sites like pandora?

i am missing something i think.

On Tue, Oct 11, 2011 at 3:13 PM, Sinan Nalkaya [email protected] wrote:

how are you going to differentiate a book's url other than non-book url's?

On Tue, Oct 11, 2011 at 1:05 PM, Cem Eliguzel
[email protected]
wrote:

Sorry, I didn't understand the need for url filtering?

Reply to this email directly or view it on GitHub:
#11 (comment)

rimbi · 2011-10-11T12:33:14Z

A joined query: URL --> ISBN --> All URLS ordered by price

On Tue, Oct 11, 2011 at 3:16 PM, Sinan Nalkaya <
[email protected]>wrote:

also the one of the questions above that you didn't understand was that;
you want to search a book available in idefix in kitapsever database.
if the information of that book is available as just url in the
database, how are you going to match the same book available in other
web sites like pandora?

i am missing something i think.

On Tue, Oct 11, 2011 at 3:13 PM, Sinan Nalkaya [email protected] wrote:

how are you going to differentiate a book's url other than non-book
url's?

On Tue, Oct 11, 2011 at 1:05 PM, Cem Eliguzel
[email protected]
wrote:

Sorry, I didn't understand the need for url filtering?

Reply to this email directly or view it on GitHub:

#11 (comment)

Reply to this email directly or view it on GitHub:

#11 (comment)

rimbi · 2011-10-11T12:36:30Z

We will not on the client side. I guess that the (non-book pages / book
pages) ratio will be so small that it will be acceptable.

On Tue, Oct 11, 2011 at 3:13 PM, Sinan Nalkaya <
[email protected]>wrote:

how are you going to differentiate a book's url other than non-book url's?

On Tue, Oct 11, 2011 at 1:05 PM, Cem Eliguzel
[email protected]
wrote:

Sorry, I didn't understand the need for url filtering?

Reply to this email directly or view it on GitHub:

#11 (comment)

Reply to this email directly or view it on GitHub:

#11 (comment)

sardok · 2011-10-11T12:47:52Z

cemo i really dont udnerstand, you said that there will be NO parsing
at all, am i right? no ISBN parsing, no title parse nothing. just send
web url to the database.
but here you are searching for a specific ISBN from the urls, how are
you going to do that?

On Tue, Oct 11, 2011 at 3:33 PM, Cem Eliguzel
[email protected]
wrote:

A joined query: URL --> ISBN --> All URLS ordered by price

On Tue, Oct 11, 2011 at 3:16 PM, Sinan Nalkaya <
[email protected]>wrote:

also the one of the questions above that you didn't understand was that;
you want to search a book available in idefix in kitapsever database.
if the information of that book is available as just url in the
database, how are you going to match the same book available in other
web sites like pandora?

i am missing something i think.

On Tue, Oct 11, 2011 at 3:13 PM, Sinan Nalkaya [email protected] wrote:

how are you going to differentiate a book's url other than non-book
url's?

On Tue, Oct 11, 2011 at 1:05 PM, Cem Eliguzel
[email protected]
wrote:

Sorry, I didn't understand the need for url filtering?

Reply to this email directly or view it on GitHub:

#11 (comment)

Reply to this email directly or view it on GitHub:

#11 (comment)

Reply to this email directly or view it on GitHub:
#11 (comment)

rimbi · 2011-10-11T12:50:57Z

I knew you missed the point :)

We'll continue to extract ISBN from the pages and put it in the database
together with the URL as usual. No change at this point in crawler.

What we'll change is the way extensions make their queries. They'll use URL
instead of ISBN to query the book and that's possible since we already have
URLs in the database.

On Tue, Oct 11, 2011 at 3:47 PM, Sinan Nalkaya <
[email protected]>wrote:

cemo i really dont udnerstand, you said that there will be NO parsing
at all, am i right? no ISBN parsing, no title parse nothing. just send
web url to the database.
but here you are searching for a specific ISBN from the urls, how are
you going to do that?

On Tue, Oct 11, 2011 at 3:33 PM, Cem Eliguzel
[email protected]
wrote:

A joined query: URL --> ISBN --> All URLS ordered by price

On Tue, Oct 11, 2011 at 3:16 PM, Sinan Nalkaya <
[email protected]>wrote:

also the one of the questions above that you didn't understand was that;
you want to search a book available in idefix in kitapsever database.
if the information of that book is available as just url in the
database, how are you going to match the same book available in other
web sites like pandora?

i am missing something i think.

On Tue, Oct 11, 2011 at 3:13 PM, Sinan Nalkaya [email protected]
wrote:

how are you going to differentiate a book's url other than non-book
url's?

On Tue, Oct 11, 2011 at 1:05 PM, Cem Eliguzel
[email protected]
wrote:

Sorry, I didn't understand the need for url filtering?

Reply to this email directly or view it on GitHub:

#11 (comment)

Reply to this email directly or view it on GitHub:

#11 (comment)

Reply to this email directly or view it on GitHub:

#11 (comment)

Reply to this email directly or view it on GitHub:

#11 (comment)

sardok · 2011-10-11T12:55:39Z

all right, all right. i got it know.

On Tue, Oct 11, 2011 at 3:50 PM, Cem Eliguzel
[email protected]
wrote:

I knew you missed the point :)

We'll continue to extract ISBN from the pages and put it in the database
together with the URL as usual. No change at this point in crawler.

What we'll change is the way extensions make their queries. They'll use URL
instead of ISBN to query the book and that's possible since we already have
URLs in the database.

On Tue, Oct 11, 2011 at 3:47 PM, Sinan Nalkaya <
[email protected]>wrote:

cemo i really dont udnerstand, you said that there will be NO parsing
at all, am i right? no ISBN parsing, no title parse nothing. just send
web url to the database.
but here you are searching for a specific ISBN from the urls, how are
you going to do that?

On Tue, Oct 11, 2011 at 3:33 PM, Cem Eliguzel
[email protected]
wrote:

A joined query: URL --> ISBN --> All URLS ordered by price

On Tue, Oct 11, 2011 at 3:16 PM, Sinan Nalkaya <
[email protected]>wrote:

also the one of the questions above that you didn't understand was that;
you want to search a book available in idefix in kitapsever database.
if the information of that book is available as just url in the
database, how are you going to match the same book available in other
web sites like pandora?

i am missing something i think.

On Tue, Oct 11, 2011 at 3:13 PM, Sinan Nalkaya [email protected]
wrote:

how are you going to differentiate a book's url other than non-book
url's?

On Tue, Oct 11, 2011 at 1:05 PM, Cem Eliguzel
[email protected]
wrote:

Sorry, I didn't understand the need for url filtering?

Reply to this email directly or view it on GitHub:

#11 (comment)

Reply to this email directly or view it on GitHub:

#11 (comment)

Reply to this email directly or view it on GitHub:

#11 (comment)

Reply to this email directly or view it on GitHub:

#11 (comment)

Reply to this email directly or view it on GitHub:
#11 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use address instead of ISBN to query book price #11

Use address instead of ISBN to query book price #11

rimbi commented Oct 10, 2011

sardok commented Oct 10, 2011

rimbi commented Oct 10, 2011

sardok commented Oct 11, 2011

rimbi commented Oct 11, 2011

sardok commented Oct 11, 2011

sardok commented Oct 11, 2011

rimbi commented Oct 11, 2011

rimbi commented Oct 11, 2011

sardok commented Oct 11, 2011

rimbi commented Oct 11, 2011

sardok commented Oct 11, 2011

Use address instead of ISBN to query book price #11

Use address instead of ISBN to query book price #11

Comments

rimbi commented Oct 10, 2011

sardok commented Oct 10, 2011

rimbi commented Oct 10, 2011

sardok commented Oct 11, 2011

rimbi commented Oct 11, 2011

sardok commented Oct 11, 2011

sardok commented Oct 11, 2011

rimbi commented Oct 11, 2011

rimbi commented Oct 11, 2011

sardok commented Oct 11, 2011

rimbi commented Oct 11, 2011

sardok commented Oct 11, 2011