Friday, February 24, 2012

I need Optimised duplicate finding query

I have a query which is like this
Select field a, field b,field c, field d from table1 where field d not in
(select distinct field d from table 2)
Table 1 has 35000 records
Table 2 has 12391876 records
Field d is of type varchar.
If there is any duplicate to be found in table 2 then the insertion from
table 1 to table 2 will not happen.
I need this query to be get optimised.Hi
DECLARE @.rowcount INT
SELECT Field1,COUNT(*) FROM Table2
GROUP BY Field1
HAVING COUNT(*)>1
SET @.rowcount =@.@.ROWCOUNT
IF @.rowcount >0 --Do exist duplicate rows
.........
"Cynthia" <Cynthia@.discussions.microsoft.com> wrote in message
news:CDCF5DF6-52FA-4D80-AAED-1F3803720972@.microsoft.com...
>I have a query which is like this
> Select field a, field b,field c, field d from table1 where field d not in
> (select distinct field d from table 2)
>
> Table 1 has 35000 records
> Table 2 has 12391876 records
> Field d is of type varchar.
>
> If there is any duplicate to be found in table 2 then the insertion from
> table 1 to table 2 will not happen.
>
> I need this query to be get optimised.
>|||I understand that the below query is used for finding out a duplicate value
in table2.
Note : field d contains <FILE NAME> . if there are 20 records in a file then
20 records will have the same filename and so on. A group of files will be
imported to the temporary table (table 1). Before inserting it to the main
table (table 2) a check is done whether that file is existing in the main
table. If it is existing then the insertion process will not done in order t
o
avoid duplicate file being entered into the main table.
We use the below said query given by me for this task.
So I need a optimised query.
"Uri Dimant" wrote:

> Hi
> DECLARE @.rowcount INT
> SELECT Field1,COUNT(*) FROM Table2
> GROUP BY Field1
> HAVING COUNT(*)>1
> SET @.rowcount =@.@.ROWCOUNT
> IF @.rowcount >0 --Do exist duplicate rows
> ..........
> "Cynthia" <Cynthia@.discussions.microsoft.com> wrote in message
> news:CDCF5DF6-52FA-4D80-AAED-1F3803720972@.microsoft.com...
>
>|||> So I need a optimised query.
CREATE indexes to optimize your query
http://www.sql-server-performance.c...red_indexes.asp
http://www.sql-server-performance.c...red_indexes.asp
http://www.sql-server-performance.c...ing_indexes.asp
"Cynthia" <Cynthia@.discussions.microsoft.com> wrote in message
news:2B4DD3B0-5BFE-4984-9726-9036457E8EA1@.microsoft.com...
>I understand that the below query is used for finding out a duplicate value
> in table2.
> Note : field d contains <FILE NAME> . if there are 20 records in a file
> then
> 20 records will have the same filename and so on. A group of files will be
> imported to the temporary table (table 1). Before inserting it to the main
> table (table 2) a check is done whether that file is existing in the main
> table. If it is existing then the insertion process will not done in order
> to
> avoid duplicate file being entered into the main table.
> We use the below said query given by me for this task.
> So I need a optimised query.
> "Uri Dimant" wrote:
>|||You might try using a not exists instead of not in. Not in will select all
12,391,876 rows from table2 then order them and summarize them, which
requires a fiar amount of in memory processing. The Not exists will check
table2 for each row in table1, meaning a maximum of 35,000 lookups. Note,
this will be faster if you have an index on table2.fieldd, but may be much
slower is this index does not exist.
Select fielda, fieldb,fieldc, fieldd from table1 t1 where not exists
(select 1 from table2 t2 where t2.fieldd = t1.fieldd)
"Cynthia" <Cynthia@.discussions.microsoft.com> wrote in message
news:CDCF5DF6-52FA-4D80-AAED-1F3803720972@.microsoft.com...
> I have a query which is like this
> Select field a, field b,field c, field d from table1 where field d not in
> (select distinct field d from table 2)
>
> Table 1 has 35000 records
> Table 2 has 12391876 records
> Field d is of type varchar.
>
> If there is any duplicate to be found in table 2 then the insertion from
> table 1 to table 2 will not happen.
>
> I need this query to be get optimised.
>

No comments:

Post a Comment