A cheklist of SQL server performance consideration for architecting/designing systems
that deal with very large volumes of data. They are in no particular order. SQL performance
tuning is necessary to optimise a SQL Server database.
Careful researching and planning is required to design and manage a write-intensive
SQL Server database. Test all SQL Server configuration changes, before moving into
Production. Consider the implications of making changes.
* Run as many load processes
as you have available CPUs. If you have 8 CPUs, run 8 parallel loads. If you have
2 CPUs, run 2 parallel loads.
* If you have control over the creation of your input files, make them of a size
that is evenly divisible by the number of load threads you want to run in parallel.
* Use BULK INSERTinstead of BCP if you are running the process on the SQL Server
machine. This allows the process to be allocated to the SQL Server Service.
* Use table partitioning to gain poetentially 8-10%, but only if your input files
are insured to match your partitioning function, meaning that all records in one
file must be in the same partition. In this scenario, every input file would go
into a separate table with its own partition, when insert was completed, the partitions
would be switched into a large table. The switching is only a few milliseconds.I
will post details regarding this process.
* Use TABLOCK to avoid row at a time locking. TABLOCK reduces lock contention on
the table, therefore, in theory increasing performance.It overrirdes page /row locking.
For example, (from BOL)
BULK INSERT pubs..authors2 FROM 'c:\authors.txt'
WITH (
DATAFILETYPE = 'char',
FIELDTERMINATOR = ',',
TABLOCK
)
The TABLOCK can also be used in the
INSERT INTO myTable1 SELECT * FROM myTable2 WITH(TABLOCK) type of statement
* Use ROWS PER BATCH = 2600, or something near this if you are importing multiple
streams into one table. Use ROWS PER BATCH=2600 , primarily to avoid lock escalation
* Place database in BULK LOGGED mode.This records extent allocations and related
metadata, but not the individual page changes and not the individual record inserts.
* In the case of row-versioning, indexing, temporary objects there is a greater
dependance on TempDB. In those case RAID 10 is the way to go on TempDB. There are
cost implications, and if so use RAID 0. If there isn't a lot of write-activity
RAID 5 will suffice. SQL Server – tempdb and solid state drives
* Multiple writing to the data aspect of the database can be done concurently by
using multiple disk drives. A major consideration is striping data over multiple
disk drives.RAID 10 is the optimal. RAID 10 mirrors every drive and stripes the
data. At min: RAID 1. RAID 5 is better suited to read-only environment. I will
post some stats of results in comparison
* SQL Server writes to the transaction log file in a synchronous mode.Therefore
fully logged changes is generally limited by the log writing activity.Taking this
one step further, if the writes to the transaction log reach the disk throughput,
the data modifications will have to be limited by the log writes.One approach is
to split the the db across multiple dbs, all with separate log files, all on separate
hard drives. Logging is often the most crirical performance factor in an intense
INSERT environment. Aim to have the log files on a dedicated hard disk
* The nature of the application may have to dictate what type of INSERT process
is required. Typically there are 3 INSERT approaches. 1)BULK 2)multirow INSERT such
as INSERT SELECT 3)individual INSERT. The order is outlined is the order of efficiency,
with 1) being the most efficient. The level of logging is the key differentiator
for the above 3 methods.
* Even though BULK INSERTS are the most efficient, i.e place a number of INSERTS
within a single transaction, there isn't an objective number that is optimal. A
degree of testing is required. I will post notes, on a given hardware, indexing
architecture
as to results I've experienced. An effective approach is benchmarking, increasing
or decreasing the amount of rows per BULK INSERT.
* Dealing with very large tables and manipulating is primarily not about space,
but considering whether to delete rows or truncate / drop tables. Regardless of
whether there is a clause involved in the DELETE statement, a DELETE statement when
dealing with large data tables is expensive. Truncating/Dropping is inexpensive
in comparison .
For example, if you have 1,000,000 table rows at 2,000 bytes each , you are looking
at 2gb space.
i.e approx, 1,000,000 rows x 2,000bytes = 2,000,000,000 bytes = 2 GB
* Hardware considerations- speak to your Hardware engineers
Republished from http://www.sqlserver-dba.com.
Republished from SQL Server DBA [65 clicks].
Read the original version here [32134 clicks].