In: Computer Science
can someone explain the difference between how Oracle and SQLServer store files.? with steps and examples
SQLServer
To configure FileStream in SQL Server
To enable FileStream on any database,
1.Firstly enable the FileStream feature on the SQL Server instance. To do that, open SQL Server configuration manager, right-click SQL Instance, select Properties
A dialog box to configure server properties opens.
Switch to the FILESTREAM tab. Select Enable FILESTREAM for T-SQL access. Select Enable FILESTREAM for I/O access and then select Allow remote client access to FILESTREAM data. In the Windows share name text box, provide a name of the directory to store the files. See the following image:
Click OK and restart the SQL service.
Enable FILESTREAM Access Level on SQL Server Instance
Once the FILESTREAM feature is enabled, change the FILESTREAM access level. To change the FileStream access level, execute the following query:
1 2 |
EXEC sp_configure filestream_access_level, 2 RECONFIGURE |
In the above query, the parameters below are valid values:
0 means the FILESTREAM support for SQL instance is disabled.
1 means the FILESTREAM support for T-SQL is enabled.
2 means the FILESTREAM support for T-SQL and Win32 streaming access is enabled.
You can change the FILESTREAM access level using SQL Server Management Studio. To do that, right-click an SQL Server connection >> select Properties >> In the server properties dialog box, select FileStream Access Level from the drop-down box, and select Full Access Enabled, as shown in the following image:
Once the parameter is changed, restart SQL Server services.
Add FILESTREAM Filegroup and Data Files
Once FILESTREAM is enabled, add the FILESTREAM filegroup and FILESTREAM container.
To do that, right-click the FileStream-Demo database >> select Properties >> In a left pane of the Database Properties dialog box, select Filegroups >> In the FILESTREAM grid, click the Add Filegroup button >> Name the filegroup as Dummy Document. See the following image:
Once the filegroup is created, in the Database Properties dialog box, select files and click the Add button. The Database files grid enables. In the Logical Name column, provide the name, – Dummy-Document. Select FILESTREAM Data in the File Type drop-down box. Select Dummy-Document in the Filegroup column. In the Path column, provide the directory location where files will be stored (E:\Dummy-Documents). See the following image:
Alternatively, you can add the FILESTREAM filegroup and containers by executing the following T-SQL Query:
1 2 3 4 5 6 |
USE [master] GO ALTER DATABASE [FileStream_Demo] ADD FILEGROUP [Dummy-Documents] CONTAINS FILESTREAM GO ALTER DATABASE [FileStream_Demo] ADD FILE ( NAME = N'Dummy-Documents', FILENAME = N'E:\Dummy-Documents' ) TO FILEGROUP [Dummy-Documents] GO |
To verify that the FileStream container has been created, open Windows Explorer and navigate to the “E:\Dummy-Document” directory.
As shown in the above image, the $FSLOG directory and the filestream.hdr file have been created. $FSLOG is like SQL server T-Log, and filestream.hdr contains metadata of FILESTREAM. Make sure that you do not change or edit those files.
Store Files in SQL table
In this demo, we will create a table to store various files from the computer. The table has the following columns:
Create a SQL Table with a FILESTREAM column
Once FILESTREAM configures, create an SQL table with the FILESTREAM columns to store various files in the SQL server table. As I mentioned above, FILESTREAM is not a datatype. It’s an attribute that we add to the varbinary(max) column in the FILESTREAM-enabled table. When you create a FILESTREAM-enabled table, make sure that you add a UNIQUEIDENTIFIER column that has the ROWGUIDCOL and UNIQUE attributes.
Execute the following script to create a FILESTREAM-enabled table:
1 2 3 4 5 6 7 8 9 10 11 12 |
Use [FileStream_Demo] go Create Table [DummyDocuments] ( ID uniqueidentifier ROWGUIDCOL unique NOT NULL, RootDirectory varchar(max), FileName varchar(max), FileAttribute varchar(150), FileCreateDate datetime, FileSize numeric(10,5), FileStreamCol varbinary (max) FILESTREAM ) |
Insert Data in Table
I have the WorldWide_Importors.xls document stored in the computer at the “E:\Documents” location. Use OPENROWSET(Bulk) to load its content from disk to the VARBINARY(max) variable. Then store the variable to the FileStreamCol (VARBINARY(max)) column of the DummyDocument table. To do that, run the following script:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
Use [FileStream-Demo] Go DECLARE @Document AS VARBINARY(MAX) -- Load the image data SELECT @Document = CAST(bulkcolumn AS VARBINARY(MAX)) FROM OPENROWSET( BULK 'E:\Documents\WorldWide_Importors.xls', SINGLE_BLOB ) AS Doc
-- Insert the data to the table INSERT INTO [DummyDocuments] (ID, RootDirectory,FileName, FileAttribute, FileCreateDate,FileSize,FileStreamCol) SELECT NEWID(), 'E:\Documents','WorldWide_Importors.xls','Raw',getdate(),10, @Document |
Access FILESTREAM Data
The FILESTREAM data can be accessed by using T-SQL and Managed API. When the FILESTREAM column accessed using T-SQL query, it uses SQL memory to read the content of the data file and send the data to the client application. When the FILESTREAM column is accessed using Win32 Managed API, it does not use SQL Server memory. It uses the streaming capability of the NT file system which gives performance benefits.
Oracle
Here I want to consider various methods of getting "documents"
into the Oracle database.
None of these methods are completely trivial, and the choice of
which to use will depend on various factors.
What are the characteristics of a "document"? In general, I'm
referring to text which may be too long to fit into a PL/SQL
VARCHAR2 variable. These variables are limited to 32767 bytes. The
document most likely originates from a file on a file system, and
it may be in a character set which differs from the database
character set. In many cases, documents will have metadata (such as
author and creation date) that we want to load at the same
time.
We should distinguish between four types of data:
Loading a mixture of these types of data can be problematic. If we want to load metadata and full text from a single file, then we need some method of determining where the full text starts and finishes. The easiest way to do that is to enclose the full text in some sort of quotes. But then what if the full text contains those quote characters? So it's usually easier to arrange for each full text document to exist in a separate file, referenced from the main loading method.
It would be impossible to load multiple binary data records from a single file, unless the binary data was first encoded using a scheme such as BASE64, since by its nature you can never specify a delimiter string that could not appear in the binary data.
For these examples, we're going to consider three use-cases:
Client / Server considerations - where is your source data?
If your client and server are the same machine, then any of
these methods should work. However, if you are connecting to your
database from a remote client over SQL*Net, you might first decide
whether you want to load files from the client (the machine from
which you connect) or the server (the machine where the database
resides).
Load Method | Data on Client | Data on Server |
---|---|---|
SQL*Plus |
Yes |
No |
SQL*Loader |
Yes |
No |
External Tables |
No |
Yes |
BFILES |
No |
Yes |
Method 1: Using SQL*Plus Scripts
Let's consider we have a table with a several short columns and a CLOB column:
CREATE TABLE my_table (id NUMBER, author VARCHAR2(30), created DATE, text CLOB );
If I just want to do some "quick and dirty" loading of this table, I can just create a SQL script file and run that in SQL*Plus. For example:
INSERT INTO my_table VALUES ( 1, 'Roger', SYSDATE, 'the quick brown fox jumps over the lazy dog' ); INSERT INTO my_table VALUES ( 2, 'John', SYSDATE, 'the angry aligator ate the antelope'); COMMIT;
Simple and quick, and fairly easy to generate from source files using a language such as Perl or Python. But do this with any quantity of data and you'll rapidly hit limitations:
INSERT INTO my_table VALUES ( 3, 'Roger', SYSDATE, 'my best friend''s brother said ''hi''');
INSERT INTO my_table VALUES ( 3, 'Roger', SYSDATE, 'this is a long line that continues over more than one line' || ' in the source code file');
Let's take our usecase 1 - the JSON documents. We could
cut-and-paste from the file into a hand-coded script, but it's
probably better to do it using a text-processing language such as
Perl. The attached file jsonToSql.pl replaces any single quotes
with two single quotes, and breaks each line into chunks no longer
than 2400 characters (allowing some space for syntactic
overheads.
Assuming we have perl installed and available, we can call it from
the command line (Windows or Unix) using
perl jsonToSql.pl jsonsource.txt > json.sql
And we can then load the resulting file json.sql (after creating the necessary table - see above) using
sqlplus user/pass @json.sql
SQL*Plus isn't very good for loading from separate files, so we'll leave scenarios two and three out for this method.
Multibyte Considerations
SQL*Plus isn't very good at dealing with multibyte / UTF-8 characters. To get useful correct (and viewable!) output you will probably want to use SQL Developer to run the scripts.
Method 2: Using SQL*Loader
SQL*Loader is a utility from the very early days of Oracle. It's
reasonably simple to use, but the vast number of options can make
it intimidating for the uninitiated.
SQL*Loader loads files from the client. It has no access
to files on the server.
It is normal to provide SQL*Loader with two inputs: A data file
containing the data to be loaded to the database, and a control
file which specifies how the data file is laid out, and which
columns the data should be written to (it's possible to include the
data inside the control file, but we'll stick with separate files
for this example).
Use case 1: JSON documents in a single file
We'll use the same table "my_table" as defned earlier.
This is fairly straightforward. We create a control file called
jsonloader.ctl which consists of
LOAD DATA INFILE 'jsonsource.txt' INTO TABLE jsondocs FIELDS ( jsontext CHAR(32000) )
Then we load it using the command:
sqlldr roger/roger control=jsonloader.ctl log=loader.log bad=loader.bad
The "log" and "bad" arguments aren't required, but can be most
useful for debugging.
Note that we've limited the length of the jsontext field to 32000
characters, but you can increase that size if you need to.
Use case 2: Loading from separate files
A quick reminder about our second scenario. We have a table which
we'll define as:
CREATE TABLE my_table (id NUMBER, author VARCHAR2(30), created DATE, text CLOB );
and into this table we want to load the files file1.txt and
file2.txt
file1.txt:
The quick brown fox jumps over the lazy dog
file2.txt:
The angry alligator ate the antelope.
These files only have one line in each, but they could just as
easily have multiple lines per file. One file is loaded into each
database record.
If we want to use SQL*Loader to load CLOB or BLOB data from
separate files, then the filenames for those files must be included
in the data file. We use the "FILLER" syntax to specify that the
filename isn't to be loaded into any column, and then the LOBFILE
syntax to load the text into a column.
So here's our control file. We can see that it's similar to the
previous control file but the "FIELDS" section is more complicated,
to allow for multiple metadata fields, and the reference to the
external file to be loaded.
loader.ctl:
LOAD DATA INFILE 'loader.txt' INTO TABLE my_table FIELDS TERMINATED BY ',' ( id CHAR(10), author CHAR(30), created DATE "YYYY-MM-DD" ":created", fname FILLER CHAR(80), text LOBFILE(fname) TERMINATED BY EOF )
"fname" is an arbitrary label, we could have used "fred" and it
would have worked exactly the same. It just needs to be the same on
the two lines where it is used.
Note that the control file includes the name of our data file,
loader.txt. loader.txt contains the metadata associated with each
file, and the filename.
loader.txt:
1,John Smith,2015-04-29,file1.txt
2,Pete Jones,2013-01-31,file2.txt
So we're loading two files, along with the metadata for each.
Now we can run SQL*Loader from the command line as follows:
sqlldr roger/roger control=loader.ctl log=loader.log bad=loader.bad
(again, the "log" and "bad" arguments aren't required, but can
be most useful for debugging).
Then we can use SQL*Plus or SQL Developer to examine our loaded
table:
SQL> select * from my_table; ID AUTHOR CREATED TEXT ---------- ---------------- --------- ------------------------------------------------ 1 John Smith 29-APR-15 The quick brown fox jumps over the lazy dog 2 Pete Jones 31-JAN-13 The angry alligator ate the antelope.
Common mistakes:
Do not be tempted to add spacing to format the loader.txt file nicely. If you use
1, John Smith, 2015-04-29, file1.txt
Then not only will the author be loaded with a space in front of "John", but you'll see an obscure error:
SQL*Loader-509: System error: The device does not recognize the command. SQL*Loader-502: unable to open data file ' file1.txt' for field TEXT table
Muti-byte considerations
So what if your file contains multi-byte characters? Perhaps
some Chinese text mixed up with English?
You will simply need to specify the character set of the file in
the LOBFILE directive. For example, I have a file encoded in UTF-8,
so I modify the control file to read:
LOAD DATA
INFILE 'loader.txt' INTO TABLE my_table FIELDS TERMINATED BY ',' ( id CHAR(10), author CHAR(30), created DATE "YYYY-MM-DD" ":created", fname FILLER CHAR(80), text LOBFILE(fname CHARACTERSET UTF8) TERMINATED BY EOF )
Loading Binary Files with SQL*Loader
The above method (without the CHARACTERSET addition) will work exactly the same if our column is a binary BLOB column rather than a character CLOB. There is no difference in the loading mechanism - all that changes is that the database does not attempt to do any kind of character set conversation on the data as it is loaded into the database.
Method 3: Using BFILES and PL/SQL
A BFILE is a non-standard datatype which provides
read-only access to data held on an external file system.
NOTE: This method will only work with files on the server.
BFILES have no access to files on the client machine.
It's possible to work directly with BFILEs, but it's perhaps
more common to use them as a kind of temporary staging area to
allow you to read data into database columns.
BFILEs require a DIRECTORY - which is an internal database
pointer to a folder/directory on the file system. To create a
directory you must have CREATE ANY DIRECTORY privilege - or you can
have a DBA create the directory and grant you access to it. This
provides a layer of protection to prevent users from reading or
writing files on the database server.
There is a provided package DBMS_LOB which provides a number of
utility functions including LOADFROMFILE.
On the face of it, this might seem all we need, but in fact there's
some preparation work we need to do.
LOADFROMFILE will only load into an open clob. So we need to
prepare a lob locator, open it, call LOADFROMFILE, then close it
afterwards.
The following example shows a procedure which writes to our
previously-created my_table.
Firstly it creates a BFILE from the supplied directory and
filename, then it inserts the metadata into my_table, fetching back
the lob locator for the newly created row. It then uses
LOADFROMFILE to write into that lob:
CREATE SEQUENCE loadtest_seq; CREATE TABLE my_table(id NUMBER, author VARCHAR2(100), load_date DATE, text CLOB); CREATE OR REPLACE PROCEDURE LOAD_A_FILE ( dir_name VARCHAR2, file_name VARCHAR2, author VARCHAR2 ) IS l_bfile BFILE; l_clob CLOB; BEGIN l_bfile := BFILENAME(dir_name, file_name); IF (dbms_lob.fileexists(l_bfile) = 1) THEN dbms_output.put_line('File Exists'); INSERT INTO my_table T VALUES (loadtest_seq.NEXTVAL, author, sysdate, EMPTY_CLOB() ) RETURN text INTO l_clob; L_BFILE := bfilename(dir_name, file_name); dbms_lob.fileopen( l_bfile, dbms_lob.FILE_READONLY ); dbms_lob.loadfromfile( l_clob, l_bfile, dbms_lob.getlength(l_bfile) ); dbms_lob.fileclose( l_bfile ); COMMIT; ELSE dbms_output.put_line('File does not exist'); END IF; END; /
To use this procedure, we might do:
CREATE OR REPLACE DIRECTORY my_directory AS '/scratch/raford/Projects/loadingFiles'; -- remember that needs CREATE ANY DIRECTORY priv.
EXECUTE load_a_file( 'MY_DIRECTORY', 'file1.txt', 'Roger');
Note that in this date we're using SYSDATE for the creation date,
which varies slightly from previous examples. We could of course
include create_date as another argument to our procedure.