Question

In: Computer Science

can someone explain the difference between how Oracle and SQLServer store files.? with steps and examples

can someone explain the difference between how Oracle and SQLServer store files.? with steps and examples

Solutions

Expert Solution

SQLServer

To configure FileStream in SQL Server

  1. Enable the FILESTREAM feature from SQL Server Configuration Manager.
  2. Enable teh FILESTREAM access level on SQL Server instance.
  3. Create a FILESTREAM filegroup and a FileStream container to store BLOB data.

To enable FileStream on any database,

1.Firstly enable the FileStream feature on the SQL Server instance. To do that, open SQL Server configuration manager, right-click SQL Instance, select Properties

A dialog box to configure server properties opens.

Switch to the FILESTREAM tab. Select Enable FILESTREAM for T-SQL access. Select Enable FILESTREAM for I/O access and then select Allow remote client access to FILESTREAM data. In the Windows share name text box, provide a name of the directory to store the files. See the following image:

Click OK and restart the SQL service.

Enable FILESTREAM Access Level on SQL Server Instance

Once the FILESTREAM feature is enabled, change the FILESTREAM access level. To change the FileStream access level, execute the following query:

1

2

EXEC sp_configure filestream_access_level, 2

RECONFIGURE

In the above query, the parameters below are valid values:

0 means the FILESTREAM support for SQL instance is disabled.

1 means the FILESTREAM support for T-SQL is enabled.

2 means the FILESTREAM support for T-SQL and Win32 streaming access is enabled.

You can change the FILESTREAM access level using SQL Server Management Studio. To do that, right-click an SQL Server connection >> select Properties >> In the server properties dialog box, select FileStream Access Level from the drop-down box, and select Full Access Enabled, as shown in the following image:

Once the parameter is changed, restart SQL Server services.

Add FILESTREAM Filegroup and Data Files

Once FILESTREAM is enabled, add the FILESTREAM filegroup and FILESTREAM container.

To do that, right-click the FileStream-Demo database >> select Properties >> In a left pane of the Database Properties dialog box, select Filegroups >> In the FILESTREAM grid, click the Add Filegroup button >> Name the filegroup as Dummy Document. See the following image:

Once the filegroup is created, in the Database Properties dialog box, select files and click the Add button. The Database files grid enables. In the Logical Name column, provide the name, – Dummy-Document. Select FILESTREAM Data in the File Type drop-down box. Select Dummy-Document in the Filegroup column. In the Path column, provide the directory location where files will be stored (E:\Dummy-Documents). See the following image:

Alternatively, you can add the FILESTREAM filegroup and containers by executing the following T-SQL Query:

1

2

3

4

5

6

USE [master]

GO

ALTER DATABASE [FileStream_Demo] ADD FILEGROUP [Dummy-Documents] CONTAINS FILESTREAM

GO

ALTER DATABASE [FileStream_Demo] ADD FILE ( NAME = N'Dummy-Documents', FILENAME = N'E:\Dummy-Documents' ) TO FILEGROUP [Dummy-Documents]

GO

To verify that the FileStream container has been created, open Windows Explorer and navigate to the “E:\Dummy-Document” directory.

As shown in the above image, the $FSLOG directory and the filestream.hdr file have been created. $FSLOG is like SQL server T-Log, and filestream.hdr contains metadata of FILESTREAM. Make sure that you do not change or edit those files.

Store Files in SQL table

In this demo, we will create a table to store various files from the computer. The table has the following columns:

  1. The “RootDirectory” column to store file location.
  2. The “Filename” column to store the name of the file.
  3. The “FileAttribute” column to store File attribute (Raw/Directory.
  4. The “FileCreateDate” column to store file creation time.
  5. The “FileSize” column to store the Size of the file.
  6. The “FileStreamCol” column to store the content of the file in the binary format.

Create a SQL Table with a FILESTREAM column

Once FILESTREAM configures, create an SQL table with the FILESTREAM columns to store various files in the SQL server table. As I mentioned above, FILESTREAM is not a datatype. It’s an attribute that we add to the varbinary(max) column in the FILESTREAM-enabled table. When you create a FILESTREAM-enabled table, make sure that you add a UNIQUEIDENTIFIER column that has the ROWGUIDCOL and UNIQUE attributes.

Execute the following script to create a FILESTREAM-enabled table:

1

2

3

4

5

6

7

8

9

10

11

12

Use [FileStream_Demo]

go

Create Table [DummyDocuments]

(

    ID uniqueidentifier ROWGUIDCOL unique NOT NULL,

    RootDirectory varchar(max),

    FileName varchar(max),

    FileAttribute varchar(150),

    FileCreateDate datetime,

    FileSize numeric(10,5),

    FileStreamCol varbinary (max) FILESTREAM

)

Insert Data in Table

I have the WorldWide_Importors.xls document stored in the computer at the “E:\Documents” location. Use OPENROWSET(Bulk) to load its content from disk to the VARBINARY(max) variable. Then store the variable to the FileStreamCol (VARBINARY(max)) column of the DummyDocument table. To do that, run the following script:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

Use [FileStream-Demo]

Go

DECLARE @Document AS VARBINARY(MAX)

-- Load the image data

SELECT @Document = CAST(bulkcolumn AS VARBINARY(MAX))

      FROM OPENROWSET(

            BULK

            'E:\Documents\WorldWide_Importors.xls',

            SINGLE_BLOB ) AS Doc

            

-- Insert the data to the table          

INSERT INTO [DummyDocuments] (ID, RootDirectory,FileName, FileAttribute, FileCreateDate,FileSize,FileStreamCol)

SELECT NEWID(), 'E:\Documents','WorldWide_Importors.xls','Raw',getdate(),10, @Document

Access FILESTREAM Data

The FILESTREAM data can be accessed by using T-SQL and Managed API. When the FILESTREAM column accessed using T-SQL query, it uses SQL memory to read the content of the data file and send the data to the client application. When the FILESTREAM column is accessed using Win32 Managed API, it does not use SQL Server memory. It uses the streaming capability of the NT file system which gives performance benefits.

Oracle

Here I want to consider various methods of getting "documents" into the Oracle database.

None of these methods are completely trivial, and the choice of which to use will depend on various factors.

What are the characteristics of a "document"? In general, I'm referring to text which may be too long to fit into a PL/SQL VARCHAR2 variable. These variables are limited to 32767 bytes. The document most likely originates from a file on a file system, and it may be in a character set which differs from the database character set. In many cases, documents will have metadata (such as author and creation date) that we want to load at the same time.

We should distinguish between four types of data:

  • Metadata. This is short textual, numeric or date data which describes the main document. For example: document id, author, date created.
  • Semi-structured Data. This is textual strings in a specific format, such as JSON or XML. We may be able to make assertions about such data, such as "will always start with '<'", or "does not contain any newline characters".
  • Full Text.  This is text of arbitrary length. It can and usually will contain all sorts of characters including newlines and quotes. It may consist of multibyte characters in UTF-8 or UCS16 (or other) character sets.
  • Binary Data. Arbitrary length data which is not character encoded. May contain any bytes or combinations of bytes including nulls and end-of-file markers.

Loading a mixture of these types of data can be problematic. If we want to load metadata and full text from a single file, then we need some method of determining where the full text starts and finishes. The easiest way to do that is to enclose the full text in some sort of quotes. But then what if the full text contains those quote characters? So it's usually easier to arrange for each full text document to exist in a separate file, referenced from the main loading method.

It would be impossible to load multiple binary data records from a single file, unless the binary data was first encoded using a scheme such as BASE64, since by its nature you can never specify a delimiter string that could not appear in the binary data.

For these examples, we're going to consider three use-cases:

  1. A set of JSON documents. These documents reside in a single file, one per line.
  2. A set of files, which contain one document per file. Each file/document has metadata associated with it in a "summary" file.
  3. As above, but one of the files contains multibyte (UTF-8) data.

Client / Server considerations - where is your source data?

If your client and server are the same machine, then any of these methods should work. However, if you are connecting to your database from a remote client over SQL*Net, you might first decide whether you want to load files from the client (the machine from which you connect) or the server (the machine where the database resides).

Load Method Data on Client Data on Server
SQL*Plus

Yes

No

SQL*Loader

Yes

No

External Tables

No

Yes

BFILES

No

Yes

Method 1: Using SQL*Plus Scripts

Let's consider we have a table with a several short columns and a CLOB column:

CREATE TABLE my_table 
  (id        NUMBER,
   author    VARCHAR2(30),
   created   DATE,
   text      CLOB
);

If I just want to do some "quick and dirty" loading of this table, I can just create a SQL script file and run that in SQL*Plus. For example:

INSERT INTO my_table VALUES ( 1, 'Roger', SYSDATE, 'the quick brown fox jumps over the lazy dog' );
INSERT INTO my_table VALUES ( 2, 'John', SYSDATE, 'the angry aligator ate the antelope');
COMMIT;

Simple and quick, and fairly easy to generate from source files using a language such as Perl or Python. But do this with any quantity of data and you'll rapidly hit limitations:

  • Any single quotes in the text need to be doubled up:
    INSERT INTO my_table VALUES ( 3, 'Roger', SYSDATE, 'my best friend''s brother said ''hi''');
  • Any ampersand ("&") characters in the script will cause problems unless you've put SET DEFINE OFF somewhere in the script
  • An input line to SQL*Plus cannot be longer than 2499 characters. If you want to include more than this you'll need to concatenate strings:
    INSERT INTO my_table VALUES ( 3, 'Roger', SYSDATE, 'this is a long line that continues over more than one line'
       || ' in the source code file');
  • Even then, that concatenation method will only allow you to assemble up to 4000 characters. To continue longer than that would require you to create temporary lobs, and use DBMS_LOB.APPEND for each chunk.
  • It's slower than most other methods which will be an issue if you're loading large quantities of data.

Let's take our usecase 1 - the JSON documents. We could cut-and-paste from the file into a hand-coded script, but it's probably better to do it using a text-processing language such as Perl. The attached file jsonToSql.pl replaces any single quotes with two single quotes, and breaks each line into chunks no longer than 2400 characters (allowing some space for syntactic overheads.

Assuming we have perl installed and available, we can call it from the command line (Windows or Unix) using

perl jsonToSql.pl jsonsource.txt > json.sql

And we can then load the resulting file json.sql (after creating the necessary table - see above) using

sqlplus user/pass @json.sql 

SQL*Plus isn't very good for loading from separate files, so we'll leave scenarios two and three out for this method.

Multibyte Considerations

SQL*Plus isn't very good at dealing with multibyte / UTF-8 characters. To get useful correct (and viewable!) output you will probably want to use SQL Developer to run the scripts.

Method 2: Using SQL*Loader

SQL*Loader is a utility from the very early days of Oracle. It's reasonably simple to use, but the vast number of options can make it intimidating for the uninitiated.

SQL*Loader loads files from the client. It has no access to files on the server.

It is normal to provide SQL*Loader with two inputs: A data file containing the data to be loaded to the database, and a control file which specifies how the data file is laid out, and which columns the data should be written to (it's possible to include the data inside the control file, but we'll stick with separate files for this example).

Use case 1: JSON documents in a single file

We'll use the same table "my_table" as defned earlier.

This is fairly straightforward. We create a control file called jsonloader.ctl which consists of

LOAD DATA
INFILE 'jsonsource.txt'
  INTO TABLE jsondocs
  FIELDS
   ( jsontext CHAR(32000) )

Then we load it using the command:

sqlldr roger/roger control=jsonloader.ctl log=loader.log bad=loader.bad

The "log" and "bad" arguments aren't required, but can be most useful for debugging.

Note that we've limited the length of the jsontext field to 32000 characters, but you can increase that size if you need to.

Use case 2: Loading from separate files

A quick reminder about our second scenario. We have a table which we'll define as:

CREATE TABLE my_table 
  (id        NUMBER,
   author    VARCHAR2(30),
   created   DATE,
   text      CLOB
);

and into this table we want to load the files file1.txt and file2.txt

file1.txt:

The quick brown fox jumps over the lazy dog

file2.txt:

The angry alligator ate the antelope.

These files only have one line in each, but they could just as easily have multiple lines per file. One file is loaded into each database record.

If we want to use SQL*Loader to load CLOB or BLOB data from separate files, then the filenames for those files must be included in the data file. We use the "FILLER" syntax to specify that the filename isn't to be loaded into any column, and then the LOBFILE syntax to load the text into a column.

So here's our control file. We can see that it's similar to the previous control file but the "FIELDS" section is more complicated, to allow for multiple metadata fields, and the reference to the external file to be loaded.

loader.ctl:

LOAD DATA
INFILE 'loader.txt'
  INTO TABLE my_table
  FIELDS TERMINATED BY ','
  ( id         CHAR(10),
    author     CHAR(30),
    created    DATE "YYYY-MM-DD" ":created",
    fname      FILLER CHAR(80),
    text       LOBFILE(fname) TERMINATED BY EOF
  )

"fname" is an arbitrary label, we could have used "fred" and it would have worked exactly the same. It just needs to be the same on the two lines where it is used.

Note that the control file includes the name of our data file, loader.txt. loader.txt contains the metadata associated with each file, and the filename.

loader.txt:
1,John Smith,2015-04-29,file1.txt
2,Pete Jones,2013-01-31,file2.txt

So we're loading two files, along with the metadata for each.

Now we can run SQL*Loader from the command line as follows:

sqlldr roger/roger control=loader.ctl log=loader.log bad=loader.bad

(again, the "log" and "bad" arguments aren't required, but can be most useful for debugging).

Then we can use SQL*Plus or SQL Developer to examine our loaded table:

SQL> select * from my_table;
    ID AUTHOR        CREATED   TEXT
---------- ---------------- --------- ------------------------------------------------
     1 John Smith        29-APR-15 The quick brown fox jumps over the lazy dog
     2 Pete Jones        31-JAN-13 The angry alligator ate the antelope.

Common mistakes:

Do not be tempted to add spacing to format the loader.txt file nicely. If you use

1, John Smith, 2015-04-29, file1.txt

Then not only will the author be loaded with a space in front of "John", but you'll see an obscure error:

SQL*Loader-509: System error: The device does not recognize the command.
SQL*Loader-502: unable to open data file ' file1.txt' for field TEXT table 

Muti-byte considerations

So what if your file contains multi-byte characters? Perhaps some Chinese text mixed up with English?
You will simply need to specify the character set of the file in the LOBFILE directive. For example, I have a file encoded in UTF-8, so I modify the control file to read:

LOAD DATA
INFILE 'loader.txt'
  INTO TABLE my_table
  FIELDS TERMINATED BY ','
  ( id         CHAR(10),
    author     CHAR(30),
    created    DATE "YYYY-MM-DD" ":created",
    fname      FILLER CHAR(80),
    text       LOBFILE(fname CHARACTERSET UTF8) TERMINATED BY EOF
  )

Loading Binary Files with SQL*Loader

The above method (without the CHARACTERSET addition) will work exactly the same if our column is a binary BLOB column rather than a character CLOB. There is no difference in the loading mechanism - all that changes is that the database does not attempt to do any kind of character set conversation on the data as it is loaded into the database.

Method 3: Using BFILES and PL/SQL

A BFILE is a non-standard datatype which provides read-only access to data held on an external file system.

NOTE: This method will only work with files on the server. BFILES have no access to files on the client machine.

It's possible to work directly with BFILEs, but it's perhaps more common to use them as a kind of temporary staging area to allow you to read data into database columns.

BFILEs require a DIRECTORY - which is an internal database pointer to a folder/directory on the file system. To create a directory you must have CREATE ANY DIRECTORY privilege - or you can have a DBA create the directory and grant you access to it. This provides a layer of protection to prevent users from reading or writing files on the database server.

There is a provided package DBMS_LOB which provides a number of utility functions including LOADFROMFILE.

On the face of it, this might seem all we need, but in fact there's some preparation work we need to do.

LOADFROMFILE will only load into an open clob. So we need to prepare a lob locator, open it, call LOADFROMFILE, then close it afterwards.

The following example shows a procedure which writes to our previously-created my_table.

Firstly it creates a BFILE from the supplied directory and filename, then it inserts the metadata into my_table, fetching back the lob locator for the newly created row. It then uses LOADFROMFILE to write into that lob:

CREATE SEQUENCE loadtest_seq;
CREATE TABLE my_table(id NUMBER, author VARCHAR2(100), load_date DATE, text CLOB);
CREATE OR REPLACE PROCEDURE LOAD_A_FILE
   ( dir_name   VARCHAR2,
     file_name  VARCHAR2,
     author     VARCHAR2
   ) IS
  l_bfile   BFILE;
  l_clob    CLOB;
BEGIN
   l_bfile := BFILENAME(dir_name, file_name);
   IF (dbms_lob.fileexists(l_bfile) = 1) THEN
      dbms_output.put_line('File Exists');
      INSERT INTO my_table T
        VALUES (loadtest_seq.NEXTVAL, 
                author, 
                sysdate, 
               EMPTY_CLOB()
      ) RETURN text INTO l_clob;
      L_BFILE := bfilename(dir_name, file_name);
      dbms_lob.fileopen( l_bfile, dbms_lob.FILE_READONLY );
      dbms_lob.loadfromfile( l_clob, l_bfile, dbms_lob.getlength(l_bfile) );
      dbms_lob.fileclose( l_bfile );
      COMMIT;
   ELSE 
     dbms_output.put_line('File does not exist');
   END IF;   
END;
/

To use this procedure, we might do:

CREATE OR REPLACE DIRECTORY my_directory AS '/scratch/raford/Projects/loadingFiles';
-- remember that needs CREATE ANY DIRECTORY priv.

EXECUTE load_a_file( 'MY_DIRECTORY', 'file1.txt', 'Roger');


Note that in this date we're using SYSDATE for the creation date, which varies slightly from previous examples. We could of course include create_date as another argument to our procedure.


Related Solutions

How would you explain the difference between the SEC, the IRS, and the FASB to someone...
How would you explain the difference between the SEC, the IRS, and the FASB to someone who has never studied accounting or the financial markets?
Can someone please explain how to do the steps for this? I already have part a...
Can someone please explain how to do the steps for this? I already have part a completed but I need help with b-e. Thank you. Problem 1: (a) What is spurious regression? Explain. (b) Assuming , randomly generate 1000 observations of variables X and Y using the following equations: Report the graphs of X and Y. (c) Run the regression: and report the estimated results. (d) What did you expect about the magnitudes and R-square? How are the estimated values...
Can someone explain to me the difference between atomic mass, mass number, and atomic weight. I...
Can someone explain to me the difference between atomic mass, mass number, and atomic weight. I have 4 different textbooks saying different things and I am so confused. Is atomic mass and mass number the same thing? Or is atomic mass and atomic weight the same thing? Or are all three terms different things with different meanings? Can someone please clarify in simple terms the meaning of these three terms and give an example for each, thanks so much! I...
Explain the process of creating flat (text) files and how to store/retrieve data.
Explain the process of creating flat (text) files and how to store/retrieve data.
Explain, with examples, how innate and adaptive immunity are different. What is the difference between active...
Explain, with examples, how innate and adaptive immunity are different. What is the difference between active and passive immunity?
How can I explain the difference between quantitative and qualitative methods?
How can I explain the difference between quantitative and qualitative methods?
can someone explain what are the main steps in the auditing process with alot of explanation...
can someone explain what are the main steps in the auditing process with alot of explanation ??
Can someone please explain to me the steps in journalizing events in accounting and then turning...
Can someone please explain to me the steps in journalizing events in accounting and then turning them into income statements. Like which assets and liabilities are credits and which are debits?
Can someone please explain how can someone find how to find the adiabatic flame temperature in...
Can someone please explain how can someone find how to find the adiabatic flame temperature in Kelvin when liquid methane reacts with liquid oxygen? Please show the calculations.
Can someone show me the steps of how to solve this? A program needs to access...
Can someone show me the steps of how to solve this? A program needs to access the following pages: 1, 2, 3, 4, 2, 1, 3, 2, 1, 4, 2, 3 There are 3 initially empty frames, how many page faults will there be respectively, if we use First-in-First-out, and Farthest-in-Future page replacement algorithms? A) 7 and 7 B) 7 and 6 --  Correct Answer   C) 6 and 6 D) 6 and 5
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT