Apache Spark ODBC driver
The Apache Spark ODBC driver lets you work with Spark data in applications such as Perl, PHP, Excel, and Oracle).
Configuring an ODBC data source
Before the Apache Spark ODBC driver can be used to connect an application to Apache Spark, it's necessary to configure an ODBC data source. An ODBC data source stores the connection details for the target database (in this case, Apache Spark) and the ODBC driver that's required to connect to it (in this case, the Apache Spark ODBC driver).
ODBC data sources are configured in ODBC Data Source Administrator, which is included with Windows.
In ODBC Data Source Administrator:
- Choose the System DSN tab, and then choose Add.
- In the Create New Data Source dialog box, choose Easysoft ODBC-Apache Spark Driver, and then choose Finish.
- Complete the fields in the Easysoft ODBC-Apache Spark Driver DSN Setup dialog box.
- In your application, connect to your newly configured data source and run a sample query. For example:
select * from MyTable
Work with Apache Spark data in Perl
Strawberry Perl is a Perl distribution for Windows that includes the necessary middleware layers (Perl DBI and Perl DBD::ODBC) to enable the Apache Spark ODBC driver to connect your Perl applications to Apache Spark.
- Here's a Perl script that retrieves some Apache Spark data:
#!/usr/bin/perl -w use strict; use DBI; my $dbh = DBI-> connect('dbi:ODBC:MyApacheSparkDataSource'); my $sql = "SELECT MyCol FROM MyTable LIMIT 10"; # Prepare the statement. my $sth = $dbh->prepare($sql) or die "Can't prepare statement: $DBI::errstr"; # Execute the statement. $sth->execute(); my($SparkCol); # Fetch and display the result set value. while(($SparkCol) = $sth->fetchrow()){ print("$SparkCol\n"); } $dbh->disconnect if ($dbh);
Work with Apache Spark data in PHP
- Here's a PHP script that retrieves some Apache Spark data:
<?php $con = odbc_connect("MyApacheSparkDataSource", "", ""); $err = odbc_errormsg(); if (strlen($err) <> 0) { echo odbc_errormsg(); } else { $rs2 = odbc_exec($con, "select MyCol from MyTable"); odbc_result_all($rs2); odbc_close($con); } ?>
Connecting Excel to Apache Spark
Follow these steps to return data from Apache Spark to Microsoft Excel by using Microsoft Query:
- On the Data tab, choose New Query > From Other Source > From ODBC.
- Choose the Apache Spark ODBC data source when prompted.
- Choose a table from the available data sets.
- Choose the Load to return the Apache Spark data to the worksheet.
Note that for large result sets, you may have to have to filter the data using Excel before the data can be returned to the worksheet.
Connect to Apache Spark from Oracle
- Create a DG4ODBC init file on your Oracle machine. To do this, change to the
%ORACLE_HOME%\hs\admin
directory. Create a copy of the fileinitdg4odbc.ora
. Name the new fileinitspark.ora
.Note In these instructions, replace
%ORACLE_HOME%
with the location of your OracleHOME
directory. For example,C:\oraclexe\app\oracle\product\11.2.0\server
. - Ensure these parameters and values are present in your init file:
HS_FDS_CONNECT_INFO = MyApacheSparkDataSource
- Comment out the line that enables DG4ODBC tracing. For example:
#HS_FDS_TRACE_LEVEL = <trace_level>
- Add an entry to
%ORACLE_HOME%\network\admin\listener.ora
that creates aSID_NAME
for DG4ODBC. For example:SID_LIST_LISTENER = (SID_LIST = (SID_DESC= (SID_NAME=spark) (ORACLE_HOME=%ORACLE_HOME%) (PROGRAM=dg4odbc) ) )
- Add a DG4ODBC entry to
%ORACLE_HOME%\network\admin\tnsnames.ora
that specifies theSID_NAME
created in the previous step. For example:SPARK = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = oracle_host)(PORT = 1521)) (CONNECT_DATA = (SID = spark) ) (HS = OK) )
Replace
oracle_host
with the host name of your Oracle machine. - Start (or restart) the Oracle Listener:
cd %ORACLE_HOME%\bin lsnrctl stop lsnrctl start
- Connect to your Oracle database in SQL*Plus.
- In SQL*Plus, create a database link for the target Apache Spark instance. For example:
CREATE PUBLIC DATABASE LINK SPARKLINK CONNECT TO "mydummyuser" IDENTIFIED BY "mydummypassword" USING 'spark';
- Try querying your Apache Spark data. For example:
SELECT * FROM "MyTable"@SPARKLINK;
Notes
- If you have problems connecting to Apache Spark from Oracle, enable DG4ODBC tracing and check the trace files written to the
%ORACLE_HOME%\hs\trace
directory. To enable DG4ODBC tracing, add the lineHS_FDS_TRACE_LEVEL = DEBUG
toinitspark.ora
and then start or restart the Oracle listener. If thetrace
directory does not exist, create it. - If you enable ODBC Driver Manager tracing, but do not get a trace file or get an empty trace file (check for
C:\SQL.log
), change the trace file location to the WindowsTEMP
directory. For example,C:\Windows\Temp\SQL.log
.