sqoop gives the option to do this effectively. Interestingly, DataStax Enterprise provides everything we want in the big data space as a package. This includes, cassandra, hadoop, hive , pig, sqoop, and mahout, which comes handy in this case.
Under the resources directory, you may find the cassandra, dse, hadoop, hive, log4j-appender, mahout, pig, solr, sqoop, and tomcat specific configurations.
For example, from resources/hadoop/bin, you may format the hadoop name node using
./hadoop namenode -format
as usual.
* Download and extract DataStax Enterprise binary archive (dse-2.1-bin.tar.gz).
* Follow the documentation , which is also available as a PDF .
* Migrating a relational database to cassandra is documented and is also blogged .
* Before starting DataStax, make sure that the JAVA_HOME is set. This also can be set directly on conf/hadoop-env.sh.
* Include the connector to the relational database into a location reachable by sqoop.
I put mysql-connector-java-5.1.12-bin.jar under resources/sqoop.
* Set the environment
$ bin/dse-env.sh
* Start DataStax Enterprise, as an Analytics node.
$ sudo bin/dse cassandra -t
where cassandra starts the Cassandra process plus CassandraFS and the -t option starts the Hadoop JobTracker and TaskTracker processes.
if you start without the -t flag, the below exception will be thrown during the further operations that are discussed below.
No jobtracker found
Unable to run : jobtracker not found
Hence do not miss the -t flag.
* Start cassandra cli to view the cassandra keyrings and you will be able to view the data in cassandra, once you migrate using sqoop as given below.
$ bin/cassandra-cli -host localhost -port 9160
Confirm that it is connected to the test cluster that is created on the port 9160, by the below from the CLI.
[default@unknown] describe cluster;
Cluster Information:
Snitch: com.datastax.bdp.snitch.DseDelegateSnitch
Partitioner: org.apache.cassandra.dht.RandomPartitioner
Schema versions:
f5a19a50-b616-11e1-0000-45b29245ddff: [127.0.1.1]
If you have missed mentioning the host/port (starting the cli by just bin/cassandra-cli ) or given it wrong, you will get the response as "Not connected to a cassandra instance."
$ bin/dse sqoop import --connect jdbc:mysql://127.0.0.1:3306/shopping_cart_db --username root --password root --table Category --split-by categoryName --cassandra-keyspace shopping_cart_db --cassandra-column-family Category_cf --cassandra-row-key categoryName --cassandra-thrift-host localhost --cassandra-create-schema
Above command will now migrate the table "Category" in the shopping_cart_db with the primary key categoryName, into a cassandra keyspace named shopping_cart, with the cassandra row key categoryName. You may use the --direct mysql specific option, which is faster. In my above command, I have everything runs on localhost.
+--------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+-------------+------+-----+---------+-------+
| categoryName | varchar(50) | NO | PRI | NULL | |
| description | text | YES | | NULL | |
| image | blob | YES | | NULL | |
+--------------+-------------+------+-----+---------+-------+
This also creates the respective java class (Category.java), inside the working directory.
To import all the tables in the database, instead of a single table.
$ bin/dse sqoop import-all-tables -m 1 --connect jdbc:mysql://127.0.0.1:3306/shopping_cart_db --username root --password root --cassandra-thrift-host localhost --cassandra-create-schema --direct
Here "-m 1" tag ensures a sequential import. If not specified, the below exception will be thrown.
ERROR tool.ImportAllTablesTool: Error during import: No primary key could be found for table Category. Please specify one with --split-by or perform a sequential import with '-m 1'.
To check whether the keyspace is created,
[default@unknown] show keyspaces;
................
Keyspace: shopping_cart_db:
Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
Durable Writes: true
Options: [replication_factor:1]
Column Families:
ColumnFamily: Category_cf
Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
Default column value validator: org.apache.cassandra.db.marshal.UTF8Type
Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
Row cache size / save period in seconds / keys to save : 0.0/0/all
Row Cache Provider: org.apache.cassandra.cache.SerializingCacheProvider
Key cache size / save period in seconds: 200000.0/14400
GC grace seconds: 864000
Compaction min/max thresholds: 4/32
Read repair chance: 1.0
Replicate on write: true
Bloom Filter FP chance: default
Built indexes: []
Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
.............
[default@unknown] describe shopping_cart_db;
Keyspace: shopping_cart_db:
Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
Durable Writes: true
Options: [replication_factor:1]
Column Families:
ColumnFamily: Category_cf
Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
Default column value validator: org.apache.cassandra.db.marshal.UTF8Type
Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
Row cache size / save period in seconds / keys to save : 0.0/0/all
Row Cache Provider: org.apache.cassandra.cache.SerializingCacheProvider
Key cache size / save period in seconds: 200000.0/14400
GC grace seconds: 864000
Compaction min/max thresholds: 4/32
Read repair chance: 1.0
Replicate on write: true
Bloom Filter FP chance: default
Built indexes: []
Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
You may also use hive to view the databases created in cassandra, in an sql-like manner.
* Start Hive
$ bin/dse hive
hive> show databases;
OK
default
shopping_cart_db
When the entire database is imported as above, separate java classes will be created for each of the tables.
$ bin/dse sqoop import-all-tables -m 1 --connect jdbc:mysql://127.0.0.1:3306/shopping_cart_db --username root --password root --cassandra-thrift-host localhost --cassandra-create-schema --direct
12/06/15 15:42:11 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
12/06/15 15:42:11 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
12/06/15 15:42:11 INFO tool.CodeGenTool: Beginning code generation
12/06/15 15:42:11 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `Category` AS t LIMIT 1
12/06/15 15:42:11 INFO orm.CompilationManager: HADOOP_HOME is /home/pradeeban/programs/dse-2.1/resources/hadoop/bin/..
Note: /tmp/sqoop-pradeeban/compile/926ddf787c73be06c4e2ad1f8fc522f1/Category.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
12/06/15 15:42:13 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-pradeeban/compile/926ddf787c73be06c4e2ad1f8fc522f1/Category.jar
12/06/15 15:42:13 INFO manager.DirectMySQLManager: Beginning mysqldump fast path import
12/06/15 15:42:13 INFO mapreduce.ImportJobBase: Beginning import of Category
12/06/15 15:42:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
12/06/15 15:42:15 INFO mapred.JobClient: Running job: job_201206151241_0007
12/06/15 15:42:16 INFO mapred.JobClient: map 0% reduce 0%
12/06/15 15:42:25 INFO mapred.JobClient: map 100% reduce 0%
12/06/15 15:42:25 INFO mapred.JobClient: Job complete: job_201206151241_0007
12/06/15 15:42:25 INFO mapred.JobClient: Counters: 18
12/06/15 15:42:25 INFO mapred.JobClient: Job Counters
12/06/15 15:42:25 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=6480
12/06/15 15:42:25 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
12/06/15 15:42:25 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
12/06/15 15:42:25 INFO mapred.JobClient: Launched map tasks=1
12/06/15 15:42:25 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
12/06/15 15:42:25 INFO mapred.JobClient: File Output Format Counters
12/06/15 15:42:25 INFO mapred.JobClient: Bytes Written=2848
12/06/15 15:42:25 INFO mapred.JobClient: FileSystemCounters
12/06/15 15:42:25 INFO mapred.JobClient: FILE_BYTES_WRITTEN=21419
12/06/15 15:42:25 INFO mapred.JobClient: CFS_BYTES_WRITTEN=2848
12/06/15 15:42:25 INFO mapred.JobClient: CFS_BYTES_READ=87
12/06/15 15:42:25 INFO mapred.JobClient: File Input Format Counters
12/06/15 15:42:25 INFO mapred.JobClient: Bytes Read=0
12/06/15 15:42:25 INFO mapred.JobClient: Map-Reduce Framework
12/06/15 15:42:25 INFO mapred.JobClient: Map input records=1
12/06/15 15:42:25 INFO mapred.JobClient: Physical memory (bytes) snapshot=119435264
12/06/15 15:42:25 INFO mapred.JobClient: Spilled Records=0
12/06/15 15:42:25 INFO mapred.JobClient: CPU time spent (ms)=630
12/06/15 15:42:25 INFO mapred.JobClient: Total committed heap usage (bytes)=121241600
12/06/15 15:42:25 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2085318656
12/06/15 15:42:25 INFO mapred.JobClient: Map output records=36
12/06/15 15:42:25 INFO mapred.JobClient: SPLIT_RAW_BYTES=87
12/06/15 15:42:25 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 11.4492 seconds (0 bytes/sec)
12/06/15 15:42:25 INFO mapreduce.ImportJobBase: Retrieved 36 records.
12/06/15 15:42:25 INFO tool.CodeGenTool: Beginning code generation
12/06/15 15:42:25 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `Customer` AS t LIMIT 1
12/06/15 15:42:25 INFO orm.CompilationManager: HADOOP_HOME is /home/pradeeban/programs/dse-2.1/resources/hadoop/bin/..
Note: /tmp/sqoop-pradeeban/compile/926ddf787c73be06c4e2ad1f8fc522f1/Customer.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
12/06/15 15:42:25 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-pradeeban/compile/926ddf787c73be06c4e2ad1f8fc522f1/Customer.jar
12/06/15 15:42:26 INFO manager.DirectMySQLManager: Beginning mysqldump fast path import
12/06/15 15:42:26 INFO mapreduce.ImportJobBase: Beginning import of Customer
12/06/15 15:42:26 INFO mapred.JobClient: Running job: job_201206151241_0008
12/06/15 15:42:27 INFO mapred.JobClient: map 0% reduce 0%
12/06/15 15:42:35 INFO mapred.JobClient: map 100% reduce 0%
12/06/15 15:42:35 INFO mapred.JobClient: Job complete: job_201206151241_0008
12/06/15 15:42:35 INFO mapred.JobClient: Counters: 17
12/06/15 15:42:35 INFO mapred.JobClient: Job Counters
12/06/15 15:42:35 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=6009
12/06/15 15:42:35 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
12/06/15 15:42:35 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
12/06/15 15:42:35 INFO mapred.JobClient: Launched map tasks=1
12/06/15 15:42:35 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
12/06/15 15:42:35 INFO mapred.JobClient: File Output Format Counters
12/06/15 15:42:35 INFO mapred.JobClient: Bytes Written=0
12/06/15 15:42:35 INFO mapred.JobClient: FileSystemCounters
12/06/15 15:42:35 INFO mapred.JobClient: FILE_BYTES_WRITTEN=21489
12/06/15 15:42:35 INFO mapred.JobClient: CFS_BYTES_READ=87
12/06/15 15:42:35 INFO mapred.JobClient: File Input Format Counters
12/06/15 15:42:35 INFO mapred.JobClient: Bytes Read=0
12/06/15 15:42:35 INFO mapred.JobClient: Map-Reduce Framework
12/06/15 15:42:35 INFO mapred.JobClient: Map input records=1
12/06/15 15:42:35 INFO mapred.JobClient: Physical memory (bytes) snapshot=164855808
12/06/15 15:42:35 INFO mapred.JobClient: Spilled Records=0
12/06/15 15:42:35 INFO mapred.JobClient: CPU time spent (ms)=510
12/06/15 15:42:35 INFO mapred.JobClient: Total committed heap usage (bytes)=121241600
12/06/15 15:42:35 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2082869248
12/06/15 15:42:35 INFO mapred.JobClient: Map output records=0
12/06/15 15:42:35 INFO mapred.JobClient: SPLIT_RAW_BYTES=87
12/06/15 15:42:35 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 9.3143 seconds (0 bytes/sec)
12/06/15 15:42:35 INFO mapreduce.ImportJobBase: Retrieved 0 records.
12/06/15 15:42:35 INFO tool.CodeGenTool: Beginning code generation
12/06/15 15:42:35 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `OrderEntry` AS t LIMIT 1
12/06/15 15:42:35 INFO orm.CompilationManager: HADOOP_HOME is /home/pradeeban/programs/dse-2.1/resources/hadoop/bin/..
Note: /tmp/sqoop-pradeeban/compile/926ddf787c73be06c4e2ad1f8fc522f1/OrderEntry.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
12/06/15 15:42:35 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-pradeeban/compile/926ddf787c73be06c4e2ad1f8fc522f1/OrderEntry.jar
12/06/15 15:42:36 INFO manager.DirectMySQLManager: Beginning mysqldump fast path import
12/06/15 15:42:36 INFO mapreduce.ImportJobBase: Beginning import of OrderEntry
12/06/15 15:42:36 INFO mapred.JobClient: Running job: job_201206151241_0009
12/06/15 15:42:37 INFO mapred.JobClient: map 0% reduce 0%
12/06/15 15:42:45 INFO mapred.JobClient: map 100% reduce 0%
12/06/15 15:42:45 INFO mapred.JobClient: Job complete: job_201206151241_0009
12/06/15 15:42:45 INFO mapred.JobClient: Counters: 17
12/06/15 15:42:45 INFO mapred.JobClient: Job Counters
12/06/15 15:42:45 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=6381
12/06/15 15:42:45 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
12/06/15 15:42:45 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
12/06/15 15:42:45 INFO mapred.JobClient: Launched map tasks=1
12/06/15 15:42:45 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
12/06/15 15:42:45 INFO mapred.JobClient: File Output Format Counters
12/06/15 15:42:45 INFO mapred.JobClient: Bytes Written=0
12/06/15 15:42:45 INFO mapred.JobClient: FileSystemCounters
12/06/15 15:42:45 INFO mapred.JobClient: FILE_BYTES_WRITTEN=21569
12/06/15 15:42:45 INFO mapred.JobClient: CFS_BYTES_READ=87
12/06/15 15:42:45 INFO mapred.JobClient: File Input Format Counters
12/06/15 15:42:45 INFO mapred.JobClient: Bytes Read=0
12/06/15 15:42:45 INFO mapred.JobClient: Map-Reduce Framework
12/06/15 15:42:45 INFO mapred.JobClient: Map input records=1
12/06/15 15:42:45 INFO mapred.JobClient: Physical memory (bytes) snapshot=137252864
12/06/15 15:42:45 INFO mapred.JobClient: Spilled Records=0
12/06/15 15:42:45 INFO mapred.JobClient: CPU time spent (ms)=520
12/06/15 15:42:45 INFO mapred.JobClient: Total committed heap usage (bytes)=121241600
12/06/15 15:42:45 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2014703616
12/06/15 15:42:45 INFO mapred.JobClient: Map output records=0
12/06/15 15:42:45 INFO mapred.JobClient: SPLIT_RAW_BYTES=87
12/06/15 15:42:45 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 9.2859 seconds (0 bytes/sec)
12/06/15 15:42:45 INFO mapreduce.ImportJobBase: Retrieved 0 records.
12/06/15 15:42:45 INFO tool.CodeGenTool: Beginning code generation
12/06/15 15:42:45 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `OrderItem` AS t LIMIT 1
12/06/15 15:42:45 INFO orm.CompilationManager: HADOOP_HOME is /home/pradeeban/programs/dse-2.1/resources/hadoop/bin/..
Note: /tmp/sqoop-pradeeban/compile/926ddf787c73be06c4e2ad1f8fc522f1/OrderItem.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
12/06/15 15:42:45 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-pradeeban/compile/926ddf787c73be06c4e2ad1f8fc522f1/OrderItem.jar
12/06/15 15:42:46 WARN manager.CatalogQueryManager: The table OrderItem contains a multi-column primary key. Sqoop will default to the column orderNumber only for this job.
12/06/15 15:42:46 INFO manager.DirectMySQLManager: Beginning mysqldump fast path import
12/06/15 15:42:46 INFO mapreduce.ImportJobBase: Beginning import of OrderItem
12/06/15 15:42:46 INFO mapred.JobClient: Running job: job_201206151241_0010
12/06/15 15:42:47 INFO mapred.JobClient: map 0% reduce 0%
12/06/15 15:42:55 INFO mapred.JobClient: map 100% reduce 0%
12/06/15 15:42:55 INFO mapred.JobClient: Job complete: job_201206151241_0010
12/06/15 15:42:55 INFO mapred.JobClient: Counters: 17
12/06/15 15:42:55 INFO mapred.JobClient: Job Counters
12/06/15 15:42:55 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=5949
12/06/15 15:42:55 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
12/06/15 15:42:55 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
12/06/15 15:42:55 INFO mapred.JobClient: Launched map tasks=1
12/06/15 15:42:55 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
12/06/15 15:42:55 INFO mapred.JobClient: File Output Format Counters
12/06/15 15:42:55 INFO mapred.JobClient: Bytes Written=0
12/06/15 15:42:55 INFO mapred.JobClient: FileSystemCounters
12/06/15 15:42:55 INFO mapred.JobClient: FILE_BYTES_WRITTEN=21524
12/06/15 15:42:55 INFO mapred.JobClient: CFS_BYTES_READ=87
12/06/15 15:42:55 INFO mapred.JobClient: File Input Format Counters
12/06/15 15:42:55 INFO mapred.JobClient: Bytes Read=0
12/06/15 15:42:55 INFO mapred.JobClient: Map-Reduce Framework
12/06/15 15:42:55 INFO mapred.JobClient: Map input records=1
12/06/15 15:42:55 INFO mapred.JobClient: Physical memory (bytes) snapshot=116674560
12/06/15 15:42:55 INFO mapred.JobClient: Spilled Records=0
12/06/15 15:42:55 INFO mapred.JobClient: CPU time spent (ms)=590
12/06/15 15:42:55 INFO mapred.JobClient: Total committed heap usage (bytes)=121241600
12/06/15 15:42:55 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2014703616
12/06/15 15:42:55 INFO mapred.JobClient: Map output records=0
12/06/15 15:42:55 INFO mapred.JobClient: SPLIT_RAW_BYTES=87
12/06/15 15:42:55 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 9.2539 seconds (0 bytes/sec)
12/06/15 15:42:55 INFO mapreduce.ImportJobBase: Retrieved 0 records.
12/06/15 15:42:55 INFO tool.CodeGenTool: Beginning code generation
12/06/15 15:42:55 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `Payment` AS t LIMIT 1
12/06/15 15:42:55 INFO orm.CompilationManager: HADOOP_HOME is /home/pradeeban/programs/dse-2.1/resources/hadoop/bin/..
Note: /tmp/sqoop-pradeeban/compile/926ddf787c73be06c4e2ad1f8fc522f1/Payment.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
12/06/15 15:42:55 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-pradeeban/compile/926ddf787c73be06c4e2ad1f8fc522f1/Payment.jar
12/06/15 15:42:56 WARN manager.CatalogQueryManager: The table Payment contains a multi-column primary key. Sqoop will default to the column orderNumber only for this job.
12/06/15 15:42:56 INFO manager.DirectMySQLManager: Beginning mysqldump fast path import
12/06/15 15:42:56 INFO mapreduce.ImportJobBase: Beginning import of Payment
12/06/15 15:42:56 INFO mapred.JobClient: Running job: job_201206151241_0011
12/06/15 15:42:57 INFO mapred.JobClient: map 0% reduce 0%
12/06/15 15:43:05 INFO mapred.JobClient: map 100% reduce 0%
12/06/15 15:43:05 INFO mapred.JobClient: Job complete: job_201206151241_0011
12/06/15 15:43:05 INFO mapred.JobClient: Counters: 17
12/06/15 15:43:05 INFO mapred.JobClient: Job Counters
12/06/15 15:43:05 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=5914
12/06/15 15:43:05 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
12/06/15 15:43:05 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
12/06/15 15:43:05 INFO mapred.JobClient: Launched map tasks=1
12/06/15 15:43:05 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
12/06/15 15:43:05 INFO mapred.JobClient: File Output Format Counters
12/06/15 15:43:05 INFO mapred.JobClient: Bytes Written=0
12/06/15 15:43:05 INFO mapred.JobClient: FileSystemCounters
12/06/15 15:43:05 INFO mapred.JobClient: FILE_BYTES_WRITTEN=21518
12/06/15 15:43:05 INFO mapred.JobClient: CFS_BYTES_READ=87
12/06/15 15:43:05 INFO mapred.JobClient: File Input Format Counters
12/06/15 15:43:05 INFO mapred.JobClient: Bytes Read=0
12/06/15 15:43:05 INFO mapred.JobClient: Map-Reduce Framework
12/06/15 15:43:05 INFO mapred.JobClient: Map input records=1
12/06/15 15:43:05 INFO mapred.JobClient: Physical memory (bytes) snapshot=137998336
12/06/15 15:43:05 INFO mapred.JobClient: Spilled Records=0
12/06/15 15:43:05 INFO mapred.JobClient: CPU time spent (ms)=520
12/06/15 15:43:05 INFO mapred.JobClient: Total committed heap usage (bytes)=121241600
12/06/15 15:43:05 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2082865152
12/06/15 15:43:05 INFO mapred.JobClient: Map output records=0
12/06/15 15:43:05 INFO mapred.JobClient: SPLIT_RAW_BYTES=87
12/06/15 15:43:05 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 9.2642 seconds (0 bytes/sec)
12/06/15 15:43:05 INFO mapreduce.ImportJobBase: Retrieved 0 records.
12/06/15 15:43:05 INFO tool.CodeGenTool: Beginning code generation
12/06/15 15:43:05 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `Product` AS t LIMIT 1
12/06/15 15:43:06 INFO orm.CompilationManager: HADOOP_HOME is /home/pradeeban/programs/dse-2.1/resources/hadoop/bin/..
Note: /tmp/sqoop-pradeeban/compile/926ddf787c73be06c4e2ad1f8fc522f1/Product.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
12/06/15 15:43:06 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-pradeeban/compile/926ddf787c73be06c4e2ad1f8fc522f1/ Product.jar
12/06/15 15:43:06 INFO manager.DirectMySQLManager: Beginning mysqldump fast path import
12/06/15 15:43:06 INFO mapreduce.ImportJobBase: Beginning import of Product
12/06/15 15:43:07 INFO mapred.JobClient: Running job: job_201206151241_0012
12/06/15 15:43:08 INFO mapred.JobClient: map 0% reduce 0%
12/06/15 15:43:16 INFO mapred.JobClient: map 100% reduce 0%
12/06/15 15:43:16 INFO mapred.JobClient: Job complete: job_201206151241_0012
12/06/15 15:43:16 INFO mapred.JobClient: Counters: 18
12/06/15 15:43:16 INFO mapred.JobClient: Job Counters
12/06/15 15:43:16 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=5961
12/06/15 15:43:16 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
12/06/15 15:43:16 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
12/06/15 15:43:16 INFO mapred.JobClient: Launched map tasks=1
12/06/15 15:43:16 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
12/06/15 15:43:16 INFO mapred.JobClient: File Output Format Counters
12/06/15 15:43:16 INFO mapred.JobClient: Bytes Written=248262
12/06/15 15:43:16 INFO mapred.JobClient: FileSystemCounters
12/06/15 15:43:16 INFO mapred.JobClient: FILE_BYTES_WRITTEN=21527
12/06/15 15:43:16 INFO mapred.JobClient: CFS_BYTES_WRITTEN=248262
12/06/15 15:43:16 INFO mapred.JobClient: CFS_BYTES_READ=87
12/06/15 15:43:16 INFO mapred.JobClient: File Input Format Counters
12/06/15 15:43:16 INFO mapred.JobClient: Bytes Read=0
12/06/15 15:43:16 INFO mapred.JobClient: Map-Reduce Framework
12/06/15 15:43:16 INFO mapred.JobClient: Map input records=1
12/06/15 15:43:16 INFO mapred.JobClient: Physical memory (bytes) snapshot=144871424
12/06/15 15:43:16 INFO mapred.JobClient: Spilled Records=0
12/06/15 15:43:16 INFO mapred.JobClient: CPU time spent (ms)=1030
12/06/15 15:43:16 INFO mapred.JobClient: Total committed heap usage (bytes)=121241600
12/06/15 15:43:16 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2085318656
12/06/15 15:43:16 INFO mapred.JobClient: Map output records=300
12/06/15 15:43:16 INFO mapred.JobClient: SPLIT_RAW_BYTES=87
12/06/15 15:43:16 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 9.2613 seconds (0 bytes/sec)
12/06/15 15:43:16 INFO mapreduce.ImportJobBase: Retrieved 300 records.
I found DataStax an interesting project to explore more. I have blogged on the issues that I faced on this as a learner, and how easily can they be fixed - Issues that you may encounter during the migration to Cassandra using DataStax/Sqoop and the fixes.