Scala configuration:
- Spark Scala Documentation
- Scala Cheatsheet
- Spark Rdd Cheat Sheet Scala
- Spark Sql Scala Cheat Sheet
- Scala Spark Sql
- Spark Scala Version
- Spark Scala Cheat Sheet Pdf
- To make sure scala is installed
Scala configuration: To make sure scala is installed $ scala -version Installation destination $ cd downloads Download zip file of spark $ tar xvf spark-2.3.0-bin-hadoop2.7.tgz Sourcing the. Although there are a lot of resources on using Spark with Scala, I couldn’t find a halfway decent cheat sheet except for the one here on Datacamp, but I thought it needs an update and needs to be just a bit more extensive than a one-pager. First off, a decent introduction on how Spark works —.
$ scala -version
- Installation destination
$ cd downloads
- Download zip file of spark
$ tar xvf spark-2.3.0-bin-hadoop2.7.tgz
- Sourcing the ~/.bashrc file
$ open ~/.bashrc
In .bashrc file, write this
$ source~/.bashrc
- Run scala
$ spark-shell
Scala cheat sheet:
https://www.tutorialspoint.com/scala/scala_basic_syntax.htm #best
Scala Example:
- To save the Scala codes in sublime text with file name test_2.scala
- To compile the program
$ scalac test_2.scala
- To run the program
$ scala test_2.scala
Spark configuration:
$ conda install spark
Scala ide code sample (load dataframe):
scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
scala> import sqlContext.sql
scala> val df = sqlContext.read.format(“com.databricks.spark.csv”).option(“header”, “true”).option(“inferSchema”, “true”).load(“taxi+_zone_lookup.csv”)
scala> df.columns
scala> df.count()
scala> df.printSchema()
scala> df.show(2)
scala> df.select(“Zone”).show(10)
scala> df.filter(df(“LocationID”) <= 11).select(“LocationID”).show(10)
scala> df.groupBy(“Zone”).count().show()
scala> df.registerTempTable(“B_friday”)
scala> sqlContext.sql(“select Zone from B_friday”).show()
Machine Learning part:
import org.apache.spark.ml.feature.RFormula
Scala IDE:
Eclipse
Reference:
Problem:
“Failed to find Spark jars directory (/Users/bh/downloads/spark/assembly/target/scala-2.10/jars).
You need to build Spark with the target “package” before running this program.”
Solution:
$ ./build/sbt assembly
$ build/sbt package
Then it’s good to operate:
~/downloads/spark$ bin/spark-shell
(This is the Scala version)
Problem:
Your PYTHONPATH points to a site-packages dir for Python 2.x but you are running Python 3.x!
PYTHONPATH is currently: “/Users/bridgethuang/downloads/spark/python/lib/py4j-0.10.4-src.zip:/Users/bridgethuang/downloads/spark/python/:”
You should `unset PYTHONPATH` to fix this.
Solution:
export PYTHONPATH=$PYTHONPATH:/usr/local/lib/python3.6/site-packages
Then it’s good to operate:
~/downloads/spark$ ./bin/pyspark
(This is the Pyspark version)
Pyspark in iPython Notebook:
$pip install findspark
import findspark
findspark.init()
from pyspark import SparkContext
from pyspark import SparkConf
Problem:
bin/spark-shell: line 57: /Users/bridgethuang//bin/spark-submit: No such file or directory
Problem: pip 10.0.1 gets warning “ModuleNotFoundError: No module named ‘pip._internal’ “
Solution: python3 -m pip uninstallspark
instead of “pip install spark”
Spark 2.1.0 doesn’t support python 3.6.0.
conda create -n py35 python=3.5 anaconda
source activate py35
where to find Bash_profile
$~/.bash_profile
name ‘execfile’ is not defined
Python2: execfile(filename, globals, locals)
Python3: exec(compile(open(filename, “rb”).read(), filename, ‘exec’), globals, locals)
Spark Scala Documentation
RDD
Resilient Distributed Dataset
Install/upgrade Pyspark
Download new package: http://spark.apache.org/downloads.html
$ tar -xzf spark-1.2.0-bin-hadoop2.4.tgz
$ sudo mv spark-1.2.0-bin-hadoop2.4 /opt/spark-1.2.0
$ sudo ln -s /opt/spark-2.3.0 /opt/spark̀
$ export SPARK_HOME=/opt/spark
$ export PATH=$SPARK_HOME/bin:$PATH
Problem:
Solution:
$ unset SPARK_HOME
$ ipython notebook –profile=pyspark
Scala Cheatsheet
Spyder Installation:
Spark Rdd Cheat Sheet Scala
$ conda install spyder
reference: https://gist.github.com/ololobus/4c221a0891775eaa86b0
$ tar xvf spark-2.4.0-bin-hadoop2.7.tgz
$ mv spark-2.4.0-bin-hadoop2.7 /opt/spark-2.4.0
$ ln -s /opt/spark-2.4.0 /opt/spark
$ export SPARK_HOME=/opt/spark
$ export PATH=$SPARK_HOME/bin:$PATH
Spark Sql Scala Cheat Sheet
# the path can be written in ./.bachrc file
To configure Spark working with Jupyter notebook and Anoaconda
Scala Spark Sql
In .bash_profile doc:
alias snotebook=’$SPARK_PATH/bin/pyspark –master local[2]’
$snotebook
Spark Scala Version
Reference:
Spark Scala Cheat Sheet Pdf
Thanks to Brendan O’Connor, this cheatsheet aims to be a quick reference of Scala syntactic constructions. Licensed by Brendan O’Connor under a CC-BY-SA 3.0 license.
variables | |
Good | Variable. |
Bad | Constant. |
Explicit type. | |
functions | |
Good Bad | Define function. Hidden error: without = it’s a procedure returning Unit ; causes havoc. Deprecated in Scala 2.13. |
Good Bad | Define function. Syntax error: need types for every arg. |
Type alias. | |
vs. | Call-by-value. Call-by-name (lazy parameters). |
Anonymous function. | |
vs. | Anonymous function: underscore is positionally matched arg. |
Anonymous function: to use an arg twice, have to name it. | |
Anonymous function: block style returns last expression. | |
Anonymous functions: pipeline style (or parens too). | |
Anonymous functions: to pass in multiple blocks, need outer parens. | |
Currying, obvious syntax. | |
Currying, obvious syntax. | |
Currying, sugar syntax. But then: | |
Need trailing underscore to get the partial, only for the sugar version. | |
Generic type. | |
Infix sugar. | |
Varargs. | |
packages | |
Wildcard import. | |
Selective import. | |
Renaming import. | |
Import all from java.util except Date . | |
At start of file: Packaging by scope: Package singleton: | Declare a package. |
data structures | |
Tuple literal (Tuple3 ). | |
Destructuring bind: tuple unpacking via pattern matching. | |
Bad | Hidden error: each assigned to the entire tuple. |
List (immutable). | |
Paren indexing (slides). | |
Cons. | |
same as | Range sugar. |
Empty parens is singleton value of the Unit type. Equivalent to void in C and Java. | |
control constructs | |
Conditional. | |
same as | Conditional sugar. |
While loop. | |
Do-while loop. | |
Break (slides). | |
same as | For-comprehension: filter/map. |
same as | For-comprehension: destructuring bind. |
same as | For-comprehension: cross product. |
For-comprehension: imperative-ish.sprintf style. | |
For-comprehension: iterate including the upper bound. | |
For-comprehension: iterate omitting the upper bound. | |
pattern matching | |
Good Bad | Use case in function args for pattern matching. |
Bad | v42 is interpreted as a name matching any Int value, and “42” is printed. |
Good | `v42` with backticks is interpreted as the existing val v42 , and “Not 42” is printed. |
Good | UppercaseVal is treated as an existing val, rather than a new pattern variable, because it starts with an uppercase letter. Thus, the value contained within UppercaseVal is checked against 3 , and “Not 42” is printed. |
object orientation | |
Constructor params - x is only available in class body. | |
Constructor params - automatic public member defined. | |
Constructor is class body. Declare a public member. Declare a gettable but not settable member. Declare a private member. Alternative constructor. | |
Anonymous class. | |
Define an abstract class (non-createable). | |
Define an inherited class. | |
Inheritance and constructor params (wishlist: automatically pass-up params by default). | |
Define a singleton (module-like). | |
Traits. Interfaces-with-implementation. No constructor params. mixin-able. | |
Multiple traits. | |
Must declare method overrides. | |
Create object. | |
Bad Good | Type error: abstract type. Instead, convention: callable factory shadowing the type. |
Class literal. | |
Type check (runtime). | |
Type cast (runtime). | |
Ascription (compile time). | |
options | |
Construct a non empty optional value. | |
The singleton empty optional value. | |
but | Null-safe optional value factory. |
same as | Explicit type for empty optional value. Factory for empty optional value. |
Pipeline style. | |
For-comprehension syntax. | |
same as | Apply a function on the optional value. |
same as | Same as map but function must return an optional value. |
same as | Extract nested option. |
same as | Apply a procedure on optional value. |
same as | Apply function on optional value, return default if empty. |
same as | Apply partial pattern match on optional value. |
same as | true if not empty. |
same as | true if empty. |
same as | true if not empty. |
same as | 0 if empty, otherwise 1 . |
same as | Evaluate and return alternate optional value if empty. |
same as | Evaluate and return default value if empty. |
same as | Return value, throw exception if empty. |
same as | Return value, null if empty. |
same as | Optional value satisfies predicate. |
same as | Optional value doesn't satisfy predicate. |
same as | Apply predicate on optional value or false if empty. |
same as | Apply predicate on optional value or true if empty. |
same as | Checks if value equals optional value or false if empty. |