H2O.ai Adventures in Artificial Intelligence (ai)

Background

Although I HAVE NOT thought about Artificial Intelligence, ai, since i was a student in Michael Arbib’s class studying for my M.S., when i became aware of H2O.ai.com, i decided it was time to jump in. 🙂

The following will be a chronicle of my adventures. 🙂

THIS IS A WORK IN PROGRESS

Big Data Hadoop vs Apache Spark

Downloads: (H2O vs Sparkling Water)

H2O.ai’s offerings, H2O and Sparkling Water, seemed to pose the question, “What Big Data platform should I choose, Hadoop or Apache Spark?” I have learned that they are not competitors. Katherine Noyes says in Infoworld,

“They do different things. … Hadoop is essentially a distributed data infrastructure … Spark, on the other hand, is a data-processing tool that operates on those distributed data collection”.

OK. But which of H2O.ai’s Downloads, only 2 when i started, should i choose to investigate? I picked Sparkling Water because of a page explaining the ai “Classification” Use Case.

Goal Install & RUN PySparkling

Here’s some notes for PySparkling installation on Windows 10.
Be prepared for (SysAdmin, SysAdmin, … more SysAAdmin)!

Install Apache Spark (to use PySpark)

  • Apache Spark needs to be installed first
    • Downloads\ApacheSpark\spark-1.6.2-bin-hadoop2.6
    • > echo %PYSPARK_PYTHON% == C:\Python27\python.exe
    • Test Run PySpark in the PySpark Shell
    • > cd Downloads\ApacheSpark\spark-1.6.2-bin-hadoop2.6
    • > .\bin\pyspark.cmd
    • Test with QuickStart N.B. Click the Python_Tab
    • RESULT: OK
    • Test Run PySpark as a Self-Contained Application
    • Test with Self-Contained Application N.B. Click the Python_Tab
    • RESULT: NO GOOD –

Self-Contained PySpark RESULT

Here’s the Self-Contained RESULT with NO MODIFICATIONS of the sys.path

"""SimpleApp.py"""
from pyspark import SparkContext

logFile = "YOUR_SPARK_HOME/README.md"  # Should be some file on your system
sc = SparkContext("local", "Simple App")
logData = sc.textFile(logFile).cache()

numAs = logData.filter(lambda s: 'a' in s).count()
numBs = logData.filter(lambda s: 'b' in s).count()

print("Lines with a: %i, lines with b: %i" % (numAs, numBs))
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-024ac6cc8be6> in <module>()
      1 """SimpleApp.py"""
----> 2 from pyspark import SparkContext
      3 
      4 logFile = "YOUR_SPARK_HOME/README.md"  # Should be some file on your system
      5 sc = SparkContext("local", "Simple App")

ImportError: No module named pyspark

sys.path Analysis – PySpark Shell vs Plain Python27

<br /># This is pyspark shell sys.path
pysparkShellSysPath = '''
C:\Users\joeco\AppData\Local\Temp\spark-a00a2ab4-63a0-404f-b607-5f34c4206e76\userFiles-5a9b86bf-a518-4fef-b4de-b42005b143d5
C:\Python27\lib\site-packages\pywebview-0.8.2-py2.7.egg
C:\Python27\lib\site-packages\ouimeaux-0.7.9.post0-py2.7.egg
C:\Python27\lib\site-packages\gevent_socketio-0.3.6-py2.7.egg
C:\Python27\lib\site-packages\flask_restful-0.3.5-py2.7.egg
C:\Python27\lib\site-packages\pysignals-0.1.2-py2.7.egg
C:\Python27\lib\site-packages\pyyaml-3.11-py2.7-win-amd64.egg
C:\Python27\lib\site-packages\requests-2.9.1-py2.7.egg
C:\Python27\lib\site-packages\gevent-1.1rc3-py2.7-win-amd64.egg
C:\Python27\lib\site-packages\gevent_websocket-0.9.5-py2.7.egg
C:\Python27\lib\site-packages\pytz-2015.7-py2.7.egg
C:\Python27\lib\site-packages\six-1.10.0-py2.7.egg
C:\Python27\lib\site-packages\flask-0.10.1-py2.7.egg
C:\Python27\lib\site-packages\aniso8601-1.1.0-py2.7.egg
C:\Python27\lib\site-packages\greenlet-0.4.9-py2.7-win-amd64.egg
C:\Python27\lib\site-packages\itsdangerous-0.24-py2.7.egg
C:\Python27\lib\site-packages\werkzeug-0.11.3-py2.7.egg
C:\Python27\lib\site-packages\python_dateutil-2.4.2-py2.7.egg
C:\Python27\lib\site-packages\python_registry-1.1.0-py2.7.egg
C:\Python27\lib\site-packages\enum34-1.1.2-py2.7.egg
C:\Python27\lib\site-packages\speedtest_cli-0.3.4-py2.7.egg
C:\Python27\lib\site-packages\midi-0.2.3-py2.7.egg
C:\Python27\lib\site-packages\h2o_pysparkling_1.6-1.6.5-py2.7.egg
C:\Python27\lib\site-packages\tabulate-0.7.5-py2.7.egg
C:\Python27\lib\site-packages\future-0.15.2-py2.7.egg
C:\Users\joeco\Downloads\ApacheSpark\spark-1.6.2-bin-hadoop2.6\python\lib\py4j-0.9-src.zip
C:\Users\joeco\Downloads\ApacheSpark\spark-1.6.2-bin-hadoop2.6\python
C:\Users\joeco\Downloads\ApacheSpark\spark-1.6.2-bin-hadoop2.6
C:\WINDOWS\SYSTEM32\python27.zip
C:\Python27\DLLs
C:\Python27\lib
C:\Python27\lib\plat-win
C:\Python27\lib\lib-tk
C:\Python27
C:\Python27\lib\site-packages
C:\Python27\lib\site-packages\win32
C:\Python27\lib\site-packages\win32\lib
C:\Python27\lib\site-packages\Pythonwin
C:\Python27\lib\site-packages\wx-3.0-msw
'''


# This is plain python 2.7 sys.path
plainPythonSysPath = '''
C:\Python27\lib\site-packages\pywebview-0.8.2-py2.7.egg
C:\Python27\lib\site-packages\ouimeaux-0.7.9.post0-py2.7.egg
C:\Python27\lib\site-packages\gevent_socketio-0.3.6-py2.7.egg
C:\Python27\lib\site-packages\flask_restful-0.3.5-py2.7.egg
C:\Python27\lib\site-packages\pysignals-0.1.2-py2.7.egg
C:\Python27\lib\site-packages\pyyaml-3.11-py2.7-win-amd64.egg
C:\Python27\lib\site-packages\requests-2.9.1-py2.7.egg
C:\Python27\lib\site-packages\gevent-1.1rc3-py2.7-win-amd64.egg
C:\Python27\lib\site-packages\gevent_websocket-0.9.5-py2.7.egg
C:\Python27\lib\site-packages\pytz-2015.7-py2.7.egg
C:\Python27\lib\site-packages\six-1.10.0-py2.7.egg
C:\Python27\lib\site-packages\flask-0.10.1-py2.7.egg
C:\Python27\lib\site-packages\aniso8601-1.1.0-py2.7.egg
C:\Python27\lib\site-packages\greenlet-0.4.9-py2.7-win-amd64.egg
C:\Python27\lib\site-packages\itsdangerous-0.24-py2.7.egg
C:\Python27\lib\site-packages\werkzeug-0.11.3-py2.7.egg
C:\Python27\lib\site-packages\python_dateutil-2.4.2-py2.7.egg
C:\Python27\lib\site-packages\python_registry-1.1.0-py2.7.egg
C:\Python27\lib\site-packages\enum34-1.1.2-py2.7.egg
C:\Python27\lib\site-packages\speedtest_cli-0.3.4-py2.7.egg
C:\Python27\lib\site-packages\midi-0.2.3-py2.7.egg
C:\Python27\lib\site-packages\h2o_pysparkling_1.6-1.6.5-py2.7.egg
C:\Python27\lib\site-packages\tabulate-0.7.5-py2.7.egg
C:\Python27\lib\site-packages\future-0.15.2-py2.7.egg
C:\WINDOWS\SYSTEM32\python27.zip
C:\Python27\DLLs
C:\Python27\lib
C:\Python27\lib\plat-win
C:\Python27\lib\lib-tk
C:\Python27
C:\Python27\lib\site-packages
C:\Python27\lib\site-packages\win32
C:\Python27\lib\site-packages\win32\lib
C:\Python27\lib\site-packages\Pythonwin
C:\Python27\lib\site-packages\wx-3.0-msw
'''

# print('hello', len( sorted(pysprkShellSysPath.splitlines()) ) )


pysparksp = sorted(pysparkShellSysPath.splitlines())
len(pysparksp)
plainsp = sorted(plainPythonSysPath.splitlines())
len(plainsp)
for i in range( max(len(pysparksp),len(plainsp)) ):
    if i < len(pysparksp):  print ('pyspk', pysparksp[i])
    if i < len(plainsp  ):  print ('plain', plainsp[i])
    print 

----------------------OUTPUT---------------------------
('pyspk', '')
('plain', '')

('pyspk', 'C:\\Python27')
('plain', 'C:\\Python27')

('pyspk', 'C:\\Python27\\DLLs')
('plain', 'C:\\Python27\\DLLs')

('pyspk', 'C:\\Python27\\lib')
('plain', 'C:\\Python27\\lib')

('pyspk', 'C:\\Python27\\lib\\lib-tk')
('plain', 'C:\\Python27\\lib\\lib-tk')

('pyspk', 'C:\\Python27\\lib\\plat-win')
('plain', 'C:\\Python27\\lib\\plat-win')

('pyspk', 'C:\\Python27\\lib\\site-packages')
('plain', 'C:\\Python27\\lib\\site-packages')

('pyspk', 'C:\\Python27\\lib\\site-packages')
('plain', 'C:\\Python27\\lib\\site-packages')

('pyspk', 'C:\\Python27\\lib\\site-packages\x07niso8601-1.1.0-py2.7.egg')
('plain', 'C:\\Python27\\lib\\site-packages\x07niso8601-1.1.0-py2.7.egg')

('pyspk', 'C:\\Python27\\lib\\site-packages\tabulate-0.7.5-py2.7.egg')
('plain', 'C:\\Python27\\lib\\site-packages\tabulate-0.7.5-py2.7.egg')

('pyspk', 'C:\\Python27\\lib\\site-packages\x0clask-0.10.1-py2.7.egg')
('plain', 'C:\\Python27\\lib\\site-packages\x0clask-0.10.1-py2.7.egg')

('pyspk', 'C:\\Python27\\lib\\site-packages\x0clask_restful-0.3.5-py2.7.egg')
('plain', 'C:\\Python27\\lib\\site-packages\x0clask_restful-0.3.5-py2.7.egg')

('pyspk', 'C:\\Python27\\lib\\site-packages\x0cuture-0.15.2-py2.7.egg')
('plain', 'C:\\Python27\\lib\\site-packages\x0cuture-0.15.2-py2.7.egg')

('pyspk', 'C:\\Python27\\lib\\site-packages\\Pythonwin')
('plain', 'C:\\Python27\\lib\\site-packages\\Pythonwin')

('pyspk', 'C:\\Python27\\lib\\site-packages\\enum34-1.1.2-py2.7.egg')
('plain', 'C:\\Python27\\lib\\site-packages\\enum34-1.1.2-py2.7.egg')

('pyspk', 'C:\\Python27\\lib\\site-packages\\gevent-1.1rc3-py2.7-win-amd64.egg')
('plain', 'C:\\Python27\\lib\\site-packages\\gevent-1.1rc3-py2.7-win-amd64.egg')

('pyspk', 'C:\\Python27\\lib\\site-packages\\gevent_socketio-0.3.6-py2.7.egg')
('plain', 'C:\\Python27\\lib\\site-packages\\gevent_socketio-0.3.6-py2.7.egg')

('pyspk', 'C:\\Python27\\lib\\site-packages\\gevent_websocket-0.9.5-py2.7.egg')
('plain', 'C:\\Python27\\lib\\site-packages\\gevent_websocket-0.9.5-py2.7.egg')

('pyspk', 'C:\\Python27\\lib\\site-packages\\greenlet-0.4.9-py2.7-win-amd64.egg')
('plain', 'C:\\Python27\\lib\\site-packages\\greenlet-0.4.9-py2.7-win-amd64.egg')

('pyspk', 'C:\\Python27\\lib\\site-packages\\h2o_pysparkling_1.6-1.6.5-py2.7.egg')
('plain', 'C:\\Python27\\lib\\site-packages\\h2o_pysparkling_1.6-1.6.5-py2.7.egg')

('pyspk', 'C:\\Python27\\lib\\site-packages\\itsdangerous-0.24-py2.7.egg')
('plain', 'C:\\Python27\\lib\\site-packages\\itsdangerous-0.24-py2.7.egg')

('pyspk', 'C:\\Python27\\lib\\site-packages\\midi-0.2.3-py2.7.egg')
('plain', 'C:\\Python27\\lib\\site-packages\\midi-0.2.3-py2.7.egg')

('pyspk', 'C:\\Python27\\lib\\site-packages\\ouimeaux-0.7.9.post0-py2.7.egg')
('plain', 'C:\\Python27\\lib\\site-packages\\ouimeaux-0.7.9.post0-py2.7.egg')

('pyspk', 'C:\\Python27\\lib\\site-packages\\pysignals-0.1.2-py2.7.egg')
('plain', 'C:\\Python27\\lib\\site-packages\\pysignals-0.1.2-py2.7.egg')

('pyspk', 'C:\\Python27\\lib\\site-packages\\python_dateutil-2.4.2-py2.7.egg')
('plain', 'C:\\Python27\\lib\\site-packages\\python_dateutil-2.4.2-py2.7.egg')

('pyspk', 'C:\\Python27\\lib\\site-packages\\python_registry-1.1.0-py2.7.egg')
('plain', 'C:\\Python27\\lib\\site-packages\\python_registry-1.1.0-py2.7.egg')

('pyspk', 'C:\\Python27\\lib\\site-packages\\pytz-2015.7-py2.7.egg')
('plain', 'C:\\Python27\\lib\\site-packages\\pytz-2015.7-py2.7.egg')

('pyspk', 'C:\\Python27\\lib\\site-packages\\pywebview-0.8.2-py2.7.egg')
('plain', 'C:\\Python27\\lib\\site-packages\\pywebview-0.8.2-py2.7.egg')

('pyspk', 'C:\\Python27\\lib\\site-packages\\pyyaml-3.11-py2.7-win-amd64.egg')
('plain', 'C:\\Python27\\lib\\site-packages\\pyyaml-3.11-py2.7-win-amd64.egg')

('pyspk', 'C:\\Python27\\lib\\site-packages\\six-1.10.0-py2.7.egg')
('plain', 'C:\\Python27\\lib\\site-packages\\six-1.10.0-py2.7.egg')

('pyspk', 'C:\\Python27\\lib\\site-packages\\speedtest_cli-0.3.4-py2.7.egg')
('plain', 'C:\\Python27\\lib\\site-packages\\speedtest_cli-0.3.4-py2.7.egg')

('pyspk', 'C:\\Python27\\lib\\site-packages\\werkzeug-0.11.3-py2.7.egg')
('plain', 'C:\\Python27\\lib\\site-packages\\werkzeug-0.11.3-py2.7.egg')

('pyspk', 'C:\\Python27\\lib\\site-packages\\win32')
('plain', 'C:\\Python27\\lib\\site-packages\\win32')

('pyspk', 'C:\\Python27\\lib\\site-packages\\win32\\lib')
('plain', 'C:\\Python27\\lib\\site-packages\\win32\\lib')

('pyspk', 'C:\\Python27\\lib\\site-packages\\wx-3.0-msw')
('plain', 'C:\\Python27\\lib\\site-packages\\wx-3.0-msw')

('pyspk', 'C:\\Users\\joeco\\AppData\\Local\\Temp\\spark-a00a2ab4-63a0-404f-b607-5f34c4206e76\\userFiles-5a9b86bf-a518-4fef-b4de-b42005b143d5')
('plain', 'C:\\WINDOWS\\SYSTEM32\\python27.zip')

('pyspk', 'C:\\Users\\joeco\\Downloads\\ApacheSpark\\spark-1.6.2-bin-hadoop2.6')
('plain', 'equests-2.9.1-py2.7.egg')

('pyspk', 'C:\\Users\\joeco\\Downloads\\ApacheSpark\\spark-1.6.2-bin-hadoop2.6\\python')

('pyspk', 'C:\\Users\\joeco\\Downloads\\ApacheSpark\\spark-1.6.2-bin-hadoop2.6\\python\\lib\\py4j-0.9-src.zip')

('pyspk', 'C:\\WINDOWS\\SYSTEM32\\python27.zip')

('pyspk', 'equests-2.9.1-py2.7.egg')

My Question: Am I missing some secret sauce to set the sys.path?

Answer: I don’t know yet.

Pursuing the Secret, sys.path, Sauce

This morning i went back to my Original Goal, run PySparkling, NOT PySpark, but PySparkling.

  • I found a new download page for Sparkling Water, PySparkling’s Uncle? 🙂
  • I chose my Spark version 1.6 out of [1.4, 1.5, 1.6]
  • Went to the 1.6 Download Page, downloaded Sparkling Water and clicked on the Python Tab which said “Get started with PySparkling”. Hallelulia! 🙂

Get started with PySparkling Steps

  1. Download Spark
    1.1 DONE
  2. Point SPARK_HOME to the existing installation of Spark and export variable MASTER.
    >echo %SPARK_HOME%
    ...\Downloads\ApacheSpark\spark-1.6.2-bin-hadoop2.6
    
    >echo %MASTER%
    local-cluster[3,2,1024]
    
    >
    
  3. From your terminal, run:
    #To start an interactive Python terminal-
    bin/pysparkling
    

PROBLEM: No bin/pysparkling.cmd

BUT sparkling-env.cmd exists

  • Could bin/pysparkling or bin/sparkling-env.cmd contain Secret sys.path Sauce?
  • I Need a break from Sys Admin
  • I Need TO CODE SOMETHING

MORE NEXT TIME! 🙂

TODO – FOLLOWING NEEDS WORK = UNPUBLISHED

PySparkling Installation

N.B. Click the Python_Tab

  • Be careful of your python environment.
  • I am running 2 & 3.
  • PySparkling needs 2.
C:\Users\joeco>python
Python 3.5.1 (v3.5.1:37a07cee5969, Dec  6 2015, 01:54:25) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from pysparkling import Context
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python35\lib\site-packages\h2o_pysparkling_1.6-1.6.5-py3.5.egg\pysparkling\__init__.py", line 11, in <module>
    from pysparkling.context import H2OContext
  File "C:\Python35\lib\site-packages\h2o_pysparkling_1.6-1.6.5-py3.5.egg\pysparkling\context.py", line 142
    print self
             ^
SyntaxError: Missing parentheses in call to 'print'
>>>
  • Be careful of your version of Sparkling Water
  • Be careful of your version of Spark
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s