Skip to content

Perpetto/hack4data_setup

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Hello, and welcome to HackFMI 8 - Hack for data!

Today you will deal with the following technologies:

We have set up a server for you guys with some e-commerce data.

What you can do (these are suggestions - you can do anything you like, really!):

  • [Scala] Familiarize yourself with PredictionIO machine learning framework and Spark. Download a few of the example PredictionIO templates below and try to get them up and running:

    When you try and use the templates, you can change engine.json as follows to utilize the e-commerce dataset we have prepared for you:

    • If the file contains an 'appId' key, change it to 1.
    • If the file contains an 'appName' key, change it to 'perpetto'.
  • [Python] Play around with Spark (state-of-the-art engine for large-scale data processing) - see this readme for details.

  • [Ruby / Python] Get familiar with Elasticsearch by running aggregations and gathering statistics. Some examples ay be:

    • Most Popoular Categories / Brands by Sessions
    • Most Popular Categories / Brands by Orders
    • Most Popular Items
    • Most Active Users
    • Months with the Most Orders
    • Most Effective Slots by Viewed Recommendations (ask us for details on this one)
  • [Scala] Build a PredictionIO template on your own starting with this skeleton template!

  • [Scala, Ruby/Python] Build a website / web service that utilizes machine learning technology! See (http://predictionio.incubator.apache.org/demo/tapster/) for some inspiration.

The server contains the following:

I. Elasticsearch instance, containing e-commerce data as follows:

  • profiles (index)
    • profile (type) - contains profile data
    • session (type) - contains profile session data. Each session is basically a collection of visits on the e-commerce site wihtin a 24-hour interval.
    • order (type) - contains profile cart/order data. A cart whill have a paid: true property if it has been purchased.
  • items (index)
    • item (type) - contains item data

II. Hbase / PredictionIO instance, containing e-commerce data in the form of events:

eventType    entityType    entityId    targetEntityType    targetEntityId    eventTime         properties
'$set'       'user'        <pid>       -                   -                 (ignore)          -
'$set'       'item'        <iid>       -                   -                 (ignore)          <item props from elasticsearch>
'buy'        'user'        <pid>       'item'              <iid>             <ISO 8601 date>   -
'view'        'user'       <pid>       'item'              <iid>             <ISO 8601 date>   -

$set events are used to set properties for entities.

view / buys events are used to store relationships between users and items.

'pid' is the profile id for a profile document in elasticsearch.

'iid' is the item id for an item document in elasticsearch.

About

Ruby Setup for HackFMI 8 Hack for Data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published