Dev 007: Market Basket Analysis with Apache Spark MLlib FP-Growth

Saturday, August 15, 2015

Market Basket Analysis with Apache Spark MLlib FP-Growth

Market Basket Analysis

source: http://www.noahdatatech.com/solutions/predictive-analytics/

Market basket analysis is identifying items in the supermarket which customers are more likely to buy together.
e.g., Customers who bought pampers also bought beer

This is important for super markets to arrange their items in a consumer convenient manner as well as to come up with promotions taking item affinity in to consideration.

Frequent Item set Mining and Association Rule Learning

Frequent item set mining is a sub area in data mining that focuses on identifying frequently co-occuring items. Once, the frequent item set is ready, we can come up with rules         to derive association between items.
        e.g., Frequent item set = {pampers, beer, milk}, association rule = {pampers, milk ---> beer}

        There are two possible popular approaches for frequent item set mining and association rule learning as given below:

Apriori algorithm
FP-Growth algorithm

To explain above algorithms, let us consider example with 4 customers making 4 transactions in supermarket that contain 7 items in total as given below:

    Transaction 1: Jana’s purchase: egg, beer, pampers, milk
    Transaction 2: Abi’s purchase: carrot, milk, pampers, beer
    Transaction 3: Mahesha’s purchase: perfume, tissues, carrot
    Transaction 4: Jayani’s purchase: perfume, pampers, beer

    Item index
    1: egg, 2: beer, 3: pampers, 4: carrot, 5: milk, 6: perfume, 7: tissues

Using Apriori algorithm

Apriori algorithm identifies frequent item sets by starting individual items and extending item set by one at a time. This is known as candidate generation step.
This algorithm makes the assumption that any sub set of item within a frequent item set is also frequent.

Transaction: Items
1: 1, 2, 3, 5
2: 4, 5, 3, 2
3: 6, 7, 4
4: 6, 3, 2

Minimum Support

Minimum support is used to prune the associations that are less frequent.

Minimum support = number of times item occur in transactions/ number of transactions

For example, lets say we define minimum support as 0.5.
Calculating support for egg is 1/4 = 0.25 (0.25 < 0.5), so that is eliminated. Support for beer is 3/4 = 0.75 (0.75 > 0.5) is it is considered for further processing.

Calculation of support for all items

size of the candidate itemset = 1

itemset: support
1: 0.25: eliminated
2: 0.75
3: 0.75
4: 0.5
5: 0.5
6: 0.5
7: 0.25: eliminated

remaining items: 2, 3, 4, 5, 6

extend candidate itemset by 1
size of the items = 2

itemset: support
2, 3: 0.75
2, 4: 0.25: eliminated
2, 5: 0.5
2, 6: 0.25: eliminated
3, 4: 0.25: eliminated
3, 5: 0.5
3, 6: 0.25: eliminated
4, 5: 0.25: eliminated
4, 6: 0.25: eliminated
5, 6: 0.25: eliminated

remaining items: {2,3},{ 2, 5}, {3, 5}

extend candidate itemset by 1
size of the items = 3

2, 3, 5: 0.5

Using FP-Growth algorithm

In FP-Growth algorithm, frequent patterns are mined using a tree approach (construction of Frequent Patter Tree)
FP-Growth algorithm has been proven to execute much faster than the Apriori algorithm.

Calculate support for frequent items and sort in degreasing order of the frequency as given below:

item: frequency
1: 1 - eliminated
2: 3
3: 3
4: 2
5: 2
6: 2
7: 1 - eliminated

Decreasing order of the frequency
2 (3), 3 (3), 4 (2), 5 (2), 6 (2)

Construction of FP-Tree

A) Transaction 1
1, 2, 3, 5 > 2 (1), 3 (1), 5 (1)

B) Transaction 2
4, 5, 3, 2 > 2 (2), 3 (2), 4 (1), 5 (1)

C) Transaction 3
6, 7, 4 > 4 (1), 6 (1)

D) Transaction 4
6, 3, 2 > 2 (3), 3 (3), 6 (1)

Once the FP-tree is constructed, frequent item sets are calculated using depth first strategy along with divide and conquer mechanism.
This enables algorithm is be computationally more effective and parallelizable (using map-reduce).

Code Example with Apache Spark MLlib

    public static void main(String[] args) {

        SparkConf conf = new SparkConf().setAppName("Market Basket Analysis");
        JavaSparkContext sc = new JavaSparkContext(conf);

        // Items
        String item1 = "egg";
        String item2 = "beer";
        String item3 = "pampers";
        String item4 = "carrot";
        String item5 = "milk";
        String item6 = "perfume";
        String item7 = "tissues";

        // Transactions
        List transaction1 = new ArrayList();
        transaction1.add(item1);
        transaction1.add(item2);
        transaction1.add(item3);
        transaction1.add(item5);

        List transaction2 = new ArrayList();
        transaction2.add(item4);
        transaction2.add(item5);
        transaction2.add(item3);
        transaction2.add(item2);

        List transaction3 = new ArrayList();
        transaction3.add(item6);
        transaction3.add(item7);
        transaction3.add(item4);

        List transaction4 = new ArrayList();
        transaction4.add(item6);
        transaction4.add(item3);
        transaction4.add(item2);

        List> transactions = new ArrayList>();
        transactions.add(transaction1);
        transactions.add(transaction2);
        transactions.add(transaction3);
        transactions.add(transaction4);

        // Make transaction collection parallel with Spark
        JavaRDD> transactionsRDD = sc.parallelize(transactions);

        // Set configurations for FP-Growth
        FPGrowth fpg = new FPGrowth()
        .setMinSupport(0.5)
        .setNumPartitions(10);

        // Generate model
        FPGrowthModel model = fpg.run(transactionsRDD);

        // Display frequently co-occuring items
        for (FPGrowth.FreqItemset itemset: model.freqItemsets().toJavaRDD().collect()) {
           System.out.println("[" + Joiner.on(",").join(itemset.javaItems()) + "], " + itemset.freq());
        }
        sc.close();
    }

14 comments:

PinelaOctober 29, 2015 at 5:25 AM
"min support" is very well explained.
could you please elaborate on "num partitions" parameter?
thank you
ReplyDelete
Replies
UnknownJanuary 19, 2017 at 1:14 AM
very good explanation
ReplyDelete
Replies
UnknownMarch 9, 2017 at 10:43 PM
Nice explanation. Thank you
ReplyDelete
Replies
TejutejuSeptember 14, 2018 at 6:46 AM

It was really a nice post and i was really impressed by reading this
Big Data Hadoop Online Training Bangalore
ReplyDelete
Replies
InfinityWizardGlider101October 3, 2023 at 2:34 AM
Malatya
Kırıkkale
Aksaray
Bitlis
Manisa
FİET
ReplyDelete
Replies
QuantumNomadMatrix9000October 3, 2023 at 3:31 AM
Afyon
Antalya
Erzurum
Mersin
izmir
İHS
ReplyDelete
Replies
EchoMatrixGlider101October 6, 2023 at 2:56 AM
kars
sinop
sakarya
ankara
çorum
W25D
ReplyDelete
Replies
CosmicCipherA216October 24, 2023 at 10:22 AM
adana evden eve nakliyat
bolu evden eve nakliyat
diyarbakır evden eve nakliyat
sinop evden eve nakliyat
kilis evden eve nakliyat
EB5
ReplyDelete
Replies
338B0CliffordE1348November 8, 2023 at 5:43 PM
E65E3
Ordu Lojistik
Kırıkkale Şehir İçi Nakliyat
Urfa Şehirler Arası Nakliyat
Tekirdağ Şehir İçi Nakliyat
Niğde Şehir İçi Nakliyat
Ünye Oto Elektrik
Bitci Güvenilir mi
Adana Şehir İçi Nakliyat
Urfa Şehir İçi Nakliyat
ReplyDelete
Replies
1C4FEKarmaE17FENovember 10, 2023 at 6:22 PM
73874
Shibanomi Coin Hangi Borsada
Bitmex Güvenilir mi
Iğdır Şehir İçi Nakliyat
Antalya Şehirler Arası Nakliyat
Nevşehir Şehirler Arası Nakliyat
Çorum Evden Eve Nakliyat
Raca Coin Hangi Borsada
Satoshi Coin Hangi Borsada
Bone Coin Hangi Borsada
ReplyDelete
Replies
D525BWhitney45EB0December 24, 2023 at 9:29 AM
B75D9
kocaeli yabancı sohbet
artvin canli sohbet chat
antalya en iyi görüntülü sohbet uygulamaları
aydın rastgele görüntülü sohbet uygulaması
canlı görüntülü sohbet
batman bedava görüntülü sohbet sitesi
kocaeli sesli sohbet sitesi
eskişehir rastgele görüntülü sohbet
adana canlı sohbet et
ReplyDelete
Replies
FF857F4AE6Libby3633B74C6CDecember 31, 2024 at 8:06 PM
7785C15FED
instagram türk bot takipçi
ReplyDelete
Replies
AnonymousMay 14, 2025 at 9:57 AM
909A4319AD
takipçi al instagram
tiktok beğeni satın al
instagram takipçi
kaliteli takipçi
güvenilir takipçi
ReplyDelete
Replies
AnonymousMay 15, 2025 at 9:22 AM
6D9093F7F8
takipçi tiktok
twitter beğeni satın al
telafili takipçi
kaliteli takipçi
bayan takipçi
ReplyDelete
Replies

Add comment

Pages