Thread.currentThread.join()

From Programmer, For Programmers

  • Apache Kafka Cookbook
  • Apache Flume Cookbook
  • Apache MINA
  • About Me

Hadoop

Crunching Data with Apache Crunch – Part 6 – Word Co-occurrence

December 27, 2012February 25, 2015 ashish 2 Comments Apache Crunch, Hadoop

In this post we look at solving Word Co-occurrence problem using Crunch. Please refer other post in the series for some basic stuff Part 1 Part 2 Part 3 Part 4 Part 5 In this post, we shall solve the word co-occurrence problem using Pairs, described in Data-Intensive Text Processing with MapReduce using Crunch. We […]

Continue reading


Crunching Data with Apache Crunch – Part 5 – Inverted Index

November 22, 2012February 25, 2015 ashish Leave a comment Apache Crunch, Hadoop

In this post we look at creating Inverted Index using Crunch. Please refer other post in the series for some basic stuff Part 1 Part 2 Part 3 Part 4 This example is an extension to Word Count example. There are various examples of creating Inverted Index using Hadoop on the net. Here is the […]

Continue reading


Crunching Data with Apache Crunch – Part 4

November 20, 2012February 25, 2015 ashish Leave a comment Apache Crunch, Hadoop, Java

So far we have looked at Basic stuff regarding Crunch. In this post, lets look at Join feature of Crunch. Please refer other post in the series for some basic stuff Part 1, Part 2 and Part 3 Let’s prepare some background on Data before we jump into code. For the purpose of join, I […]

Continue reading


Crunching Data with Apache Crunch – Part 3

November 16, 2012February 25, 2015 ashish Leave a comment Apache Crunch, Hadoop

In Part 2 of the series, we saw finding Top 100 words. Lets explore a bit about filtering the data. From the word list, we may be interested in removing certain words. For this posy, we shall remove “the” from the list of words. This can be done while splitting text as well, but would […]

Continue reading


Crunching Data with Apache Crunch – Part 2

November 15, 2012February 25, 2015 ashish Leave a comment Apache Crunch, Hadoop

In Part 1, we saw the word count example. Lets built more on top of it. A very common use case of Word Count example would be to find, Top 100 words. Using MapReduce, you would use Secondary Sort and get this. Let try to achieve the same functionality using Crunch Requirement Find Top 100 […]

Continue reading


Crunching Data with Apache Crunch – Part 1

November 13, 2012February 25, 2015 ashish Leave a comment Apache Crunch, Hadoop

Apache Crunch (incubating) is a Java library for writing, testing, and running MapReduce pipelines, based on Google’s FlumeJava. Its goal is to make pipelines that are composed of many user-defined functions simple to write, easy to test, and efficient to run. This multipart series takes a deep dive into the new upcoming tool. The first […]

Continue reading


Posts navigation

Older posts
Search for:

Java and Open Source

Java and Open Source
  • Popular
  • Recent
  • [Amazon S3] Reading File content from S3 bucket in Java Feb 24, 2015
  • [SMPP] Sending long SMS through SMPP Jan 26, 2009
  • Multi Threaded Trap Receiver using SNMP4J Dec 15, 2008
  • [Amazon S3] Listing Bucket contents programatically Feb 22, 2015
  • Getting Started with Cassandra Jan 24, 2011
  • [Kafka Cookbook] Building from Source Aug 5, 2015
  • [Flume Cookbook] Extracting/Validating File Channel data Jul 21, 2015
  • [Apache Oozie] Quick Setup Jun 30, 2015
  • [Kafka Cookbook] Simple Consumer Jun 26, 2015
  • [Kafka Cookboook] Simple Producer Jun 26, 2015
  • Laxmikanta Nayak on [Amazon S3] Reading File content from S3 bucket in Java

    Because your file is not public , make sure you are reading (...)

  • Jon Ander on Quick Start with AsyncHBase

    Dear ashish, I have worked with asynhbase but opening (...)

  • Amit on [Amazon S3] Reading File content from S3 bucket in Java

    when i am accessing bucket, getting exception. Error (...)

  • Sreejith on [Amazon S3] Listing Bucket contents programatically

    Hi Ashish, Good post. Is there any way to filter the (...)

  • Garima Jain on Apache MINA – Blacklist Filter Explained

    Can you provide me the link to refer to whitelist

Categories

  • Amazon SDK
  • Apache Crunch
  • Apache Flume
  • Apache Kafka
  • Apache MINA
  • Apache Oozie
  • Apache Pig
  • Apache Sanselan
  • Apache Solr
  • Apache Spark
  • Application Programming
  • Avro
  • Cassandra
  • Ehcache
  • Elastic Search
  • Hadoop
  • Java
  • Java Tips
  • jclouds
  • JVM
  • Netty
  • Network Programming
  • Personal
  • SMPP
  • SNMP
  • Solr
  • Spring Security
  • Terracotta
  • Uncategorized
  • Useful Commands
  • Vysper
  • Zookeeper

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 108 other subscribers

Tag Cloud

Add new tag Apache Flume Apache Kafka Apache MINA Apache POI Apache Sanselan Apache Spark Apache Spark examples Apache Vysper Application Architecture Blacklist Filter Cassandra Clustered barrier Clustered Map CPU CumulativeProtocolDecoder Custom User Details Darwin Distributed Cache Ehcache Frameworks GData API Google Calendar Hadoop Hector How-to IoBuffer Java Java IO frameworks Kafka Cookbook Long SMS Microsoft Project MINA Mocking MPXJ MPXJ Read MPP MS Project Networking Network Management Project Management SMP SNMP4J Terracotta Terracotta Toolkit Tutorial

Java and Open Source

Java and Open Source
  • Apache Kafka Cookbook
  • Apache Flume Cookbook
  • Apache MINA
  • About Me
Ashish Paliwal All rights reserved. Theme by Colorlib Powered by WordPress