Clone
 

ramangrover29 <ramangrover29@123451ca-8445-de46-9d55-352943316053> in asterixdb

removing commented content from pom.xml to avoid running of integration test, commenting was done temporarily as mvn install was not working

git-svn-id: https://hyracks.googlecode.com/svn/branches/hyracks_dev_next@1102 123451ca-8445-de46-9d55-352943316053

Modified TreeIndexDropOperator to use IIndex interface instead of ITreddIndex

git-svn-id: https://hyracks.googlecode.com/svn/branches/hyracks_dev_next@1101 123451ca-8445-de46-9d55-352943316053

added test case for hadoop compatibility layer

git-svn-id: https://hyracks.googlecode.com/svn/branches/hyracks_hadoop_compat_changes@480 123451ca-8445-de46-9d55-352943316053

    • -4
    • +22
    /hyracks-examples/text-example/pom.xml
    • -23
    • +0
    /hyracks-examples/text-example/textapp/.project
    • -13052
    • +0
    /hyracks-examples/text-example/textapp/data/file1.txt
    • -10216
    • +0
    /hyracks-examples/text-example/textapp/data/file2.txt
    • -155
    • +0
    /hyracks-examples/text-example/textapp/pom.xml
  1. … 9 more files in changeset.
1) made changes to pom.xml for hadoopcompatapp , so that working directory is appropriately set for the CC and NCs 2) refactored code

git-svn-id: https://hyracks.googlecode.com/svn/branches/hyracks_hadoop_compat_changes@479 123451ca-8445-de46-9d55-352943316053

    • -8
    • +18
    /hyracks-dataflow-hadoop/.settings/org.eclipse.jdt.core.prefs
refactored code in HadoopWriteOperatorDescriptor

git-svn-id: https://hyracks.googlecode.com/svn/branches/hyracks_hadoop_compat_changes@475 123451ca-8445-de46-9d55-352943316053

Refactored code in comaptibility layer to support submission of jobs against existing applications + made minor changes in hadoop operators

git-svn-id: https://hyracks.googlecode.com/svn/branches/hyracks_hadoop_compat_changes@460 123451ca-8445-de46-9d55-352943316053

Made changes in HadoopAdapter ( the part responsible for converting a JobConf into a JobSpec ) in accordance with the new mechanism of configuring partition constraints for operators

git-svn-id: https://hyracks.googlecode.com/svn/branches/hyracks_scheduling@307 123451ca-8445-de46-9d55-352943316053

Modified HadoopWriterOperatorDescriptor. The operator previously used to create Sequence files by opening FSDataOutputStream. Though this results in correct creation of the sequence file, it is still better to have it done by obtaining a SequenceFile writer from the outputFormat.

git-svn-id: https://hyracks.googlecode.com/svn/trunk@202 123451ca-8445-de46-9d55-352943316053

Modified HadoopWriterOperatorDescriptor. The operator previously used to create Sequence files by opening FSDataOutputStream. Though this results in correct creation of the sequence file, it is still better to have it done by obtaining a SequenceFile writer from the outputFormat.

git-svn-id: https://hyracks.googlecode.com/svn/trunk/hyracks@202 123451ca-8445-de46-9d55-352943316053

External Hadoop client like Pig/Hive use intermediate output formats. Earlier the write operator used to open a FSDataOutputStream and push in the bytes. This works fine but not with some custom output formats that do not write into HDFS but to some local storage. In order to be compatible with such custom formats, we must get the writer from the custom format. This check in ensures the the writer writes in a manner similar to the custom output format

git-svn-id: https://hyracks.googlecode.com/svn/trunk@191 123451ca-8445-de46-9d55-352943316053

External Hadoop client like Pig/Hive use intermediate output formats. Earlier the write operator used to open a FSDataOutputStream and push in the bytes. This works fine but not with some custom output formats that do not write into HDFS but to some local storage. In order to be compatible with such custom formats, we must get the writer from the custom format. This check in ensures the the writer writes in a manner similar to the custom output format

git-svn-id: https://hyracks.googlecode.com/svn/trunk/hyracks@191 123451ca-8445-de46-9d55-352943316053

    • -1
    • +259
    /hyracks-dataflow-hadoop/.settings/org.eclipse.jdt.core.prefs
Fixed issure related to initialization of Job Conf in HadoopReducer

git-svn-id: https://hyracks.googlecode.com/svn/trunk@190 123451ca-8445-de46-9d55-352943316053

Fixed issure related to initialization of Job Conf in HadoopReducer

git-svn-id: https://hyracks.googlecode.com/svn/trunk/hyracks@190 123451ca-8445-de46-9d55-352943316053

fixed issue related to initialization of JobConf instance before calling in the configure > used ReflectionUtils to create instance of mapper class > and the reducer class.

git-svn-id: https://hyracks.googlecode.com/svn/trunk/hyracks@188 123451ca-8445-de46-9d55-352943316053

fixed issue related to initialization of JobConf instance before calling in the configure > used ReflectionUtils to create instance of mapper class > and the reducer class.

git-svn-id: https://hyracks.googlecode.com/svn/trunk@188 123451ca-8445-de46-9d55-352943316053

Made changes to support org.apache.hadoop.mapreduce library in addition to org.apache.hadoop.mapred library. The new library is used in Hadoop client community, notably in Pig and Mahout. To be compatible with hadoop, this change is mandatory

git-svn-id: https://hyracks.googlecode.com/svn/trunk/hyracks@187 123451ca-8445-de46-9d55-352943316053

Made changes to support org.apache.hadoop.mapreduce library in addition to org.apache.hadoop.mapred library. The new library is used in Hadoop client community, notably in Pig and Mahout. To be compatible with hadoop, this change is mandatory

git-svn-id: https://hyracks.googlecode.com/svn/trunk@187 123451ca-8445-de46-9d55-352943316053

The compatibility layer now supports use of 'org.apache.hadoop.mapreduce' packages

git-svn-id: https://hyracks.googlecode.com/svn/trunk/hyracks@185 123451ca-8445-de46-9d55-352943316053

The compatibility layer now supports use of 'org.apache.hadoop.mapreduce' packages

git-svn-id: https://hyracks.googlecode.com/svn/trunk@185 123451ca-8445-de46-9d55-352943316053

Hadoop Operators currently do not support 'org.apache.hadoop.mapreduce.*' types and hence cannot run MR jobs referencing those types. In order to be compatible, we need to support them. > This change adds suport for mapreduce libraries. The changes are spread across all Hadoop operators. The compatibilty layer also changes in order to support the mapreduce package

git-svn-id: https://hyracks.googlecode.com/svn/trunk@184 123451ca-8445-de46-9d55-352943316053

Hadoop Operators currently do not support 'org.apache.hadoop.mapreduce.*' types and hence cannot run MR jobs referencing those types. In order to be compatible, we need to support them. > This change adds suport for mapreduce libraries. The changes are spread across all Hadoop operators. The compatibilty layer also changes in order to support the mapreduce package

git-svn-id: https://hyracks.googlecode.com/svn/trunk/hyracks@184 123451ca-8445-de46-9d55-352943316053

Currently FileSplit stores java.io.file as a field and the host housing the file as a string. This is appropriate for a local file system but not for HDFS. The path contained inside the file type is incorrectly normalized. FileSplit should be constructed just from the node and the file path. Added field path (String) to FileSplit.

git-svn-id: https://hyracks.googlecode.com/svn/trunk/hyracks@183 123451ca-8445-de46-9d55-352943316053

Currently FileSplit stores java.io.file as a field and the host housing the file as a string. This is appropriate for a local file system but not for HDFS. The path contained inside the file type is incorrectly normalized. FileSplit should be constructed just from the node and the file path. Added field path (String) to FileSplit.

git-svn-id: https://hyracks.googlecode.com/svn/trunk@183 123451ca-8445-de46-9d55-352943316053

checking in bug fix in PreClustered Group operator. The close on aggregate was not being called that prevented reducer from closing

git-svn-id: https://hyracks.googlecode.com/svn/trunk/hyracks@178 123451ca-8445-de46-9d55-352943316053

checking in bug fix in PreClustered Group operator. The close on aggregate was not being called that prevented reducer from closing

git-svn-id: https://hyracks.googlecode.com/svn/trunk@178 123451ca-8445-de46-9d55-352943316053

Modified the compatibility layer to create JobSpec for a MR job , using the HadoopMapper in a self-read mode. Thus the jobSpec for a MR job now has 4 operators : Mapper ---M:N-->-- Sorter(External) ---1:1-->-- Reducer ---1:1:->-- Writer This makes the HadoopReadOperator redundant, but is not deleted as it is useful operator in othe scenarios.

The compatibility layer uses Hadoop Mapper in 'dependent mode' when it is forming a pipeling of MR jobs, where the mapper cannot get its input on its own.

git-svn-id: https://hyracks.googlecode.com/svn/trunk@177 123451ca-8445-de46-9d55-352943316053

Modified the compatibility layer to create JobSpec for a MR job , using the HadoopMapper in a self-read mode. Thus the jobSpec for a MR job now has 4 operators : Mapper ---M:N-->-- Sorter(External) ---1:1-->-- Reducer ---1:1:->-- Writer This makes the HadoopReadOperator redundant, but is not deleted as it is useful operator in othe scenarios.

The compatibility layer uses Hadoop Mapper in 'dependent mode' when it is forming a pipeling of MR jobs, where the mapper cannot get its input on its own.

git-svn-id: https://hyracks.googlecode.com/svn/trunk/hyracks@177 123451ca-8445-de46-9d55-352943316053

    • -1
    • +1
    /hyracks-hadoop-compat/.settings/org.eclipse.jdt.core.prefs
modified HadoopMapper operator to work in two modes : 1) SelfReadMode: The mapper reads the input directly from HDFS instead of receiving it from another specific read operator 2) DependentMode : The mapper is not anymore a source operator, but requires input to be fed by some other operator ( eg a reducer in case of chained MR jobs ) For operators A & B that connect using a one-to-one connector, A & B can be fused together to form a single operator. The above change maked HadoopReadOperator redundant. It is not being deleted here as it is a useful operator for reading from HDFS and could be used in other scenarios.

Modified AbstractHadoopReadOperator to take as argument in the constructor , the input arity. The input arity was earlier assumed to be 1 for Map and Reduce, but is

0 for Map in the SelfReadMode.

Modified Reducer to pass the inputArity to base class constructor

git-svn-id: https://hyracks.googlecode.com/svn/trunk@176 123451ca-8445-de46-9d55-352943316053

modified HadoopMapper operator to work in two modes : 1) SelfReadMode: The mapper reads the input directly from HDFS instead of receiving it from another specific read operator 2) DependentMode : The mapper is not anymore a source operator, but requires input to be fed by some other operator ( eg a reducer in case of chained MR jobs ) For operators A & B that connect using a one-to-one connector, A & B can be fused together to form a single operator. The above change maked HadoopReadOperator redundant. It is not being deleted here as it is a useful operator for reading from HDFS and could be used in other scenarios.

Modified AbstractHadoopReadOperator to take as argument in the constructor , the input arity. The input arity was earlier assumed to be 1 for Map and Reduce, but is

0 for Map in the SelfReadMode.

Modified Reducer to pass the inputArity to base class constructor

git-svn-id: https://hyracks.googlecode.com/svn/trunk/hyracks@176 123451ca-8445-de46-9d55-352943316053

removed log messages

git-svn-id: https://hyracks.googlecode.com/svn/trunk/hyracks@165 123451ca-8445-de46-9d55-352943316053