DistantSupervisionSyntheticFilter

java.lang.Object
- weka.filters.Filter
- - weka.filters.SimpleFilter
  - - weka.filters.SimpleBatchFilter
    - - weka.filters.unsupervised.attribute.TweetToFeatureVector
      - weka.filters.unsupervised.attribute.DistantSupervisionSyntheticFilter

All Implemented Interfaces:

java.io.Serializable, CapabilitiesHandler, CapabilitiesIgnorer, CommandlineRunnable, OptionHandler, RevisionHandler

Direct Known Subclasses:

ASA, PTCM
```
public abstract class DistantSupervisionSyntheticFilter
extends TweetToFeatureVector
```
An abstract filter that creates polarity labeled instances from unlabeled tweets and a seed polarity lexicon by generating synthetic instances.

Version:

$Revision: 1 $

Author:

Felipe Bravo-Marquez (fbravoma@waikato.ac.nz)

See Also:

Serialized Form

Field Summary

Fields
Modifier and Type	Field	Description
`static java.lang.String`	`CLUSTPREFIX`	The prefix for cluster-based attributes.
`static java.lang.String`	`LEXICON_FOLDER_NAME`	Default path to where lexicons are stored.
`static java.lang.String`	`RESOURCES_FOLDER_NAME`	Default path to where resources are stored.
`static java.lang.String`	`UNIPREFIX`	The prefix for unigram attributes.

Constructor Summary

Constructors
Constructor Description

DistantSupervisionSyntheticFilter()

Constructors
Constructor	Description
`DistantSupervisionSyntheticFilter()`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`it.unimi.dsi.fastutil.objects.Object2IntMap<java.lang.String>`	`calculateDocVec(java.util.List<java.lang.String> tokens)`	Calculates tweet vectors from a list of tokens
`java.io.File`	`getLexicon()`
`int`	`getMinAttDocs()`
`java.lang.String`	`getPolarityAttName()`
`java.lang.String`	`getPolarityAttNegValName()`
`java.lang.String`	`getPolarityAttPosValName()`
`int`	`getRandomSeed()`
`java.io.File`	`getWordClustFile()`
`boolean`	`isCreateClustAtts()`
`boolean`	`isCreateWordAtts()`
`Instances`	`mapTargetInstance(Instances inp)`	Maps tweets from the second batch into instances that are compatible with the ones generated
`void`	`setCreateClustAtts(boolean createClustAtts)`
`void`	`setCreateWordAtts(boolean createWordAtts)`
`void`	`setLexicon(java.io.File lexicon)`
`void`	`setMinAttDocs(int minAttDocs)`
`void`	`setPolarityAttName(java.lang.String polarityAttName)`
`void`	`setPolarityAttNegValName(java.lang.String polarityAttNegValName)`
`void`	`setPolarityAttPosValName(java.lang.String polarityAttPosValName)`
`void`	`setRandomSeed(int randomSeed)`
`void`	`setWordClustFile(java.io.File wordClustFile)`

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait

Methods inherited from class weka.filters.SimpleBatchFilter
batchFinished, input

Methods inherited from class weka.filters.SimpleFilter
globalInfo, setInputFormat

Methods inherited from class weka.filters.unsupervised.attribute.TweetToFeatureVector
allowAccessToFullInputFormat, getCapabilities, getOptions, getStemmer, getStopwordsHandler, getTextIndex, getTokenizer, isReduceRepeatedLetters, isStandarizeUrlsUsers, isToLowerCase, listOptions, setOptions, setReduceRepeatedLetters, setStandarizeUrlsUsers, setStemmer, setStopwordsHandler, setTextIndex, setTokenizer, setToLowerCase

Field Detail
- RESOURCES_FOLDER_NAME
```
public static java.lang.String RESOURCES_FOLDER_NAME
```
  Default path to where resources are stored.
- LEXICON_FOLDER_NAME
```
public static java.lang.String LEXICON_FOLDER_NAME
```
  Default path to where lexicons are stored.
- UNIPREFIX
```
public static java.lang.String UNIPREFIX
```
  The prefix for unigram attributes.
- CLUSTPREFIX
```
public static java.lang.String CLUSTPREFIX
```
  The prefix for cluster-based attributes.

Constructor Detail
- DistantSupervisionSyntheticFilter
```
public DistantSupervisionSyntheticFilter()
```

Method Detail

mapTargetInstance
```
public Instances mapTargetInstance(Instances inp)
```
Maps tweets from the second batch into instances that are compatible with the ones generated

Parameters:

inp - input Instances

Returns:

convertes Instances

calculateDocVec
```
public it.unimi.dsi.fastutil.objects.Object2IntMap<java.lang.String> calculateDocVec(java.util.List<java.lang.String> tokens)
```
Calculates tweet vectors from a list of tokens

Parameters:

tokens - a tokenized tweet

Returns:

a mapping between attribute names and values

getMinAttDocs

@OptionMetadata(displayName="minAttDocs",
                description="Minimum frequency of a sparse attribute to be considered in the attribute space.",
                commandLineParamName="M",
                commandLineParamSynopsis="-M <int>",
                displayOrder=6)
public int getMinAttDocs()

setMinAttDocs

public void setMinAttDocs(int minAttDocs)

isCreateWordAtts

@OptionMetadata(displayName="createWordAtts",
                description="True for creating unigram attributes.",
                commandLineParamIsFlag=true,
                commandLineParamName="W",
                commandLineParamSynopsis="-W",
                displayOrder=7)
public boolean isCreateWordAtts()

setCreateWordAtts

public void setCreateWordAtts(boolean createWordAtts)

setCreateClustAtts

@OptionMetadata(displayName="createClustAtts",
                description="True for creating attributes using word clusters",
                commandLineParamIsFlag=true,
                commandLineParamName="C",
                commandLineParamSynopsis="-C",
                displayOrder=8)
public void setCreateClustAtts(boolean createClustAtts)

isCreateClustAtts
```
public boolean isCreateClustAtts()
```

getWordClustFile

@OptionMetadata(displayName="wordClustFile",
                description="The file containing the word clusters.",
                commandLineParamName="H",
                commandLineParamSynopsis="-H <string>",
                displayOrder=9)
public java.io.File getWordClustFile()

setWordClustFile

public void setWordClustFile(java.io.File wordClustFile)

getLexicon

@OptionMetadata(displayName="lexicon",
                description="The file containing a lexicon in ARFF format with word polarities.",
                commandLineParamName="lex",
                commandLineParamSynopsis="-lex <string>",
                displayOrder=10)
public java.io.File getLexicon()

setLexicon

public void setLexicon(java.io.File lexicon)

getRandomSeed

@OptionMetadata(displayName="randomseed",
                description="The random seed number. \t default: 1",
                commandLineParamName="R",
                commandLineParamSynopsis="-R <int>",
                displayOrder=11)
public int getRandomSeed()

setRandomSeed

public void setRandomSeed(int randomSeed)

getPolarityAttName

@OptionMetadata(displayName="polarityAttName",
                description="The lexicon attribute name with the word polarities. \t default: polarity",
                commandLineParamName="polatt",
                commandLineParamSynopsis="-polatt <string>",
                displayOrder=12)
public java.lang.String getPolarityAttName()

setPolarityAttName

public void setPolarityAttName(java.lang.String polarityAttName)

getPolarityAttPosValName

@OptionMetadata(displayName="polarityAttPosValName",
                description="The lexicon attribute value name for positive words. \t default: positive",
                commandLineParamName="posval",
                commandLineParamSynopsis="-posval <String>",
                displayOrder=17)
public java.lang.String getPolarityAttPosValName()

setPolarityAttPosValName

public void setPolarityAttPosValName(java.lang.String polarityAttPosValName)

getPolarityAttNegValName

@OptionMetadata(displayName="polarityAttNegValName",
                description="The lexicon attribute value name for negative words. \t default: negative",
                commandLineParamName="negval",
                commandLineParamSynopsis="-negval <String>",
                displayOrder=18)
public java.lang.String getPolarityAttNegValName()

setPolarityAttNegValName

public void setPolarityAttNegValName(java.lang.String polarityAttNegValName)

Class DistantSupervisionSyntheticFilter

Field Summary

Constructor Summary

Method Summary

Methods inherited from class weka.filters.Filter

Methods inherited from class java.lang.Object

Methods inherited from class weka.filters.SimpleBatchFilter

Methods inherited from class weka.filters.SimpleFilter

Methods inherited from class weka.filters.unsupervised.attribute.TweetToFeatureVector

Field Detail

RESOURCES_FOLDER_NAME

LEXICON_FOLDER_NAME

UNIPREFIX

CLUSTPREFIX

Constructor Detail

DistantSupervisionSyntheticFilter

Method Detail

mapTargetInstance

calculateDocVec

getMinAttDocs

setMinAttDocs

isCreateWordAtts

setCreateWordAtts

setCreateClustAtts

isCreateClustAtts

getWordClustFile

setWordClustFile

getLexicon

setLexicon

getRandomSeed

setRandomSeed

getPolarityAttName

setPolarityAttName

getPolarityAttPosValName

setPolarityAttPosValName

getPolarityAttNegValName

setPolarityAttNegValName