Select columns which contains a string in pyspark

I have a pyspark dataframe with a lot of columns, and I want to select the ones which contain a certain string, and others. For example:

df.columns = ['hello_world','hello_country','hello_everyone','byebye','ciao','index']

I want to select the ones which contains 'hello' and also the column named 'index', so the result will be:

['hello_world','hello_country','hello_everyone','index']

I want something like df.select('hello*','index')

Thanks in advance :)

EDIT:

I found a quick way to solve it, so I answered myself, Q&A style. If someone sees my solution and can provide a better one I will appreciate it

I've found a quick and elegant way:

selected = [s for s in df.columns if 'hello' in s]+['index']
df.select(selected)

With this solution i can add more columns I want without editing the for loop that Ali AzG suggested.

pyspark.sql module, _ = spark.udf.register("add_one", add_one) >>> spark.sql("SELECT For example, if value is a string, and subset contains a non-string column, then the  schema – a pyspark.sql.types.DataType or a datatype string or a list of column names, default is None. The data type string format equals to pyspark.sql.types.DataType.simpleString, except that top level struct type can omit the struct<> and atomic types use typeName() as their format, e.g. use byte instead of tinyint for pyspark.sql.types.ByteType.

You can also try to use colRegex function introduced in Spark 2.3, where in you can specify the column name as regular expression as well.

pyspark.sql.column, Source code for pyspark.sql.column Select a column out of a DataFrame df. '​contains' " "in a string column or 'array_contains' function for an array column. String Split of the column in pyspark : Method 1. split () Function in pyspark takes the column name as first argument ,followed by delimiter (“-”) as second argument. Which splits the column by the mentioned delimiter (“-”). getItem (0) gets the first part of split . getItem (1) gets the second part of split. 1.

This sample code does what you want:

hello_cols = []

for col in df.columns:
  if(('index' in col) or ('hello' in col)):
    hello_cols.append(col)

df.select(*hello_cols)

How To Select Columns Using Prefix/Suffix of , How to Select Columns with Prefix in Pandas Python. Selecting one Note that this file contains gapminder data in wider form. It has 142 rows  Computes the BASE64 encoding of a binary column and returns it as a string column.This is the reverse of unbase64. concat_ws(sep: String, exprs: Column*): Column Concatenates multiple input string columns together into a single string column, using the given separator.

[PDF] Cheat sheet PySpark SQL Python.indd, from pyspark.sql import functions as F. Select. >>> df.select("firstName").show(). Show all entries in firstName column. > field_name in schemaString.split()]. >  Select a column out of please use 'contains' ""in a string column or 'array import doctest from pyspark.sql import SparkSession import pyspark.sql.column

Column · The Internals of Spark SQL, You can also create free column references from $ -prefixed strings. DataFrame = [id: int, text: string] scala> df.select('id) res0: org.apache.spark.sql. Filtering Data using using double quotes. When we are filtering the data using the double quote method , the column could from a dataframe or from a alias column and we are only allowed to use the single part name i.e, just the column name or the aliased column name.

DataFrame Transformations in PySpark (Continued), In part 1, we touched on filter(), select(), dropna(), fillna(), and isNull(). We instead pass a string containing the name of our columns to col()  Splitting a string into an ArrayType column Let’s create a DataFrame with a name column and a hit_songs pipe delimited string. Then let’s use the split() method to convert hit_songs into an

Comments
  • Great solution. and do not need * before selected?
  • Thanks ! I don't :)
  • Thanks, i fixed an error in your code and it worked.
  • @Antonio Manrique You're right. I wrote my code in python 2.7. Please accept my answer if it was helpful.
  • I will give it an upvote ! But i've found myself a better option for what i am doing, i'll post it as an answer and accept it. But thank you so much !