python – How do I use Rest Data Source API with access tokens?


I’m trying to get data back from an API without having to loop through an RDD, and I found the Rest Data Source Library on Git hub: https://github.com/sourav-mazumder/Data-Science-Extensions/tree/master/spark-datasource-rest

The issue I’m running into is how to incorporate Access Tokens in with parameter mapping. When it comes to the supported options the descriptions has this for the URL call: url: This is the uri of the target micro service. You can also provide the common parameters (those that don’t vary with each API call) in this url. This is a mandatory parameter. I’ve been trying to do this using python/pyspark if possible

So I have code structured like this:

Getting access token:

token_url = 'https://restful-website/oauth/token'

search_url = 'https://my_final_endpoint'

client_id = my_client_id

client_secret = my_client_secret

data = {'grant_type': 'client_credentials'}

access_token_response = requests.post(token_url, data = data, verify = False, allow_redirects = False
                                      , auth = (client_id, client_secret))
tokens = json.loads(access_token_response.text)
api_call_headers = {'Authorization': 'Bearer ' + tokens('access_token')}

expert_input_Df = spark.read.csv(file_directory, inferSchema = True, header = True)
expert_input_Df.createOrReplaceTempView('test_expert')
*expert_prms_Soda = { 'url' : search_url
            , 'input' : 'test_expert'
            , 'method' : 'GET'
            , 'readTimeout' : '10000'
            , 'connectionTimeout' : '2000'
            , 'partitions' : '10'}
expert_returnDf =* spark.read.format('org.apache.dsext.spark.datasource.rest.RestDataSource').options(**expert_prms_Soda).load()
expert_returnDf.printSchema() 
expert_returnDf.createOrReplaceTempView("expert_table")
spark.sql("select name, address, city_name from expert_table ").show()

What I would like to know is how do I get this to work with the access token? And ‘what common parameters are needed to get this to work. I’m stuck at this point and could use some guidance, please. This api call doesn’t take headers so I’m kind of stuck and I don’t want to have to use an RDD just to do this……